# Predictive_AI_Maintenance_System

**Repository Path**: wu-nil/Predictive_AI_Maintenance_System

## Basic Information

- **Project Name**: Predictive_AI_Maintenance_System
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-11
- **Last Updated**: 2025-12-11

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Predictive AI Maintenance System for Elevators

[![Python](https://img.shields.io/badge/Python-3.9%2B-blue)](https://www.python.org/)
[![License](https://img.shields.io/badge/License-MIT-green)](LICENSE)
[![Status](https://img.shields.io/badge/Status-Master's_Thesis-orange)](https://github.com/Aladra8/Predictive_AI_Maintenance_System)

A reproducible, transparent Predictive Maintenance (PdM) pipeline for elevator systems using public telemetry data. This project establishes a baseline for detecting mechanical faults without ground-truth logs by using physically-grounded analytical labeling and rigorous robustness protocols ("Leakage Guard").

---

## 📖 Table of Contents
- [Project Overview](#-project-overview)
- [Dataset & Feature Engineering](#-dataset--feature-engineering)
- [Methodology](#-methodology)
  - [Analytical Labeling](#1-analytical-labeling-2-of-3-rule)
  - [Leakage Guard Protocol](#2-leakage-guard-robustness)
- [Project Structure](#-project-structure)
- [Installation](#-installation)
- [Usage Pipeline](#-usage-pipeline)
- [Results](#-results)
- [Thesis Report](#-thesis-report)
- [Author](#-author)

---

## Project Overview

**Motivation:** Public research in Elevator PdM is hindered by the lack of labeled datasets. Most studies rely on proprietary "black box" data.
**Objective:** To create an auditable, open-source pipeline that:
1.  Generates valid fault labels from raw telemetry.
2.  Benchmarks interpretable models (Random Forest) against Neural Networks (MLP).
3.  Validates that models learn physical failure precursors (Energy, Acceleration) rather than just memorizing rules.
4.  Aggregates high-frequency alerts into actionable maintenance tickets to reduce alarm fatigue.

---

## Dataset & Feature Engineering

**Source:** Huawei Munich Research Center (Zenodo/Kaggle).
**Raw Data:** 112,001 rows of high-frequency telemetry (approx 4Hz).

### Feature Mapping
The raw data contained anonymized features (`x1`...`x5`). Based on Exploratory Data Analysis (EDA) and physical correlations, we mapped them as follows:

| Original | Mapped Name | Description |
| :--- | :--- | :--- |
| `x1` | **Temperature** | Environmental context (slow drift). |
| `x2` | **Speed** | Correlates with motion phases. |
| `x3` | **Signal Strength** | IoT/Network health proxy. |
| `x4` | **Energy** | Motor power consumption (peaks at start). |
| `x5` | **Motor Cycles** | Cumulative usage counter. |
| *Derived* | **Acceleration** | Discrete difference of Speed ($\Delta v$). |
| *Derived* | **Timestamp** | Synthesized using Poisson process for time-series splits. |

---

## Methodology

### 1. Analytical Labeling ("2-of-3 Rule")
Since ground truth was unavailable, we developed a transparent labeling logic. A sample is flagged as **Faulty** if at least **2** of the following **3** sensors exceed their **95th percentile**:
* Vibration
* Speed
* Revolutions

*Validation:* Principal Component Analysis (PCA) confirmed that these labeled points cluster in a distinct "High Operational Intensity" manifold, validating they are not random noise.

### 2. "Leakage Guard" (Robustness)
To prove the model isn't just "cheating" by reversing the labeling rule, we introduced the **Guard Regime**.
* **Standard Regime:** Train on all features.
* **Guard Regime:** **Drop** the label-defining features (Vibration, Speed, Revolutions). The model must predict faults using only secondary context (Temperature, Energy, Acceleration).

### 3. Event Aggregation
Raw row-level predictions (4Hz) create too much noise. We implemented an event logic:
* **Logic:** Merge consecutive faulty rows if the gap is < 60 seconds.
* **Output:** Discrete "Maintenance Events" with Start Time, Duration, and Intensity.

---

## Project Structure

```text
Predictive_AI_Maintenance_System/
│
├── Code/
│   ├── src/
│   │   ├── preprocess_large_dataset.py   # ETL, Feature Engineering, Labeling
│   │   ├── phase1_noise_fs.py            # PCA, EDA, Threshold Grid
│   │   ├── phase2_training.py            # Model Training (RF, MLP), Calibration
│   │   ├── phase3_summarize.py           # Robustness Analysis & Tables
│   │   └── visualization/                # Plotting scripts
│   ├── data/
│   │   ├── raw/                          # Place 'predictive-maintenance-dataset.csv' here
│   │   └── processed/                    # Output CSVs appear here
│   ├── outputs/                          # Saved models (.pkl) and artifacts
│   └── requirements.txt                  # Python dependencies
│
├── Report/
│   ├── Full_Research_Report.tex          # Main Thesis Source
│   ├── references.bib                    # Bibliography
│   ├── Images/                           # Figures for the report
│   └── Tables/                           # CSV tables for LaTeX
│
└── README.md

INSTALLATION

1. Clone the repository:
git clone [https://github.com/Aladra8/Predictive_AI_Maintenance_System.git](https://github.com/Aladra8/Predictive_AI_Maintenance_System.git)
cd Predictive_AI_Maintenance_System

2. Set up the environment (Optional but recommended):
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install dependencies:
pip install -r Code/requirements.txt

USAGE PIPELINE
Run the scripts in the following order to reproduce the thesis results.
Phase 1: Data Preparation
python Code/src/preprocess_large_dataset.py

Phase 2: EDA & Structural Validation
Generates PCA plots, threshold visualizations, and distribution histograms to validate the labeling logic.
python Code/others/run_phase1_noise_fs.py

Phase 3: Training & Benchmarking
Trains Random Forest and MLP models, generates ROC/PR curves, and runs the Leakage Guard robustness test.
# Run Standard Training & Save Models
python Code/others/run_phase2.py --save-models

# Run Robustness Check (Leakage Guard)
python Code/others/run_phase2_training.py --leakage-guard

Phase 4: Visualization & Events
Generates operational plots (Events per Day, Faults by Hour) and the Top-10 Events table.
python Code/src/visualization/visualize_combined_faults.py
# Note: Event analytics are also generated during Phase 1 & 2 outputs.


RESULTS SUMMARY

Model,ROC-AUC,F1-Score,Avg Precision,Interpretation
Random Forest,0.999,0.985,0.998,Highly Interpretable (Recommended)
MLP (Neural Net),0.999,0.982,0.997,Black-box Benchmark

Key Findings:
Interpretability: Random Forest matches Neural Network performance but offers superior transparency via Feature Importance and Partial Dependence Plots.

Robustness: Even in the Guard Regime (Vibration removed), the model maintained high precision, proving it utilizes Energy and Acceleration as valid failure precursors.

Operations: Event aggregation logic successfully reduced ~5,600 raw fault rows into 507 actionable maintenance tickets.

THE REPORT
The full academic report is available in the Report/ directory. It is written in LaTeX.

To compile the PDF locally (VS Code):

Ensure you have a TeX distribution installed (e.g., MacTeX for macOS).

Open Report/Full_Research_Report.tex in VS Code.

Run the build command (using LaTeX Workshop extension): Recipe: latexmk (latexmk -> bibtex -> latexmk)


AUTHOR
Buba Drammeh