-
-
You can also feed live data. Right now only 50 observations but can be scaled up.
-
According to the data, a anomaly score is generated and each sensor readings (graph) can be then use by the AI to look for probable cause
-
All the findings that are generated can be reported and probable cause is also generated and added to the pdf report
-
main dashboard, you can select the Engine No., Sensor no. accordingly telemetry data is displayed
-
the report is saved as pdf
✈️ AI-Powered Aviation Anomaly Detector
Hybrid Deep Learning + Generative AI Diagnostic System
🧩 Abstract
This project implements an AI-driven aviation engine diagnostic platform that integrates a sequence-based anomaly detection model (LSTM Autoencoder) with a Large Language Model (LLM) for natural-language diagnosis generation.
The system analyzes multi-sensor turbofan engine data (NASA C-MAPSS datasets) to:
- Detect operational anomalies,
- Identify deviating sensors,
- Generate expert-level technical reports via Gemini (Google Generative Language API),
- Produce fully formatted PDF diagnostic reports.
The application is deployed via Streamlit, enabling interactive data exploration and real-time anomaly assessment for both historical and live sensor feeds.
⚙️ System Architecture
┌────────────────────────────────────────────┐
│ Streamlit Frontend │
│ (Interactive UI + Visualization) │
└────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────┐
│ Data Pipeline & Preprocessing │
│ - Load NASA FD001–FD004 datasets │
│ - Apply MinMaxScaler (global normalization)│
└────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────┐
│ LSTM Autoencoder (Keras) │
│ - Sequence length = 50 cycles │
│ - 21 sensor inputs │
│ - Trained on normal operating data │
│ - Computes reconstruction error (MAE) │
└────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────┐
│ Anomaly Investigation Engine │
│ - Identify top 3 deviating sensors │
│ - Pass findings to Gemini LLM │
│ - Generate human-readable technical report │
└────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────┐
│ PDF Report Generator (FPDF) │
│ - Summarizes results, plots, LLM analysis │
│ - Exports professional-grade report │
└────────────────────────────────────────────┘
🧠 Core Components
1. LSTM Autoencoder
- Architecture:
- Encoder: 3 stacked LSTM layers compressing temporal signals
- Bottleneck latent representation
- Decoder: LSTM layers reconstructing input sequence
- Encoder: 3 stacked LSTM layers compressing temporal signals
- Objective Function:
Mean Absolute Error (MAE) between original and reconstructed sensor signals. - Anomaly Score:
MAE = (1/n) * Σ |xᵢ - x̂ᵢ|
where:
- xᵢ is the observed sensor value
- x̂ᵢ is the predicted sensor value
- n is the number of data points
A high MAE indicates abnormal sensor behavior or a potential fault.
2. LLM-Powered Diagnostic Engine (Gemini API)
- Uses Google’s
generativelanguage.googleapis.comendpoint (gemini-2.5-flash-preview-05-20). - Prompt engineering integrates:
- Anomaly score
- Top deviating sensors
- Anomaly score
- Returns structured text containing:
- Expert Diagnosis
- Probable Root Cause
- Expert Diagnosis
Example Prompt:
3. Streamlit User Interface
- Tab 1 – Test Data Analysis:
Load and visualize engines from NASA FD001–FD004 datasets. - Tab 2 – Live Data Analysis:
Paste or stream real-time sensor readings for anomaly detection. - Interactive plots built with Plotly and Matplotlib.
- PDF export via FPDF library.
📦 Folder Structure
project_root/
│
├── app.py ← Streamlit app (main UI)
├── expert_rules.py
├── run_combined.py
├── run_project_fd001.py
├── run_project_fd002.py
├── run_project_fd003.py
├── run_project_fd004.py
│
├── data/
│ ├── readme.txt
│ ├── train_FD001.txt
│ ├── test_FD001.txt
│ ├── RUL_FD001.txt
│ ├── Damage Propagation Modeling.pdf
│ └── ... (same for FD002–FD004)
│
├── models/ ← Trained models (.h5)
├── plots/ ← Generated charts/reports
├── venv/ ← Virtual environment
├── .env ← Contains secrets (DO NOT COMMIT)
└── .gitignore ← Controls what’s excluded from Git
🔒 Security and Privacy
Sensitive components not committed:
.env→ containsGOOGLE_API_KEY/models/→ stores trained weights/data/→ raw NASA datasets/plots/→ runtime outputs/venv/→ local environment
All secrets are loaded via:
from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv("GOOGLE_API_KEY")
🧪 Example Workflow
- Select Dataset → e.g.,
FD002 - Choose Engine ID
- Click Generate PDF Report
The App Performs:
- Computes reconstruction MAE (Mean Absolute Error)
- Identifies abnormal sensors
- Queries Gemini API for natural-language diagnosis
- Outputs a formatted PDF report with plots and LLM explanations
📊 Example Output
Anomaly Score: 0.1624
Top Deviating Sensors: s3, s9, s14
LLM Output:
Expert Diagnosis: Compressor efficiency degradation detected.
Probable Root Cause: Stage 2 turbine wear leading to increased vibration and thermal imbalance.
Generated Report:
Anomaly_Report_Engine_FD002_123.pdf
🧠 Dependencies
| Library | Purpose |
|---|---|
tensorflow.keras |
LSTM Autoencoder architecture |
sklearn.preprocessing |
MinMax scaling |
streamlit |
Interactive web application |
plotly, matplotlib |
Visualization |
fpdf |
PDF generation |
requests |
Gemini API calls |
python-dotenv |
Secure environment variable management |
numpy, pandas |
Data manipulation |
Install via:
pip install -r requirements.txt
🧠 Author
Arya Keskar
📧 dcaryakeskar@gmail.com
Log in or sign up for Devpost to join the conversation.