A.E.R.I.S. — Autonomous Environmental RAG & Inference System
A self-hosted environmental intelligence platform that detects anomalies in real-time environmental data and generates causal explanations using a locally-hosted LLM with RAG and multi-source cross-referencing.
The Problem
Environmental hazards vary block-by-block and change hour-by-hour. Public data exists across dozens of agencies (EPA, NOAA, NASA, USGS) in incompatible formats. When a pollution spike hits your neighborhood, no system automatically detects it, determines the cause, and tells you what to do.
What AERIS Does
AERIS runs on a home server and:
- Aggregates 9 real-time environmental data sources (air quality, weather, wildfires, satellite, traffic, water, energy)
- Detects anomalies using a three-method engine (statistical, seasonal decomposition, isolation forest)
- Explains causes via a locally-hosted LLM that cross-references all data sources through a RAG pipeline
- Visualizes everything on an interactive map with real-time feeds, evidence panels, and natural language querying
All inference runs locally. No student data, health queries, or location data leaves the server.
Architecture
Home Server (Always-On)
├── Data Collectors ──── 9 public APIs (EPA, OpenAQ, PurpleAir, NOAA, NASA FIRMS,
│ Sentinel-5P, TomTom, USGS, EIA)
├── PostgreSQL + TimescaleDB ──── Time-series storage
├── Anomaly Detection ──── Z-score | STL decomposition | Isolation Forest
├── Ollama (Llama 3 8B) ──── Local LLM inference
├── ChromaDB ──── RAG vector store
└── FastAPI ──── REST API + WebSocket
Web Application (React)
├── Interactive Map ──── Mapbox GL JS with sensor/anomaly/satellite layers
├── Anomaly Feed ──── Real-time detected anomalies with LLM summaries
├── Anomaly Detail ──── Full explanation + evidence panels + health advisory
├── NL Query ──── "Why was air quality bad yesterday?"
└── System Dashboard ──── Collection status, model metrics, server health
Tech Stack
| Layer | Technology |
|---|---|
| Backend | Python 3.11+, FastAPI, SQLAlchemy |
| Database | PostgreSQL + TimescaleDB |
| Vector Store | ChromaDB |
| Local LLM | Ollama (Llama 3 8B) |
| ML | scikit-learn, statsmodels |
| Frontend | React 18, TypeScript, Vite |
| Mapping | Mapbox GL JS |
| Charts | Recharts |
| Styling | Tailwind CSS |
Data Sources
| Source | Data | Frequency |
|---|---|---|
| EPA AirNow | AQI, PM2.5, ozone, NO2, SO2, CO | Hourly |
| OpenAQ | Global government air quality monitors | Hourly |
| PurpleAir | Crowdsourced hyperlocal PM2.5 | ~2 min |
| NOAA / OpenWeather | Weather (temp, wind, humidity, pressure) | Hourly |
| NASA FIRMS | Active wildfire locations | ~3 hours |
| Sentinel-5P | Satellite NO2, SO2, CO columns | Daily |
| TomTom | Real-time traffic density | ~15 min |
| USGS Water Services | Stream flow, water quality | 15 min - hourly |
| EIA Open Data | Power plant emissions, grid carbon | Hourly - daily |
Research
Question: Can locally-hosted LLMs generate accurate causal explanations for environmental anomalies by cross-referencing heterogeneous data sources?
Evaluation:
- Expert-labeled anomaly ground truth (50-100 events)
- Local (Llama 3 8B) vs. cloud (GPT 5.4, Gemini 3 Thinking) comparison
- Automated hallucination detection accuracy
- User comprehension and actionability study
Getting Started
Prerequisites
- Python 3.11+
- Node.js 18+
- PostgreSQL 15+ with TimescaleDB extension
- Ollama with Llama 3 8B pulled
Setup
# Clone
git clone https://github.com/<your-username>/aeris.git
cd aeris
# Backend
cd server
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # fill in API keys
uvicorn app.main:app --reload
# Frontend (new terminal)
cd client
npm install
npm run dev
Environment Variables
Copy server/.env.example and fill in:
DATABASE_URL— PostgreSQL connection stringAIRNOW_API_KEY— EPA AirNowPURPLEAIR_API_KEY— PurpleAirOPENWEATHER_API_KEY— OpenWeatherTOMTOM_API_KEY— TomTom trafficEIA_API_KEY— EIA energy dataMAPBOX_TOKEN— Mapbox GL JSNASA_EARTHDATA_TOKEN— Sentinel-5P satellite data
Roadmap
- [x] Design specification
- [ ] Month 1: Server infrastructure + data pipeline (9 collectors)
- [ ] Month 2: Anomaly detection engine + LLM explanation pipeline
- [ ] Month 3: Web application (map, feed, detail, query, dashboard)
- [ ] Month 4: Research evaluation + polish
- [ ] Month 5: Paper, competition submissions, stretch goals
License
MIT
Log in or sign up for Devpost to join the conversation.