A.E.R.I.S. — Autonomous Environmental RAG & Inference System

A self-hosted environmental intelligence platform that detects anomalies in real-time environmental data and generates causal explanations using a locally-hosted LLM with RAG and multi-source cross-referencing.

The Problem

Environmental hazards vary block-by-block and change hour-by-hour. Public data exists across dozens of agencies (EPA, NOAA, NASA, USGS) in incompatible formats. When a pollution spike hits your neighborhood, no system automatically detects it, determines the cause, and tells you what to do.

What AERIS Does

AERIS runs on a home server and:

Aggregates 9 real-time environmental data sources (air quality, weather, wildfires, satellite, traffic, water, energy)
Detects anomalies using a three-method engine (statistical, seasonal decomposition, isolation forest)
Explains causes via a locally-hosted LLM that cross-references all data sources through a RAG pipeline
Visualizes everything on an interactive map with real-time feeds, evidence panels, and natural language querying

All inference runs locally. No student data, health queries, or location data leaves the server.

Architecture

Home Server (Always-On)
├── Data Collectors ──── 9 public APIs (EPA, OpenAQ, PurpleAir, NOAA, NASA FIRMS,
│                        Sentinel-5P, TomTom, USGS, EIA)
├── PostgreSQL + TimescaleDB ──── Time-series storage
├── Anomaly Detection ──── Z-score | STL decomposition | Isolation Forest
├── Ollama (Llama 3 8B) ──── Local LLM inference
├── ChromaDB ──── RAG vector store
└── FastAPI ──── REST API + WebSocket

Web Application (React)
├── Interactive Map ──── Mapbox GL JS with sensor/anomaly/satellite layers
├── Anomaly Feed ──── Real-time detected anomalies with LLM summaries
├── Anomaly Detail ──── Full explanation + evidence panels + health advisory
├── NL Query ──── "Why was air quality bad yesterday?"
└── System Dashboard ──── Collection status, model metrics, server health

Tech Stack

Layer	Technology
Backend	Python 3.11+, FastAPI, SQLAlchemy
Database	PostgreSQL + TimescaleDB
Vector Store	ChromaDB
Local LLM	Ollama (Llama 3 8B)
ML	scikit-learn, statsmodels
Frontend	React 18, TypeScript, Vite
Mapping	Mapbox GL JS
Charts	Recharts
Styling	Tailwind CSS

Data Sources

Source	Data	Frequency
EPA AirNow	AQI, PM2.5, ozone, NO2, SO2, CO	Hourly
OpenAQ	Global government air quality monitors	Hourly
PurpleAir	Crowdsourced hyperlocal PM2.5	~2 min
NOAA / OpenWeather	Weather (temp, wind, humidity, pressure)	Hourly
NASA FIRMS	Active wildfire locations	~3 hours
Sentinel-5P	Satellite NO2, SO2, CO columns	Daily
TomTom	Real-time traffic density	~15 min
USGS Water Services	Stream flow, water quality	15 min - hourly
EIA Open Data	Power plant emissions, grid carbon	Hourly - daily

Research

Question: Can locally-hosted LLMs generate accurate causal explanations for environmental anomalies by cross-referencing heterogeneous data sources?

Evaluation:

Expert-labeled anomaly ground truth (50-100 events)
Local (Llama 3 8B) vs. cloud (GPT 5.4, Gemini 3 Thinking) comparison
Automated hallucination detection accuracy
User comprehension and actionability study

Getting Started

Prerequisites

Python 3.11+
Node.js 18+
PostgreSQL 15+ with TimescaleDB extension
Ollama with Llama 3 8B pulled

Setup

# Clone
git clone https://github.com/<your-username>/aeris.git
cd aeris

# Backend
cd server
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env  # fill in API keys
uvicorn app.main:app --reload

# Frontend (new terminal)
cd client
npm install
npm run dev

Environment Variables

Copy server/.env.example and fill in:

DATABASE_URL — PostgreSQL connection string
AIRNOW_API_KEY — EPA AirNow
PURPLEAIR_API_KEY — PurpleAir
OPENWEATHER_API_KEY — OpenWeather
TOMTOM_API_KEY — TomTom traffic
EIA_API_KEY — EIA energy data
MAPBOX_TOKEN — Mapbox GL JS
NASA_EARTHDATA_TOKEN — Sentinel-5P satellite data

Roadmap

[x] Design specification
[ ] Month 1: Server infrastructure + data pipeline (9 collectors)
[ ] Month 2: Anomaly detection engine + LLM explanation pipeline
[ ] Month 3: Web application (map, feed, detail, query, dashboard)
[ ] Month 4: Research evaluation + polish
[ ] Month 5: Paper, competition submissions, stretch goals

License

MIT

Built With

python

Updates

Mason Cao started this project — Apr 19, 2026 05:41 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.