Inspiration

I've been frustrated with system monitoring tools for years. Task Manager tells me my CPU is at 47% right now, but it can't tell me what happened five minutes ago when my fan suddenly spun up and the laptop got hot. By the time I open it, the moment has already passed.

What I really wanted was a dashcam for my operating system — something that records continuously, so when something feels off I can rewind and see what process was responsible.

The other thing I noticed: even when monitoring tools show you data, they don't interpret it. A graph going up means nothing to most people. So I wanted to pair the recording with an AI that could read the data and explain in plain English what it sees — and importantly, distinguish whether a problem is hardware (like a failing drive or worn battery) or software (like a runaway process).

That combination — continuous recording, statistical anomaly detection, plain-English AI diagnosis, all running locally — is ProcessLens.

What it does

ProcessLens is a real-time system forensics tool for Windows. It does four things continuously in the background:

  • Samples your system every second — running processes, CPU, RAM, disk I/O, GPU load, hardware sensors, per-core temperatures
  • Detects anomalies using a rolling 3-sigma threshold that adapts to whatever's normal for your machine, right now
  • Attributes suspect processes by computing per-process delta contribution during the anomaly window — so it identifies the rogue process that just spiked, not just the process that's always biggest
  • Generates AI diagnosis in structured plain English, distinguishing hardware issues from software issues, with prioritized recommendations

The dashboard shows live charts of all metrics, an anomaly sidebar that updates the moment something fires, and a top-processes table. Click any anomaly to zoom the chart and see ranked suspects with delta contributions. From there you can terminate the process directly, or click the AI diagnose button to get a structured analysis from Claude, OpenAI, Gemini, or a fully local Ollama model.

Beyond detection, it also does deep hardware enumeration — storage drives with SMART failure prediction, monitors with EDID-decoded model and manufacture year, RAM modules with manufacturer and speed, battery wear percentage, peripherals like trackpads and cameras and fingerprint readers — and feeds all of that into the LLM as diagnostic context. So when the AI says "this is likely a software issue, not hardware," it actually has hardware data to back that up.

Everything is local. SQLite persists anomalies across restarts. With Ollama as the LLM, even the diagnosis stays on your machine — no telemetry, nothing in the cloud.

How we built it

I went architecture-first. Before writing a single feature, I built a layered engine system: an abstract polling thread base class, then process and hardware engines that each run on their own background thread at their own polling rate, then an aggregator that unifies their snapshots into a single time-series with a 2-hour rolling buffer, then a detector that scans that buffer for anomalies. Each layer reads only from the layer below.

This paid off enormously. Once the engine pipeline was working, every feature became "subscribe to the aggregator's history" or "add an endpoint that reads from the detector." No coupling, no global state, easy to test.

For hardware sensors, I integrated LibreHardwareMonitorLib — a .NET library — into Python via pythonnet. This gives access to per-core CPU loads, GPU memory, motherboard sensors, and other things psutil can't see. For deeper hardware enumeration (RAM modules, monitors, SMART status, battery wear) I used WMI queries directly, including the root\wmi namespace for things like WmiMonitorID (with EDID decoding) and MSStorageDriver_FailurePredictStatus.

The detector uses rolling 3-sigma anomaly detection. For each metric, every 3 seconds it computes mean and standard deviation over the last 2 minutes (excluding the very recent window to avoid self-poisoning), derives a threshold of max(μ + 3σ, floor), and fires only when 3 consecutive samples exceed it. The "rolling" part is critical — it adapts to whatever the machine is currently doing, so an ongoing compile gradually becomes "the new normal" while a sudden spike on top of it still fires.

For suspect attribution, I compute per-process delta contribution: for each process, average resource usage during the anomaly minus average in the 10 seconds before. This catches change agents instead of incumbents — the small Python process that just ballooned to 1.1GB, not Chrome which has been at 4GB all day. I also flag any process spawned within ±5 seconds of the anomaly onset, so triggers like schedulers don't get missed.

The AI diagnosis layer is provider-agnostic. Same prompt, same expected JSON schema, four backends (Claude, OpenAI, Gemini, Ollama). The diagnostic report builder produces a Markdown document with system info, hardware inventory, recent metric stats, anomaly history, and a structured "diagnostic question" prompt at the end. The LLM returns a JSON object with overall_health, summary, issues_found[], hardware_vs_software, recommendations[], and watch_for[]. The frontend renders this as a styled card with severity tags, priority badges, and evidence citations.

The frontend is intentionally simple — vanilla JavaScript, no build step, Plotly for charts, custom CSS for everything else. WebSocket for real-time anomaly push from the detector to the UI. Light theme for legibility in bright demo rooms.

Tech stack ended up being Python 3.12, FastAPI, uvicorn, psutil, pythonnet, WMI, Anthropic/OpenAI/Google/requests for LLMs, SQLite, vanilla JS, and Plotly.

Challenges we ran into

LibreHardwareMonitor's .NET dependency hell. Getting LHM to load via pythonnet took hours. The library depends on specific versions of System.Memory, System.Numerics.Vectors, and System.Runtime.CompilerServices.Unsafe (specifically v4.5.3 for the right assembly version), and they have to be loaded in the right order before the main DLL. Plus Windows blocks .NET assemblies downloaded from the internet by default — required Unblock-File on the entire lib/ directory. Added to that, certain LHM hardware groups (Storage, Controller) conflict with newer dependencies and had to be disabled. The eventual solution was bundling specific NuGet versions of every dependency and pre-loading them in a deterministic order.

Cross-thread WebSocket coordination. The detector runs on a background thread, but FastAPI's WebSocket sends are async coroutines that need to run on the main event loop. Got RuntimeError: no running event loop for a while. Solution was to capture the running loop on app startup and use asyncio.run_coroutine_threadsafe to schedule sends from the detector's thread.

NaN in JSON serialization. Hardware sensors occasionally return NaN when a read fails transiently, and Starlette's default JSON encoder explodes on NaN. Discovered this when the timeline endpoint started 500-erroring intermittently. Wrote a _safe_float helper that converts NaN and inf to None before serialization.

Statistical false positives on idle systems. Pure 3-sigma fires constantly on quiet machines because σ becomes tiny — a CPU jump from 1% to 1.5% is "anomalous" by pure statistics. Added per-metric absolute floors (CPU 25%, RAM 60%) so the detector requires the metric to be at least somewhat elevated in absolute terms, not just statistically unusual.

Anomaly chart bands obscuring the data lines. Initially rendered anomalies as red shaded bands behind the chart. With multiple stacked anomalies during demo testing, the bands compounded into solid pink and you couldn't see the metric lines anymore. Fixed by lowering fill opacity to 0.08 and adding thin 1px borders so each band stays visually discrete.

Suspect attribution logic getting it wrong. First version sorted suspects by current usage during the anomaly. Worked terribly — Chrome was always #1 in RAM anomalies even when it had nothing to do with them. Rewrote to use delta from before-window to during-window, which correctly surfaces the actual change agents.

Persistence ordering bug. When I added SQLite, my first version of the lifespan() function tried to hydrate anomalies into the detector before the detector was constructed. Server crashed on boot in less than a second so the elevated console window closed before I could read the error. Solved by routing crashes to a crash.log file and reordering the initialization carefully — engines first, then detector, then storage, then hydration, then start the detector.

Accomplishments that we're proud of

The architecture genuinely scales. After the engine pipeline was built, every new feature — kill, export, AI diagnosis, SQLite, system info — slotted in without refactoring core code. That's something I usually fail at in time-pressured projects.

The suspect attribution algorithm actually works. In a real test, my system fired a RAM anomaly when I deliberately allocated 2GB in a Python script. python.exe came up at the top of suspects with a +1097 MB delta — pinpointed by the algorithm with no human input. That moment was satisfying.

The multi-provider LLM design — Claude, OpenAI, Gemini, and Ollama all working through the same UI with the same JSON output schema — feels production-grade. And the privacy story (run Ollama locally, your data never leaves your machine) is a genuinely useful angle for a system monitoring tool.

The deep hardware inventory finds things you usually need third-party tools to see: monitor manufacturer and manufacture year via EDID decoding, SMART failure prediction per drive, battery wear percentage from design vs. current full-charge capacity, distinguishing trackpads from regular mice via name pattern matching. When the AI sees "battery health 73%" in the report, that's real diagnostic context.

And on a personal level: this is the most cohesive end-to-end product I've shipped in a hackathon. Real-time backend, real-time frontend, statistics, AI integration, persistence, hardware enumeration — and it all works together.

What we learned

Architecture decisions early on save hours later. The decision to make every engine an Engine subclass with poll(), snapshot(), and history() was made in the first 30 minutes and paid back its cost ten times over.

Statistics is more useful than ML for this kind of problem. I considered training an LSTM or autoencoder for anomaly detection. Glad I didn't. Rolling 3-sigma with a per-metric floor and consecutive-sample requirement is interpretable, debuggable, and works across machines without any training data.

The hard part of LLM integration isn't the API — it's the prompt. Getting Claude/GPT to return strict JSON with the exact schema I wanted required several iterations on the system prompt. The structured Diagnostic Question section at the end of the report — explicitly listing the five things I want the LLM to do — made output dramatically more reliable than open-ended prompts.

Hardware monitoring on Windows is messier than I expected. Half a dozen different APIs (WMI, registry, .NET libraries, PowerShell cmdlets) with overlapping coverage and inconsistent reliability. LibreHardwareMonitor turned out to be the best general-purpose option but it has its own dependency quirks.

Demo-driven development works. Toward the end I started thinking explicitly about the live demo flow and tightened the trigger commands, the visual feedback, the timing — and that exercise surfaced UX issues I'd otherwise have shipped (anomaly bands too dense, insights panel covering the cards, NaN errors on the System Info button, etc.).

What's next for ProcessLens

ETW integration on Windows for direct per-process disk and network attribution. Right now suspect ranking for disk I/O anomalies falls back to CPU% as a proxy because Windows doesn't easily expose per-process disk stats. ETW would fix this.

Cross-platform support. The engine architecture is OS-agnostic; only the sensor layer is Windows-specific. macOS would use powermetrics and IOKit, Linux would use lm-sensors and procfs. About a day of porting work per OS.

Long-term retention with Parquet. SQLite is great for hours-to-days. For weeks-to-months of metric history with efficient querying, Parquet on disk would be a better fit.

Comparative anomalies across machines. If multiple instances are running on a network, aggregating "what's anomalous for me" against "what's anomalous for everyone" would let the tool distinguish "your machine has a problem" from "all our machines have a problem."

Process behavior fingerprints. Once enough history is collected, ProcessLens could learn what's normal for each specific process, not just for each metric. "Chrome has been at 4GB for an hour" is normal; "Notepad just hit 4GB" is not — even though both are statistically large.

Fleet view for IT teams. A multi-machine dashboard for small IT shops where admins could see anomalies across an org's laptops and use ProcessLens as a first-pass diagnostic before escalating.

Built With

Share this project:

Updates