SŌMA | Devpost

Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for SŌMA

The Moment This Became Real

I was reading my own journal from six months ago.

Every entry had the same themes. The same goals. The same promises to myself. The same frustrations.

Six months apart. Word for word identical.

That was not a motivation problem. That was not a discipline problem. That was a visibility problem. I could not see my own patterns because I was inside them. No tool I had ever used — journaling, habit tracking, productivity systems, therapy — had ever shown me my behavior objectively. They all waited for me to report what I thought was happening. And I was wrong, every time.

That is why I built SŌMA.

The Problem Is Bigger Than One Person

Research by organizational psychologist Tasha Eurich found that while 95% of people believe they are self-aware, only 10–15% actually are. There is less than a 30% correlation between how competent people believe they are and how competent they actually are.

This is not rare. This is the default human condition.

Every self-improvement tool ever built has one fatal flaw: it relies on your self-report. You tell it your goals. You log your mood. You describe your decisions. But you are the least reliable narrator of your own life. You misremember. You rationalize. You omit the embarrassing parts.

$$\text{Self-Report Accuracy} \approx 0.3 \times \text{Actual Behavior}$$

The data you give every tool about yourself is 70% noise.

What SŌMA Actually Is

SŌMA is the first objective observer of a human life.

Not a second brain. Not a memory tool. Not a habit tracker.

An always-on, local-first behavioral intelligence system that:

Captures raw behavioral signal across audio, screen, and text — entirely on your device, never uploaded
Builds a model of who you actually are from what you do, not what you say
Surfaces your real patterns, your blind spots, and the gap between your stated values and actual behavior
Intervenes at the exact millisecond you are about to repeat a mistake

It does not ask you anything. It watches what you do.

How I Built It

The entire stack runs locally. Zero cloud. Zero data leaving the device. Zero infrastructure cost.

Layer	Tool
Audio Transcription	Whisper.cpp (real-time, on-device)
Screen Understanding	Moondream2 (local vision model)
Behavioral Inference	Llama 3.3 via Ollama
Semantic Embeddings	nomic-embed via Ollama
Memory + Pattern Store	LanceDB + NetworkX
Intervention Engine	Custom temporal pattern detection
Frontend	Electron (local desktop app)
Encryption	AES-256, user holds the only key

The core architecture is a three-layer pipeline:

$$\text{Raw Signal} \rightarrow \text{Semantic Embedding} \rightarrow \text{Behavioral Graph} \rightarrow \text{Pattern Detection} \rightarrow \text{Intervention}$$

Every piece of raw data — audio, screen frames, keystrokes — is processed locally and immediately converted into semantic embeddings. The raw data is deleted after processing. What persists is meaning, not content. What is stored is mathematically irreversible back to the original signal.

What I Learned

Three things surprised me during the build:

1. The hardest problem is not technical — it is trust. An always-on system that watches your behavior will face immediate skepticism. The only answer is radical transparency: open-source the data pipeline, let anyone verify no exfiltration occurs, and make local-first architecture non-negotiable from day one.

2. Pattern detection across time is harder than pattern detection across content. Most AI systems find patterns in what is there. SŌMA needs to find patterns in sequences across days and weeks — what follows what, how often, under what conditions. That required building a custom temporal graph rather than relying on standard vector similarity.

3. The intervention timing problem is everything. Surfacing a pattern after the fact is useful. Surfacing it at the exact moment of deviation is transformative. Getting that timing right — early enough to matter, not so early it becomes noise — is the core product insight that separates SŌMA from every existing tool.

The Challenges

Privacy and trust — solved through uncompromising local-first architecture and open-source transparency.

Cold start problem — the behavioral model needs time to become meaningful. Solved by combining heuristic pattern detection in week one with ML-based detection after sufficient data accumulates.

Intervention fatigue — too many nudges and users ignore them all. Solved by a strict relevance threshold: SŌMA only intervenes when pattern confidence exceeds 80% and the stakes of the current action are above baseline.

Why Now

Four technical constraints that made this impossible broke simultaneously in the last 18 months:

Local real-time transcription → Whisper v3
On-device vision understanding → Moondream2
Consumer-grade local LLMs → Llama 3.3, Mistral
Local vector databases → LanceDB, ChromaDB

$$\text{2026} = \text{First year the entire stack fits on a consumer device}$$

The technology was waiting for someone to assemble it into this.

The Vision

In three years: 1,000,000 people have replaced their therapist, coach, or self-help routine with SŌMA as their primary tool for self-understanding — measured by active subscribers who have used the system for a minimum of six consecutive months.

The six-month threshold matters. It is the point at which the behavioral model becomes dense enough to surface patterns invisible to any shorter observation window.

It is also the point at which no user has ever voluntarily left.

Because leaving means losing the most honest record of yourself that has ever existed.

SŌMA. The first system that tells you the truth about yourself.