Inspiration
There's a specific kind of pain every developer knows.
You're staring at a stack trace. It looks familiar. You know you've seen this before - maybe last month, maybe last sprint. But you can't remember what fixed it. You search Slack. You grep through old PRs. You ask a teammate who vaguely remembers something about a race condition but can't recall the details.
Forty minutes later, you fix it. Again. For the second time.
That moment - debugging the same bug twice because nothing remembered the first time - is what DevTrace AI is built to eliminate.
The deeper I looked at how teams actually debug, the worse it got. Debugging context lives in the wrong places: ephemeral chat messages, half-finished Notion pages, closed browser tabs. The moment a session ends, the hard-won insight evaporates. The next developer who hits the same error starts from zero.
I wanted to build something that treated debugging as a first-class engineering discipline - with permanent records, AI-powered analysis, semantic memory, and real-time collaboration baked in at the foundation. Not a bolt-on. Not a plugin. A purpose-built debugging operating system for teams.
What I Learned
Local-First Is a Mindset Shift, Not a Feature
The biggest lesson from building DevTrace AI wasn't technical - it was philosophical. Local-first architecture forces you to rethink every assumption about where data lives and when it's available.
The insight that crystallized everything:
$$\text{Perceived Latency} = \text{Network Latency} \times \text{Read Frequency}$$
Most apps optimize writes. But reads happen orders of magnitude more often.
With PowerSync, every useQuery() hits local SQLite at ~0ms. The network
becomes invisible to the user - a background sync concern, not a rendering
concern. Once you internalize this, you can't go back to building apps that
spin on every page load.
Semantic Search Doesn't Require a Server
Before this project, I assumed meaningful vector search required a managed
vector database, a cloud embedding API, and a backend retrieval layer.
transformers.js proved that wrong. Running Xenova/all-MiniLM-L6-v2
entirely in the browser generates 384-dimension embeddings with no API call,
no server, and no cost per query.
Cosine similarity against those vectors stored in local SQLite:
$$\text{similarity}(A, B) = \frac{A \cdot B}{|A| \cdot |B|}$$
...is fast enough for real-time "Similar Sessions" matching. Combine it with keyword token overlap scoring and you get a hybrid retrieval layer that catches both exact error matches and semantically related bugs - entirely on-device, entirely offline.
Collaboration Doesn't Require a Collaboration Backend
Before DevTrace AI, my mental model for real-time collaboration was: WebSockets + a presence server + a pub/sub layer. Building the session and project collaboration on PowerSync WAL sync rewired that completely.
Every presence heartbeat, checklist toggle, and chat message is a
powerSync.execute() write to local SQLite. PowerSync syncs it to every
collaborator's device via WAL. No Socket.io. No Supabase Realtime
subscription. No polling. The result is collaboration that works even when
the network is flaky — because the source of truth is always local.
AI Security Belongs at the Edge
Client-side AI calls are a security antipattern - API keys in the browser, no rate limiting, no audit trail. Every AI call in DevTrace AI is server-side: JWT verified, rate limited, keys stored in Supabase Secrets. The rate limiting model is a rolling 1-hour window:
$$\text{requests_allowed} = \max(0,\ 20 - \text{count in last 3600s})$$
Enforced per user in a rate_limits table before any Groq call fires.
This pattern - thin client, smart Edge Function - turned out to be both
more secure and more maintainable than any client-side alternative.
How I Built It
DevTrace AI is built as six distinct layers, each with a single clear responsibility.
Layer 1 - Local-First Data Core
PowerSync manages 11 tables across 5 sync bucket definitions. All reads
are useQuery() against local SQLite - zero network, zero spinner.
All writes are powerSync.execute() - written locally first, uploaded
automatically. Large blobs like ai_analysis bypass the mutation queue
and write direct to Supabase, then sync back down via WAL. This hybrid
write path keeps the local SQLite responsive while handling payloads
that would choke the WASM CRUD reader.
Layer 2 - Auth & Source of Truth
Supabase Postgres is the canonical store, with Row Level Security on every
table. Three auth providers (email, GitHub OAuth, Google OAuth). Three
Edge Functions own all server-side logic - analyze-bug for AI inference
with rate limiting, debug-dna for debugging fingerprint generation,
and mastra-agent as a JWT-verified proxy to Mastra Cloud.
Layer 3 - Hybrid Local-First RAG
On every bug log, transformers.js generates a 384-dim embedding in the
browser and stores it via powerSync.execute(). On session open, two
scoring layers fire against local SQLite simultaneously - keyword token
overlap for exact matches, cosine similarity for semantically related bugs.
Top matches surface as "Similar Sessions" with confidence scores. The
entire retrieval pipeline runs offline.
Layer 4 - AI Intelligence
analyze-bug routes Groq + Llama 3.3 70B calls server-side and returns
a structured 8-tab breakdown saved as JSONB - persistent across reloads,
no re-analyzing needed. Two Mastra Cloud agents handle deeper work:
Session Debugger for diff-format line-level fixes, Project Analyzer for
pattern detection and health verdicts across the full session history.
Layer 5 - Real-Time Collaboration
Session and project collaboration run entirely on PowerSync WAL sync.
Presence heartbeats, shared checklists, session chat, project activity
feed, and project chat are all powerSync.execute() writes that replicate
instantly to every collaborator's local SQLite. No custom backend.
No WebSocket server. Collaboration that works offline.
Layer 6 - Offline Intelligence
When a user is offline and opens a session without prior AI analysis,
useOfflineMemory queries local SQLite for sessions with ai_analysis,
scores by token overlap, and synthesizes root causes, fixes, and checklist
items from the top 5 matches. Every suggestion is tagged with a confidence
level and linked to the source sessions it came from - useful, but never
misleading.
Challenges
The Large Blob Problem
ai_analysis is a dense JSONB object - 8-12KB per session at full
fidelity. Routing it through PowerSync's WASM CRUD reader introduced
noticeable lag and occasional corruption on large payloads. The fix
was a split write path: large blobs go direct to Supabase via
supabase.update(), then sync back down via WAL. Small fields route
through powerSync.execute() as normal. Getting this split clean -
without race conditions between the two write paths - took more
iteration than any other single problem in the project.
Embedding Storage Without a Vector Database
Storing 384-dimension float arrays in SQLite meant serializing vectors as JSON strings and deserializing on retrieval before running cosine similarity in JavaScript. It's not a purpose-built vector store - but it's fast enough for the dataset sizes DevTrace AI targets, it's fully offline, and it syncs to every device automatically via PowerSync. The constraint forced a pragmatic solution that turned out to be genuinely good enough.
Offline Assistance Without Hallucination
The Offline Memory Assist had one hard requirement: it could never make a user think they were receiving fresh AI analysis when they weren't. Every suggestion needed a confidence level (High / Medium / Low), a link back to the source sessions it was synthesized from, and clear UI labeling that this was local history synthesis - not inference. Getting the UX language precise - helpful without misleading - was harder than writing the synthesis algorithm itself.
Mastra Agent Output Consistency
Mastra Cloud agents return rich, reasoning-heavy output. Parsing it reliably into a structured UI - root cause badge, before/after diff, verification steps, risk flags - required careful prompt engineering and JSON schema enforcement in the Edge Function proxy. Early builds had silent UI failures when the agent returned a slightly different shape. The fix was strict schema validation in the Edge Function before the response ever reached the client.
Concurrent Rate Limit Enforcement
The rate_limits table uses a rolling 1-hour window enforced with a
SELECT COUNT(*) before each Groq call. Under concurrent requests from
the same user, two requests could pass the count check simultaneously
before either write landed - a classic TOCTOU race. Solving this without
a full mutex or a Postgres advisory lock meant restructuring the upsert
pattern in the Edge Function to make the count check and the write
effectively atomic. Small problem, surprisingly sharp edges.
Built With
- cosine-similarity
- edge-functions
- github-oauth
- groq
- hybrid-rag
- jwt
- llama-3.3-70b
- local-first
- mastra
- oauth
- offline-first
- postgresql
- powersync
- react
- real-time-collaboration
- recharts
- row-level-security
- sqlite
- supabase
- tailwindcss
- transformers.js
- typescript
- vector-search
- vercel
- vite
- webassembly
- zustand
Log in or sign up for Devpost to join the conversation.