Inspiration

There's a specific kind of pain every developer knows.

You're staring at a stack trace. It looks familiar. You know you've seen this before - maybe last month, maybe last sprint. But you can't remember what fixed it. You search Slack. You grep through old PRs. You ask a teammate who vaguely remembers something about a race condition but can't recall the details.

Forty minutes later, you fix it. Again. For the second time.

That moment - debugging the same bug twice because nothing remembered the first time - is what DevTrace AI is built to eliminate.

The deeper I looked at how teams actually debug, the worse it got. Debugging context lives in the wrong places: ephemeral chat messages, half-finished Notion pages, closed browser tabs. The moment a session ends, the hard-won insight evaporates. The next developer who hits the same error starts from zero.

I wanted to build something that treated debugging as a first-class engineering discipline - with permanent records, AI-powered analysis, semantic memory, and real-time collaboration baked in at the foundation. Not a bolt-on. Not a plugin. A purpose-built debugging operating system for teams.


What I Learned

Local-First Is a Mindset Shift, Not a Feature

The biggest lesson from building DevTrace AI wasn't technical - it was philosophical. Local-first architecture forces you to rethink every assumption about where data lives and when it's available.

The insight that crystallized everything:

$$\text{Perceived Latency} = \text{Network Latency} \times \text{Read Frequency}$$

Most apps optimize writes. But reads happen orders of magnitude more often. With PowerSync, every useQuery() hits local SQLite at ~0ms. The network becomes invisible to the user - a background sync concern, not a rendering concern. Once you internalize this, you can't go back to building apps that spin on every page load.

Semantic Search Doesn't Require a Server

Before this project, I assumed meaningful vector search required a managed vector database, a cloud embedding API, and a backend retrieval layer. transformers.js proved that wrong. Running Xenova/all-MiniLM-L6-v2 entirely in the browser generates 384-dimension embeddings with no API call, no server, and no cost per query.

Cosine similarity against those vectors stored in local SQLite:

$$\text{similarity}(A, B) = \frac{A \cdot B}{|A| \cdot |B|}$$

...is fast enough for real-time "Similar Sessions" matching. Combine it with keyword token overlap scoring and you get a hybrid retrieval layer that catches both exact error matches and semantically related bugs - entirely on-device, entirely offline.

Collaboration Doesn't Require a Collaboration Backend

Before DevTrace AI, my mental model for real-time collaboration was: WebSockets + a presence server + a pub/sub layer. Building the session and project collaboration on PowerSync WAL sync rewired that completely.

Every presence heartbeat, checklist toggle, and chat message is a powerSync.execute() write to local SQLite. PowerSync syncs it to every collaborator's device via WAL. No Socket.io. No Supabase Realtime subscription. No polling. The result is collaboration that works even when the network is flaky — because the source of truth is always local.

AI Security Belongs at the Edge

Client-side AI calls are a security antipattern - API keys in the browser, no rate limiting, no audit trail. Every AI call in DevTrace AI is server-side: JWT verified, rate limited, keys stored in Supabase Secrets. The rate limiting model is a rolling 1-hour window:

$$\text{requests_allowed} = \max(0,\ 20 - \text{count in last 3600s})$$

Enforced per user in a rate_limits table before any Groq call fires. This pattern - thin client, smart Edge Function - turned out to be both more secure and more maintainable than any client-side alternative.


How I Built It

DevTrace AI is built as six distinct layers, each with a single clear responsibility.

Layer 1 - Local-First Data Core

PowerSync manages 11 tables across 5 sync bucket definitions. All reads are useQuery() against local SQLite - zero network, zero spinner. All writes are powerSync.execute() - written locally first, uploaded automatically. Large blobs like ai_analysis bypass the mutation queue and write direct to Supabase, then sync back down via WAL. This hybrid write path keeps the local SQLite responsive while handling payloads that would choke the WASM CRUD reader.

Layer 2 - Auth & Source of Truth

Supabase Postgres is the canonical store, with Row Level Security on every table. Three auth providers (email, GitHub OAuth, Google OAuth). Three Edge Functions own all server-side logic - analyze-bug for AI inference with rate limiting, debug-dna for debugging fingerprint generation, and mastra-agent as a JWT-verified proxy to Mastra Cloud.

Layer 3 - Hybrid Local-First RAG

On every bug log, transformers.js generates a 384-dim embedding in the browser and stores it via powerSync.execute(). On session open, two scoring layers fire against local SQLite simultaneously - keyword token overlap for exact matches, cosine similarity for semantically related bugs. Top matches surface as "Similar Sessions" with confidence scores. The entire retrieval pipeline runs offline.

Layer 4 - AI Intelligence

analyze-bug routes Groq + Llama 3.3 70B calls server-side and returns a structured 8-tab breakdown saved as JSONB - persistent across reloads, no re-analyzing needed. Two Mastra Cloud agents handle deeper work: Session Debugger for diff-format line-level fixes, Project Analyzer for pattern detection and health verdicts across the full session history.

Layer 5 - Real-Time Collaboration

Session and project collaboration run entirely on PowerSync WAL sync. Presence heartbeats, shared checklists, session chat, project activity feed, and project chat are all powerSync.execute() writes that replicate instantly to every collaborator's local SQLite. No custom backend. No WebSocket server. Collaboration that works offline.

Layer 6 - Offline Intelligence

When a user is offline and opens a session without prior AI analysis, useOfflineMemory queries local SQLite for sessions with ai_analysis, scores by token overlap, and synthesizes root causes, fixes, and checklist items from the top 5 matches. Every suggestion is tagged with a confidence level and linked to the source sessions it came from - useful, but never misleading.


Challenges

The Large Blob Problem

ai_analysis is a dense JSONB object - 8-12KB per session at full fidelity. Routing it through PowerSync's WASM CRUD reader introduced noticeable lag and occasional corruption on large payloads. The fix was a split write path: large blobs go direct to Supabase via supabase.update(), then sync back down via WAL. Small fields route through powerSync.execute() as normal. Getting this split clean - without race conditions between the two write paths - took more iteration than any other single problem in the project.

Embedding Storage Without a Vector Database

Storing 384-dimension float arrays in SQLite meant serializing vectors as JSON strings and deserializing on retrieval before running cosine similarity in JavaScript. It's not a purpose-built vector store - but it's fast enough for the dataset sizes DevTrace AI targets, it's fully offline, and it syncs to every device automatically via PowerSync. The constraint forced a pragmatic solution that turned out to be genuinely good enough.

Offline Assistance Without Hallucination

The Offline Memory Assist had one hard requirement: it could never make a user think they were receiving fresh AI analysis when they weren't. Every suggestion needed a confidence level (High / Medium / Low), a link back to the source sessions it was synthesized from, and clear UI labeling that this was local history synthesis - not inference. Getting the UX language precise - helpful without misleading - was harder than writing the synthesis algorithm itself.

Mastra Agent Output Consistency

Mastra Cloud agents return rich, reasoning-heavy output. Parsing it reliably into a structured UI - root cause badge, before/after diff, verification steps, risk flags - required careful prompt engineering and JSON schema enforcement in the Edge Function proxy. Early builds had silent UI failures when the agent returned a slightly different shape. The fix was strict schema validation in the Edge Function before the response ever reached the client.

Concurrent Rate Limit Enforcement

The rate_limits table uses a rolling 1-hour window enforced with a SELECT COUNT(*) before each Groq call. Under concurrent requests from the same user, two requests could pass the count check simultaneously before either write landed - a classic TOCTOU race. Solving this without a full mutex or a Postgres advisory lock meant restructuring the upsert pattern in the Edge Function to make the count check and the write effectively atomic. Small problem, surprisingly sharp edges.


Built With

  • cosine-similarity
  • edge-functions
  • github-oauth
  • groq
  • hybrid-rag
  • jwt
  • llama-3.3-70b
  • local-first
  • mastra
  • oauth
  • offline-first
  • postgresql
  • powersync
  • react
  • real-time-collaboration
  • recharts
  • row-level-security
  • sqlite
  • supabase
  • tailwindcss
  • transformers.js
  • typescript
  • vector-search
  • vercel
  • vite
  • webassembly
  • zustand
Share this project:

Updates