Incuera
Inspiration
Every developer has stared at analytics dashboards wondering why users drop off at checkout or how they navigate a confusing UI. Traditional analytics tell you what happened—bounce rates, conversion funnels, click counts—but never why. We wanted to build something that lets you watch exactly what users experience, then have AI explain the patterns humans might miss.
The idea crystallized when debugging a production bug: logs showed errors, but understanding the user's journey required piecing together timestamps across multiple systems. What if we could just watch the session? And what if AI could watch thousands of sessions and surface the insights automatically?
What it does
Incuera captures user sessions as replayable videos with AI-powered analysis:
- Record - A lightweight SDK ($<5\text{KB}$ gzipped) captures DOM mutations, mouse movements, scrolls, and interactions using rrweb
- Replay - Sessions are rendered into MP4 videos using headless browser technology
- Analyze - Molmo 2 vision model watches the videos and extracts:
- Session summaries in natural language
- Interaction heatmaps (where users clicked, hovered, scrolled)
- Conversion funnel tracking
- Error and frustration detection
- Action counts and behavioral patterns
The dashboard lets teams watch any session, filter by user or timeframe, and get AI insights without manual review.
How we built it
The system has four main components:
SDK (@incuera/sdk) - TypeScript library using rrweb to record browser events. Events are batched and sent every 10 seconds or 100 events. Sessions under 30 seconds are discarded to filter noise. A heartbeat mechanism keeps long sessions alive.
Backend (FastAPI + Python) - Handles event ingestion, session management, and orchestrates video generation. Uses SQLAlchemy with PostgreSQL (Supabase) for persistence and ARQ with Redis for background job processing.
Video Generation (Playwright) - A headless Chromium browser renders the rrweb player with recorded events, then captures video at 1280×720. Thumbnails and keyframes are extracted for previews. Videos upload to Supabase Storage.
AI Analysis (OpenRouter + Molmo 2) - The vision-language model analyzes generated videos via API. We prompt it to extract structured data:
$$\text{Analysis} = f(\text{video}) \rightarrow {\text{summary}, \text{heatmap}, \text{funnel}, \text{errors}, \text{actions}}$$
Frontend (Next.js 16 + React 19) - Dashboard with project management, API key handling, session browsing, and video playback with analysis overlays.
Challenges we ran into
Session lifecycle management - Handling the gap between "user starts browsing" and "session is worth recording" required careful state management. We store metadata in Redis temporarily, only persisting to PostgreSQL when sessions exceed 30 seconds. Race conditions in concurrent session-end requests required distributed locking.
Video generation at scale - Playwright is resource-intensive. Rendering a 5-minute session can take 30+ seconds. We implemented job queuing with ARQ, retry logic for failures, and careful cleanup of temporary files to prevent disk exhaustion.
AI prompt engineering - Getting Molmo 2 to return structured JSON instead of prose required iterative prompt refinement. The model sometimes hallucinated UI elements or misidentified actions. We added validation layers and fallback defaults.
Source/dist synchronization - During rapid iteration, our SDK source fell out of sync with the compiled distribution. Debugging why production behavior differed from development was a painful lesson in build pipeline discipline.
Accomplishments that we're proud of
- End-to-end pipeline works - From a user clicking a button to watching an AI-analyzed video replay, the full loop functions
- Sub-5KB SDK - Recording doesn't bloat client bundles
- Serverless-ready architecture - Connection pooling, stateless API design, and background workers scale independently
- Clean multi-tenant model - Projects, API keys, and sessions are properly isolated with row-level security
What we learned
- rrweb is powerful but complex - The recording format captures everything, but replaying it correctly requires understanding DOM serialization deeply
- Vision models need visual clarity - Molmo 2 performs better on higher-contrast UIs; subtle hover states get missed
- Background jobs need observability - Silent failures in video generation taught us to add comprehensive logging at every step
- Type safety pays off - TypeScript and Pydantic caught numerous bugs before they reached production
What's next for Incuera
- Real-time replay - Stream sessions live without waiting for video generation
- Rage click detection - Identify frustrated users automatically
- Funnel builder - Define conversion funnels visually, get AI recommendations for improvement
- Team collaboration - Comments, annotations, and shared insights on sessions
- Self-hosted option - Docker compose for privacy-conscious teams
Built With
- arq
- fastapi
- molmo-2
- next.js
- openrouter-api
- playwright
- postgresql
- pydantic
- python
- react
- redis
- rrweb
- shadcn/ui
- sqlalchemy
- supabase
- tailwind-css
- tanstack
- typescript
Log in or sign up for Devpost to join the conversation.