Incuera

Inspiration

Every developer has stared at analytics dashboards wondering why users drop off at checkout or how they navigate a confusing UI. Traditional analytics tell you what happened—bounce rates, conversion funnels, click counts—but never why. We wanted to build something that lets you watch exactly what users experience, then have AI explain the patterns humans might miss.

The idea crystallized when debugging a production bug: logs showed errors, but understanding the user's journey required piecing together timestamps across multiple systems. What if we could just watch the session? And what if AI could watch thousands of sessions and surface the insights automatically?

What it does

Incuera captures user sessions as replayable videos with AI-powered analysis:

Record - A lightweight SDK ($<5\text{KB}$ gzipped) captures DOM mutations, mouse movements, scrolls, and interactions using rrweb
Replay - Sessions are rendered into MP4 videos using headless browser technology
Analyze - Molmo 2 vision model watches the videos and extracts:
- Session summaries in natural language
- Interaction heatmaps (where users clicked, hovered, scrolled)
- Conversion funnel tracking
- Error and frustration detection
- Action counts and behavioral patterns

The dashboard lets teams watch any session, filter by user or timeframe, and get AI insights without manual review.

How we built it

The system has four main components:

SDK (@incuera/sdk) - TypeScript library using rrweb to record browser events. Events are batched and sent every 10 seconds or 100 events. Sessions under 30 seconds are discarded to filter noise. A heartbeat mechanism keeps long sessions alive.

Backend (FastAPI + Python) - Handles event ingestion, session management, and orchestrates video generation. Uses SQLAlchemy with PostgreSQL (Supabase) for persistence and ARQ with Redis for background job processing.

Video Generation (Playwright) - A headless Chromium browser renders the rrweb player with recorded events, then captures video at 1280×720. Thumbnails and keyframes are extracted for previews. Videos upload to Supabase Storage.

AI Analysis (OpenRouter + Molmo 2) - The vision-language model analyzes generated videos via API. We prompt it to extract structured data:

$$\text{Analysis} = f(\text{video}) \rightarrow {\text{summary}, \text{heatmap}, \text{funnel}, \text{errors}, \text{actions}}$$

Frontend (Next.js 16 + React 19) - Dashboard with project management, API key handling, session browsing, and video playback with analysis overlays.

Challenges we ran into

Session lifecycle management - Handling the gap between "user starts browsing" and "session is worth recording" required careful state management. We store metadata in Redis temporarily, only persisting to PostgreSQL when sessions exceed 30 seconds. Race conditions in concurrent session-end requests required distributed locking.

Video generation at scale - Playwright is resource-intensive. Rendering a 5-minute session can take 30+ seconds. We implemented job queuing with ARQ, retry logic for failures, and careful cleanup of temporary files to prevent disk exhaustion.

AI prompt engineering - Getting Molmo 2 to return structured JSON instead of prose required iterative prompt refinement. The model sometimes hallucinated UI elements or misidentified actions. We added validation layers and fallback defaults.

Source/dist synchronization - During rapid iteration, our SDK source fell out of sync with the compiled distribution. Debugging why production behavior differed from development was a painful lesson in build pipeline discipline.

Accomplishments that we're proud of

End-to-end pipeline works - From a user clicking a button to watching an AI-analyzed video replay, the full loop functions
Sub-5KB SDK - Recording doesn't bloat client bundles
Serverless-ready architecture - Connection pooling, stateless API design, and background workers scale independently
Clean multi-tenant model - Projects, API keys, and sessions are properly isolated with row-level security

What we learned

rrweb is powerful but complex - The recording format captures everything, but replaying it correctly requires understanding DOM serialization deeply
Vision models need visual clarity - Molmo 2 performs better on higher-contrast UIs; subtle hover states get missed
Background jobs need observability - Silent failures in video generation taught us to add comprehensive logging at every step
Type safety pays off - TypeScript and Pydantic caught numerous bugs before they reached production

What's next for Incuera

Real-time replay - Stream sessions live without waiting for video generation
Rage click detection - Identify frustrated users automatically
Funnel builder - Define conversion funnels visually, get AI recommendations for improvement
Team collaboration - Comments, annotations, and shared insights on sessions
Self-hosted option - Docker compose for privacy-conscious teams

Built With

arq
fastapi
molmo-2
next.js
openrouter-api
playwright
postgresql
pydantic
python
react
redis
rrweb
shadcn/ui
sqlalchemy
supabase
tailwind-css
tanstack
typescript

Updates

Anirudh Ramesh started this project — Jan 18, 2026 11:31 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.