DigitalBrain

Inspiration

We all have the same problem: great ideas, articles, voice notes, and screenshots scattered across a dozen apps — and none of it ever gets used. The friction of capturing and organizing knowledge is so high that most of it just disappears.

We wanted to build something that lives where people already are. No new app to download, no new habit to build. Just WhatsApp — the app everyone already has open — turned into a personal knowledge system.

What it does

Digital Brain is a WhatsApp bot that acts as your second brain. You send it anything — a voice note, a photo of a whiteboard, a PDF, a YouTube link, a random thought at 2am — and it saves it instantly with zero friction.

When you're ready to organize, /process triggers the AI. It reads everything in your inbox, groups notes by theme even if they arrived in mixed order, and proposes a structured Markdown summary for each topic. You approve it, it gets saved. Simple.

Beyond capture and organization, the bot also handles spaced repetition: it messages you proactively to review saved topics, generates quizzes to test your retention, and exports your entire knowledge base as Markdown files compatible with Obsidian.

How we built it

The stack is Python + FastAPI + Twilio for the WhatsApp layer, Google Gemini 2.5 Flash for AI (text, vision, and multimodal), and SQLite with WAL mode for persistence.

The key architectural decision was separating capture from processing. Heavy content — images, web pages, YouTube videos — is saved as lazy placeholders ([IMAGE_PENDING], [WEB_PENDING]) during capture so the bot always responds in under a second. Resolution happens only when the user runs /process.

The /process flow uses a single LLM call that both detects thematic groups and generates the proposal in one shot when there's only one topic — reducing API usage from 3 calls to 2. When multiple themes are detected, the user gets a menu and can freely combine groups: 1, 2 3, 1 2 3, or todos.

Everything is multi-user from the ground up — all inbox, state, and configuration operations are keyed by WhatsApp number.

Challenges we ran into

API rate limits hit us hard during development. Three chained LLM calls per /process burned through Gemini's free tier (200 req/day) faster than expected. We solved it by fusing the group detection and proposal generation into a single prompt, and adding a global 5-second throttle in LLMManager.

Twilio's Sandbox limit of 50 messages/day also forced us to develop most of the bot logic through a local console simulator that mimics the full WhatsApp flow — which turned out to be a productivity win.

Model deprecation mid-build — gemini-2.0-flash became unavailable for new accounts while we were working. We had to migrate from the deprecated google-generativeai SDK to the new google-genai SDK mid-sprint.

Getting the AI to return consistent JSON for the dual-mode response (single topic vs. multi-group) required careful prompt engineering and defensive parsing with fallbacks at every level.

Accomplishments that we're proud of

The thematic grouping works remarkably well even when notes arrive completely interleaved. You can send Python has a GIL, then Strawberries have more vitamin C than oranges, then asyncio is better for I/O, and the bot correctly separates them into two distinct knowledge clusters.

The lazy resolution architecture means capture feels instant regardless of content type — the same response time whether you send a text or a 10MB PDF.

The export feature generates a full Markdown knowledge base with individual files per topic and an INDEX.md with Obsidian-compatible [[wiki links]], delivered directly as a WhatsApp attachment.

What we learned

Keeping LLM calls to a minimum isn't just an optimization — it's a design constraint that forced us to write better prompts. The single-call dual-mode approach (detect + propose in one shot) produced cleaner results than the original three-call pipeline.

Separating capture from processing is the right model for any knowledge system. The moment you add friction to capture, the system fails — people stop using it. Everything else can be slow; capture must be instant.

Building a console simulator that mirrors the production flow saved enormous amounts of time and API quota. It's the first thing we'd add to any future bot project.

What's next for DigitalBrain

Persistent state — StateManager currently lives in memory; a restart loses pending confirmations. Moving it to SQLite is the obvious next step.
Semantic search with embeddings — /remember currently sends recent topics to the LLM as context. Real vector search would make it genuinely useful at scale.
Graph of ideas — automatically linking related topics across sessions to surface connections the user didn't notice.
Voice-first interface — making the entire flow work via voice notes only, no typing required.
Multi-language — the bot currently assumes Spanish. Detecting input language and responding accordingly.