Pumila AI

pumila

Inspiration

Every brand strategist we talked to opened the same 14 browser tabs to answer the same question: "Who should we be talking to, and why now?" They'd bounce between YouGov, Similarweb, Semrush, McKinsey PDFs, and Deloitte Insights, copy-pasting quotes into a Google Doc, and still ship a deck where half the citations pointed at the wrong paragraph.

We wanted to collapse that week-long ritual into a single prompt. Not another chatbot - a real audience-targeting dossier with numbered citations, a knowledge graph, source-tier transparency (Internal / Syndicated / Published), and evidence you can click back to. The bar was: a strategist should be able to paste the output into a client deck without editing.

We also had a harder constraint: this thing has to run on a desktop. Strategists live inside authenticated tools like YouGov that actively hate being scraped from a data center. A cloud-only agent can never log in as them. A desktop app can.

What it does

Pumila.ai is an Electron desktop app. You type one business question - "Who are the emerging Gen-Z buyers for small-batch tequila in the DMV?" - and within about a minute you get:

An Audience Targeting Dossier with a serif hero, stats strip (sources · audiences · findings · avg confidence), and an executive summary with inline numeric citations [1][2]….
A source-mix stacked bar breaking evidence into Internal / Syndicated / Published tiers.
A knowledge graph (custom SVG) mapping audiences ↔ signals ↔ source tiers as columns with curved gradient edges; hovering a node dims everything outside its neighborhood.
A Report tab of audience cards with confidence + reach bars and structured Signals / Actions / Open questions columns.
An evidence trail - every citation is a real URL, and for firm sources we've also captured the actual page screenshot so Claude can visually corroborate its claim.
One-click Markdown / JSON export to drop into a deck or a Notion doc.

Under the hood, every run does three things in parallel:

Headless Playwright sweeps Deloitte Insights, McKinsey, and NAHREP, extracts text, and captures PNG screenshots.
A Cognito-authenticated cloud gateway (AWS API Gateway -> Lambda -> Bedrock Claude) fans out to Google News + optional Semrush/Similarweb, merges everything, and calls Bedrock with a strict JSON schema and multimodal image blocks so the LLM can see the screenshots it's citing.
Optionally, a sibling headful Playwright crawler that we auto-spawn on 127.0.0.1:4010 drives a real, logged-in YouGov AI Search session and folds that evidence back into the dossier.

Auth is Cognito email/password + Google IdP, entirely in-app. JWTs live in the OS keychain via Electron safeStorage. Bedrock credentials never touch the desktop.

How we built it

We split the system into three coordinated surfaces so each could move independently:

The desktop shell - Electron + React 19 + TypeScript. Vite 8 for the renderer, tsup for the main/preload, React 19's new JSX transform, hand-rolled CSS with semantic variables (no component library - the UI is bespoke, including the Instrument Serif hero and the SVG knowledge graph). State is a single top-level state machine in App.tsx with four modes (loading / setup-error / unauthenticated / authenticated) and a clean IPC surface through contextBridge (auth:*, dossiers:*, crawler:*, visuals:*, sessions:*).

The cloud gateway - AWS CDK v2. One stack provisions Cognito User Pool + Hosted UI + Google federation, API Gateway HTTP API with a JWT authorizer, NodejsFunction Lambdas, a DynamoDB single-table (PK families: USER# / ORG# / SESSION# / USAGE#), CloudWatch, and fine-grained IAM. The main Lambda (create-dossier.ts) validates JWT claims, re-checks org membership in DDB (claims alone are never enough), fans out to search providers, runs the HTML->Markdown scraper pipeline, and calls Bedrock via ConverseCommand with a system prompt that locks the model into our Pumila JSON schema. A second Lambda (generate-visual.ts) hits Gemini/Bedrock image models for deck-ready visuals. Lambda Powertools handles logging, metrics, and tracing.

The headful crawler - a sibling Express + Playwright repo. The desktop auto-spawns it: locates the folder, pnpm install (falling back to npm) if node_modules is missing, writes a minimal .env, spawns dev, pipes stdout with a [crawler] prefix, and polls /healthz for up to 45 s. The user only ever sees the Connect YouGov button - they log into YouGov once in a real browser window, we persist the Playwright storage_state in the OS keychain, and future runs drive YouGov AI Search silently.

Multimodal glue. firmResearch.ts in the main process does a headless Playwright pass over Deloitte/McKinsey/NAHREP, captures a PNG per hit, base64-encodes it, and ships it to the gateway as multimodal_sources. Lambda converts those into Bedrock ConverseCommand image blocks so Claude literally sees the page it's quoting. This is what makes citations trustworthy.

Hardening. contextIsolation: true, nodeIntegration: false, sandbox: false only where necessary, external links forced through shell.openExternal, typed CrawlerUnavailableError codes (no-directory, no-dependencies, no-package-manager, startup-timeout, spawn-failed) so a user sees "Headful crawler is starting up - first launch installs Chromium" instead of TypeError: fetch failed.

Testing + CI-hygiene. Vitest on both the renderer (jsdom + RTL) and the cloud handlers, ESLint 9 flat config with react-hooks/rules-of-hooks as an error, tsc -b as part of build. We hit 18 test files / 86 passing before submission. Windows installers via electron-builder (NSIS + portable x64).

Challenges we ran into

Consulting firms hate automation. Deloitte and McKinsey both throw JS-heavy interstitials; our first Playwright selectors broke on page two. We ended up shipping per-firm hostSuffixes + contentPathHints and a link-filter pass that only keeps URLs matching real insights paths (/insights/, /featured-insights, /our-thinking/, …). NAHREP needed year-suffix hints (/2023/, /2024/) because its CMS doesn't expose a clean tag.
YouGov from a data center is a dead end. Every cloud IP we tried eventually got a login wall. The entire reason the product is a desktop app is so the Playwright session runs from the user's own residential IP with their real cookies. Building the auto-spawn, auto-install, auto-heal child-process lifecycle for the sibling crawler repo took longer than the Bedrock integration did.
Getting Claude to return valid JSON every time. Early runs returned Markdown code fences, trailing commentary, or subtly wrong field names. We locked it down with a strict system prompt (Return valid JSON only. Do not wrap the JSON in markdown…) plus a coerceDossierResponse() parser that tolerates fence wrappers, normalizes confidence scores to 0-100, and re-keys snake_case payloads into the camelCase UI shape via dossierMapper.ts.
Multimodal payload size. Base64-encoded PNGs blew past API Gateway's body limit the first time. We now downsample and cap screenshots per firm, and the Lambda logs withScreenshots vs total source count so we can see which firms are paying their weight.
Electron + React 19 + Vite 8 was bleeding edge. Some of our favorite devtooling simply didn't have updated typings yet. We leaned on tsup's --external electron, cross-env + wait-on to sequence vite / tsup --watch / electron . on Windows, and a flat ESLint config that actually understands the new JSX transform.
Auth UX without an external browser. Most Cognito examples pop a system browser for Hosted UI. That feels cheap in a desktop product. We built the full email/password + verification code + password reset flow in-app with USER_PASSWORD_AUTH, and only fall back to Hosted UI PKCE for Google. Then we gated Google behind a feature flag (PUMILA_GOOGLE_AUTH_ENABLED) so the button is hidden rather than broken when the pool has no Google IdP configured.
Keeping a desktop installer honest. We made a deliberate call: the built-in PUMILA_API_BASE_URL and Cognito client ID are public values (same class as an OAuth redirect URI). The sensitive thing is the user's JWT, which lives in safeStorage. No Bedrock keys, no AWS keys, ever, anywhere near the renderer.

Accomplishments that we're proud of

A working end-to-end research loop - question in, cited dossier + knowledge graph out - with real multimodal grounding, not a demo-grade stub.
An actually beautiful UI: custom SVG knowledge graph with gradient edges + neighborhood highlighting, serif hero, source-mix stacked bar, command palette (⌘K), keyboard shortcuts dialog, light/dark/system theme - all hand-rolled, no shadcn, no MUI.
A production-shaped AWS stack defined entirely as code (CDK v2). One cdk deploy stands up Cognito, API Gateway, Lambda, DynamoDB, Powertools observability, and the right IAM.
An auto-healing local crawler: if the sibling repo is missing deps, we install them; if it's not running, we start it; if it crashes, we surface a typed, user-readable error. Zero terminal knowledge required.
Shipping Cognito auth entirely in-app - no system-browser round-trip for email/password.
A clean, greppable architecture: 10k LOC across desktop + cloud, ESLint clean, 86 passing tests, Windows NSIS + portable installers produced by npm run dist:win.

What we learned

Electron's real superpower is authenticated automation. Running Playwright on a user's own machine with their own cookies unlocks a whole category of data (YouGov, internal BI tools, gated whitepapers) that no SaaS scraper can ethically touch.
Multimodal citations are a trust unlock. Once Claude gets both the text and a screenshot of the source, its quotes stop drifting. The visual channel is a free correctness check.
Schema-first prompting beats prompt engineering. Nailing the JSON schema + a strict "return valid JSON only" system prompt + a forgiving parser got us more reliability than any amount of few-shot tuning.
A desktop app is a distributed system. Renderer ↔ main ↔ Cognito ↔ gateway ↔ Lambda ↔ Bedrock ↔ DDB ↔ local crawler child process is seven moving parts. Designing typed errors at every boundary (e.g. CrawlerUnavailableError, GatewayUnauthorizedError) paid for itself by hour six of the hackathon.
Single-table DynamoDB design is genuinely fast to build once you commit. USER# / ORG# / SESSION# / USAGE# with workspace-aware authorization covered personal workspaces, org membership, invites, join requests, and usage audit in one table.
AWS CDK + TypeScript is the fastest way to go from a whiteboard diagram to a deployed serverless stack - provided you bootstrap the account first (we spent thirty minutes learning that the hard way).

What's next for Pumila.ai

More headful connectors. Same pattern as YouGov: Semrush, Similarweb, Google Search Console, and Google Analytics logged-in sessions driven from the user's own machine, with the captured cookies encrypted in safeStorage.
Dossier collaboration. Right now sessions live on-device in a JSON store. Next is org-scoped cloud persistence in DDB so teams can share, comment, and diff dossiers across a workspace.
An Evidence tab with inline screenshots. The data already flows - we just need the UI to show the firm screenshot next to the citation it supports, with a click-through to the live URL.
Auto-generated deck visuals. The /v1/visuals/generate Lambda already produces Gemini/Bedrock images; we want a one-click "Export to Google Slides" that lays out hero + audience cards + knowledge graph + generated visuals into a client-ready deck.
Scheduled re-runs + diffing. Let a strategist pin a dossier and have it re-generate weekly, surfacing what changed - new sources, new audiences, shifted confidence - as a notification.
macOS + Linux installers. electron-builder already supports them; we just prioritized Windows (NSIS + portable) for this hackathon.
Org admin surface. The CDK stack and data model already contemplate invites, join requests, and owner/admin/member roles; the UI for reviewing and approving is next.