Narrify

Inspiration

We were inspired by how people actually learn from Instagram, TikTok, and Facebook—short, story-driven, and conversational formats that make ideas stick. Dense PDFs and lecture notes rarely feel that way. We wanted the same “easy to consume, hard to put down” energy applied to serious documents, without dumbing them down.

What it does

Narrify turns PDFs or free-form topics into three parallel narratives derived from the same source:

Hero’s Journey (three beats) — Confusion (why it feels hard), Struggle (simple explanation), Breakthrough (clear takeaway).
Educational story — a short, slightly imaginative but grounded story (roughly a few minutes read) that carries the main ideas.
Podcast-style dialogue — alternating Host A (curious beginner) and Host B (explainer), with Article or Screenplay layout in the UI.

Each block can be played as AI narration via ElevenLabs (proxied through your Node server so the API key stays off the client). PDFs are read in memory (no disk upload of extracted text), with size and length limits to keep requests practical.

How we built it

Frontend: React (Vite), Axios, a custom starfield canvas background, and cards for each narrative type plus play buttons for audio.
PDF path: Flask + PyMuPDF (fitz) extracts text from uploaded PDFs; Google Gemini 2.5 Flash returns strict JSON (with markdown-fence stripping when the model wraps it).
Text path: Express /generate calls the same Gemini prompt for pasted topics (with a local fallback if the key is missing).
Audio: Express /audio calls ElevenLabs (eleven_multilingual_v2) and streams MP3 back to the browser.
Tooling: You used Claude heavily for scaffolding the frontend and backends; Gemini owns the transformations; ElevenLabs owns the voice.

Challenges we ran into

The Flask service that handles PDF uploads hit network errors that looked like bugs or API issues but turned out to be environment/hardware (e.g. local network or machine), not the parsing or Gemini integration itself. On the software side, reliable JSON from the model and guarding PDFs (blank/scanned-only files, size limits, truncating very long extracts) needed careful handling.

Accomplishments that we’re proud of

End-to-end PDF → three narratives in one flow, with clear validation (PDF-only, 20 MB cap, truncation around 40k characters of extracted text).
Two input modes—upload a PDF or type a topic—both feeding the same narrative pipeline.
Polished, thematic UI (Hosts, “scroll” metaphor) plus one-click narration per section.
Secrets kept on the server (Gemini/ElevenLabs keys in .env, not in client code).

What we learned

We learned not to push API keys to GitHub—use .env, .gitignore, and separate keys per provider; treat repos as public by default. We also learned to design for flaky LLM output (JSON fences, fallbacks for the text-only path) and to match product claims to the prompt (e.g. the app is explicitly Hero’s Journey + story + dialogue, not a separate “third-person narrator” mode in code—though the model often narrates in an accessible, story-like voice).

What’s next for Narrify

Scientific papers and long textbook chapters—smarter chunking, citations, and section-aware summaries.
Scanned PDFs via OCR so image-only documents work.
Richer audio: two voices for the podcast dialogue, or chapter playlists.
Deployment: one hosted backend instead of localhost URLs in the client.