Inspiration

Academic reading is broken. The volume of papers published grows faster than any human can read, yet careers, grades, and research quality depend on staying current. The bottleneck isn't access — it's time and cognitive load. I built PaperVoice to attack that bottleneck by converting the hardest part of research consumption (dense, jargon-heavy academic text) into the most passive form of consumption possible (audio). The idea crystallized from my own experience as a graduate student with 15-20 papers in my reading queue at any given time. I read maybe 3 deeply per week, skim 5, and ignore the rest. PaperVoice changes that ratio. And for researchers working on unpublished drafts, there's a deeper problem: you can't upload your pre-submission manuscript to ChatGPT or Gemini without risking IP exposure. PaperVoice solves that too — with a private vault that processes your draft entirely on-device via OpenClaw.

What it does

PaperVoice turns your reading queue into a podcast. Upload any academic PDF, get a 3-minute audio digest with a yes/no/maybe verdict on whether it deserves a full read.

Built for two users: the overwhelmed grad student with 40 papers queued and 3 hours to spare, and the faculty researcher who can't upload pre-submission manuscripts to ChatGPT without risking IP exposure.

For the grad student — a Spotify-style library where every processed paper is searchable and playable instantly. A Courses tab that maps to Canvas readings with due dates. A focused digest mode where you ask a specific question about any paper and get a targeted audio answer back.

For the researcher — a Private Vault where your draft never leaves your machine. OpenClaw intercepts the upload, extracts text locally, and sends only plain text to Claude for processing. The raw PDF is never transmitted to any cloud service. The output is a pre-submission audit: contribution clarity, methodology gaps, and the exact reviewer questions your paper is likely to face.

Everything is narrated by ElevenLabs in natural English. The digest follows P-PACE structure — Problem, Finding, Evidence, Limitations, Takeaway — so every paper sounds the same and your brain stops spending energy on format and starts retaining content.

How we built it

Stack: Node.js/Express backend, React/Vite/Tailwind frontend, SQLite with WAL mode, Anthropic Claude Sonnet for digest generation, ElevenLabs eleven_turbo_v2 for audio narration, Gemini 2.0 Flash for enhanced PDF extraction and vault audit structural analysis, and OpenClaw as the on-device privacy and automation layer. The architecture has three processing paths. Public papers go through the server queue: Gemini extracts structured text from the PDF, Claude generates the P-PACE digest with triage verdict, and ElevenLabs narrates it. Private vault papers take a fundamentally different path: OpenClaw intercepts the upload, extracts text locally using pdf-parse (PDF never transmitted), sends only plain text to Claude, and the audio is written directly to the local filesystem. A third path handles focused digests — user asks a specific question, Claude generates a targeted response, ElevenLabs narrates it. OpenClaw also handles push notifications via Telegram — when a digest completes, you get a message with the verdict while you're doing something else.

Challenges we ran into

The hardest technical challenge was the privacy architecture for the vault. Making raw PDFs never leave the machine while still getting high-quality Claude + ElevenLabs output required a careful split: OpenClaw handles local PDF extraction and sends only text to the server, the server never touches the raw bytes. Getting this fallback chain right (OpenClaw → server-side if unavailable) without breaking the existing vault flow took significant debugging. Gemini's free tier quota exhaustion was a recurring problem during development — each 429 error caused a 30-second timeout before falling back to pdf-parse, making vault processing feel broken. The fix was a session-level quota flag that detects the first 429 and immediately skips Gemini for all subsequent calls in that session. CORS handling with Vite's dynamic port assignment (it picks a random port when 5173 is occupied) required rewriting the CORS middleware to accept any localhost origin rather than a hardcoded port list.

Accomplishments that we're proud of

The privacy model is genuine. Most "privacy-preserving" AI tools still send your data to a server — they just promise not to store it. PaperVoice's vault architecture actually prevents transmission at the infrastructure level. A researcher can verify this by watching their network traffic: the PDF bytes never leave the machine. The P-PACE digest format produces consistently useful output. The triage verdict — worth a full read or not — is the killer feature. It changes how researchers interact with their reading queue. The Spotify structural analogy held up through the entire build. The persistent bottom player, sidebar navigation, library, trending feed, and playlist-equivalent courses tab all map cleanly. It feels like a product, not a hackathon project.

What we learned

OpenClaw is most powerful as an infrastructure layer, not a user-facing feature. The original vision had users interacting with OpenClaw directly — the better design uses it invisibly as the privacy enforcement layer between the user's draft and the cloud APIs. Progressive disclosure in skill design (YAML frontmatter → SKILL.md body → referenced scripts) maps directly to how Claude Code processes prompts — knowing this made the OpenClaw skill dramatically more reliable. ElevenLabs turbo model is fast enough for real-time feel but requires careful SSML handling to avoid robotic output on dense academic terminology.

What's next for PaperVoice

Field Digest (Discover Weekly for research): weekly curated digest of 5 papers from your declared research area sourced from arXiv and Semantic Scholar, trained on your triage behavior. Author Pages: every researcher whose papers appear in the library gets a profile — follow them and get notified when new papers are processed. Research Wrapped: annual stats showing papers processed, time saved, and most-listened authors — shareable on academic Twitter. Freemium model: free tier for library access and 3 personal uploads per month, premium ($10/month) for unlimited uploads, private vault, and Field Digest.

Built With

Share this project:

Updates