HNFM: your personalized AI-powered podcast

About the Project: HNFM

Inspiration

HNFM was inspired by a few different streams of thought:

NotebookLM, pdf-to-podcast, and other HN-to-podcast projects showed me the potential of turning text-heavy formats into rich audio-visual experiences.
I also drew inspiration from the TBPN podcast by John Coogan and Jordi Hays, which has a fast-paced, tech-and-business style. I liked the idea of creating something similar — but fully automated and powered by open-source AI models running on consumer GPUs.
Finally, I’ve always admired podcasters who cover AI and technology news in ways that are engaging and accessible. I wanted to build a system that could take the constant firehose of Hacker News content and repackage it in a format that feels fresh, fast, and fun.

What It Does

HNFM generates 2–3 minute podcast-style videos from Hacker News posts. Each episode includes:

Dialog between two AI speakers, written by GPT-OSS (a reasoning-focused LLM).
Narration with DIA, a TTS model capable of ultra-realistic cloned voices.
Engaging visuals generated by Flux Krea via the InvokeAI API, based on GPT-OSS’s image prompts.
Subtitles, synced to the narration using WhisperX for ASR and word-level timestamps.
Video assembly via FFmpeg, which stitches audio, images, and subtitles into a finished clip.
Metadata generation including a description, tags, and even a haiku about the article.

The result is a short-form podcast video that feels like a fast-moving Hacker News radio show — part entertainment, part news digest.

How We Built It

The development journey had two major phases:

Vibe coding MVP (CLI)

The first version was a simple command-line tool.
I hacked it together quickly to prove the concept: scrape → summarize → narrate → generate image → assemble.
It worked surprisingly well, but it was rough, hard to debug, and not sustainable for iteration.

AI-assisted rebuild (Web App)

For v2, I started over. I wrote polished, detailed PRDs for each subsystem.
I used Cursor’s coding agents, handing off these PRDs to guide implementation.
I emphasized simplicity above all else. Whenever code grew too complex, I stepped back, deleted large chunks, and re-specified the design.
The backend was built with FastAPI, Redis, and Celery, while the frontend was built with Nuxt.js and ShadCN components.
Each external model service (DIA, WhisperX, InvokeAI, GPT-OSS) runs on my home network with lightweight API wrappers for inference.

This rebuild process taught me how powerful AI-assisted coding can be if you guide it with strong specs — and how messy it gets if you don’t.

Challenges We Ran Into

Content edge cases
- Some Hacker News submissions don’t have articles behind them — just link posts, code, or discussions.
- Others were too long to fit in the LLM’s context window. I experimented with fallbacks, including fetching archived versions from the Internet Archive’s Wayback Machine.
Versioning leap
- Jumping straight from a v1 CLI to a v2 web app was messy.
- AI coding agents introduced unnecessary complexity, and debugging became painful. I ended up deleting large amounts of code and starting again with stripped-down PRDs.
Image generation quirks
- The Flux [dev] NIM sometimes returned blank PNGs (all black), likely due to content filtering.
- I switched to running Flux Krea locally through InvokeAI, which gave me more control.
Context window management
- Articles longer than ~30k tokens caused failures or truncated outputs. Balancing the model’s reasoning ability with available VRAM was tricky.

Accomplishments That We’re Proud Of

Watching the finished videos is the best reward — they’re engaging, fun, and genuinely informative.
I launched a YouTube channel @hn_fm to share the episodes!
I’m proud of building a system where every piece of inference runs locally on my RTX 4090 — no cloud dependencies.
I learned to treat AI-assisted development like a collaboration: guiding coding agents with PRDs, reviewing carefully, and simplifying relentlessly.

What We Learned

Hacker News API: I didn’t realize it was so generous — no API key, barely rate-limited. Perfect for this project.
GPT-OSS reasoning control: You can set reasoning levels by simply including "Reasoning: {low, medium, high}" in the system prompt. This wasn’t well documented, but it unlocked huge flexibility.
Forced alignment in ASR: I learned about techniques for aligning audio with transcripts by sending both to the ASR model. This could dramatically improve subtitle accuracy in future versions.
AI dev workflow: Writing clear, detailed specs for AI coding agents makes all the difference between chaos and progress. Vibe coding is fun and super fast for proving out simple ideas, but AI assisted development is crucial for sustained progress. Carefully writing and iterating on precise requirements documents and then actually reading the code that AI coding agents write is also super important. Who would have thought!?
gpt-oss web scraping capabilities I didn't know that gpt-oss-20b model would be so good as the LLM that powers Firecrawl, the open source web scraping tool I used in my project. Often times I find that local models fall short for certain tasks that larger models all well-suited to handle, but I was pleased to see how well this smaller model handled web scraping tasks!
AI coding agent capabilities AI coding agents are so good at running complex commands that help test and validate your code as it is written. Leaning on this helped me a lot in the development of HNFM.

What’s Next for HNFM

There are a lot of exciting directions to take HNFM:

Better scriptwriting: Add more personality to the dialog, experiment with styles, and refine prompt engineering.
Improved ASR: Explore forced alignment for word-perfect subtitles.
Dynamic images: Move beyond static images to simple animations or Ken Burns–style pans.
Edge case handling: Smarter fallbacks for posts without readable articles.
Agentic workflows: Let the LLM make more decisions — for example, filtering uninteresting posts or re-ranking stories by novelty.
Open source release: Share the codebase so others can experiment with local AI media pipelines.

Ultimately, I see HNFM as more than a tool — it’s a proof of concept for local-first, open-source AI content generation. It demonstrates the state of the art for open source models on consumer hardware. It shows that with today’s GPUs, anyone can build a fully automated AI podcast studio with their AI PC!

Built With

celery
dia
fastapi
gpt-oss
invokeai
nuxt
whisperx

Updates

Brian Caffey started this project — Sep 11, 2025 06:13 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.