đź’ˇ Inspiration
The modern news cycle is relentless, chaotic, and overwhelming. As an engineering student, staying updated with global events while balancing coursework and building projects is a constant struggle. I realized there was a massive gap between the raw velocity of breaking news and the polished, digestible format of televised broadcasts.
I wanted to build a system that consumes this chaos and autonomously transforms it into high-quality, cinematic content. The goal was to create a 24/7 infinite broadcast—a pipeline that entirely removes the human bottleneck from media production, proving that frontier AI models can act as a complete, autonomous production studio.
⚙️ How we built it
NovaStream is constructed as a fully autonomous, event-driven multi-agent architecture powered by AWS Bedrock. We split the workload across specialized AI agents:
- The Showrunner: Powered by Amazon Nova 2 Lite, this agent fetches live headlines via NewsAPI and generates a structured JSON blueprint containing scene breakdowns and scripts.
- The Voice Actor: We utilized Amazon Nova 2 Sonic to synthesize hyper-realistic, broadcast-quality narration for each generated scene.
- The Casting Director: This agent queries the Pexels API for cinematic stock footage.
- The Editor: A Python-driven FFmpeg instance that dynamically stitches the audio and video together, pushing the final render to Supabase Storage.
To ensure the best visual match for the generated scripts, the Casting Director agent uses a mathematical weighting model to score and select the optimal background video based on keyword relevance, duration variance, and visual quality. The selection score $S$ for a candidate video $V$ against a scene text $T$ is calculated as:
$$S(V, T) = \alpha \sum_{i=1}^{n} w_i \cdot \text{match}(k_i, V_{\text{tags}}) - \beta \left| V_{\text{duration}} - T_{\text{audio_length}} \right|$$
Where $k_i$ represents the extracted semantic keywords, and $\alpha$ and $\beta$ are tuning parameters for relevance and pacing.
Finally, relying on recent deep dives into API management and architecture, the backend was built with FastAPI and Asyncio to handle these heavy external calls concurrently. The frontend is a Next.js 14 application that relies on WebSockets to listen for new episode events and seamlessly transition the video player, ensuring a clean, immersive UI/UX for the viewer.
đźš§ Challenges we ran into
- Strict JSON Output: One of the biggest hurdles was ensuring the Nova 2 Lite model consistently returned perfectly formatted JSON so the FFmpeg Editor wouldn't crash. This required rigorous prompt engineering and setting up fallback retry logic.
- Pipeline Latency: Generating TTS, fetching 4K videos, and rendering
.mp4files is incredibly time-consuming. We had to heavily optimize the asynchronous Python functions to ensure the pipeline generated episodes faster than they were consumed by the live player. - Media Synchronization: Calculating the exact byte-length and duration of the Nova 2 Sonic audio streams to trim the Pexels video clips perfectly took a lot of trial and error with FFmpeg filters.
đź§ What we learned
Building NovaStream was an incredible crash course in orchestrating multiple foundation models into a single, cohesive workflow. I learned how to seamlessly integrate AWS Bedrock into a Python backend using boto3, completely bypassing the need to host or manage inference servers. Additionally, I gained deep, practical experience in managing complex asynchronous tasks, WebSocket streaming, and dynamic video manipulation via code.
Built With
- amazon-nova
- asyncio
- aws-bedrock
- aws-iam
- boto3
- docker
- fastapi
- ffmpeg
- newsapi
- next.js
- pexels-api
- python
- react
- render
- supabase
- tailwind-css
- vercel
- websockets
Log in or sign up for Devpost to join the conversation.