Agents on Air

Home Page
Entering a topic
Loading Page
Starting the Podcast
Agent Nova
Agent Echo
Interrupting the podcast with a question
Agent Nova answering the question

Inspiration

We each wanted to become more educated in our respective areas of interest, and recognized that there was no quick and efficient way to do that, given our busy schedules. When we do have the time to go through articles and newspapers, we each find it to be incredibly boring. We wanted to create a more natural way to stay informed by turning news into something conversational, engaging, and easy to absorb. We are inspired by immersive learning experiences, rather than just textbook-like reading. By letting ideas interact rather than presenting them in isolation, Agents on Air (AoA) transforms understanding from a task into an experience.

What it does

This application enables users to generate and participate in a fully interactive, AI-driven podcast experience in real time. By entering a topic, the system retrieves current news, structures an episode outline, and orchestrates a dynamic 10-15 minute podcast between two AI hosts. Users can interject with questions or commentary, influencing the direction of the discussion as it unfolds. The result is a personalized, on-demand audio experience that blends live information retrieval, conversational AI, and real-time synthesis into a cohesive and responsive platform.

How we built it:

AOA was built through a layered architecture that deliberately separates intelligence, audio synthesis, and real-time coordination to maintain scalability and low latency. When a user submits a topic, the backend initiates a news ingestion pipeline via NewsAPI that retrieves current headlines and context, then leverages Azure OpenAI to construct a structured episode outline and generate dialogue between two AI agents. The conversational engine manages turn sequencing, maintains contextual memory, and enforces pacing constraints to keep the discussion coherent and time-bound. Each generated turn is converted into high-quality speech using Azure Speech Services, with audio streamed back to the client as it is produced. A Redis-powered state layer tracks room-level data — including segment progression, interruption flags, and timing thresholds — ensuring deterministic flow even during user interjections. Azure Web PubSub provides real-time communication between frontend and backend, synchronizing audio playback, turn transitions, and user-triggered events. To minimize latency, the system pre-generates upcoming dialogue and corresponding audio while the current speaker is still playing, allowing near-instant handoffs between agents. If a user interrupts, the engine reprioritizes context, injects the user input into the state layer, and resumes the episode naturally. At the conclusion of the session, all audio segments are programmatically stitched into a single file, producing a downloadable, complete podcast episode.

Challenges we ran into and what we learned

On the technical side, Agents on Air challenged us far beyond simply building a web application. The most complex aspect was integrating and orchestrating multiple Microsoft Azure services from scratch — Static Web Apps, Container Apps, Azure Container Registry, Web PubSub, Redis, Speech Services, and Azure OpenAI — each with its own configuration model, authentication flow, networking rules, and deployment patterns. Connecting these services reliably required a deep understanding of cloud architecture, containerization, environment configuration, CORS policies, and CI/CD pipelines. The learning curve was steep, but it ultimately strengthened our understanding of full-stack cloud-native architecture and real-time system design.

In addition to the backend and cloud infrastructure challenges, we also faced a significant learning curve on the front-end experience — particularly around integrating Spline into our application. Working with 3D assets introduced a completely different layer of complexity beyond traditional UI development. We had to understand how to embed Spline scenes into our framework and ensure the assets rendered correctly within a responsive layout. Editing the 3D assets themselves — adjusting animations, modifying interactions, refining visual elements, and exporting in the correct format — required experimentation and iteration. Aligning the motion, proportions, and overall aesthetic with our design system was not straightforward. Balancing creative intent with technical constraints pushed us to expand our understanding of both front-end engineering and interactive design.

Accomplishments that we're proud of

We are most proud of engineering a real-time, multi-agent podcast system that feels natural, responsive, and production-ready. At its core, we built a stateful turn engine that orchestrates two AI hosts, dynamically generates structured dialogue from live news sources, and converts each turn into streamed audio with minimal latency. The system manages outline progression, conversation memory, interruption handling, and time-capped episode flow while maintaining coherence and pacing. We implemented prefetching to reduce conversational gaps, integrated Azure OpenAI and Speech Services for generation and synthesis, and coordinated Redis-backed room state, Web PubSub signaling, and containerized backend infrastructure to deliver a seamless end-to-end experience. Users can interrupt mid-episode, ask questions, and meaningfully shape the discussion — with the system re-prioritizing context and resuming naturally, as if it were a live broadcast.

What's next for Agents on Air

Looking ahead, we plan to evolve AoA from a real-time AI podcast engine into a more immersive and versatile interactive media platform. Next Steps Hands-Free Interruption Detection

Enable voice activity detection so users can naturally interrupt without pressing a button, allowing seamless, conversational engagement Interactive Podcast Replication
Allow users to upload existing podcast episodes, recreate them with AI voices, and interact with the content in real time. More AI Voices & Panel Formats
Support multiple AI hosts for panel-style discussions and specialized personas. Expanded Media Ingestion
Move beyond news to include research papers, videos, blogs, and uploaded documents, transforming diverse content into interactive conversations.

Built With

azure-managed-redis
azure-openai
claude-code
javascript
microsoft-container-app
microsoft-speech-service
microsoft-static-web-app
microsoft-web-pubsub
next.js
node.js
react.js

Updates

Charlotte Qin started this project — Mar 01, 2026 06:00 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.