Inspiration

The initial idea came into mind as I drove from NIU to UIUC (2 and a half hour drive). As I was listening to my podcast during the drive a question came up, what if a podcast could adapt based on your mind?

Most audio content is static — the same story, the same pacing, the same tone, regardless of who's listening or what state they're in. But neuroscience tells us that how we receive information is deeply tied to our current mental state. A person in a flow state processes narrative differently than someone who's anxious or burnt out. We wanted to build something that closes that gap.

The deeper inspiration came from affective neuroscience — specifically the two-dimensional model of emotion (Arousal-Valence) developed by researchers like James Russell, which underpins how EEG devices actually interpret brain states in consumer neurotechnology. We wanted to take that model seriously and build a real experience around it, not just use "brain waves" as an aesthetic.


What it does

NeuroCast is an AI-powered podcast that generates a unique audio episode in real time based on your current neural state.

You set three sliders — Arousal (calm to activated), Valence (negative to positive), and Attention (diffuse to focused) — which map to the core dimensions used in EEG-based neurotechnology. The app detects one of eight moods from your input: Anxious, Wired, Euphoric, Flow State, Hyper-focused, Burnt Out, Dreamy, or Balanced.

From that mood, NeuroCast generates a fully produced podcast episode:

  • An original story written by an AI agent, tailored to your mood and genre
  • Two distinct voices — a host (Alex) and a narrator — each with their own ElevenLabs voice
  • Contextual sound effects generated per scene using ElevenLabs Sound Effects API
  • Mood-matched background music mixed under the host intro
  • Everything mixed in the browser using the Web Audio API and delivered as a single audio file with play, pause, and replay controls

The brainwave visualizer reacts to your slider input in real time, displaying animated Alpha, Beta, Theta, and Delta wave activity that reflects the neuroscience behind the mood model.


How we built it

Frontend: HTML, CSS, and JavaScript. The Web Audio API's OfflineAudioContext handles all audio decoding, mixing, and WAV export entirely in the browser.

AI Script Generation: We created an ElevenLabs Conversational AI Agent with a system prompt that writes structured podcast scripts in JSON format. The agent call is made server-side to avoid CORS and keep the API key secure.

Voice & Audio: ElevenLabs Text-to-Speech API generates each segment with distinct voices. The Sound Effects API generates both scene-specific SFX stings and mood-matched background music from text prompts.

Mixing pipeline: All audio blobs are decoded to PCM using Web Audio API, scheduled in an OfflineAudioContext at precise time offsets, mixed with gain nodes for volume and fade control, and rendered to a final WAV file.

Backend: A lightweight Node.js proxy server handles CORS for local development. For production, a Cloudflare Worker proxies all ElevenLabs API calls with the API key stored as an encrypted secret — never exposed in client-side code.

Mood model: The three axes (Arousal, Valence, Attention) are grounded in affective neuroscience. The brainwave display updates based on known relationships — Alpha rises with low arousal, Beta rises with high arousal and attention, Theta rises with low attention and positive valence, Delta rises when both arousal and attention are low.


Challenges we ran into

CORS was our biggest technical wall. Browsers block direct calls to api.elevenlabs.io from a local HTML file and from within sandboxed iframe environments. We went through several approaches before landing on a clean solution: a Node.js proxy for local development and a Cloudflare Worker for production, with the API key stored as a server-side secret in both cases.

Audio stitching. Our first approach was raw byte concatenation of MP3 blobs, which only played the first segment — MP3 files can't be reliably concatenated at the byte level. We rebuilt the merge pipeline using OfflineAudioContext, decoding each blob to PCM, scheduling them at precise time offsets, and rendering a clean WAV output. This also unlocked per-segment music mixing.

ElevenLabs agent response parsing. The simulate-conversation endpoint returns transcripts in a shape that evolved as we debugged it — different field names across different response states. We ended up building a robust extraction function that tries every known response shape and falls back to pre-written scripts if the agent response is unparseable.

Keeping generation time reasonable. Each episode requires 4 TTS calls, 2 SFX calls, 1 music generation call, and 1 agent call — all sequential or parallel where possible. We parallelised TTS and SFX per segment using Promise.all to cut generation time significantly.


Accomplishments that we're proud of

  • The audio pipeline. Taking raw API responses and producing a fully mixed, multi-track WAV file entirely in the browser — no server-side audio processing — was genuinely hard and we're proud it works cleanly.

  • The mood model. Grounding the three input axes in real affective neuroscience (Arousal-Valence-Attention) rather than arbitrary sliders makes NeuroCast defensible as a neurotechnology application, not just a gimmick.

  • The fallback architecture. The app never breaks. If the AI agent fails, it falls back to hand-crafted scripts. If music generation fails, it plays speech only. Every failure mode is graceful.

  • Zero dependencies. The entire frontend — audio mixing, wave visualisation, UI, API calls — runs on browser APIs and vanilla JS with no npm packages, no build step, and no framework.


What we learned

  • The Web Audio API is extraordinarily powerful. OfflineAudioContext is essentially a DAW in the browser — once we understood it, we could do things we thought would require a backend.

  • ElevenLabs is much more than TTS. Sound effects, music, conversational agents, and multi-voice dialogue are all accessible from the same key, which makes it a surprisingly complete audio production platform.

  • CORS is a real architectural constraint, not just a development nuisance. Designing for it from the start — rather than treating it as something to work around — leads to better architecture.


What's next for NeuroCast

  • Real EEG input. The slider model is a stand-in for actual neurotechnology. The natural next step is integration with consumer EEG headsets (Muse, Neurosity Crown, OpenBCI) via Web Bluetooth or a companion app, so the podcast adapts to your live brainwave data rather than self-reported sliders.

  • Adaptive playback. Rather than generating a fixed episode up front, NeuroCast could monitor your neural state during playback and dynamically adjust — speeding up narration when you're in flow, slowing down when arousal spikes, shifting genre mid-episode if your mood changes.

  • Longer episodes. The current model generates ~60-second episodes. With better script structure and chunked generation, 10-15 minute episodes with full narrative arcs are achievable.

  • Personalised voice profiles. ElevenLabs voice cloning could let listeners hear stories narrated in a voice they choose or create — making the experience genuinely personal.

  • Shared episodes. Generated episodes could be saved and shared — a library of mood-specific stories, each tied to a particular neural fingerprint.

Built With

Share this project:

Updates