Inspiration

We spend way too much time context-switching during research. You're checking a stock price in one tab, searching an SEC filing in another, running a spreadsheet model somewhere else — and by the time you've pulled it all together, the moment's passed. We wanted to collapse all of that into a single interface where you just speak and get answers back. When we saw that Nova Sonic 2 does bidirectional audio streaming natively — meaning you can interrupt it mid-sentence and it responds — it felt like the right foundation to build something voice-first without hacking around limitations.


What it does

Ask Alpha is a voice-native financial research assistant. You speak a question, and it fetches live market data, searches SEC 10-K filings using RAG, runs Monte Carlo price simulations, or saves a structured research note to a personal vault — all spoken back to you in real time.

You can interrupt it mid-response and it adjusts. There's no typing, no clicking through dashboards. The entire research workflow — prices, filings, risk modeling, and note-taking — is accessible through natural speech.


How we built it

The core is a FastAPI server with a WebSocket endpoint that bridges the browser's microphone to AWS Bedrock's InvokeModelWithBidirectionalStream API. Raw PCM-16 audio at 16 kHz goes in, 24 kHz TTS audio comes back — all over a single persistent stream, no separate STT or TTS APIs.

Nova Sonic handles speech recognition, intent understanding, tool selection, response generation, and text-to-speech natively. We built the four financial backends that Nova calls into:

  • Tool 1 hits Finnhub for real-time quotes with a Polygon fallback
  • Tool 2 queries an AWS Bedrock Knowledge Base with a relevance score threshold to filter out low-confidence retrieval — passages below 0.5 cosine similarity are dropped before Nova ever sees them
  • Tool 3 fetches live volatility from Tiingo (with Polygon as backup), then runs 10,000 Geometric Brownian Motion paths using NumPy in a vectorised (simulations × days) matrix — the whole simulation runs in under 50ms
  • Tool 4 calls a Groq LLM to compose a structured Obsidian-compatible Markdown note with YAML front matter, pulls session metadata from the active session context, and writes it async with aiofiles

The session has a state machine (IDLE → LISTENING → TOOL_EXECUTING → SPEAKING → CLOSED) and audio chunks from the browser are silently dropped during TOOL_EXECUTING so partial speech doesn't confuse the model mid-tool-call.


Challenges we ran into

Bidirectional streaming is not like a normal API call. You're sending and receiving events on the same live stream — audio chunks, tool call events, TTS events, generation complete signals — and the order matters. Getting the session event sequence right (sessionStart → promptStart → audioInput chunks → toolResult → contentBlockStop) took a lot of trial and error because errors here are often silent or produce malformed audio output.

The interruption behaviour was another tricky part. Nova Sonic's VAD handles it natively, but you have to make sure you're actually still sending audio to the stream while it's speaking — otherwise it never detects the interruption. Dropping audio during the wrong state breaks this entirely.

Relevance filtering for the SEC RAG tool was harder than expected. Bedrock Knowledge Base always returns N results regardless of actual relevance — so without the 0.5 score threshold, Nova would confidently quote irrelevant passages for companies not in the knowledge base. Adding that filter and testing it against edge cases (obscure tickers, misheard company names) was time-consuming.


Accomplishments that we're proud of

Honestly, getting the live interruption to work cleanly. When Nova is mid-sentence on a Monte Carlo readout and you cut in — it stops, processes your new question, and responds. That behaviour comes from Nova Sonic's built-in VAD, but wiring the audio pipeline correctly so it actually fires took real effort. Seeing it work in a live demo without special-casing anything felt genuinely satisfying.

Also the vault logger. It doesn't just dump a text file — it passes the full session context (tool history, session ID, the actual tool outputs from earlier in the conversation) into a Groq LLM call, which composes a structured note with an executive summary, evidence, risks, and next steps. The notes are Obsidian-compatible with full YAML front matter. That turned out way more polished than we initially planned.


What we learned

Nova Sonic's bidirectional streaming model is fundamentally different from the typical request-response AI API. Thinking in terms of an event stream rather than individual calls changes how you architect everything — the state machine, the audio routing, the tool dispatch timing.

We also learned how much the fallback design matters for demos. Every tool has at least one fallback path (Polygon for Finnhub, FAISS for Bedrock KB, native Python for the ironclad Wasm sandbox, structural template for Groq). Tools that fail silently or crash during a live demo are worse than tools that were never built — so we invested heavily in graceful degradation from the start.


What's next for Ask Alpha

  • Multi-turn context memory — right now each session is stateless beyond tool history. Persisting conversation context across sessions would make it a real research companion, not just a one-shot query tool.
  • More tools — earnings calendars, options chains, insider filings. The event router is designed to be extensible, adding a new tool is just a new Python file and a JSON schema.
  • Nova Lite for note composition — the vault logger currently uses Groq for note generation. The stub for Nova Lite is already in the codebase (_compose_with_nova_lite exists but returns None). Switching fully to AWS-native models would simplify the dependency stack.
  • Deployed version — right now it runs locally. Packaging this as a containerised AWS service with proper auth would make it shareable.

Built With

Share this project:

Updates