APS | Devpost

Nova Sonic Assistant
Create new scene by Nova Sonic taking notes
Beatsheet check
Project Architecture
Nova Sonic Voice Architecture

Pitch

APS (AI Pre-Production Studio) is a collaborative screenplay writing platform with a real-time voice assistant powered by Amazon Nova. Writers speak naturally to their screenplay — querying scenes, brainstorming ideas, creating new scenes through narration, and editing existing ones — all through voice conversation.

Amazon Nova powers both layers of the AI: Nova Sonic handles real-time speech-to-speech interaction via Bedrock's bidirectional streaming, while Nova Lite serves as the reasoning engine inside the preprod graph agent. This means the entire AI pipeline — from understanding spoken words to generating structured screenplay content — runs on Amazon Nova models. Nova Sonic's low-latency audio streaming with built-in function calling makes it possible to have natural, multi-turn conversations about complex creative work, while Nova Lite's structured output capabilities handle the precise formatting and editing that screenplay work demands.

Architecture

The system has three layers:

React frontend with a custom screenplay editor and voice assistant drawer
FastAPI backend handling auth, database operations, WebSocket voice streaming, and REST APIs
Preprod Graph (LangGraph agent) handling all AI reasoning with Nova Lite, called by the backend over HTTP

The voice flow:

React connects to backend via WebSocket
User's microphone audio streams as PCM at 16kHz
Backend creates and manages a Nova Sonic session through Bedrock's bidirectional streaming API (amazon.nova-sonic-v1:0)
Audio flows both directions — user speech goes in, Nova Sonic's voice responses come back at 24kHz
When Nova Sonic decides to call a function (e.g. fetch a scene, create one), the backend intercepts the tool use event
Backend routes the call to the preprod graph (which uses Nova Lite for reasoning)
Tool result is sent back to Nova Sonic to continue the conversation

This is the high level architecture of the entire project.

Nova Sonic

Amazon Nova models are used in two distinct roles:

Nova Sonic (Voice Layer)

Real-time bidirectional audio streaming via Bedrock (amazon.nova-sonic-v1:0)
Speech recognition, natural language understanding, and voice synthesis in a single model
Supports tool use mid-conversation — the model decides when to trigger functions based on conversational context
Session initialization with system prompt, tool definitions, and audio configuration
Events-based protocol: sessionStart, promptStart, contentStart, audioInput, toolResult, etc.
7 registered tools: get_scene_num, get_scene_by_content, brainstorm_ideas, get_project_info, create_scene, update_project_info, update_scene

Nova Lite (Reasoning Layer)

Used via Bedrock for all structured reasoning tasks in the preprod graph
Intent classification (routing user queries to the right operation)
Keyword extraction for content-based scene search
Scene candidate confirmation (picking the best match from search results)
Narration-to-screenplay parsing (converting spoken descriptions into formatted scene elements)
Surgical scene editing (find-and-replace on specific elements while preserving structure)
Project info field extraction (determining which fields the user wants to update)

The backend manages the Nova Sonic session lifecycle: opening the bidirectional stream, sending initialization events (session config, system prompt, tool definitions), forwarding audio chunks as base64-encoded events, intercepting tool use events, routing them to the preprod graph, and sending tool results back through the stream.

The following is the high level gemini flow. Visit Here for a detailed version. Architecture

What it does?

The voice assistant supports three categories of operations through natural speech:

1. Get Data

Retrieve and summarize information from the screenplay project.

Get Scene by Position

User says "show me the last scene" or "what's in the first scene"
Nova Sonic fires get_scene_num tool with the position
Backend calls preprod graph → fetches scene by ordinal from MongoDB
Nova Lite generates a conversational summary of the scene
Frontend scrolls to and highlights the scene in the editor

Get Scene by Content/Dialogue

User describes a scene: "the scene where they argue about the plan"
Nova Sonic fires get_scene_by_content tool
Nova Lite extracts 1-5 search keywords from the description
Backend string-matches keywords across all scenes in the screenplay
Nova Lite confirms the best candidate from results
Scene is summarized and highlighted in the editor

Get Project Information

User asks "what is this project about" or "what screenplays do we have"
Nova Sonic fires get_project_info tool
Graph fetches project metadata (title, description, screenplays) from backend
Formats and returns it conversationally

Brainstorm Ideas

User says "what should happen next" or "give me ideas"
Nova Sonic fires brainstorm_ideas tool
Graph pulls existing scene summaries from the screenplay
Checks beatsheet status to see which story beats are covered
Nova Lite suggests next directions based on uncovered beats

2. Create Data

Add new content to the screenplay through voice narration.

Add Scene to Screenplay

User narrates a scene naturally — describing setting, action, characters, dialogue
User signals completion with a key phrase ("end scene", "that's it")
Nova Sonic fires create_scene tool with the full narration
Nova Lite parses narration into proper screenplay format (scene heading, action, character, dialogue, parentheticals, transitions)
A preview of the formatted scene appears in the voice drawer
User approves or rejects via buttons (SVG icons)
On approval → scene saves to MongoDB → editor reloads with the new scene highlighted

3. Update Data

Modify existing screenplay content through voice instructions.

Update Scene

User says "update the first scene, change juggling to photography"
Nova Sonic fires update_scene tool with the full instruction
Nova Lite identifies which scene (by position or content keywords)
Graph fetches the scene using shared search utilities
Nova Lite applies surgical find-and-replace edits — only changing what was asked, preserving all IDs, structure, and unchanged text
Updated scene preview shown in the voice drawer for approval
On approval → scene replaces the original in MongoDB → editor reloads and highlights it

Update Project Information

User says "change the project name to..." or "update the description"
Nova Sonic fires update_project_info tool
Graph fetches current project info from backend
Nova Lite extracts which fields to change from the instruction
Backend applies the update via PATCH endpoint

The TechStack

React + TypeScript — Frontend with custom screenplay editor (ProseMirror-based) and voice assistant drawer
FastAPI (Python) — Backend REST API + WebSocket server for voice streaming
Amazon Nova Sonic — Real-time speech-to-speech voice conversation with tool use (amazon.nova-sonic-v1:0 via Bedrock)
Amazon Nova Lite — Structured reasoning, classification, and content generation (via Bedrock)
Amazon Bedrock — Managed inference for both Nova models
LangGraph — Stateful agent graph with classify-and-route pattern
MongoDB — Screenplay document storage (scenes, elements, revisions)
TiDB — Relational data (users, projects, screenplay metadata)

This is NOT The END!

The platform is designed for the full pre-production pipeline. Upcoming additions:

Outlines — Structured story outlines that feed into scene generation
Storyboard — Visual scene planning with AI-generated shot descriptions
Beatboard — Visual beat mapping tied to the beatsheet for story structure tracking

Built With

amazonbedrock
amazonnovalite
amazonnovasonic
fastapi
langgraph
mongodb
react
tidb

Updates

Nishtha Patel started this project — Mar 16, 2026 07:29 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.