Pitch

APS (AI Pre-Production Studio) is a collaborative screenplay writing platform with a real-time voice assistant powered by Amazon Nova. Writers speak naturally to their screenplay — querying scenes, brainstorming ideas, creating new scenes through narration, and editing existing ones — all through voice conversation.

Amazon Nova powers both layers of the AI: Nova Sonic handles real-time speech-to-speech interaction via Bedrock's bidirectional streaming, while Nova Lite serves as the reasoning engine inside the preprod graph agent. This means the entire AI pipeline — from understanding spoken words to generating structured screenplay content — runs on Amazon Nova models. Nova Sonic's low-latency audio streaming with built-in function calling makes it possible to have natural, multi-turn conversations about complex creative work, while Nova Lite's structured output capabilities handle the precise formatting and editing that screenplay work demands.

Architecture

The system has three layers:

  • React frontend with a custom screenplay editor and voice assistant drawer
  • FastAPI backend handling auth, database operations, WebSocket voice streaming, and REST APIs
  • Preprod Graph (LangGraph agent) handling all AI reasoning with Nova Lite, called by the backend over HTTP

The voice flow:

  • React connects to backend via WebSocket
  • User's microphone audio streams as PCM at 16kHz
  • Backend creates and manages a Nova Sonic session through Bedrock's bidirectional streaming API (amazon.nova-sonic-v1:0)
  • Audio flows both directions — user speech goes in, Nova Sonic's voice responses come back at 24kHz
  • When Nova Sonic decides to call a function (e.g. fetch a scene, create one), the backend intercepts the tool use event
  • Backend routes the call to the preprod graph (which uses Nova Lite for reasoning)
  • Tool result is sent back to Nova Sonic to continue the conversation

This is the high level architecture of the entire project. Architecture

Nova Sonic

Amazon Nova models are used in two distinct roles:

Nova Sonic (Voice Layer)

  • Real-time bidirectional audio streaming via Bedrock (amazon.nova-sonic-v1:0)
  • Speech recognition, natural language understanding, and voice synthesis in a single model
  • Supports tool use mid-conversation — the model decides when to trigger functions based on conversational context
  • Session initialization with system prompt, tool definitions, and audio configuration
  • Events-based protocol: sessionStart, promptStart, contentStart, audioInput, toolResult, etc.
  • 7 registered tools: get_scene_num, get_scene_by_content, brainstorm_ideas, get_project_info, create_scene, update_project_info, update_scene

Nova Lite (Reasoning Layer)

  • Used via Bedrock for all structured reasoning tasks in the preprod graph
  • Intent classification (routing user queries to the right operation)
  • Keyword extraction for content-based scene search
  • Scene candidate confirmation (picking the best match from search results)
  • Narration-to-screenplay parsing (converting spoken descriptions into formatted scene elements)
  • Surgical scene editing (find-and-replace on specific elements while preserving structure)
  • Project info field extraction (determining which fields the user wants to update)

The backend manages the Nova Sonic session lifecycle: opening the bidirectional stream, sending initialization events (session config, system prompt, tool definitions), forwarding audio chunks as base64-encoded events, intercepting tool use events, routing them to the preprod graph, and sending tool results back through the stream.

The following is the high level gemini flow. Visit Here for a detailed version. Architecture

What it does?

The voice assistant supports three categories of operations through natural speech:

1. Get Data

Retrieve and summarize information from the screenplay project.

Get Scene by Position

  • User says "show me the last scene" or "what's in the first scene"
  • Nova Sonic fires get_scene_num tool with the position
  • Backend calls preprod graph → fetches scene by ordinal from MongoDB
  • Nova Lite generates a conversational summary of the scene
  • Frontend scrolls to and highlights the scene in the editor

Get Scene by Content/Dialogue

  • User describes a scene: "the scene where they argue about the plan"
  • Nova Sonic fires get_scene_by_content tool
  • Nova Lite extracts 1-5 search keywords from the description
  • Backend string-matches keywords across all scenes in the screenplay
  • Nova Lite confirms the best candidate from results
  • Scene is summarized and highlighted in the editor

Get Project Information

  • User asks "what is this project about" or "what screenplays do we have"
  • Nova Sonic fires get_project_info tool
  • Graph fetches project metadata (title, description, screenplays) from backend
  • Formats and returns it conversationally

Brainstorm Ideas

  • User says "what should happen next" or "give me ideas"
  • Nova Sonic fires brainstorm_ideas tool
  • Graph pulls existing scene summaries from the screenplay
  • Checks beatsheet status to see which story beats are covered
  • Nova Lite suggests next directions based on uncovered beats

2. Create Data

Add new content to the screenplay through voice narration.

Add Scene to Screenplay

  • User narrates a scene naturally — describing setting, action, characters, dialogue
  • User signals completion with a key phrase ("end scene", "that's it")
  • Nova Sonic fires create_scene tool with the full narration
  • Nova Lite parses narration into proper screenplay format (scene heading, action, character, dialogue, parentheticals, transitions)
  • A preview of the formatted scene appears in the voice drawer
  • User approves or rejects via buttons (SVG icons)
  • On approval → scene saves to MongoDB → editor reloads with the new scene highlighted

3. Update Data

Modify existing screenplay content through voice instructions.

Update Scene

  • User says "update the first scene, change juggling to photography"
  • Nova Sonic fires update_scene tool with the full instruction
  • Nova Lite identifies which scene (by position or content keywords)
  • Graph fetches the scene using shared search utilities
  • Nova Lite applies surgical find-and-replace edits — only changing what was asked, preserving all IDs, structure, and unchanged text
  • Updated scene preview shown in the voice drawer for approval
  • On approval → scene replaces the original in MongoDB → editor reloads and highlights it

Update Project Information

  • User says "change the project name to..." or "update the description"
  • Nova Sonic fires update_project_info tool
  • Graph fetches current project info from backend
  • Nova Lite extracts which fields to change from the instruction
  • Backend applies the update via PATCH endpoint

The TechStack

  • React + TypeScript — Frontend with custom screenplay editor (ProseMirror-based) and voice assistant drawer
  • FastAPI (Python) — Backend REST API + WebSocket server for voice streaming
  • Amazon Nova Sonic — Real-time speech-to-speech voice conversation with tool use (amazon.nova-sonic-v1:0 via Bedrock)
  • Amazon Nova Lite — Structured reasoning, classification, and content generation (via Bedrock)
  • Amazon Bedrock — Managed inference for both Nova models
  • LangGraph — Stateful agent graph with classify-and-route pattern
  • MongoDB — Screenplay document storage (scenes, elements, revisions)
  • TiDB — Relational data (users, projects, screenplay metadata)

This is NOT The END!

The platform is designed for the full pre-production pipeline. Upcoming additions:

  • Outlines — Structured story outlines that feed into scene generation
  • Storyboard — Visual scene planning with AI-generated shot descriptions
  • Beatboard — Visual beat mapping tied to the beatsheet for story structure tracking

Built With

  • amazonbedrock
  • amazonnovalite
  • amazonnovasonic
  • fastapi
  • langgraph
  • mongodb
  • react
  • tidb
Share this project:

Updates