FlowPilot

landing page

Inspiration

Every small business runs critical workflows that live in people's heads. A founder explains lead handling in a voice note. An ops manager records their screen walking through a daily task. A sales assistant saves screenshots and says, "this is what I do every morning." These processes are real, repeatable, and ripe for automation — but the path from tribal knowledge to a working automation is too expensive and too technical for most small teams.

We built FlowPilot because we believe the starting point for automation should be the way people already communicate: showing, explaining, and answering questions — not filling out rigid specification forms.

What It Does

FlowPilot is a multi-agent AI system that converts unstructured workflow explanations into production-ready n8n automation drafts. Users upload a screen recording, a set of screenshots, or simply describe their workflow using live voice — and FlowPilot handles the rest.

The system orchestrates five specialized AI agents in an explicit pipeline:

Observer — Analyzes uploaded media (video, audio, images) to extract workflow structure, identify likely triggers, apps involved, and unknown details
Clarifier — Conducts targeted follow-up questions via text chat or real-time voice conversation to fill in gaps
Architect — Converts confirmed observations into a structured workflow design, selecting from a curated catalog of 17 supported n8n node types
Builder — Generates n8n-compatible workflow JSON with proper node connections, positions, and parameters
Reviewer — Validates the generated workflow for importability, surfaces missing credentials, risky patterns, and provides a quality assessment

The output is not just generated JSON — it's a reviewed workflow draft with explicit placeholders, a validation report, a visual step graph, and actionable recommendations.

How We Built It

Architecture

FlowPilot is a full-stack application with:

Backend: Python FastAPI server with a multi-agent orchestration layer
Frontend: Single-page app with real-time agent pipeline visualization
Voice Bridge: Node.js WebSocket sidecar for bidirectional audio streaming with Nova 2 Sonic
Storage: File-based job persistence (JSON per job) for easy judge inspection

Amazon Nova Integration

FlowPilot makes 5-6 Amazon Nova API calls per workflow generation job, each serving a distinct purpose:

Agent	Nova Model	Purpose
Observer	Nova 2 Lite	Extract workflow structure from video frames, audio transcript, or screenshot sequences
Clarifier	Nova 2 Lite + Nova 2 Sonic	Generate clarification questions; conduct live voice conversations
Architect	Nova 2 Lite	Design workflow schema with node selection and connection logic
Builder	Nova 2 Lite	Generate n8n-compatible JSON with proper parameters and positions
Reviewer	Nova 2 Lite	Validate workflow connectivity, credential completeness, and runnability

Nova 2 Lite powers all reasoning-heavy operations using Bedrock's Converse API with tool use for structured JSON extraction. Every Nova call uses explicit tool schemas to enforce output structure rather than relying on free-form text parsing.

Nova 2 Sonic handles two critical speech tasks:

Upload transcription: Windowed chunking (8-minute segments with 3-second overlap) for longer recordings
Live voice sessions: Full bidirectional streaming via WebSocket for real-time conversational clarification

Key Technical Decisions

Explicit agent pipeline — Each agent is a separate class with defined inputs/outputs, not a hidden prompt chain. Agent execution timing and status are tracked and surfaced in the API and UI.
Tool use for structure — Every Nova call uses Bedrock's tool_use feature to enforce JSON schema compliance, eliminating fragile text parsing.
Curated n8n catalog — 17 supported node types prevent hallucinated integrations. The system chooses from known, tested nodes rather than inventing fictional ones.
Graceful MCP degradation — An optional n8n MCP server provides integration context, but the system generates complete workflows even when the MCP server is unavailable.
Streaming responses — Chat uses SSE streaming for real-time typing. Analysis and generation run in background threads with status polling.

Challenges We Ran Into

Structured extraction from video — Getting reliable workflow structure from screen recordings required chunked map-reduce analysis and explicit observation schemas rather than single-shot prompts.
Voice session reliability — Bidirectional streaming with Nova 2 Sonic required careful WebSocket lifecycle management and a dedicated Node.js bridge sidecar to handle the binary audio protocol.
n8n JSON compliance — Generated workflows need exact node type names, valid connection structures, and proper coordinate positioning to import cleanly. This required a curated catalog and multi-phase generation (schema → design → JSON → validation).
Agent visibility — Making the multi-agent pipeline visible to users and judges without adding overhead required careful state tracking at each orchestration step.

Accomplishments We're Proud Of

End-to-end voice-to-workflow: A user can describe their entire workflow by speaking, and receive a validated automation draft without ever typing a specification.
Visible agent reasoning: The UI shows which agent is active, what it found, and what questions remain — making the AI system transparent rather than a black box.
Validation-first output: Every generated workflow comes with a quality report that tells users exactly what's confirmed, what's inferred, and what still needs configuration before deployment.
Zero-dependency on MCP: The optional MCP integration adds context when available but never blocks workflow generation.

What We Learned

Multimodal intake dramatically lowers the barrier — Users describe workflows 3-5x faster through recording or voice than through forms.
Structured tool use > free-form generation — Enforcing JSON schemas through Nova's tool_use feature eliminated an entire class of output parsing bugs.
Validation is the differentiator — Generating JSON is table stakes; catching what's missing before deployment is what makes the output trustworthy.
Agent visibility builds trust — Showing users which agent is working and why it's asking specific questions makes the system feel collaborative rather than opaque.

What's Next for FlowPilot

Multi-platform export — Extend beyond n8n to support Make, Zapier
Nova multimodal embeddings — Semantic similarity matching to suggest "similar workflows" from a template library
Team collaboration — Shared workflow libraries with version history and review workflows
Deployment assistant — Guide users through credential setup and one-click deployment to their automation platform

Built With

amazon-web-services
node.js
nova-2-lite
nova-2-sonic
python

Updates

Odebunmi Wasiu started this project — Mar 15, 2026 01:00 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.