Inspiration
Every small business runs critical workflows that live in people's heads. A founder explains lead handling in a voice note. An ops manager records their screen walking through a daily task. A sales assistant saves screenshots and says, "this is what I do every morning." These processes are real, repeatable, and ripe for automation — but the path from tribal knowledge to a working automation is too expensive and too technical for most small teams.
We built FlowPilot because we believe the starting point for automation should be the way people already communicate: showing, explaining, and answering questions — not filling out rigid specification forms.
What It Does
FlowPilot is a multi-agent AI system that converts unstructured workflow explanations into production-ready n8n automation drafts. Users upload a screen recording, a set of screenshots, or simply describe their workflow using live voice — and FlowPilot handles the rest.
The system orchestrates five specialized AI agents in an explicit pipeline:
- Observer — Analyzes uploaded media (video, audio, images) to extract workflow structure, identify likely triggers, apps involved, and unknown details
- Clarifier — Conducts targeted follow-up questions via text chat or real-time voice conversation to fill in gaps
- Architect — Converts confirmed observations into a structured workflow design, selecting from a curated catalog of 17 supported n8n node types
- Builder — Generates n8n-compatible workflow JSON with proper node connections, positions, and parameters
- Reviewer — Validates the generated workflow for importability, surfaces missing credentials, risky patterns, and provides a quality assessment
The output is not just generated JSON — it's a reviewed workflow draft with explicit placeholders, a validation report, a visual step graph, and actionable recommendations.
How We Built It
Architecture
FlowPilot is a full-stack application with:
- Backend: Python FastAPI server with a multi-agent orchestration layer
- Frontend: Single-page app with real-time agent pipeline visualization
- Voice Bridge: Node.js WebSocket sidecar for bidirectional audio streaming with Nova 2 Sonic
- Storage: File-based job persistence (JSON per job) for easy judge inspection
Amazon Nova Integration
FlowPilot makes 5-6 Amazon Nova API calls per workflow generation job, each serving a distinct purpose:
| Agent | Nova Model | Purpose |
|---|---|---|
| Observer | Nova 2 Lite | Extract workflow structure from video frames, audio transcript, or screenshot sequences |
| Clarifier | Nova 2 Lite + Nova 2 Sonic | Generate clarification questions; conduct live voice conversations |
| Architect | Nova 2 Lite | Design workflow schema with node selection and connection logic |
| Builder | Nova 2 Lite | Generate n8n-compatible JSON with proper parameters and positions |
| Reviewer | Nova 2 Lite | Validate workflow connectivity, credential completeness, and runnability |
Nova 2 Lite powers all reasoning-heavy operations using Bedrock's Converse API with tool use for structured JSON extraction. Every Nova call uses explicit tool schemas to enforce output structure rather than relying on free-form text parsing.
Nova 2 Sonic handles two critical speech tasks:
- Upload transcription: Windowed chunking (8-minute segments with 3-second overlap) for longer recordings
- Live voice sessions: Full bidirectional streaming via WebSocket for real-time conversational clarification
Key Technical Decisions
- Explicit agent pipeline — Each agent is a separate class with defined inputs/outputs, not a hidden prompt chain. Agent execution timing and status are tracked and surfaced in the API and UI.
- Tool use for structure — Every Nova call uses Bedrock's tool_use feature to enforce JSON schema compliance, eliminating fragile text parsing.
- Curated n8n catalog — 17 supported node types prevent hallucinated integrations. The system chooses from known, tested nodes rather than inventing fictional ones.
- Graceful MCP degradation — An optional n8n MCP server provides integration context, but the system generates complete workflows even when the MCP server is unavailable.
- Streaming responses — Chat uses SSE streaming for real-time typing. Analysis and generation run in background threads with status polling.
Challenges We Ran Into
- Structured extraction from video — Getting reliable workflow structure from screen recordings required chunked map-reduce analysis and explicit observation schemas rather than single-shot prompts.
- Voice session reliability — Bidirectional streaming with Nova 2 Sonic required careful WebSocket lifecycle management and a dedicated Node.js bridge sidecar to handle the binary audio protocol.
- n8n JSON compliance — Generated workflows need exact node type names, valid connection structures, and proper coordinate positioning to import cleanly. This required a curated catalog and multi-phase generation (schema → design → JSON → validation).
- Agent visibility — Making the multi-agent pipeline visible to users and judges without adding overhead required careful state tracking at each orchestration step.
Accomplishments We're Proud Of
- End-to-end voice-to-workflow: A user can describe their entire workflow by speaking, and receive a validated automation draft without ever typing a specification.
- Visible agent reasoning: The UI shows which agent is active, what it found, and what questions remain — making the AI system transparent rather than a black box.
- Validation-first output: Every generated workflow comes with a quality report that tells users exactly what's confirmed, what's inferred, and what still needs configuration before deployment.
- Zero-dependency on MCP: The optional MCP integration adds context when available but never blocks workflow generation.
What We Learned
- Multimodal intake dramatically lowers the barrier — Users describe workflows 3-5x faster through recording or voice than through forms.
- Structured tool use > free-form generation — Enforcing JSON schemas through Nova's tool_use feature eliminated an entire class of output parsing bugs.
- Validation is the differentiator — Generating JSON is table stakes; catching what's missing before deployment is what makes the output trustworthy.
- Agent visibility builds trust — Showing users which agent is working and why it's asking specific questions makes the system feel collaborative rather than opaque.
What's Next for FlowPilot
- Multi-platform export — Extend beyond n8n to support Make, Zapier
- Nova multimodal embeddings — Semantic similarity matching to suggest "similar workflows" from a template library
- Team collaboration — Shared workflow libraries with version history and review workflows
- Deployment assistant — Guide users through credential setup and one-click deployment to their automation platform
Built With
- amazon-web-services
- node.js
- nova-2-lite
- nova-2-sonic
- python
Log in or sign up for Devpost to join the conversation.