Inspiration

Every small business runs critical workflows that live in people's heads. A founder explains lead handling in a voice note. An ops manager records their screen walking through a daily task. A sales assistant saves screenshots and says, "this is what I do every morning." These processes are real, repeatable, and ripe for automation — but the path from tribal knowledge to a working automation is too expensive and too technical for most small teams.

We built FlowPilot because we believe the starting point for automation should be the way people already communicate: showing, explaining, and answering questions — not filling out rigid specification forms.

What It Does

FlowPilot is a multi-agent AI system that converts unstructured workflow explanations into production-ready n8n automation drafts. Users upload a screen recording, a set of screenshots, or simply describe their workflow using live voice — and FlowPilot handles the rest.

The system orchestrates five specialized AI agents in an explicit pipeline:

  1. Observer — Analyzes uploaded media (video, audio, images) to extract workflow structure, identify likely triggers, apps involved, and unknown details
  2. Clarifier — Conducts targeted follow-up questions via text chat or real-time voice conversation to fill in gaps
  3. Architect — Converts confirmed observations into a structured workflow design, selecting from a curated catalog of 17 supported n8n node types
  4. Builder — Generates n8n-compatible workflow JSON with proper node connections, positions, and parameters
  5. Reviewer — Validates the generated workflow for importability, surfaces missing credentials, risky patterns, and provides a quality assessment

The output is not just generated JSON — it's a reviewed workflow draft with explicit placeholders, a validation report, a visual step graph, and actionable recommendations.

How We Built It

Architecture

FlowPilot is a full-stack application with:

  • Backend: Python FastAPI server with a multi-agent orchestration layer
  • Frontend: Single-page app with real-time agent pipeline visualization
  • Voice Bridge: Node.js WebSocket sidecar for bidirectional audio streaming with Nova 2 Sonic
  • Storage: File-based job persistence (JSON per job) for easy judge inspection

Amazon Nova Integration

FlowPilot makes 5-6 Amazon Nova API calls per workflow generation job, each serving a distinct purpose:

Agent Nova Model Purpose
Observer Nova 2 Lite Extract workflow structure from video frames, audio transcript, or screenshot sequences
Clarifier Nova 2 Lite + Nova 2 Sonic Generate clarification questions; conduct live voice conversations
Architect Nova 2 Lite Design workflow schema with node selection and connection logic
Builder Nova 2 Lite Generate n8n-compatible JSON with proper parameters and positions
Reviewer Nova 2 Lite Validate workflow connectivity, credential completeness, and runnability

Nova 2 Lite powers all reasoning-heavy operations using Bedrock's Converse API with tool use for structured JSON extraction. Every Nova call uses explicit tool schemas to enforce output structure rather than relying on free-form text parsing.

Nova 2 Sonic handles two critical speech tasks:

  • Upload transcription: Windowed chunking (8-minute segments with 3-second overlap) for longer recordings
  • Live voice sessions: Full bidirectional streaming via WebSocket for real-time conversational clarification

Key Technical Decisions

  • Explicit agent pipeline — Each agent is a separate class with defined inputs/outputs, not a hidden prompt chain. Agent execution timing and status are tracked and surfaced in the API and UI.
  • Tool use for structure — Every Nova call uses Bedrock's tool_use feature to enforce JSON schema compliance, eliminating fragile text parsing.
  • Curated n8n catalog — 17 supported node types prevent hallucinated integrations. The system chooses from known, tested nodes rather than inventing fictional ones.
  • Graceful MCP degradation — An optional n8n MCP server provides integration context, but the system generates complete workflows even when the MCP server is unavailable.
  • Streaming responses — Chat uses SSE streaming for real-time typing. Analysis and generation run in background threads with status polling.

Challenges We Ran Into

  • Structured extraction from video — Getting reliable workflow structure from screen recordings required chunked map-reduce analysis and explicit observation schemas rather than single-shot prompts.
  • Voice session reliability — Bidirectional streaming with Nova 2 Sonic required careful WebSocket lifecycle management and a dedicated Node.js bridge sidecar to handle the binary audio protocol.
  • n8n JSON compliance — Generated workflows need exact node type names, valid connection structures, and proper coordinate positioning to import cleanly. This required a curated catalog and multi-phase generation (schema → design → JSON → validation).
  • Agent visibility — Making the multi-agent pipeline visible to users and judges without adding overhead required careful state tracking at each orchestration step.

Accomplishments We're Proud Of

  • End-to-end voice-to-workflow: A user can describe their entire workflow by speaking, and receive a validated automation draft without ever typing a specification.
  • Visible agent reasoning: The UI shows which agent is active, what it found, and what questions remain — making the AI system transparent rather than a black box.
  • Validation-first output: Every generated workflow comes with a quality report that tells users exactly what's confirmed, what's inferred, and what still needs configuration before deployment.
  • Zero-dependency on MCP: The optional MCP integration adds context when available but never blocks workflow generation.

What We Learned

  • Multimodal intake dramatically lowers the barrier — Users describe workflows 3-5x faster through recording or voice than through forms.
  • Structured tool use > free-form generation — Enforcing JSON schemas through Nova's tool_use feature eliminated an entire class of output parsing bugs.
  • Validation is the differentiator — Generating JSON is table stakes; catching what's missing before deployment is what makes the output trustworthy.
  • Agent visibility builds trust — Showing users which agent is working and why it's asking specific questions makes the system feel collaborative rather than opaque.

What's Next for FlowPilot

  • Multi-platform export — Extend beyond n8n to support Make, Zapier
  • Nova multimodal embeddings — Semantic similarity matching to suggest "similar workflows" from a template library
  • Team collaboration — Shared workflow libraries with version history and review workflows
  • Deployment assistant — Guide users through credential setup and one-click deployment to their automation platform

Built With

Share this project:

Updates