Inspiration
Modern digital work is fragmented across dozens of apps, portals, and forms. Tasks like booking travel, submitting expense reports, applying for grants, or filing administrative requests require navigating multiple websites and repeating the same manual steps. While AI assistants can answer questions, they usually cannot actually complete tasks across real interfaces. We asked a simple question: What if AI could not only understand your request but also complete the entire workflow for you? With the capabilities of Amazon Nova models and Nova Act, we saw an opportunity to build an autonomous AI workflow assistant that can reason, plan, and interact with web interfaces to complete real-world tasks on behalf of the user. That idea became FlowPilot AI. What it does FlowPilot AI is a voice-driven autonomous AI assistant that can complete complex digital workflows across real web applications. Users can simply speak or type a request such as: “Book the cheapest hotel in Tokyo next week and submit the receipt to my expense report.” FlowPilot then: Understands the request using Amazon Nova 2 Lite Creates a step-by-step workflow plan Executes actions on websites using Nova Act Extracts and processes documents using multimodal embeddings Communicates with the user through voice using Nova 2 Sonic The system can automate workflows like: • Booking travel • Filing expense reports • Submitting forms • Processing invoices • Uploading documents • Navigating web portals Instead of switching between apps, users simply tell FlowPilot what they want done. How we built it FlowPilot AI is built using a multi-agent architecture powered by Amazon Nova models. Core AI Models Amazon Nova 2 Lite Used for reasoning and task planning Breaks user requests into structured workflows Amazon Nova 2 Sonic Provides real-time speech-to-speech interaction Enables natural voice conversations with the AI Amazon Nova Multimodal Embeddings Interprets documents, screenshots, and forms Extracts relevant information from files like receipts or PDFs Amazon Nova Act Automates UI interactions with real web applications Performs actions like clicking, filling forms, and navigating pages System Architecture User Interface Voice and chat interface Workflow dashboard Document upload Intent Processing Layer Nova 2 Lite analyzes the request Generates a structured workflow plan Agent Orchestration Layer Multi-agent system coordinates tasks Assigns actions to specialized agents Execution Layer Nova Act agents interact with real websites Perform automated UI actions Knowledge Layer Multimodal embeddings process documents and page content Data Layer Task states stored in DynamoDB Files stored in S3 Challenges we ran into One of the biggest challenges was designing a system that could translate natural language requests into reliable UI workflows. User instructions are often ambiguous, so we needed the planning agent to generate structured task steps that the automation agents could execute safely. Another challenge was handling dynamic web interfaces. Websites change frequently, so the UI automation agent had to be resilient and capable of understanding page context rather than relying solely on fixed selectors. Integrating voice interaction with real-time task execution was also complex, requiring synchronization between conversational AI and the workflow engine. Accomplishments that we're proud of We are proud that FlowPilot demonstrates how agentic AI can move beyond chat interfaces and actually perform work in the digital world. Key accomplishments include: • A working multi-agent workflow system powered by Amazon Nova • Autonomous UI automation using Nova Act • Voice-driven interaction using Nova 2 Sonic • Multimodal understanding of documents and web content • A modular architecture that can support many types of automation tasks Most importantly, FlowPilot shows how AI can reduce administrative friction and give users back valuable time. What we learned Building FlowPilot taught us several important lessons about the future of AI systems. First, agent orchestration is as important as model intelligence. Even powerful models need well-designed coordination layers to translate reasoning into actions. Second, multimodal understanding is critical for real-world automation, because most workflows involve documents, forms, and visual interfaces. Finally, we learned that voice interfaces dramatically lower the barrier to automation, making advanced AI tools accessible to non-technical users. What's next for FlowPilot AI Our vision is to turn FlowPilot into a universal AI automation layer for the internet. Next steps include: • Expanding the agent library for more workflows • Adding integrations with enterprise tools (Slack, Jira, Salesforce) • Building a user-defined workflow builder using natural language • Adding memory so the system learns from past automations • Deploying FlowPilot as a personal AI productivity platform Long term, we believe systems like FlowPilot can become the operating system for autonomous digital work, allowing people to focus on creativity and decision-making while AI handles repetitive tasks.
Built With
- all
Log in or sign up for Devpost to join the conversation.