Inspiration
Applying to jobs online should be simple, but in reality it is one of the most repetitive digital workflows people experience. A single job application often requires filling the same information over and over again—name, email, resume, portfolio links, and answers to similar questions across different platforms. Job seekers can spend hours navigating websites, copying information between forms, and dealing with login walls or security prompts.
We asked a simple question: what if an AI agent could watch your screen and complete these workflows for you?
Instead of building another chatbot that only answers questions, we wanted to build something more powerful—an AI system that understands interfaces visually and performs actions on behalf of the user.
That idea became FlowState, a collaborative AI agent that observes the browser, understands forms and UI elements, fills information from your profile, and completes repetitive workflows while collaborating with the user when obstacles appear.
What it does
FlowState is a visual AI workflow agent designed to automate repetitive web tasks such as job applications.
The agent observes the user’s screen, interprets UI elements using multimodal AI, and executes actions like typing, clicking, and navigating across pages.
Key capabilities include:
Profile-aware automation
When the agent encounters a form field, it checks the user's profile data. If the information exists, it fills the field automatically. If the data is missing, the agent asks the user for the value and optionally saves it to the profile for future workflows.
Over time, the system builds a reusable automation-ready profile.
Collaborative obstacle handling
Real websites contain login walls, CAPTCHAs, and security checks. Instead of failing silently, FlowState pauses and asks the user to complete the required step. Once the user resolves the obstacle, the agent resumes exactly where it stopped.
Agent reasoning timeline
FlowState includes a live reasoning timeline that shows how the agent is thinking and acting in real time. Each step is categorized (analyzing, navigating, filling forms, requesting user help, submitting applications, etc.) and displayed with confidence scores.
This transparency helps users understand and trust the automation.
Workflow memory
FlowState recognizes different job platforms and adapts its behavior accordingly. The system detects platforms such as LinkedIn, Greenhouse, Lever, and others, tagging workflow patterns so future automation becomes faster and more reliable.
How we built it
FlowState is built as a multimodal AI agent system that combines browser automation, real-time reasoning, and cloud infrastructure.
The system architecture includes:
AI Planning Layer
A Gemini-powered planner interprets screenshots and UI context, deciding the next action the agent should perform. The planner outputs structured actions such as clicking elements, typing into fields, requesting user input, or navigating to new pages.
Execution Layer
An orchestrator manages the agent loop:
observe → plan → act → verify
This component executes browser actions, tracks workflow progress, and manages pause/resume states when user assistance is required.
Profile Memory
User information is stored in a structured profile system that includes both core fields and dynamically generated fields discovered during automation.
When the agent encounters a new form field (for example GitHub URL or visa status), it can create a new profile field automatically and reuse it later.
Real-time Agent Interface
The frontend interface shows the agent’s activity through a reasoning timeline, confidence indicators, and collaborative prompts when user input is needed.
Challenges we ran into
Building a reliable web automation agent was significantly harder than expected.
Handling real-world websites
Job platforms often include login walls, dynamic interfaces, and security checks. We had to design a collaborative pause-and-resume system so the agent could safely continue after user intervention.
Form variability
Every job platform structures application forms differently. We solved this by creating a flexible profile system that allows the agent to dynamically create and reuse new profile fields.
Agent transparency
Many AI systems behave like black boxes. To make the system understandable and trustworthy, we implemented a reasoning timeline that shows each decision the agent makes.
What we learned
This project taught us that true AI agents require collaboration between humans and machines.
Fully autonomous automation often fails in unpredictable environments. Instead, the most reliable systems are those that can adapt, ask for help, and learn from user interactions.
We also learned that combining multimodal AI with structured workflow orchestration opens the door to a new generation of intelligent automation tools.
What's next for FlowState
FlowState can extend far beyond job applications.
The same agent architecture could automate many repetitive workflows such as:
- lead generation and CRM updates
- form-based business processes
- research workflows
- QA testing for web applications
- cross-platform workflow automation
Our long-term vision is to build a general-purpose AI workflow agent that helps users complete complex digital tasks simply by describing their goal.
Built With
- TypeScript
- Angular
- Node.js
- Gemini API (Google AI)
- Google Cloud
- Firestore
- WebSocket real-time streaming
- Browser automation
- Multimodal UI interpretation
Built With
- angular.js
- api
- automation
- browser
- cloud
- firestore
- gemini
- node.js
- streaming
- typescript
- websocket
Log in or sign up for Devpost to join the conversation.