Inspiration

In nature, repeated actions tend to become easier and more efficient over time — from neuroplasticity in the brain to dirt paths forming naturally across grassy fields. These “paths” emerge as patterns are reinforced through repetition.

Do Less Agents is a natural extension of this phenomenon applied to personal computer usage. Instead of humans manually repeating the same workflows, the system allows agents to observe interactions, learn from them, and gradually form optimized “pathways” that automate both current and future tasks. Over time, multiple agents and repeated iterations contribute cumulatively, strengthening these pathways and making automation faster and more robust.


What It Does

Do Less Agents is a personal computer-use agent and tool-generation platform.

Users and agents perform tasks directly on the computer, and the system learns from these interactions to automatically generate reusable functions and workflows. These functions can later be invoked with parameters to reproduce or adapt the task without repeating the original manual steps.

Key capabilities include:

  • Learning from real computer interactions (navigation, clicks, inputs, and context)
  • Automatically generating parameterized functions from those interactions
  • Creating reusable workflows that can generalize to related tasks
  • Leveraging multimodal AI (vision and speech) to improve task understanding and function creation

Multimodal features from the Gemini model — especially vision and speech — are critical for tools like the Smart Scraper, which extracts meaningful elements from complex webpages, and for enabling spoken descriptions of functions.


How We Built It

The project evolved through several stages:

  1. Record & Playback Chrome Extension We started by building a Chrome extension that records user navigation along with audio and text notes. This data is then analyzed to infer the task structure and generate functions with parameters.

  2. Agent Autonomy via Computer Use API We expanded the system by giving AI agents autonomy through a computer-use API, allowing them to navigate interfaces on their own.

  3. Function Generation by Agents The agent’s actions were integrated into the same record-playback pipeline, enabling AI agents to generate their own reusable functions from autonomous navigation.

  4. Backend Simulation A simulated backend was implemented using Python and Docker, allowing users to upload, store, and execute generated functions.


Challenges We Ran Into

  • Severe time constraints
  • Limitations of Chrome extensions, including workarounds required to call local models (e.g., Ollama)
  • Coordinating multiple tools and capabilities while keeping agent behavior stable and reliable

Accomplishments We’re Proud Of

  • Smart Scraper Tool: Successfully uses Gemini’s vision capabilities to cut through noisy DOM structures and extract only the most relevant elements needed to form reliable automation functions.
  • Demonstrating cumulative learning through multiple agents and iterations
  • Seamlessly blending user-driven and agent-driven task creation into a single system

What We Learned

  • Effective prompting dramatically improves agent performance and reliability
  • Computer Use APIs provide significantly more flexibility and power compared to traditional Chrome extension-only approaches
  • Multimodal AI is essential for real-world automation, especially in messy, unstructured environments like the web

What’s Next for Do Less AI

  1. Build and launch a full production backend for persistent function storage
  2. Expose generated tools through a locally running MCP server, making them accessible to other agents and applications

Built With

Share this project:

Updates