Inspiration

Every agent demo we'd seen breaks the moment something unexpected happens. This is because the agent has no idea who you are. Your shortcuts, your preferences, your workflow. We wanted to build something that starts dumb and gets smart about you specifically, through the normal act of using your computer.


What We Built

Gregory is an Electron desktop app wrapping a Computer Use Agent powered by CLōD. CLōD is a unified API routing requests across 25+ free LLMs from Anthropic, OpenAI, Google, Meta, and others.

The core loop: → Add a skill in plain English → Gregory generates a training scenario (a sequence of on-screen tasks and decision points) → you complete it → labelled trajectories feed a two-stage learning pipeline:

  • In-context learning — Gregory immediately distils a skill summary so the CUA can perform the task right away
  • LoRA fine-tuning — trajectories queue for local SFT via low-rank adaptation, $W = W_0 + BA$, with per-skill adapters that compose without interfering.

Everything, including trajectories, adapters and skill summaries, stays on your machine.


Challenges

Safety. This project requires a lot of permissions from the user's computer. By using sandbox, we can ensure the user's data are safe until control of AI agents.

Two-stage consistency. The in-context summary and the LoRA adapter can disagree. We fixed this by deriving the summary directly from the trajectory, so both are grounded in the same data.

LoRA fast enough to feel immediate. With $r = 8$ adapters and a few hundred steps, a fine-tuning pass runs in under two minutes on CPU — fast enough to finish in the background.


What We Learned

In-context learning and fine-tuning are complementary, not competing. In-context gets you to good enough immediately, and fine-tuning gets you to reliably correct over time. CLōD's dynamic cost routing also turned out to be genuinely useful, as hundreds of CUA calls over the hackathon, near-zero cost.


Built With

Share this project:

Updates