Inspiration

Agents are the most popular thing in tech right now. Claude Code and OpenClaw make it easy to issue natural language commands and get real results. I built a browser agent that does exactly that. It's a CLI app that you start in your terminal, then enter any natural language command. Watch it automatically open the browser and navigate to it. It's like magic.


What it does


There are two agents in this repo:

  1. One that navigates websites with extractable DOM trees (most traditional websites or websites like Y Combinator's Hacker News or US government websites).
  2. Another agent that uses vision for websites with hidden DOM (like Apple or Amazon).

How we built it

I built this in about 12 hours using Claude Code, Codex, and JavaScript.

Challenges we ran into

Extracting DOM from websites is really hard. It's messy and hard to parse, and the LLMs easily get confused by reading too much of it. I spent a lot of work on my custom DOM parser so the agent knows exactly what context is needed.

Accomplishments that we're proud of


It's a little messy but watching the agent successfully navigate the web is something that I'm super proud of.

What we learned

Building an agent is really hard. I can see why this is so popular in tech right now.

What's next for autonomous web agent

I didn't have the time but I would have loved to add a desktop agent that pairs with a browser agent. This would allow you to issue commands like "Open iMessage, see what my last text from mom was, and open the Amazon link in that message" I think with another couple hours I would have been able to get it!

Built With

Share this project:

Updates