Inspiration

We kept running into the same friction: you see something in real life and want it, but the path from "I want that" to "I own that" is still five steps too long. Search, compare, check reviews, add to cart, checkout. Every time. We wanted to cut all of it out. The name Kaimon means "buying gate" in Japanese, and that's exactly what we wanted to build: a gate you walk through once and come out the other side with your item on the way.

The timing felt right too. With global supply chains in flux and tariffs shifting constantly, we thought there was something interesting in not just buying things automatically, but buying them intelligently, with real awareness of where things come from and what conditions look like right now.

What it does

Kaimon lets you point your camera at any physical object and handles the entire purchase automatically. You scan, it figures out what it's looking at, maps out the global supply chain for that item on a 3D globe, then dispatches a fleet of AI agents to find the best option based on your budget and past buying behavior and complete the purchase.

The supply chain layer is more than a visual. It pulls live data on maritime chokepoints, shipping rates, and tariff trends to color-code risk levels by manufacturer region, so you actually understand what you're buying and where it's coming from before you commit.

How we built it

The pipeline has a few distinct layers that all have to talk to each other in real time.

Video comes in from a webcam or mobile app, gets converted from WebM to MP4 via FFmpeg, and is uploaded to TwelveLabs where Pegasus 1.2 indexes it and returns a detailed description of what's visible. That raw description goes through Claude Haiku, which strips out everything that isn't a physical purchasable object and returns a clean JSON list.

From there, WorldMonitor APIs pull live supply chain data: shipping stress, chokepoint status, tariff trends across major trade corridors. Claude takes that data alongside the detected objects and generates manufacturer pins with risk color-coding, placed on an interactive globe built with react-globe.gl.

On the purchasing side, we built nine Fetch.ai uAgents that work in parallel: an orchestrator that breaks down the shopping list, a search agent, a ranker that scores results, a treasury agent that manages budget approval, and four buyer agents that each handle a separate item simultaneously using Browser Use for browser automation. The whole thing is tied together with a FastAPI backend, a React frontend, and Stripe for payment verification before any purchase is triggered.

Challenges we ran into

Coordinating nine agents without them stepping on each other was harder than expected. Getting the ranker, treasury, and buyer agents to communicate cleanly under time pressure required a lot of iteration on the message-passing schema.

TwelveLabs requires videos to be at least five seconds long, and webcam clips from the browser often came in shorter than that. We ended up padding clips by freezing the last frame with FFmpeg, which works but added edge cases to handle.

Browser automation is inherently brittle. Dynamic checkout flows, CAPTCHAs, and page load timing all caused failures we had to build around. Getting four buyer agents running stable and parallel sessions was one of the last things to come together.

Accomplishments that we're proud of

Getting the full pipeline working end to end was the big one. Scan something, watch the globe populate with manufacturer locations and supply chain risk data, then see four browser agents spin up and start shopping in parallel. Seeing it actually work was a good moment.

The supply chain visualization turned out better than we expected. The combination of real WorldMonitor data and Claude-generated context means the pins aren't just decorative; they're grounded in actual shipping and trade conditions.

What we learned

Orchestrating multiple autonomous agents is a systems design problem as much as an AI problem. The intelligence is only as useful as the coordination layer underneath it. We spent more time on agent communication and state management than on any single model integration.

We also learned that video AI and browser automation both have a lot of rough edges in production. Building for the happy path is fast. Building for everything else takes most of the time.

What's next for Kaimon

Smarter personalization is the obvious next step. Right now the agents optimize for price within a budget. We want them to learn your preferences over time: brands you trust, specs you care about, retailers you've had bad experiences with.

We also want to expand beyond a single retailer and let agents comparison shop across platforms before committing. And on the supply chain side, there's a real opportunity to use the WorldMonitor data not just for visualization but to actually influence purchasing decisions, like automatically routing to a supplier from a less disrupted region when the primary source is under stress.

Built With

Share this project:

Updates