Inspiration

One of us filed an LLC last year and spent a Saturday Googling each field. "Manager-managed vs member-managed"? "Registered agent"? Annoying for us. Genuinely gatekeeping for someone whose first language isn't English. The real gap isn't translation. A US-born founder calls their lawyer and gets told what to file, in what order, with citations. The conversation surfaces things they didn't know to ask. Translation tools can't do that. We wanted to build that consultation.

What it does

Shu Xiang is a cross-language onboarding agent for immigrant founders. Three steps:

User describes their business in their native language. Agent builds a profile. Agent generates a personalized checklist of every bureaucratic obligation that business has, with citations to real regulations. For a Chicago single-member restaurant LLC, that's 12 items, including several most first-time founders don't know exist. Agent opens the government websites and fills the forms autonomously. At judgment fields, it pauses, highlights the field with a bilingual annotation overlaid on the page, and asks a clarification question in the user's language. User responds by voice. Agent continues.

How we built it

Python orchestrator. browser-use drives the browser with Claude Sonnet 4.6. Whisper for transcription, Claude for intent extraction, ElevenLabs for Chinese TTS. The overlay is a React app injected into the browser via Playwright. Highlights and annotations land on the right fields using DOM bounding boxes, so no computer vision needed. A custom browser-use tool handles the clarification interrupts. The requirements database is hand-curated. Every obligation has the issuing authority, fee, and a link to the real regulation. We did this work ourselves rather than ask the model, because the whole point of the consultation is that it's correct.

Challenges we ran into

The hardest part was integrating the overlay with voice input and the autonomous agent simultaneously. browser-use wants to run end-to-end. Our design needs it to pause, hand control to the overlay and mic, then resume with the answer. Getting the custom tool to actually halt the agent loop and wait for async input took most of our integration time. Intent detection for form submission was the other big one. The user describes their business in one sentence, and the agent has to map that onto specific English form fields with the right types, fast enough that the overlay renders before the user's attention moves. Real .gov sites also break in unpredictable ways. A lot of debugging went into making the demo path reliable on the actual Illinois SOS site.

Accomplishments that we're proud of

We have the core loop working. User speaks in Chinese, intent extracts, the form fills, the overlay renders bilingual context, clarification moments interrupt the flow cleanly. End to end, on a real government website, with real obligations from real regulations. What we learned The interesting work in an agent product right now isn't the model. It's everything around it: the curated knowledge, the interaction model, the moments where the agent should stop and ask. "Translation" framings are a trap. The user doesn't need a better translator; they need a system that knows what to surface, when to interrupt, and how to act on their behalf in a language they don't operate in.

What's next for Shu Xiang

Extending the requirements database to more business types and jurisdictions. Adding Spanish as the next language. Longer term, covering ongoing filings (annual reports, tax obligations, IRS notices), because the relationship between a founder and US bureaucracy isn't a one-time transaction.

Built With

Share this project:

Updates