Glide Architecture: Manifest-Based Deterministic Automation

Glide — Natural Language Automation for Web Apps

Inspiration

I build web apps for cooperatives, local shops, and waste collectors in Kenya. These apps are data-heavy by nature, with dozens of farmers, daily deliveries, processing, and sales; this easily adds up to hundreds of data points entered manually every day.

Every new staff member means another week of training, and turnover means starting over. I wanted to eliminate that cost by letting users interact with any web app through natural language.

What it does

Glide is a Chrome extension that turns any web app into a natural language interface.

Users type commands like:

"Add farmer Jane Wambui from Kericho."
"Rekodi delivery ya James Mwangi, kilo 200, grade A."

Glide navigates the app, opens the right forms, fills fields, and submits automatically.

It supports create, update, and delete operations, works in any language Gemini understands, and requires zero code changes to the target app.

How I built it

Glide works in two phases.

First, a DOM scanner detects an app’s navigation, forms, and fields. Gemini enhances this scanned data into a structured manifest, adding semantic understanding so terms like "weight", "kilos", and "kg" all map to the same field.

Second, at runtime, each user command is sent to Gemini, which parses intent, extracts entities, and produces a step-by-step execution plan. A ghost navigator then executes the plan deterministically using the manifest’s selectors.

The stack is Chrome Extension (Manifest V3), React, TypeScript, Vite, Zustand,and the Gemini API.

Gemini Integration

gemini-3-flash-preview - primary model for command parsing and entity extraction
systemInstruction - passes the full app manifest as context without consuming output tokens
responseMimeType: 'application/json' - guarantees structured JSON responses for execution plans
Manifest Enhancement - Gemini enriches raw DOM scans with semantic hints and multilingual keywords
Multilingual support - native English and Swahili entity extraction, including code-switching, with no per-language configuration

Challenges I ran into

Voice input didn’t work in practice.
Glide started as a voice-first tool, but speech recognition struggled with Kenyan accents, Swahili–English code-switching, noisy environments, and local names. This led to frequent errors, especially with numbers. We pivoted to text-based commands, which proved more accurate, private, and reliable in real working conditions.

Selector collisions.
The scanner initially generated selectors like button.btn-primary for submit buttons inside modals. The same selector also matched the button that opened the modal, causing Glide to re-open the modal instead of submitting. We fixed this by preferring button[type="submit"] for submit actions.

Service worker restrictions.
Vite’s module preload polyfill injected DOM references into the build. Chrome Manifest V3 service workers have no DOM access, causing silent crashes. We disabled module preloading and enforced build checks to ensure zero DOM usage in the background script.

Over-validating user input.
Early versions rejected valid commands because they didn’t match predefined hint lists. We learned to trust Gemini more and limit validation to only required fields.

Accomplishments that I'm proud of

A cooperative manager who has never seen the app before can type:

ongeza mkulima Mary Akinyi simu 0733112233 kutoka Nandi

Glide handles navigation, form filling, and submission in Swahili, on the first try. No training, no onboarding, no manual.

The manifest approach also lets any developer enable Glide in minutes with zero code changes — just generate, download, and drop in a JSON file.

What I learned

Trust the model. Early versions pre-validated every command against semantic hint lists before calling Gemini. Commands like "add 20 pieces of candy" got rejected because no word matched the expected hints. Removing that layer and letting Gemini handle entity extraction directly made the system both simpler and more accurate.

Structure beats vision. I assumed AI automation meant teaching models to "see" interfaces. The manifest approach, letting apps describe themselves, turned out to be more reliable, faster, and privacy-safe. It also made debugging trivial: when something breaks, you read the manifest, not a screenshot.

Gemini handles multilingual code-switching natively. I expected to need per-language rules for Swahili-English mixing. I didn't.

What's next for Glide

Web standard for automation - a <link rel="glide-manifest"> tag that lets any website declare its automation interface, the way sites declare APIs today
Beyond forms - extend manifests to cover checkout flows, multi-step wizards, settings, and data exports
Accessibility infrastructure - partner with screen readers and assistive tools so natural language becomes a navigation layer for the visually impaired and motor-impaired users
Voice input and offline queuing - speech-to-text for hands-busy environments, with command queuing for unreliable connectivity

Built With

gemini
react
typescript
vite
zustand

Updates

Cynthia Pendo started this project — Feb 09, 2026 01:38 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.