Inspiration
I kept thinking about a bakery owner in Connaught Place who wants to open a second location. To do it properly, she needs to hire a market research consultant, brief a designer, figure out social media, and wait weeks for everything to come together — and that's if she can afford it. Most small business owners in India just skip the whole thing and open on instinct.
That felt wrong to me. The intelligence exists. The tools exist. What's missing is something that puts it all together and hands it to you in plain language, without requiring you to be a marketing expert.
And then there's the deployment problem. Even when you build an automated pipeline, it breaks silently. Google OAuth tokens expire. Refresh tokens get revoked. Your "autonomous" system goes offline at 2 AM and nobody knows until morning. I wanted a pipeline that stays online — where authentication is managed infrastructure, not a tokens.json file on disk.
That's what I set out to build — not a chatbot, but something closer to an operator. Something you talk to, and it acts. Something that deploys, and it stays deployed.
What it does
Market Vector is a voice-first AI marketing operator for Indian small businesses. You talk to it. It does the work.
Here's what that looks like in practice:
Location scouting — drop a pin anywhere on the map and Market Vector scores that location out of 100. Not a vague score — a breakdown across footfall density, spending power, competitor saturation, accessibility, visibility, growth trajectory, and how well the area fits your specific business category. Competitor pins drop onto the map in real time. Demographic heatmaps show who walks through at which hour of the day. The AI pulls in live news, Google Trends, and Places data and synthesises it into a brief you can actually act on.
Space analysis — upload a photo of your shop interior or your storefront. The AI looks at it in the context of your neighbourhood, nearby landmarks, and what's happening culturally in the next 30 days. It suggests specific physical improvements — not generic advice, but things like "add cricket memorabilia near the entrance, IPL season starts in 11 days and the stadium is 400 metres away." Then it renders an AI-improved version of your space so you can actually see what it could look like.
Product analysis — upload a photo of your product. The AI identifies who buys it, which upcoming festivals drive demand for it, which channels to use to reach those customers, and generates a ready-to-use ad image.
Campaign strategy — describe your goal by voice. The Visionary Agent builds a complete brief: campaign name, headline, target audience, channel plan with priority rankings, timing tied to the cultural calendar, and prompts ready to fire into the image and video generators.
Ad Studio — generate images with Imagen, edit them with AI surgical editing, render 10-second video ads with Veo 3.1, arrange everything on a timeline, and deploy directly to YouTube or Gmail. The narration voice is dynamically selected by the AI — a warm female voice for a bakery, a deep authoritative voice for a luxury brand, a playful voice for a kids' product. All by voice.
Secure autonomous deployment — this is where Auth0 changes everything. YouTube and Gmail deployments go through Auth0's Token Vault. The pipeline never holds raw Google tokens. Auth0 manages the refresh cycle, exchanges tokens on demand, and ensures the autonomous pipeline stays online 24/7 without silent invalid_grant failures. You authenticate once through Auth0's Universal Login, and the system handles everything from there.
The whole thing responds to natural speech. You never have to click through menus if you don't want to.
How I built it
The frontend is React 19 with Vite, Framer Motion for the cockpit animations, Tailwind CSS for styling, and @react-google-maps/api for the live map layer.
The backend is Node.js with Express, with a WebSocket server that proxies the Gemini Live API connection. Google Places powers the competitor data. Google News RSS feeds the live market awareness panel. Google Trends handles momentum scoring. FFmpeg handles the video timeline rendering via a composition service.
The Auth0 integration uses express-openid-connect for OIDC session management and @auth0/ai-vercel for the Token Vault. When a user authenticates through Auth0's Universal Login with the google-oauth2 connection, Auth0 captures and manages the Google refresh token. At deployment time, the backend calls Token Vault to exchange the Auth0 session for a fresh Google access token — no stale tokens, no disk persistence, no expiry failures.
The key Token Vault exchange looks like:
const auth0AI = new Auth0AI({
auth0: {
domain: process.env.AUTH0_DOMAIN,
clientId: process.env.AUTH0_CUSTOM_API_CLIENT_ID,
clientSecret: process.env.AUTH0_CUSTOM_API_CLIENT_SECRET
}
});
const googleToken = auth0AI.withTokenVault({
connection: 'google-oauth2',
scopes: ['https://www.googleapis.com/auth/youtube.upload']
});
The AI layer:
- Gemini 2.5 Flash Native Audio via the Live API handles the real-time voice conversation — it listens, reasons, speaks, and embeds
[ACTION:TYPE:PARAMS]tags in its responses that trigger real operations in the UI - Gemini 2.5 Flash handles market analysis, space improvement analysis, product analysis, campaign strategy, cultural calendar reasoning, and dynamic voice profiling — it reasons about the product category and campaign tone to select the optimal TTS voice and speaking style
- Gemini 2.5 Flash TTS renders narration with AI-selected voices and natural language style steering (e.g., "speak with warmth and invitation, slowly and deliberately")
- Imagen generates ad images and AI-renders improved versions of uploaded business spaces
- Veo 3.1 renders the video ads
The agent architecture has two specialised agents — Oracle (tactical intelligence, news recon, trend analysis) and Visionary (campaign strategy, creative direction, voice profiling) — coordinated through the central Gemini Live voice loop. The UI parses action tags from audio transcriptions, executes them sequentially, and returns TOOL_RESULT feedback to close the loop.
Challenges I ran into
Getting action tags out of an audio-only model was the hardest problem. Gemini 2.5 Flash Native Audio doesn't output text — it speaks. The [ACTION:NAVIGATE:LOCATION_INTEL] tags I needed the AI to emit only arrive via outputTranscription, not through modelTurn.parts like a text model. Getting that pipeline reliable — intercepting transcription in real time, parsing tags, executing actions, feeding results back — took a lot of debugging.
OAuth token decay in autonomous pipelines — the original deployment flow used raw Google OAuth with tokens stored on disk. It worked for demo sessions but failed overnight. Refresh tokens got revoked, invalid_grant errors killed YouTube uploads silently, and the "autonomous" pipeline wasn't actually autonomous. Integrating Auth0 Token Vault solved this by moving token lifecycle management into managed infrastructure — Auth0 handles the refresh cycle, and the backend just asks for a fresh token at deploy time.
Audio desync in long sessions — the nextPlayTime variable that schedules audio chunks was a bare let inside the hook function. Every React re-render reset it to zero, causing the next audio response to overlap or stutter. Moving it to a useRef was a one-line fix that took a long time to find.
Google Trends rate limiting from Indian IPs — the google-trends-api package gets 429s quickly when you're making repeated queries from India. I built a resilient fetch helper with exponential backoff and graceful fallback data so the app degrades cleanly instead of crashing.
CJS/ESM interop with Auth0 SDK — the @auth0/ai-vercel package ships broken ESM exports for certain Node.js versions. I had to use createRequire to load the CJS build as a workaround:
import { createRequire } from 'module';
const require = createRequire(import.meta.url);
const { Auth0AI } = require('@auth0/ai-vercel');
Accomplishments that I'm proud of
The agentic loop actually works end to end. You can say "analyze Khan Market for my bakery" and within 30 seconds you have a scored location brief, competitor pins on the map, a demographic heatmap, and a campaign brief — without touching the keyboard.
The space improvement feature is the one I'm most proud of because nothing else does it. Taking a photo of your interior, cross-referencing it with your location, what's nearby, and what's happening culturally in the next month, then rendering an AI-improved version — that's a genuinely new thing.
The Auth0 Token Vault integration makes this a production-grade system, not a demo. The deployment pipeline stays online across sessions, handles token refresh transparently, and the user authenticates once through a familiar Auth0 Universal Login flow. The Ad Studio deploy panel shows live Token Vault status — the user sees "✓ SECURE" when Auth0 is active, with their authenticated email visible.
The dynamic voice selection is a subtle but powerful touch — the AI reasons about the campaign (kids product? luxury brand? local bakery?) and picks not just the voice but the speaking style and pacing. A bakery ad gets a warm, inviting female voice at a natural pace. A premium watch ad gets a deep, authoritative male voice, slow and deliberate. It's the difference between TTS that sounds like a robot and TTS that sounds like a creative decision.
What I learned
Agentic voice interfaces are a completely different problem from chat. The hard part isn't the AI reasoning — it's keeping three systems (what the AI says, what the UI parses, what actually executes) in sync in real time. The [ACTION:TAG] pattern with sequential execution and TOOL_RESULT feedback was the thing that made it reliable.
Auth0 as infrastructure, not a feature. The biggest lesson from integrating Token Vault was that authentication should be invisible plumbing, not something the application worries about. Once I moved token management out of my code and into Auth0, an entire class of bugs — expired tokens, revoked grants, silent deployment failures — disappeared permanently.
Cultural context is underrated. Injecting today's date and the user's location into every AI prompt costs nothing and changes everything. "Diwali is in 15 days and you're in a Hindu neighbourhood" produces a completely different — and dramatically more useful — campaign brief than a generic seasonal recommendation.
Building for a specific person makes you make better decisions. Every time I wasn't sure what to build next, I asked myself what that bakery owner in Connaught Place would actually need. That question kept me focused.
What's next for Market Vector
Satellite development monitoring — using Google Maps Static API tiles with Gemini Vision to detect nearby construction, new hotel openings, or mall developments and alert business owners to emerging opportunities before their competitors notice.
Competitor price intelligence — surfacing pricing gaps from publicly available Google Places data and customer reviews so owners can position themselves more precisely.
Multi-location management — letting an owner with several outlets compare location scores, campaign performance, and market conditions across all of them in a single voice session.
Auth0 role-based access — extending the Auth0 integration to support team-level access with role-based permissions, so a bakery chain owner can give managers view-only access to analytics while retaining deployment authority.