GigConnect

Inspiration

It started with a conversation none of us expected to have at a hackathon.

One of our team members had recently moved to a new city. While looking for a plumber, he ended up calling five different numbers from a hand-written list taped to a chai stall. Three of them were disconnected. One arrived two days late. The guy who finally showed up was brilliant at his work ” fast, thorough, professional ” but he had no reviews, no digital presence, nothing. No way to prove his track record to the next person who needed him.

That image stuck with us: a genuinely skilled worker with zero verifiable identity in the digital economy.

India has over 400 million informal workers ” daily wage labourers, domestic helpers, drivers, electricians, plumbers, cooks. They are the backbone of urban life, yet most of them are invisible to the formal economy. They can't get loans because they have no credit history. They can't get better-paying gigs because they have no verified reviews. And they can't use most existing platforms because those platforms assume you can type, have a smartphone you're comfortable with, and are comfortable navigating English-first UIs.

We asked ourselves: what does a gig platform look like if you design it from scratch for someone who might not be able to read?

The answer was voice-first, language-first, and dignity-first. That became GigConnect.

How We Built It

We split into two parallel tracks from the start ” one person on the backend API and AI pipeline, another on the frontend and i18n system ” and synced every few hours to integrate.

The Voice Pipeline First

We made a deliberate decision to nail the voice-to-profile pipeline before building anything else, because it was the riskiest and most novel part of the system. If we couldn't reliably convert a Hindi voice note into a clean MongoDB document, the whole premise fell apart.

The pipeline we landed on:

The browser records audio via the MediaRecorder API and sends it as a base64 string in a JSON body.
The server uploads the audio to Cloudinary to get a stable public URL.
That URL is sent to Sarvam AI for transcription ” we chose Sarvam specifically because it handles Indian languages and code-mixed speech far better than generic STT models.
The transcript goes to Claude via the tool_use API, which returns a typed JSON object ” not markdown, not prose, a real structured document.
That document pre-fills a confirmation form on the frontend, which the user can edit before saving.

Getting this working end-to-end in the first few hours gave us confidence. Everything else was building on top of a solid foundation.

Multilingual UI Without Compromise

We made a rule early on: zero hardcoded English strings in any JSX file. Every user-facing label, button, toast, and placeholder had to go through react-i18next's t() function.

This sounds simple but it forces you to think differently about every component you write. You can't just write "Apply Now" and move on ” you have to define the key, add it to four locale files, and make sure the translation actually makes sense in context, not just as a word-for-word swap.

We supported Hindi, Tamil, Bhojpuri, and English. Language preference lives in localStorage only â€” the backend has no concept of language at all. This was a deliberate architectural choice: a user's language preference is a device setting, not a user attribute.

The AI Stack Came Together Piece by Piece

Claude handled extraction ” converting messy spoken language into clean structured data.
Groq with llama-3.3-70b-versatile powered the voice navigation system. The model's job isn't to understand the world ” it's only allowed to pick from a hardcoded list of actions per page. This constraint made the feature reliable instead of impressive-but-unpredictable.
The RAG pipeline for government scheme suggestions was built on top of a dataset we assembled ourselves: eligibility rules for welfare programmes across 18 Indian states plus central schemes. We turned these into embeddable documents so the system could retrieve relevant schemes based on a worker's profile rather than running through a giant if-else chain.
The TensorFlow.js neural network for market price prediction was trained offline on historical gig payout data. The trained weights are bundled with the server so inference is instant and free at runtime. We kept a hardcoded fallback table for cold-start situations where the model hasn't seen enough data for a given skill-region pair.

The QR Work Passport

This was the feature we were most emotionally invested in. The idea that a worker could pull out their phone, show a QR code, and have an employer instantly see five years of verified gig history, a trust score, and a Gold badge ” that felt genuinely meaningful.

The implementation was straightforward once the data model was solid: a single API endpoint (GET /api/proofs/passport/:workerId) aggregates the worker's profile, all GigCompletion records, trust score, badge tier, and Aadhaar status into one payload, and qrcode.react encodes a shareable URL to that page.

Challenges We Faced

Audio Encoding Was a Rabbit Hole

We spent an embarrassing amount of time on audio encoding. The MediaRecorder API produces different codecs on different browsers and operating systems ” WebM/Opus on Chrome, sometimes MP4 on Safari. Sarvam AI is particular about what it accepts.

Our solution was to standardize on base64-encoded WebM and document the browser support clearly, rather than trying to transcode on the fly. We also discovered that resource_type: 'video' ” not 'raw' or 'image' ” is the correct Cloudinary type for audio files, which cost us about an hour of debugging.

Making i18n Feel Natural, Not Translated

Machine-translating strings from English to Hindi and calling it done produces UI that feels robotic and condescending to native speakers. We had to go back through the Hindi and Tamil locale files and rephrase several strings that were technically correct but tonally wrong ” things like error messages that sounded harsh, or button labels that were grammatically accurate but not how anyone would actually say it.

The Bhojpuri locale was hardest. It has far less standardized written form than Hindi or Tamil, and we had to make judgment calls about register and vocabulary that native speakers might disagree with.

Geo-Matching Edge Cases

MongoDB's $nearSphere query with a 2dsphere index is powerful, but it requires coordinates to be stored as [longitude, latitude] (not [latitude, longitude], which is the intuitive order for most people). We had this reversed in the initial implementation, which caused the nearby gig feed to return results from across the country.

We also had to think carefully about what "within 5 km" means for a worker who doesn't have a precise GPS fix â€” the browser Geolocation API can return accuracy radii of several hundred meters, especially indoors. We settled on using the centroid and trusting the platform to surface gigs that are "probably nearby" rather than trying to be overly precise.

Trust Score Gaming

Once we had the trust score and badge system working, we immediately started thinking about how to break it. The most obvious attack: create fake employer accounts, post fake gigs, complete them with fake workers, and farm reviews to inflate trust scores.

We built a fraud filter that flags reviews under five words or containing profanity ” those reviews are excluded from avg_rating calculations. This is a basic heuristic, not a complete solution. A more robust implementation would look at review velocity, IP clustering, and timing patterns. We documented this as a known limitation rather than pretending the current filter is sufficient.

The Voice Navigator Was Harder Than It Looked

The hardest part of the voice navigation system wasn't the AI Groq's inference is fast and the intent classification worked well. The hard part was the state machine.

Speech synthesis and speech recognition are both asynchronous and stateful, and they interact in non-obvious ways. If the user taps the mic while the page description is still being read out, you need to cancel synthesis, start recognition, and handle any partial speech events in flight. We went through four different state machine designs before landing on the idle â†’ speaking â†’ paused â†’ listening â†’ resolving â†’ idle model that's in the final code.

Browser compatibility was also a genuine constraint. webkitSpeechRecognition is Chrome and Edge only. We made the design decision to let TTS degrade gracefully on unsupported browsers rather than blocking the feature entirely a user on Firefox can still hear the page described, they just can't speak their command back.

What We Learned

Voice-first design forces you to rethink everything. Onboarding flows, error states, confirmation steps â€” all of them are designed around reading text. When you strip that assumption away, you realise how much of "standard" UX is inaccessible to a significant portion of users.

Multilingual is not an afterthought. Building i18n in from day one with the strict rule of zero hardcoded strings made it genuinely multilingual. We've seen projects bolt on translation at the end and the result always feels broken around the edges. Doing it right means the translation keys shape how you structure the UI, not the other way around.

Constraints make better products. No multer. No bidding. Language in localStorage only. Market rates as a JS object. Each of these constraints came from wanting to keep the system simple and auditable, and every one of them turned out to be the right call under time pressure.

The informal economy is underserved, not unserved. These workers already have trust networks, referral systems, and reputation they're just analogue. GigConnect isn't introducing a foreign concept; it's digitising something that already exists and making it portable.

What's Next

GigConnect is a hackathon prototype, but it points at a real problem worth solving properly. The directions we'd most want to explore with more time:

Offline support workers in low-connectivity areas need the core flows to work without a stable internet connection.
UPI payment integration closing the loop so pay flows through the platform, which also gives us honest transaction data for the trust score.
Lender API exposing the Work Passport data via a consent-based API so NBFCs and microfinance institutions can underwrite loans based on verified gig history.
Expanding the scheme dataset 18 states is a start; full national coverage would require partnerships with state welfare departments.
Stronger fraud detection the current review filter is a heuristic; a proper model trained on review patterns would be far more robust.

Built at Odyssey Hackathon, 2025.