Inspiration
India has 100 million informal workers — cooks, tailors, weavers, caregivers — with decades of real skill and zero credentials to prove it. No certification system was ever built for them. HUNAR changes that. Workers demonstrate their skill on camera, answer questions by voice in their own language, and collect employer feedback — all on any basic Android phone in four minutes. Google Gemini evaluates everything together. No written test. No travel. No cost. The result: a QR-verified certificate they own permanently. Because the credential gap was never about skill. It was always about design.
What it does
Worker picks their language — Hindi, Rajasthani, Bhojpuri, English. Picks their skill. Then — she does not just answer questions. She cooks. On camera. In real time. Her hands. Her technique. Her masalas. Her sequencing. Everything she has learned over twenty years — visible. While she cooks she talks through what she is doing. And when she is done — her employer records a thirty second voice comment. No typing. Just speak. Google Gemini watches the video. Listens to her narration. Reads the employer comment. Evaluates all three together. You cannot fake twenty years of skill in front of a camera. Certificate. QR-verified. Three-layer verified. Hers permanently. She pays — nothing. Zero. Ever
How we built it
The AI evaluation pipeline works like this. After the worker submits her video, I extract frames at regular intervals using the camera package. I experimented with different sampling rates — too many frames slowed processing, too few missed key moments. I settled on one frame every two seconds for the right balance of accuracy and speed. Each frame is analysed using computer vision — face detection confirms the same person is present throughout. Activity analysis evaluates what the person is doing in each frame — hand position, tool usage, ingredient handling. I then pass the frame analysis results to Google Gemini along with a skill-specific prompt. The prompt tells Gemini exactly what to look for in a cook versus a tailor versus a caregiver. The scoring criteria are different for each profession. Gemini returns a structured JSON score. I parse that and render the progress bars you see on the analysis screen. One technical challenge: early versions gave inconsistent scores when the video quality was poor or lighting was bad. I solved this by adding a weighting system — Video Quality is one of the seven criteria, but it contributes less weight to the overall score than Technique or Ingredient Handling. A worker with a bad camera but excellent technique still scores well. The AI judges what she does, not how her phone looks
Challenges we ran into
The hardest bug I dealt with was inconsistent AI scoring across different lighting conditions. The same worker doing the same task would get an 87 in bright light and a 62 in dim light — not because her technique changed but because the frame analysis was struggling with low contrast. I debugged this by testing with multiple users in different environments — my hostel room, the kitchen, outdoors. I collected the raw frame data and found that the face detection confidence score dropped below 0.6 in low light, which was throwing off the whole pipeline. The fix was to add a minimum confidence threshold. If face detection confidence drops below 0.6, that frame is excluded from scoring and the remaining frames carry more weight. This made scores consistent across lighting conditions. I also had inconsistency in video processing where some videos were not being analysed at all. I traced this to a file path issue on certain Android versions. Fixed by using the app's temporary directory instead of external storage.
Accomplishments that we're proud of
I tested with my hostel warden — the person HUNAR was built for. Two things she showed me immediately. She kept reading the text on screen even though the app was speaking it aloud. So I reduced the text size significantly and made the microphone button much larger — so her eye goes to the button, not the words. She also did not understand what Level 2 meant on her certificate. I added the word Proficient next to the number. Two changes. Ten minutes of watching a real person use something I built. Neither came from a textbook
What we learned
I started with one assumption: the system exists but workers aren't using it. I was wrong. No system was ever built for them. That realisation changed everything. Technical challenge: making an app work for someone who cannot read. Solution: TextToSpeech reads every word aloud. SpeechRecognizer captures every answer. The worker never touches a keyboard. Human challenge: I tested with my warden and she kept looking at the text on screen instead of the microphone button. I made the button bigger. She stopped looking at the text. The best product decisions came from watching real people, not writing code.
What's next for HUNAR
The current prototype demonstrates the full flow. The full product adds three things — computer vision trained on domain-specific technique markers across all skill categories, support for 22 Indian languages using Google MuRIL, and an employer feedback loop where real-world performance feeds back into the certificate level over time. HUNAR was designed from day one to scale from prototype to infrastructure.
Log in or sign up for Devpost to join the conversation.