What Inspired Us
Mental health and physical wellness in India are broken by design.
Walk into any city — gyms are expensive, dietitians charge per consultation, therapists have 3-week waitlists, and wellness apps cost more per month than a family's grocery budget. The people who need help the most have the least access to it.
The AI revolution was producing extraordinary tools — LLMs that could hold empathetic conversations, speech models that could listen in real time, multimodal models that understood images and audio together. But all of it was fragmented across a dozen products, none of which spoke to each other, and none of which were free.
We asked: what if we built the wellness app India actually needs? Not a SaaS. Not a premium product. A platform where free access is a core feature, not an afterthought. That was the spark for NUTRACIA.
The name reflects the mission: Nutr (nutrition) + acia (vitality and care) — a word that belongs in a future where health is a right, not a luxury.
What We Learned
1. Real-time voice AI is harder than it looks
Getting sub-800ms response latency — from speaking, to transcription, to LLM reasoning, to audio playback — required understanding every bottleneck in the stack. The browser's native SpeechRecognition API paired with SpeechSynthesis delivers a surprisingly good experience when the LLM (Groq) is fast enough. Deepgram Nova-2 gave us higher accuracy when needed.
Key insight: the bottleneck is almost always the LLM, not the STT or TTS.
2. Multimodal AI requires careful prompt engineering
Getting Gemini 2.0 Flash to return structured JSON from an image — emotions, themes, mood tags, sentiment score — reliably, without markdown fences breaking the parse, required real iteration. We learned to always use regex extraction (/\{[\s\S]*\}/) as a safety net. Never trust that JSON will arrive clean.
3. LLM fallback chains are essential
Groq is fast and free, but free-tier rate limits are real. Building a transparent fallback to OpenAI GPT-3.5-turbo — same interface, same system prompt, same response format — meant users never experienced downtime.
Lesson: design your AI service layer to be provider-agnostic from day one.
4. The math of wellness personalization
Designing the diet and grocery agent required grounding prompts in real nutritional science:
$$\text{Protein} = 0.8 \times \text{body weight (kg) grams/day}$$
$$\text{Daily Caloric Need} = \text{BMR} \times \text{Activity Factor}$$
$$\text{BMR} = 10W + 6.25H - 5A + S$$
where \(W\) = weight (kg), \(H\) = height (cm), \(A\) = age (years), \(S\) = \(+5\) for males, \(-161\) for females.
These formulas gave the grocery agent nutritional context to reason over, not just keyword-match product names.
5. 3D on mobile is a privilege, not a given
WebGL workout galleries looked stunning on desktop. On budget Android devices — where most of India's internet population lives — they caused dropped frames and crashes. We implemented a device capability check with a graceful 2D fallback.
Accessibility and inclusion are engineering problems, not just design problems.
How We Built It
We chose a monorepo structure — React + Vite frontend, Node.js + Express backend, run concurrently in development.
The architecture is intentionally simple and linear:
Browser → Express REST API → AI Service Layer → MongoDB
No microservices. No message queues. Just clean route handlers, thin service modules (one per AI provider), and Mongoose models. Every part of the codebase readable by a new contributor in under an hour.
Voice Therapy runs on a four-state machine — idle → listening → thinking → speaking — each state reflected in the animated morphing orb: wave layers, equalizer bars, and glow colors all shift to make the AI's "mental state" feel tangible and human.
The AI Grocery Agent skips a formal LangGraph pipeline entirely. Instead, we implement the full reasoning loop inside a single, carefully structured prompt to LLaMA-3.3-70b — instructing it to validate macros, check cost, and match diet preference as part of its chain-of-thought before emitting JSON. It behaves like a multi-step agent without the infrastructure overhead.
Challenges We Faced
The voice latency problem
Our initial architecture made three sequential API calls: Deepgram (STT) → Groq (LLM) → Deepgram (TTS). On slow networks, this felt sluggish. We solved it by moving default STT and TTS to the browser's native Web Speech APIs, eliminating two round trips entirely. The Deepgram endpoints remain available for higher accuracy — but the default path is now zero-latency, client-side.
Consistent JSON from LLMs
LLMs occasionally wrap JSON in markdown fences, add trailing commas, or prepend commentary. We built a /\{[\s\S]*\}/ extraction pattern across every AI service returning structured data, plus hardcoded fallback responses so the UI never crashes on a bad parse.
Emotion detection accuracy
The Smile Gate runs entirely on-device — no data sent to any server. Getting reliable detection across different lighting conditions, skin tones, and webcam qualities was a real challenge. We tuned the confidence threshold and added a real-time face overlay so users always understand what the system is looking for.
Keeping everything free
Every API had to have a genuine free tier — not a trial that expires in two weeks. This constraint drove every architectural decision: Groq over paid-only OpenAI, Deepgram's $200 credit over ElevenLabs' 10,000-character cap, MongoDB Atlas M0 over a paid cluster, Cloudinary's 25 GB over S3.
The free-first constraint turned out to be a feature. It forced us to find the best tools, not just the most popular ones.
Building for India specifically
A generic wellness chatbot gives generic advice. NUTRACIA's system prompts encode cultural context explicitly: the grocery agent knows Flipkart and Amazon India, recommends MuscleBlaze and Patanjali, prices everything in ₹. The chatbot knows what dal is. These details matter enormously for trust and real-world adoption.
The Road Ahead
NUTRACIA is a foundation, not a finished product. What comes next:
- Wearable integration — sync with fitness trackers for real biometric context
- Regional language support — Hindi, Tamil, Marathi, Bengali voice therapy
- Community features — peer wellness groups with AI-moderated support
- Offline mode — lightweight cached models for intermittent connectivity
- Clinical partnerships — a verified doctor network for Nearby Care
Every future feature will be built under the same constraint that started this project:
It must be free, and it must work for the 800 million.
Built With
- aura-tts)
- autoprefixer
- axios
- bcryptjs
- cloudinary
- cloudinary-sdk
- concurrently
- cors
- deepgram-nova-2-(stt)-&-aura-(tts)
- deepgram-sdk-(nova-2-stt
- drei
- es-modules
- express-rate-limit
- express.js
- framer-motion
- github
- google-gemini-2.0-flash
- google-generative-ai-sdk-(gemini-2.0-flash)
- google-maps-places-api
- groq-(llama-3.3-70b)
- groq-sdk-(llama-3.3-70b-versatile
- gsap
- helmet
- jwt
- jwt-(jsonwebtoken)
- llama-3.1-8b-instant)
- lucide-react
- mongodb-atlas
- mongoose
- morgan
- multer
- node.js
- nodemon
- ogl
- openai-gpt-3.5-turbo
- openai-sdk-(gpt-3.5-turbo)
- postcss
- react-18
- react-router-v6
- react-three-fiber
- render
- tailwind-css
- three.js
- uuid
- vercel
- vite
- web-speech-api
- web-speech-api-(speechrecognition-+-speechsynthesis)
- webgl
- websocket-(ws)
Log in or sign up for Devpost to join the conversation.