Inspiration
At a local market I visit often in Lagos, there is a vendor who cannot speak. Every interaction I had with him was a silent struggle. I would point at tomatoes; he would type a price on a calculator. I would type a quantity; he would nod. Numbers replaced words, and calculators replaced voices.
This isn't just his story. In Nigeria and across the globe, millions living with ALS, stroke-related disabilities, autism, and other speech impairments are widely misunderstood and underserved. Existing tools are often expensive, complex, or require fine motor skills that many do not have. We built SayIt to change this. We wanted to create a tool that doesn't just 'speak' for you, but understands what you want to say before you even finish the sentence using the power of Gemini 3 to turn a simple head movement into a full conversation.
What it does
SayIt is a web-based, accessibility-focused communication tool that allows users to "speak" using only head movements plus hands movement to your preference. Head Tracking Interface: Users navigate a grid of words, phrases, and categories by moving their head. A "dwell click" or a specific gesture (like a nod) selects the item and Select buttons by opening and closing the mouth, without touching the screen. Smart Sentence Construction: Unlike traditional AAC (Augmentative and Alternative Communication) boards that require selecting every single word, SayIt uses Gemini 3 to predict the user's intent. If a user selects "Market" and "Price," Gemini generates a natural sentence like, "How much does this cost?" Context-Aware Responses: SayIt can listen to the conversation partner (via microphone) and suggest relevant responses for the user to select quickly. Multilingual Support: It bridges the gap between English and local dialects (like Pidgin), ensuring users can communicate naturally in their environment,and different languages are present .For any users for communication purposes and to learn any language of your choice.
How we built it
We built SayIt as a modern web application designed for speed and accessibility. Frontend: Built with Next.js and React for a responsive, fast user interface. We used Tailwind CSS for accessible, high-contrast styling.Head Tracking: We integrated MediaPipe Face Mesh to detect facial landmarks in real-time right in the browser. This allows us to track the user's nose tip to control the cursor and detect "nods" for clicking, eliminating the need for physical touch.The Brain (Gemini 3 Integration): The core intelligence is powered by the Google Gemini 3 API.Predictive Text Engine: We send the user's selected keywords and context to Gemini 3. Its advanced reasoning capabilities instantly formulate grammatically correct, context-appropriate sentences.Intent Classification: Gemini 3 analyzes the user's history to re-order the grid dynamically, putting the most likely needed phrases front and center.Text-to-Speech: We utilize the Web Speech API (or a specific TTS library) to vocalize the generated text.Deployment: The application is deployed on Vercel for global availability and edge caching.
Challenges we ran into
The "Jitter" Problem: Initially, the head tracking was too sensitive; the cursor would shake, making it hard to select buttons. We implemented a smoothing algorithm (Kalman filter-inspired) to stabilize the cursor movement without adding lag.Latency vs. Accuracy: We needed Gemini 3 to generate sentences instantly. We optimized our prompt engineering to be "token-efficient," instructing Gemini to return JSON objects strictly, which reduced parsing time and latency significantly.Lighting Conditions: The tracking struggled in low light (common in the market scenarios we envisioned). We adjusted the contrast thresholds and added a "High Contrast Mode" to help the vision models perform better in varying environments.
Accomplishments that we're proud of
It Works Offline-ish: We built a hybrid system where core phrases are cached, but complex sentences use Gemini, ensuring the user is never completely silenced even with spotty internet.True Hands-Free: We successfully achieved a 100% hands-free navigation flow. A user can open the app, select phrases, and speak them without touching the screen once.Gemini's "Empathy": We were blown away by how well Gemini 3 could grasp context. When we tested with the keyword "Help," it didn't just say "Help"; it offered context-aware options like "I need assistance" or "Please call my emergency contact."
What we learned
Accessibility is Specific: We learned that "disability" is not a monolith. Designing for someone with ALS (limited muscle movement) is different from designing for a mute vendor (full motor control but no speech). We had to make the sensitivity customizable. Prompt Engineering is UI: In a text-based AI app, the prompt is the backend. Tweaking the system instructions for Gemini 3 directly improved the user interface by providing better suggestions.
What's next for SayIt
Launch on Lightweight Tablets: We aim to port SayIt to affordable, lightweight Android tablets to distribute to families, caregivers, and clinics in Nigeria and across other countries. Simplified Touch Modes: For users with partial hand movement, we will add large, accessible touch buttons as an alternative to head tracking.And add in more hand-free movement ,Vision Integration: We plan to use Gemini 3's multimodal vision capabilities. Ideally, the user could point their camera at an object (e.g., a banana), and SayIt would recognize it and suggest phrases like "I want to buy these bananas."
Built With
- figma
- gemini3
- react.js
- shadcn
- tailwindcss
- typescript
- ui
- vite
Log in or sign up for Devpost to join the conversation.