Inspiration
I grew up hearing Ghomala' (Ghɔ́málá') — the language of the Bamiléké people from West Cameroon — spoken by my parents and grandparents. Today, I speak only about 30% of my mother tongue. My generation is losing it. Young Cameroonians grow up speaking French or English and never learn their ancestral language. Ghomala' has:
Zero presence on Duolingo, Google Translate, or any language platform No digital corpus — the largest resource is a scanned dictionary from the 1970s No voice assistant that understands or speaks it ~1 million speakers, most of them aging
UNESCO classifies Ghomala' as a vulnerable language. If we do nothing, it dies with our elders. NAM SA' is my answer: use AI to make sure the sun rises on our language, not sets.
What it does
NAM SA' is a live multimodal agent that teaches Ghomala' through natural voice interaction: 🎙️ Real-time voice conversations — Speak naturally with the agent using Gemini Live API via ADK streaming. Supports barge-in (interruption), silence detection, and continuous conversational flow. 📖 AI Language Tutor — Fine-tuned Gemini Flash model that actually knows Ghomala'. Translates, explains grammar, shares cultural context. 🔄 Dictionary & Translation — French ↔ English ↔ Ghomala' translation interface. 🌿 Bamiléké Proverbs — 20+ traditional proverbs with cultural explanations and audio. 📚 Structured Learning — 3 levels, 15 topics, 75+ vocabulary words with TTS.
How we built it
Mandatory Tech Compliance:
- ✅ Gemini Live API — Real-time bidirectional audio via ADK
run_live() - ✅ ADK (Agent Development Kit) — Agent orchestration with tool calling
- ✅ Google Cloud — Cloud Run (backend), Vertex AI (SFT), Firestore (sessions), GCS (data)
Fine-Tuning on Vertex AI:
- Base model:
gemini-2.0-flash-001 - Training data: 18K Ghomala' conversation pairs from community datasets + dictionary extraction
- SFT via
vertexai.tuning.sft.train()with 3 epochs, adapter_size=4 - Tuned model deployed automatically on Vertex AI endpoint
Agent Architecture (ADK):
from google.adk.agents import Agent
root_agent = Agent(
name="namsa_agent",
model="gemini-2.5-flash-native-audio-preview",
tools=[ghomala_dictionary, ghomala_translator],
instruction="Tu es NAM SA', tuteur Ghomala'...",
)
# Voice streaming via runner.run_live()
The agent uses two tools:
ghomala_dictionary— Queries the fine-tuned model for translations and definitionsghomala_translator— Direct translation between FR/EN/Ghomala'
Backend: FastAPI on Cloud Run, deployed via cloudbuild.yaml (automated deployment).
Mobile: React Native + Expo SDK 55, bilingual FR/EN, 6 screens.
Google Cloud Services Used
| Service | Role |
|---|---|
| Vertex AI | SFT fine-tuning of Gemini Flash on Ghomala' data |
| Gemini Live API | Real-time bidirectional voice streaming via ADK |
| Cloud Run | Serverless backend hosting |
| Firestore | Session persistence and conversation history |
| Cloud Storage | Training data and audio file staging |
| Cloud Build | Automated deployment (CI/CD) |
Challenges we ran into
Ghomala' has almost no digital data. The largest source was a physical dictionary. I had to use a vision AI model to extract entries page-by-page, then manually curate them.
Bedrock SFT constraints: Nova 2 Lite on Bedrock has batch size capped at 1. My first attempt with 18K samples projected 48 days of training. I solved it by curating 2,000 balanced samples → 5.5 hours.
Nova 2 Sonic SDK is brand new. It uses a Smithy-based Python SDK separate from boto3, with a specific event protocol (sessionStart → promptStart → contentStart → audioInput → contentEnd) that must be followed exactly.
No TTS for Ghomala'. I use Amazon Polly's French neural voice (Lea) as a phonetic approximation — imperfect but functional for learners.
Accomplishments that I'm proud of
- Built the first AI voice assistant that understands Ghomala'
- Fine-tuned the first Amazon Nova model on a Cameroonian language
- Created a 4,929-entry digital dictionary from a scanned 1970s PDF
- The app works — real users can actually learn Ghomala' words, hear pronunciation, and have voice conversations ## Accomplishments that we're proud of
- Built the first Gemini-powered voice agent for a Cameroonian language
- Fine-tuned Gemini Flash on Vertex AI with community-sourced Ghomala' data
- Created a 4,929-entry digital Ghomala' dictionary from a scanned 1970s PDF
- Automated Cloud deployment with
cloudbuild.yaml## What we learned - How to fine-tune Gemini Flash on Vertex AI with SFT for low-resource languages
- ADK streaming architecture for real-time voice agents
- Building data pipelines when your language has almost zero digital presence
- The power of open-source community datasets (Masakhane, HuggingFace contributors) ## What's next for NAM SA' - Live Voice Agent for Ghomala'
- Community data collection — Record audio sessions with native speakers (my parents) to build a speech corpus
- Proper dictionary digitization — Use SIL Keyman Unicode keyboard with native speaker verification
- Open-source the fine-tuned model on HuggingFace under Daemon Craft
- Expand to other Cameroonian languages — Medumba, Fe'fe', Yemba (all Bamiléké family)
- Camera vision mode — Point at objects and learn the Ghomala' word (MVP2 with Nova vision)
Built With
- cloud-build
- cloud-run
- cloud-storage
- expo.io
- fastapi
- firestore
- gemini-flash
- gemini-live-api
- google-adk
- python
- react-native
- vertex-ai

Log in or sign up for Devpost to join the conversation.