Inspiration

I grew up hearing Ghomala' (Ghɔ́málá') — the language of the Bamiléké people from West Cameroon — spoken by my parents and grandparents. Today, I speak only about 30% of my mother tongue. My generation is losing it. Young Cameroonians grow up speaking French or English and never learn their ancestral language. Ghomala' has:

Zero presence on Duolingo, Google Translate, or any language platform No digital corpus — the largest resource is a scanned dictionary from the 1970s No voice assistant that understands or speaks it ~1 million speakers, most of them aging

UNESCO classifies Ghomala' as a vulnerable language. If we do nothing, it dies with our elders. NAM SA' is my answer: use AI to make sure the sun rises on our language, not sets.

What it does

NAM SA' is a live multimodal agent that teaches Ghomala' through natural voice interaction: 🎙️ Real-time voice conversations — Speak naturally with the agent using Gemini Live API via ADK streaming. Supports barge-in (interruption), silence detection, and continuous conversational flow. 📖 AI Language Tutor — Fine-tuned Gemini Flash model that actually knows Ghomala'. Translates, explains grammar, shares cultural context. 🔄 Dictionary & Translation — French ↔ English ↔ Ghomala' translation interface. 🌿 Bamiléké Proverbs — 20+ traditional proverbs with cultural explanations and audio. 📚 Structured Learning — 3 levels, 15 topics, 75+ vocabulary words with TTS.

How we built it

Mandatory Tech Compliance:

  • Gemini Live API — Real-time bidirectional audio via ADK run_live()
  • ADK (Agent Development Kit) — Agent orchestration with tool calling
  • Google Cloud — Cloud Run (backend), Vertex AI (SFT), Firestore (sessions), GCS (data)

Fine-Tuning on Vertex AI:

  • Base model: gemini-2.0-flash-001
  • Training data: 18K Ghomala' conversation pairs from community datasets + dictionary extraction
  • SFT via vertexai.tuning.sft.train() with 3 epochs, adapter_size=4
  • Tuned model deployed automatically on Vertex AI endpoint

Agent Architecture (ADK):

from google.adk.agents import Agent

root_agent = Agent(
    name="namsa_agent",
    model="gemini-2.5-flash-native-audio-preview",
    tools=[ghomala_dictionary, ghomala_translator],
    instruction="Tu es NAM SA', tuteur Ghomala'...",
)
# Voice streaming via runner.run_live()

The agent uses two tools:

  1. ghomala_dictionary — Queries the fine-tuned model for translations and definitions
  2. ghomala_translator — Direct translation between FR/EN/Ghomala'

Backend: FastAPI on Cloud Run, deployed via cloudbuild.yaml (automated deployment).

Mobile: React Native + Expo SDK 55, bilingual FR/EN, 6 screens.

Google Cloud Services Used

Service Role
Vertex AI SFT fine-tuning of Gemini Flash on Ghomala' data
Gemini Live API Real-time bidirectional voice streaming via ADK
Cloud Run Serverless backend hosting
Firestore Session persistence and conversation history
Cloud Storage Training data and audio file staging
Cloud Build Automated deployment (CI/CD)

Challenges we ran into

Ghomala' has almost no digital data. The largest source was a physical dictionary. I had to use a vision AI model to extract entries page-by-page, then manually curate them.

Bedrock SFT constraints: Nova 2 Lite on Bedrock has batch size capped at 1. My first attempt with 18K samples projected 48 days of training. I solved it by curating 2,000 balanced samples → 5.5 hours.

Nova 2 Sonic SDK is brand new. It uses a Smithy-based Python SDK separate from boto3, with a specific event protocol (sessionStart → promptStart → contentStart → audioInput → contentEnd) that must be followed exactly.

No TTS for Ghomala'. I use Amazon Polly's French neural voice (Lea) as a phonetic approximation — imperfect but functional for learners.

Accomplishments that I'm proud of

  • Built the first AI voice assistant that understands Ghomala'
  • Fine-tuned the first Amazon Nova model on a Cameroonian language
  • Created a 4,929-entry digital dictionary from a scanned 1970s PDF
  • The app works — real users can actually learn Ghomala' words, hear pronunciation, and have voice conversations ## Accomplishments that we're proud of
  • Built the first Gemini-powered voice agent for a Cameroonian language
  • Fine-tuned Gemini Flash on Vertex AI with community-sourced Ghomala' data
  • Created a 4,929-entry digital Ghomala' dictionary from a scanned 1970s PDF
  • Automated Cloud deployment with cloudbuild.yaml ## What we learned
  • How to fine-tune Gemini Flash on Vertex AI with SFT for low-resource languages
  • ADK streaming architecture for real-time voice agents
  • Building data pipelines when your language has almost zero digital presence
  • The power of open-source community datasets (Masakhane, HuggingFace contributors) ## What's next for NAM SA' - Live Voice Agent for Ghomala'
  • Community data collection — Record audio sessions with native speakers (my parents) to build a speech corpus
  • Proper dictionary digitization — Use SIL Keyman Unicode keyboard with native speaker verification
  • Open-source the fine-tuned model on HuggingFace under Daemon Craft
  • Expand to other Cameroonian languages — Medumba, Fe'fe', Yemba (all Bamiléké family)
  • Camera vision mode — Point at objects and learn the Ghomala' word (MVP2 with Nova vision)

Built With

Share this project:

Updates

posted an update

I had a cost issue with my custom model on Vertex AI. The endpoints resulted in an unintended overcharge, so I disabled the backend, but Google Support resolved the issue. As I don’t know when the judges will test the APK, I’ve left the custom model I trained in place and used RAG and a customised tool so that I can use a base model that will cost less.

Log in or sign up for Devpost to join the conversation.