Our Journey with Sema Sasa

The Inspiration: Closing the Linguistic Chasm

As a team of African innovators, engineers, and strategists, our inspiration for Sema Sasa wasn't born in a lab; it was born from our daily reality. We saw firsthand how the digital revolution, while promising, was creating a new kind of divide a linguistic chasm. In a continent with over 2,000 languages, the digital world overwhelmingly speaks English. This locks out millions from accessing critical information in healthcare, education, and civic life, effectively silencing their voices in the global conversation.

We were inspired by a simple, powerful idea: what if technology could adapt to people, instead of forcing people to adapt to it? What if we could build tools that honored our linguistic diversity and empowered our communities? This question became our mission. We chose Swahili as our starting point a language spoken by over 200 million people, yet critically underserved by modern AI. We named our project Sema Sasa, which means "Speak Now" in Swahili, as a call to action to unlock Africa's voices.

How We Built It: A Foundation of Open-Source and Real-World Data

Our approach was guided by two principles: build on the shoulders of giants and root our work in authentic data.

  1. An Open-Source First Philosophy: We knew we couldn't reinvent the wheel. We leveraged powerful, open-source models as our foundation. Our core technology stack includes:

    • Speech Recognition: We fine-tuned models like Whisper (OpenAI) and wav2vec 2.0 (Meta), which have shown remarkable capabilities in multilingual and low-resource settings.
    • Translation & Summarization: We utilized transformer-based models like NLLB-200 (Meta) and mBART, specifically designed for high-quality multilingual translation.
    • Sentiment Analysis: We started with a lightweight and efficient model like DistilBERT to analyze the tone of Swahili text.
    • Backend & API: We chose FastAPI for its high performance and ease of serving machine learning models, creating a modular backend that could scale.
  2. A Culturally-Aware Data Strategy: We recognized that models are only as good as their data. The biggest challenge for African languages is data scarcity. We tackled this by:

    • Starting with the Mozilla Common Voice Swahili dataset, a crucial crowdsourced resource.
    • Enriching this with a large corpus of public domain Swahili texts, including books, news articles, and government publications. This was vital for teaching our models the nuances of formal, narrative, and conversational language something a sterile dataset could never capture.
    • Designing a future-proof system for community-driven data contribution, allowing users to help us improve the models over time.

Our development process is phased, moving from a Proof of Concept (PoC) to validate our core pipeline, through a Pilot Deployment with real users, and ultimately to a Full, Scalable Deployment.

The Challenges We Faced: Thriving Under Constraints

This project has been a constant exercise in solving problems under constraints, which is the very essence of the Africa Deep Tech Challenge.

  • The Data Desert: Our primary challenge was the scarcity of high-quality, labeled Swahili data. Unlike English, we couldn't simply download a massive, perfectly curated dataset. We had to become data archeologists, piecing together corpora from diverse sources and building robust cleaning and preprocessing pipelines to make it usable.

  • The Behemoth Model Problem: State-of-the-art models are computationally expensive. Running them in real-time on a standard mobile device in a low-bandwidth environment seemed impossible. We faced significant latency issues with our initial prototypes. This forced us to go beyond just using the models and to dive deep into optimization, exploring techniques like ONNX/TensorRT, quantization, and model distillation to create lightweight versions that don't sacrifice too much accuracy.

  • The Ethical Tightrope of Bias: We quickly realized that our models could inherit and amplify biases present in the data. For example, a model trained primarily on formal news articles might perform poorly on regional dialects or misinterpret gendered language. We are actively working to mitigate this through diverse data sampling and by building feedback loops for users to report errors. We measure fairness alongside accuracy, knowing that a model isn't truly working if it doesn't work for everyone. Our success isn't just a low Word Error Rate (WER), but equitable performance across demographics. We evaluate our classifiers not just on accuracy, but on metrics like the $F_1$ score to balance precision and recall, especially for minority classes:

What We Learned: Technology in Service of Humanity

This journey has taught us more than just how to fine-tune a model.

  1. Context is King: The best technology is useless if it doesn't fit the context of its users. Building for Africa requires a relentless focus on offline capability, low-power consumption, and mobile-first design.
  2. Community is the Best Dataset: While web scraping and public datasets are useful, the richest, most nuanced data comes from the community itself. Our future success depends on building a system where users are not just consumers, but co-creators.
  3. Constraints Breed Creativity: The limitations we faced forced us to be more innovative. Instead of throwing more compute power at a problem, we had to find smarter, more efficient solutions. This challenge has solidified our belief that resource-constrained environments are the most fertile ground for true innovation.

Sema Sasa is our answer to the challenge of building deep tech for Africa. It’s a testament to the idea that by embracing our constraints, we can build solutions that are not only technologically advanced but also deeply human and truly inclusive.

Built With

  • distilbert
  • fastapi
  • mbart
  • python
  • wav2vec2.0(meta)
  • whisper(openai)
Share this project:

Updates