-
-
Drisa_AI Architecture Daigram
-
Desktop View of Drisa_AI end user interface (widget)
-
Desktop View of Drisa_AI of Admin Dashboard (Setting tab)
-
Desktop View of Drisa_AI of Admin Dashboard (Admin Authorize Tab)
-
Desktop View of Drisa_AI of Admin Dashboard (Leads and Follow-Ups Tab)
-
Desktop View of Drisa_AI of Admin Dashboard (Call Session History Tab))
-
Desktop View of Drisa_AI of Admin Dashboard (Knowledge Base Tab)
-
Desktop View of Drisa_AI of Admin Dashboard (Agent Tab)
-
Mobile View of End User Interface (Widget) before call
-
Mobile View of End User Interface (Widget) during call
-
Mobile View of Drisa_AI Admin Dashboard
Drisa_AI Multilingual Support Agent
Inspiration
Across Nigeria and many parts of Africa, millions of people struggle to access digital services because of language barriers, literacy limitations, and lack of accessible technology interfaces. Most modern platforms, from customer support systems to public service portals are designed primarily for English-speaking users.
However, a large percentage of Nigerians communicate daily in Hausa, Igbo, Yoruba, and Nigerian Pidgin. When digital systems do not support these languages, people are often excluded from services such as healthcare information, market access, government programs, or business support.
As someone building technology solutions for SMEs and underserved communities, I have frequently encountered individuals who are capable and entrepreneurial but unable to effectively interact with digital systems simply because of language limitations.
This inspired a simple but powerful question:
What if anyone could simply speak to an AI assistant in their own language and receive clear, accurate support instantly?
The result is Drisa_AI Multilingual Support Agent, an AI-powered voice and chat agent that allows businesses and organizations to provide real-time multilingual support through natural conversation.
The system bridges the gap between advanced AI technology and everyday users, ensuring that even people with limited literacy or disabilities can interact with modern services naturally through voice.
What the Project Does
Drisa_AI is a multimodal AI support agent capable of acting as:
- A Customer Support Representative
- A Sales Assistant
- A Business Information Assistant
- A Personal AI Agent
Users can interact with the system through:
- Voice conversations
- Traditional phone calls
- Text chat
The AI agent automatically detects and communicates in:
- English
- Hausa
- Igbo
- Yoruba
- Nigerian Pidgin
The system can also retrieve information from business knowledge bases and perform automated actions such as:
- Querying product catalogs
- Scheduling appointments
- Sending WhatsApp follow-up messages
- Sending email notifications
- Logging customer leads
By combining multilingual AI, voice interaction, and business automation, Drisa_AI allows organizations to provide 24/7 customer support that is accessible to a wider population.
Why This Matters
Language should never be a barrier to accessing technology or essential information.
Across Nigeria, millions of citizens primarily communicate in local languages. However, most digital services and AI tools are designed in English or other international languages.
This creates serious challenges:
- Many people struggle to understand healthcare information.
- Citizens miss government programs or registration deadlines.
- Rural communities cannot easily find nearby services.
- Small businesses lose customers due to poor communication.
- Emergency information becomes difficult to access during critical moments.
When people cannot access clear information, the consequences can include:
- missed healthcare,
- failed registrations,
- misinformation,
- lost economic opportunities.
Drisa_AI addresses this problem by enabling people to interact with intelligent systems in their own language and at their own level of literacy.
The solution is especially beneficial for:
- rural communities
- low-literacy populations
- non-English speakers
- people with disabilities
- small businesses that cannot afford large support teams
How I Built the Project
To build the system efficiently while maintaining scalability and low operational costs, I designed the architecture using modern AI tools, cloud-native services, and serverless infrastructure.
Frontend Interface
The user interface was built using React and TypeScript, providing a responsive web interface and embeddable AI widget that can be integrated into any website.
The interface supports:
- voice interaction
- text chat
- real-time conversation streaming
This allows users to communicate naturally with the AI assistant.
Backend Infrastructure
The backend is powered by Node.js and Express, acting as a real-time orchestration layer responsible for:
- managing user sessions
- handling WebSocket connections
- processing audio streams
- coordinating AI responses
- triggering external integrations
The backend runs on Google Cloud Run, allowing the system to scale automatically without managing traditional servers.
Real-Time AI Intelligence
The core intelligence of the system is powered by Gemini 2.5 Flash Live API, which enables:
- real-time audio processing
- multilingual understanding
- natural conversational responses
- emotion and tone awareness from speech
Unlike traditional pipelines that require separate speech-to-text and text-to-speech models, the Gemini Live API processes audio streams directly, enabling low-latency voice conversations.
Mathematically, the response pipeline can be simplified as:
[ Audio_{input} \rightarrow Intent_{understanding} \rightarrow Tool_{execution} \rightarrow Response_{generation} ]
This architecture significantly reduces response latency.
Tool-Orchestrated Agent System
To allow the AI agent to perform useful tasks, I implemented a tool orchestration layer inside the backend.
The agent can autonomously call specialized tools such as:
- Firestore database queries
- Google Calendar scheduling
- WhatsApp messaging
- Email notifications
This allows the system to behave like a real business assistant rather than a simple chatbot.
Data Storage and Knowledge Retrieval
The system uses Google Firestore and SQLite to store:
- conversation history
- business knowledge bases
- product catalogs
- lead information
A lightweight Retrieval-Augmented Generation (RAG) mechanism enables the AI agent to search relevant information before generating responses.
Telephony Integration
To support traditional phone users, I integrated Twilio Media Streams.
This allows users to call a regular phone number and interact with the AI assistant through natural voice conversation.
This feature is particularly important in regions where many people still rely on phone calls rather than web applications.
External Integrations
The agent integrates with several external services:
- Meta WhatsApp Business API for automated follow-ups
- SMTP email services for notifications
- Google Calendar API for appointment scheduling
These integrations transform the AI assistant into a complete business automation tool.
Challenges I Faced
Building a real-time multilingual AI voice system involved several technical challenges.
Real-Time Audio Streaming
Handling continuous audio streams between Twilio, WebSockets, and Gemini Live API required careful orchestration to maintain low latency and stable connections.
Multilingual Prompt Engineering
Ensuring the agent could switch naturally between five different languages required extensive prompt design and testing.
Audio Transcoding
Twilio streams audio using mu-law format, while Gemini expects PCM audio, requiring real-time transcoding between formats.
Tool Integration
Building a reliable tool orchestration system that allows the AI to safely trigger external APIs required careful design to ensure accuracy and security.
What I Learned
This project reinforced several key insights about building real-world AI systems.
AI Becomes Powerful When It Is Inclusive
Many AI tools focus on performance and benchmarks, but real impact happens when AI removes barriers for underserved populations.
Even simple multilingual support can dramatically expand access to technology.
Voice Interfaces Are the Future of Accessibility
For many users, especially those with limited literacy or disabilities, voice interaction is far more natural than typing.
Designing voice-first AI systems can unlock access to digital services for millions of people.
Agent Architectures Enable Scalable AI Systems
Using an agent-based architecture with tool orchestration allows AI systems to move beyond simple chatbots and perform real-world tasks such as scheduling, notifications, and data retrieval.
Serverless Infrastructure Simplifies Scaling
By using Google Cloud Run and Firestore, the system can scale dynamically without heavy infrastructure management.
This makes it possible to build powerful AI platforms even with limited resources.
Looking Forward
Drisa_AI is designed as the foundation for a broader AI platform for multilingual digital services across Africa.
Future expansions could include:
- additional African languages
- industry-specific AI agents
- integrations with healthcare, agriculture, and government services
- a SaaS platform for SMEs
The long-term vision is simple:
A world where access to technology and information is not limited by language, literacy, or disability.
By enabling people to interact with AI in their own language, Drisa_AI moves us one step closer to that future.
Built With
- gemini-api
- google-ai-studio
- google-cloud-run
- google-firebase
- javascript
- meta-whatsapp-business-api
- node.js
- react
- twilio-voice-api
- typescript
- websockets
Log in or sign up for Devpost to join the conversation.