Call Wiz — AI Call Center for Vietnam
Inspiration
Over 100 million Vietnamese speakers are underserved by AI call center solutions. Most products focus on English, leaving Vietnamese customers frustrated with robotic responses that don't understand their language, accents, or cultural context.
We set out to build something different: an AI call center that truly speaks Vietnamese — not just translates, but understands and responds naturally like a native speaker.
What We Built
CallWiz is a fully AI-powered call center platform featuring:
- No-Code Flow Designer — Create complex call scenarios with just natural language prompts
- Natural Vietnamese conversations — Customers speak normally, AI understands and responds fluently
- Real-time monitoring — Supervisors watch live calls with instant Vietnamese ↔ English translation
- Smart human handoff — AI recognizes its limits and transfers to human agents seamlessly
- Automatic form filling — Collected data exports directly to Excel templates
Prompt-to-Flow: Empowering Non-Technical Users
The game-changer of our solution: anyone can create AI call scenarios without writing a single line of code.
A bank operations manager simply types:
"Create a flow for customers reporting lost credit cards. Verify their ID and last 4 digits of card number. If verified, lock the card immediately. If verification fails twice, transfer to human agent."
Qwen3-Max instantly generates a complete conversation flow — with greeting scripts, verification logic, error handling, and escalation rules. The visual editor lets users refine with drag-and-drop, or simply type more prompts like "Add a step to ask for callback number".
Why this matters for banks:
- Dozens of scenarios, zero engineering cost — Loan applications, account updates, fraud reports, appointment scheduling
- Business teams own their flows — No waiting for IT, no developer bottlenecks
- Rapid iteration — Test a new script in minutes, not weeks
- Consistency across branches — Same AI, same quality, every call
Batch Calling – Reduce workload with one click

Powered by Qwen & Alibaba Cloud
Our solution is built entirely on the Alibaba Cloud AI ecosystem, leveraging cutting-edge Qwen models and cloud infrastructure.
Qwen3-Max — The Brain
Qwen3-Max via DashScope serves as our core reasoning engine:
- Intelligent conversation management — Understands customer intent even with unclear expressions, slang, or regional dialects
- Multi-step reasoning — Handles complex requests: identity verification → information lookup → action execution
- Tool-calling capability — Extracts structured data from natural speech and fills forms automatically
- Flow compilation — Transforms visual flow designs into optimized conversation prompts
Custom Vietnamese STT — Trained to Listen
We integrated a Vietnamese-optimized Speech-to-Text model specifically trained for:
- Vietnamese phonetics — Recognizes all 6 tones accurately (sắc, huyền, hỏi, ngã, nặng, ngang)
- Regional accents — Understands Northern, Central, and Southern Vietnamese dialects
- Real-world audio — Handles background noise, phone quality, and natural speech patterns
PAI-EAS OmniVoice — Trained to Speak
We deployed OmniVoice on Alibaba Cloud PAI-EAS for natural Vietnamese text-to-speech:
- Custom voice cloning — Created a natural Vietnamese female voice from just 10 seconds of audio
- Tone-perfect pronunciation — Correctly pronounces Vietnamese tones that other TTS engines struggle with
- Ultra-low latency — First response in ~200ms, enabling real-time conversation flow
- Emotional expression — Adjusts tone for empathy, urgency, or reassurance based on context
Qwen MT Flash — Real-time Translation
Qwen MT Flash enables instant Vietnamese ↔ English translation:
- Sub-500ms latency — Supervisors see English translations as customers speak Vietnamese
- Context-aware translation — Understands banking terminology and customer service phrases
- Streaming output — Translations appear word-by-word for long responses
Alibaba Cloud — The Complete AI Infrastructure
Our entire solution runs on Alibaba Cloud, providing enterprise-grade reliability and seamless integration:
| Service | Role in Our Solution |
|---|---|
| DashScope | Powers Qwen3-Max (conversation) + Qwen MT Flash (translation) |
| PAI-EAS | Hosts custom OmniVoice TTS model with auto-scaling |
| ECS | Application servers in Bangkok region (low latency to Vietnam) |
Why Alibaba Cloud?
- Unified AI ecosystem — Qwen models + PAI deployment + cloud infra work together seamlessly
- Southeast Asia presence — Bangkok data center ensures <50ms latency to Vietnam
- Enterprise compliance — Meets banking security and data residency requirements
- Cost efficiency — Pay-per-use for AI inference, auto-scaling for traffic spikes
Challenges We Faced
Vietnamese Language Complexity
- Vietnamese has 6 tones that completely change word meaning. "Ma" can mean ghost, mother, horse, rice seedling, tomb, or "but" depending on tone. Standard AI models struggle with this.
- Voice conversations require sub-second response times. Any delay feels unnatural.
- Customers speak in slang, abbreviations, and regional dialects that are hard for generic models to understand.
What We Learned
- Qwen3-Max excels at Vietnamese — Tool-calling works reliably even with colloquial speech
- PAI-EAS is production-ready — Custom model deployment in minutes, scales automatically
- The Alibaba Cloud ecosystem is cohesive — DashScope, PAI-EAS, and ECS integrate seamlessly
- No-code is the key to adoption — Banks embrace AI when business teams control it
What's Next
More Natural Vietnamese Voices
We plan to train multiple voice personas on PAI-EAS:
- Professional female — For banking and formal services
- Friendly male — For customer support and casual interactions
- Regional accents — Northern, Central, Southern Vietnamese options
- Emotional range — Happy, empathetic, apologetic tones for different scenarios
SIP Trunking Integration — Real Phone Calls
Currently our solution works via WebRTC (browser/app). Next phase: direct integration with Vietnam telecom networks using Alibaba Cloud SIP Trunking:
- Inbound calls — Customers dial a hotline number, AI answers automatically
- Outbound campaigns — AI calls customers for appointment reminders, surveys, payment notifications
- PSTN connectivity — Works with any phone, no app installation required
- Alibaba Cloud Voice Service — Scalable, reliable, integrated with our existing infrastructure
This transforms our solution from a demo into a production-ready call center replacement.
Tech Stack
| Component | Technology |
|---|---|
| Reasoning & Dialog | Qwen3-Max (DashScope) |
| Speech-to-Text | Custom Vietnamese STT |
| Text-to-Speech | OmniVoice (Deployed on PAI-EAS) |
| Translation | Qwen MT Flash |
| Infrastructure | Alibaba Cloud ECS |
| Real-time Audio | LiveKit + WebRTC |
| Frontend | React + TypeScript |
| Open-source Libraries | Ant Design, React Flow, Axios, LiveKit |
The Vision
"Every Vietnamese deserves a call center that actually understands them."
With Qwen and Alibaba Cloud, we're making that vision a reality — one natural conversation at a time.


Log in or sign up for Devpost to join the conversation.