Call Wiz — AI Call Center for Vietnam

Inspiration

Over 100 million Vietnamese speakers are underserved by AI call center solutions. Most products focus on English, leaving Vietnamese customers frustrated with robotic responses that don't understand their language, accents, or cultural context.

We set out to build something different: an AI call center that truly speaks Vietnamese — not just translates, but understands and responds naturally like a native speaker.


What We Built

CallWiz is a fully AI-powered call center platform featuring:

  • No-Code Flow Designer — Create complex call scenarios with just natural language prompts
  • Natural Vietnamese conversations — Customers speak normally, AI understands and responds fluently
  • Real-time monitoring — Supervisors watch live calls with instant Vietnamese ↔ English translation
  • Smart human handoff — AI recognizes its limits and transfers to human agents seamlessly
  • Automatic form filling — Collected data exports directly to Excel templates

Prompt-to-Flow: Empowering Non-Technical Users

The game-changer of our solution: anyone can create AI call scenarios without writing a single line of code.

A bank operations manager simply types:

"Create a flow for customers reporting lost credit cards. Verify their ID and last 4 digits of card number. If verified, lock the card immediately. If verification fails twice, transfer to human agent."

Qwen3-Max instantly generates a complete conversation flow — with greeting scripts, verification logic, error handling, and escalation rules. The visual editor lets users refine with drag-and-drop, or simply type more prompts like "Add a step to ask for callback number".

Why this matters for banks:

  • Dozens of scenarios, zero engineering cost — Loan applications, account updates, fraud reports, appointment scheduling
  • Business teams own their flows — No waiting for IT, no developer bottlenecks
  • Rapid iteration — Test a new script in minutes, not weeks
  • Consistency across branches — Same AI, same quality, every call

Flow Designer

Batch Calling – Reduce workload with one click

Batch 1

Batch 2

Powered by Qwen & Alibaba Cloud

Our solution is built entirely on the Alibaba Cloud AI ecosystem, leveraging cutting-edge Qwen models and cloud infrastructure.

Qwen3-Max — The Brain

Qwen3-Max via DashScope serves as our core reasoning engine:

  • Intelligent conversation management — Understands customer intent even with unclear expressions, slang, or regional dialects
  • Multi-step reasoning — Handles complex requests: identity verification → information lookup → action execution
  • Tool-calling capability — Extracts structured data from natural speech and fills forms automatically
  • Flow compilation — Transforms visual flow designs into optimized conversation prompts

Custom Vietnamese STT — Trained to Listen

We integrated a Vietnamese-optimized Speech-to-Text model specifically trained for:

  • Vietnamese phonetics — Recognizes all 6 tones accurately (sắc, huyền, hỏi, ngã, nặng, ngang)
  • Regional accents — Understands Northern, Central, and Southern Vietnamese dialects
  • Real-world audio — Handles background noise, phone quality, and natural speech patterns

PAI-EAS OmniVoice — Trained to Speak

We deployed OmniVoice on Alibaba Cloud PAI-EAS for natural Vietnamese text-to-speech:

  • Custom voice cloning — Created a natural Vietnamese female voice from just 10 seconds of audio
  • Tone-perfect pronunciation — Correctly pronounces Vietnamese tones that other TTS engines struggle with
  • Ultra-low latency — First response in ~200ms, enabling real-time conversation flow
  • Emotional expression — Adjusts tone for empathy, urgency, or reassurance based on context

Qwen MT Flash — Real-time Translation

Qwen MT Flash enables instant Vietnamese ↔ English translation:

  • Sub-500ms latency — Supervisors see English translations as customers speak Vietnamese
  • Context-aware translation — Understands banking terminology and customer service phrases
  • Streaming output — Translations appear word-by-word for long responses

Alibaba Cloud — The Complete AI Infrastructure

Our entire solution runs on Alibaba Cloud, providing enterprise-grade reliability and seamless integration:

Service Role in Our Solution
DashScope Powers Qwen3-Max (conversation) + Qwen MT Flash (translation)
PAI-EAS Hosts custom OmniVoice TTS model with auto-scaling
ECS Application servers in Bangkok region (low latency to Vietnam)

Why Alibaba Cloud?

  • Unified AI ecosystem — Qwen models + PAI deployment + cloud infra work together seamlessly
  • Southeast Asia presence — Bangkok data center ensures <50ms latency to Vietnam
  • Enterprise compliance — Meets banking security and data residency requirements
  • Cost efficiency — Pay-per-use for AI inference, auto-scaling for traffic spikes

Challenges We Faced

Vietnamese Language Complexity

  • Vietnamese has 6 tones that completely change word meaning. "Ma" can mean ghost, mother, horse, rice seedling, tomb, or "but" depending on tone. Standard AI models struggle with this.
  • Voice conversations require sub-second response times. Any delay feels unnatural.
  • Customers speak in slang, abbreviations, and regional dialects that are hard for generic models to understand.

What We Learned

  • Qwen3-Max excels at Vietnamese — Tool-calling works reliably even with colloquial speech
  • PAI-EAS is production-ready — Custom model deployment in minutes, scales automatically
  • The Alibaba Cloud ecosystem is cohesive — DashScope, PAI-EAS, and ECS integrate seamlessly
  • No-code is the key to adoption — Banks embrace AI when business teams control it

What's Next

More Natural Vietnamese Voices

We plan to train multiple voice personas on PAI-EAS:

  • Professional female — For banking and formal services
  • Friendly male — For customer support and casual interactions
  • Regional accents — Northern, Central, Southern Vietnamese options
  • Emotional range — Happy, empathetic, apologetic tones for different scenarios

SIP Trunking Integration — Real Phone Calls

Currently our solution works via WebRTC (browser/app). Next phase: direct integration with Vietnam telecom networks using Alibaba Cloud SIP Trunking:

  • Inbound calls — Customers dial a hotline number, AI answers automatically
  • Outbound campaigns — AI calls customers for appointment reminders, surveys, payment notifications
  • PSTN connectivity — Works with any phone, no app installation required
  • Alibaba Cloud Voice Service — Scalable, reliable, integrated with our existing infrastructure

This transforms our solution from a demo into a production-ready call center replacement.


Tech Stack

Component Technology
Reasoning & Dialog Qwen3-Max (DashScope)
Speech-to-Text Custom Vietnamese STT
Text-to-Speech OmniVoice (Deployed on PAI-EAS)
Translation Qwen MT Flash
Infrastructure Alibaba Cloud ECS
Real-time Audio LiveKit + WebRTC
Frontend React + TypeScript
Open-source Libraries Ant Design, React Flow, Axios, LiveKit

The Vision

"Every Vietnamese deserves a call center that actually understands them."

With Qwen and Alibaba Cloud, we're making that vision a reality — one natural conversation at a time.

Built With

Share this project:

Updates