Inspiration
Shinhan Bank Vietnam's consumer lending book has grown rapidly, and with it, the operational cost of collections and outbound sales. Today, human agents handle every early-stage delinquency reminder and cross-sell call—a linear cost that caps growth. Furthermore, most off-the-shelf voice-bot products are English-first. Vietnamese TTS/ASR from Western providers sounds stilted, misses regional pronunciation, and handles the formal/informal register switch poorly—a non-starter for a premium retail bank.
We built Shinhan MAI for the Qwen AI Build Day 2026 to directly address Shinhan InnoBoost 2026 Use Case #4 (AI Call Bot for Collections & Sales).
What it does
Shinhan MAI is a single, Qwen-powered Vietnamese voice agent that:
- Places outbound calls for early-delinquency payment reminders and cross-selling of eligible products.
- Handles conversations natively in Vietnamese—mastering polite registers (anh/chị/em), face-saving objection handling ("lương chưa về"), and a bank-appropriate tone.
- Uses function calling to autonomously look up customer context, propose structured payment plans, log outcomes to the CRM, and escalate to a human when needed.
- Streams to a real-time supervisor dashboard featuring a live bilingual transcript, a tool-call feed, and call outcomes for human monitoring.
How we built it
We built a full-stack web app using Next.js 15 (App Router), React 19, and TypeScript, styled with Tailwind CSS.
At the core, Qwen acts as the load-bearing component:
- Agent Brain: We use
qwen-max-latestvia Alibaba Cloud Model Studio for streaming reasoning and robust tool-calling. - Voice Output (TTS): We use
qwen3-omni-flashto keep the voice output inside the Qwen family, ensuring premium Vietnamese pronunciation. - Translations: We use
qwen-turboto generate real-time English subtitles for the supervisor dashboard. - Voice Input (ASR): We offloaded this to the browser's Web Speech API (
vi-VN). Since Qwen-Omni doesn't support tools yet, splitting the ASR off the Qwen path allowed our agent to retain its crucial tool-calling capabilities.
We also used Zod for runtime-safe tool argument validation.
Challenges we ran into
The biggest challenge was architecting a pipeline that could deliver top-tier Vietnamese voice synthesis while still executing complex, multi-step tool calls (like looking up accounts and proposing payment installments). We solved this by splitting the ASR/TTS and Reasoning layers, keeping Qwen-Max strictly focused on text-based reasoning and function calling, while letting Qwen3-Omni handle the voice synthesis.
Accomplishments that we're proud of
- One Agent, Multiple Scenarios: The same agent engine seamlessly handles both tense collection calls and polite cross-selling pitches just by adjusting the initial tool-call context.
- Cultural Nuance: Achieving a natural-sounding Vietnamese banking persona ("Mai") that correctly uses honorifics and empathetic negotiation tactics.
- Demo Resilience: We built a "Demo Mode" with pre-scripted, auto-advancing customer replies to ensure our live pitch is 100% immune to venue Wi-Fi drops or microphone feedback.
What we learned
We learned how to deeply integrate the Qwen model family via Alibaba Cloud's DashScope API, utilizing different models for their specific strengths (Max for logic/tools, Omni for voice, Turbo for translation). We also learned a lot about prompting LLMs to maintain strict banking compliance and tone in Vietnamese.
What's next for Shinhan MAI
If selected for Shinhan InnoBoost PoC funding, our roadmap includes:
- Telephony integration: Twilio or a Vietnamese SIP provider (FPT/Viettel) for real outbound dialing.
- Shinhan SOL app integration: Consuming real delinquency queues from Shinhan's core banking instead of mock data.
- Compliance layer: Call recording, consent capture, and SBV-compliant audit logs.
- Expanded use cases: Customer service (SOL app support) and fraud alert confirmations.
Built With
- ai
- alibaba-cloud
- nextjs
- qwen-max
- qwen-omni
- react
- tailwind-css
- typescript
- zod
Log in or sign up for Devpost to join the conversation.