CESTestSuiteAgent: Conversational AI QA Platform

Here's the complete project story — copy-paste ready for Devpost:

Inspiration

Every enterprise deploying a conversational AI agent faces the same invisible problem: how do you know your bot actually works? Manual QA is slow, inconsistent, and doesn't scale. When a Dialogflow CX agent handles thousands of customer interactions daily, a single broken intent or a missed edge case can cascade into real business damage — failed bookings, unresolved complaints, frustrated customers.

We were inspired by the gap between how much effort goes into building conversational agents versus how little tooling exists for testing them rigorously. Traditional software has mature CI/CD test frameworks. Conversational AI has... spreadsheets and hope.

We wanted to change that.

What it does

CESTestSuiteAgent is a full-stack, AI-powered quality assurance platform purpose-built for Dialogflow CX conversational agents. It gives QA teams and developers a single control plane to:

🎯 Simulate conversations in real-time with intent detection visualization, parameter extraction, and page flow tracking
📦 Run bulk test suites via CSV upload with concurrent execution, progress tracking, and pass/fail reporting
🛡️ Scan for vulnerabilities — automated prompt injection and jailbreak detection, safety scoring, and adversarial input analysis
📊 Track coverage — intent coverage, page coverage, untested flow identification, and gap analysis
📈 Monitor quality trends — customer satisfaction scoring, First Contact Resolution (FCR) tracking, escalation rates, and historical analytics
🔍 Analyze conversations — automated insights from multi-turn sessions with Gemini-powered evaluation

The platform is live at: https://testsuite-frontend-7vgsrczewq-uc.a.run.app

How we built it

Frontend: React 18 + TypeScript + Vite + TailwindCSS + Recharts + Zustand for state management

Backend: Node.js + Express.js + TypeScript, with SQLite for persistence, Bull/Redis for async job queuing, and concurrent test execution via p-limit

AI & Cloud:

Gemini (Vertex AI) — powers conversation quality analysis, vulnerability detection, and automated test insights
Google Cloud Dialogflow CX SDK — gRPC-based agent communication for real-time intent detection
Google Cloud Run — fully containerized, auto-scaling serverless deployment for both frontend and backend
Google Cloud Build — CI/CD pipeline for automated container builds and deployments

Architecture: The system follows a clean layered pattern — React frontend → REST API → Bull job queue → Dialogflow CX via gRPC → GCP. Test runs are processed asynchronously with real-time progress updates pushed to the UI.

Challenges we ran into

Conversation state management at scale: Multi-turn conversations require maintaining session context across parallel workers. We solved this with Bull queues and careful session ID scoping per conversation group.
Accurate vulnerability detection: Distinguishing legitimate edge-case inputs from actual prompt injection attempts required iterative prompt engineering with Gemini to minimize false positives.
Cloud Run cold start latency: Initial test runs triggered container cold starts that added perceived delay. We addressed this with minimum instance configuration and warmup requests.
CSV parsing variability: Real-world test datasets came in wildly inconsistent formats. We built a robust parser with Zod schema validation and graceful error recovery.

Accomplishments that we're proud of

✅ End-to-end deployed system running live on Google Cloud Run — not a prototype
✅ Automated vulnerability scanning that catches prompt injection attacks before they reach production agents
✅ Concurrent bulk testing that can process hundreds of conversation turns in parallel
✅ Gemini-powered analysis that goes beyond pass/fail to deliver actionable quality insights
✅ Clean, production-grade codebase with full TypeScript, ESLint, and architectural separation of concerns

What we learned

Conversational AI testing is fundamentally different from unit testing — context, state, and sequence matter as much as individual responses
Gemini's ability to evaluate quality (not just correctness) unlocks a new class of QA tooling that wasn't possible with rule-based systems
Cloud Run's simplicity for deploying containerized full-stack apps is genuinely impressive for rapid iteration
Prompt injection is a real, underappreciated risk in production conversational agents — and most teams have zero automated coverage for it

What's next for CESTestSuiteAgent: Conversational AI QA Platform

🔊 Live API integration — real-time audio testing for voice-enabled Dialogflow CX agents using Gemini Live API
🔁 CI/CD webhooks — trigger test suites automatically on agent deployment events
📱 Agent Development Kit (ADK) support — extend beyond Dialogflow CX to any ADK-built agent
🌐 Multi-language test coverage — automated test generation for multilingual agents
📋 Compliance reporting — exportable audit trails for regulated industries

Built with: TypeScript React Node.js Express.js Google Cloud Run Dialogflow CX Vertex AI Gemini Google Cloud Build TailwindCSS SQLite Redis Docker Vite Recharts Zustand

Try it out:

Built With

Updates

Yash Kavaiya started this project — Mar 16, 2026 05:55 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.