Here's the complete project story — copy-paste ready for Devpost:


Inspiration

Every enterprise deploying a conversational AI agent faces the same invisible problem: how do you know your bot actually works? Manual QA is slow, inconsistent, and doesn't scale. When a Dialogflow CX agent handles thousands of customer interactions daily, a single broken intent or a missed edge case can cascade into real business damage — failed bookings, unresolved complaints, frustrated customers.

We were inspired by the gap between how much effort goes into building conversational agents versus how little tooling exists for testing them rigorously. Traditional software has mature CI/CD test frameworks. Conversational AI has... spreadsheets and hope.

We wanted to change that.


What it does

CESTestSuiteAgent is a full-stack, AI-powered quality assurance platform purpose-built for Dialogflow CX conversational agents. It gives QA teams and developers a single control plane to:

  • šŸŽÆ Simulate conversations in real-time with intent detection visualization, parameter extraction, and page flow tracking
  • šŸ“¦ Run bulk test suites via CSV upload with concurrent execution, progress tracking, and pass/fail reporting
  • šŸ›”ļø Scan for vulnerabilities — automated prompt injection and jailbreak detection, safety scoring, and adversarial input analysis
  • šŸ“Š Track coverage — intent coverage, page coverage, untested flow identification, and gap analysis
  • šŸ“ˆ Monitor quality trends — customer satisfaction scoring, First Contact Resolution (FCR) tracking, escalation rates, and historical analytics
  • šŸ” Analyze conversations — automated insights from multi-turn sessions with Gemini-powered evaluation

The platform is live at: https://testsuite-frontend-7vgsrczewq-uc.a.run.app


How we built it

Frontend: React 18 + TypeScript + Vite + TailwindCSS + Recharts + Zustand for state management

Backend: Node.js + Express.js + TypeScript, with SQLite for persistence, Bull/Redis for async job queuing, and concurrent test execution via p-limit

AI & Cloud:

  • Gemini (Vertex AI) — powers conversation quality analysis, vulnerability detection, and automated test insights
  • Google Cloud Dialogflow CX SDK — gRPC-based agent communication for real-time intent detection
  • Google Cloud Run — fully containerized, auto-scaling serverless deployment for both frontend and backend
  • Google Cloud Build — CI/CD pipeline for automated container builds and deployments

Architecture: The system follows a clean layered pattern — React frontend → REST API → Bull job queue → Dialogflow CX via gRPC → GCP. Test runs are processed asynchronously with real-time progress updates pushed to the UI.


Challenges we ran into

  • Conversation state management at scale: Multi-turn conversations require maintaining session context across parallel workers. We solved this with Bull queues and careful session ID scoping per conversation group.
  • Accurate vulnerability detection: Distinguishing legitimate edge-case inputs from actual prompt injection attempts required iterative prompt engineering with Gemini to minimize false positives.
  • Cloud Run cold start latency: Initial test runs triggered container cold starts that added perceived delay. We addressed this with minimum instance configuration and warmup requests.
  • CSV parsing variability: Real-world test datasets came in wildly inconsistent formats. We built a robust parser with Zod schema validation and graceful error recovery.

Accomplishments that we're proud of

  • āœ… End-to-end deployed system running live on Google Cloud Run — not a prototype
  • āœ… Automated vulnerability scanning that catches prompt injection attacks before they reach production agents
  • āœ… Concurrent bulk testing that can process hundreds of conversation turns in parallel
  • āœ… Gemini-powered analysis that goes beyond pass/fail to deliver actionable quality insights
  • āœ… Clean, production-grade codebase with full TypeScript, ESLint, and architectural separation of concerns

What we learned

  • Conversational AI testing is fundamentally different from unit testing — context, state, and sequence matter as much as individual responses
  • Gemini's ability to evaluate quality (not just correctness) unlocks a new class of QA tooling that wasn't possible with rule-based systems
  • Cloud Run's simplicity for deploying containerized full-stack apps is genuinely impressive for rapid iteration
  • Prompt injection is a real, underappreciated risk in production conversational agents — and most teams have zero automated coverage for it

What's next for CESTestSuiteAgent: Conversational AI QA Platform

  • šŸ”Š Live API integration — real-time audio testing for voice-enabled Dialogflow CX agents using Gemini Live API
  • šŸ” CI/CD webhooks — trigger test suites automatically on agent deployment events
  • šŸ“± Agent Development Kit (ADK) support — extend beyond Dialogflow CX to any ADK-built agent
  • 🌐 Multi-language test coverage — automated test generation for multilingual agents
  • šŸ“‹ Compliance reporting — exportable audit trails for regulated industries

Built with: TypeScript React Node.js Express.js Google Cloud Run Dialogflow CX Vertex AI Gemini Google Cloud Build TailwindCSS SQLite Redis Docker Vite Recharts Zustand

Try it out:

Share this project:

Updates