I wanted clinical trial simulation to feel more natural and less like filling endless forms. The idea grew from watching how researchers struggle with complex dashboards. I wondered if trial planning could be done simply by talking to an assistant. That curiosity pushed me to build a voice-first, multilingual platform powered by Cloud Run. The app lets users simulate clinical trials, ask questions, update parameters, and receive AI-generated insights—all through text or voice. It supports English, Hindi, and Bengali with real-time transcription and speech output. It also includes an agentic GenAI system that analyzes trial results and can integrate fine-tuned scientific models in the future to make early predictions before experiments finish. I built the system with FastAPI, Tailwind, Google Gemini Flash 2.5, and Cloud Speech/TTS. A custom simulation engine generates trial outcomes from CSV-based datasets. On top of that sits a multi-agent architecture with an intelligent router that chooses between cloud AI or local fallback agents. The entire project runs serverlessly on Cloud Run, with voice processing, authentication, simulation CRUD, and logging integrated cleanly. Getting WebM/Opus microphone recordings working inside Cloud Run was a major hurdle. Handling multilingual voice flows without breaking the UI also took time. The agent routing was tricky—balancing cloud AI calls with fallback logic. Managing service account credentials in a secure way across platforms (Windows → Cloud Run) added extra friction. I’m proud that the app delivers a true voice-first clinical trial assistant with smooth language switching, stop-speech control, and conversational updates. The agentic layer is another achievement—simple, but real: four specialized agents, structured context, and a router that makes the system reliable. Even as a POC, it demonstrates how trial planning can move beyond static dashboards into natural conversation. I learned how to stitch together voice input/output pipelines, design safe confirmation workflows, and optimize latency for AI calls. The biggest learning was around agent orchestration—how to pass structured state to models, fall back gracefully, and keep a POC both fast and dependable. I also gained experience deploying audio-heavy applications on Cloud Run. The next step is to connect fine-tuned domain models so scientists can merge their private data with external knowledge and get early predictive insights. I also plan to expand the voice engine, add more languages, generate full PDF reports, and introduce richer analytics. Real-time collaboration and mobile support are long-term goals.

Built With

  • browser-mediarecorder-api
  • fastapi
  • gemini-2.5-flash
  • google-cloud-speech-to-text
  • google-cloud-text-to-speech
  • jinja
  • openstreetmap-nominatim
  • python
  • tailwindcss
Share this project:

Updates