Inspiration: We wanted to create a tool that lets users convert any text prompt into high-quality, lifelike speech instantly, combining the latest generative AI with accessible voice synthesis. The idea came from experimenting with AI chat and TTS services and noticing how cumbersome it was to combine them manually.

What we learned: We learned how to integrate Google Vertex AI’s generative models with ElevenLabs’ TTS API, how to handle serverless deployment constraints on Vercel, and how to securely manage credentials in a cloud environment without committing sensitive files. We also improved our front-end skills to build a compact, mobile-friendly UI.

How we built it:

Back-end: Node.js with Express handles API requests, communicates with Vertex AI to generate text, and sends it to ElevenLabs TTS for audio generation.

Front-end: A single-page HTML interface with embedded CSS and JS allows users to input prompts, select voices, and play audio results.

Deployment: Deployed serverless on Vercel, handling authentication with environment variables and dynamic credential loading.

Challenges faced:

Securely managing Google service account credentials in a serverless environment.

Making Vertex AI calls from Vercel serverless functions without relying on local JSON files.

Ensuring audio playback works reliably across browsers and devices.

Built With

Share this project:

Updates