Inspiration

VoiceForge was inspired by a communication problem that affects many people every day.

• Many people are not unheard because they have nothing valuable to say.

• They are unheard because they struggle to express themselves confidently, clearly, and comfortably in real situations.

This problem appears in many important moments, such as classroom discussions, interviews, presentations, pitching ideas, and even everyday conversations. In each of these situations, strong ideas can be lost when the speaker feels nervous, unprepared, or unsure of how to deliver their thoughts.

• We wanted to create a tool that helps people practice speaking in a realistic, supportive, and engaging way.

• Instead of only giving general advice, VoiceForge allows users to actually speak, respond, and improve through live AI interaction.

VoiceForge combines real-time AI conversation, transcript-based feedback, and non-verbal presence analysis to make speaking practice more personal and more effective. Our goal is to help people build confidence and communicate their ideas in a way that truly lands.

• Our broader vision is a future where more people can speak with confidence.

• We want meaningful ideas to be heard, not lost because the speaker was too nervous or unclear to express them well.

What it does

VoiceForge was inspired by a communication problem that affects many people every day.

• Many people are not unheard because they have nothing valuable to say.

• They are unheard because they struggle to express themselves confidently, clearly, and comfortably in real situations.

This problem appears in many important moments, such as classroom discussions, interviews, presentations, pitching ideas, and even everyday conversations. In each of these situations, strong ideas can be lost when the speaker feels nervous, unprepared, or unsure of how to deliver their thoughts.

• We wanted to create a tool that helps people practice speaking in a realistic, supportive, and engaging way.

• Instead of only giving general advice, VoiceForge allows users to actually speak, respond, and improve through live AI interaction.

VoiceForge combines real-time AI conversation, transcript-based feedback, and non-verbal presence analysis to make speaking practice more personal and more effective. Our goal is to help people build confidence and communicate their ideas in a way that truly lands.

• Our broader vision is a future where more people can speak with confidence.

• We want meaningful ideas to be heard, not lost because the speaker was too nervous or unclear to express them well.

How we built it

We built VoiceForge as a modern web application with a fast frontend and an AI powered backend workflow. For the frontend foundation, we used Vite as the build tool and Node.js as the runtime environment during development, which gave us a quick iteration loop and a clean JavaScript-based stack.

To accelerate the product design phase, we first used MeDo to generate the initial UI skeleton and layout ideas. After that, we moved the exported code into our main project and used Codex as the core development assistant to refactor the structure, clean the generated code, and implement the actual MVP features in a more maintainable way. OpenAI describes Codex as a coding agent for software development, with support for project specific instructions and skills, which matched our workflow well.

For the live speaking experience, we integrated ElevenLabs to power the voice based AI agents. This allowed VoiceForge to offer real-time spoken interaction instead of just text prompts, which was essential to making the product feel like a realistic speaking coach and debate partner. ElevenLabs provides conversational agents and voice infrastructure specifically for natural dialogue and real-time interactions.

For the feedback layer, we used Groq to power fast LLM based analysis and structured post session responses. Groq’s API is designed to be largely OpenAI-compatible, which made it easier to integrate into our existing backend logic while keeping latency low for coaching summaries and evaluations.

To analyze non verbal communication, we used Google MediaPipe, especially its Face Landmarker web tools. This allowed us to track signals such as face presence, head angle, and general on-camera attention in real time, which helped us generate simple non verbal feedback alongside the transcript based verbal feedback. MediaPipe’s Face Landmarker is built for landmark detection and facial expression analysis on images and continuous video streams, which fit our browser based camera feature well.

For backend and data architecture, we used Supabase as part of our database and platform workflow because it offers a Postgres-based backend with Auth, Realtime, and Storage capabilities. This helped support the broader app architecture for handling user data, session records, and scalable product features. Supabase describes itself as a Postgres development platform and documents built-in database, auth, realtime, and storage services.

We also incorporated Featherless AI as an open-model inference option during development. Featherless provides serverless access to a large library of open weight models through an API, which made it useful for experimenting with flexible model choices without managing our own inference infrastructure.

Overall, VoiceForge was built by combining rapid UI prototyping, AI-assisted development, real-time voice agents, transcript-based language analysis, and browser-side non verbal tracking into one interactive communication platform. The result is a product that blends frontend engineering, backend architecture, and applied AI into a single speaking coach experience.

Challenges we ran into

Building VoiceForge was challenging because it combined several systems into one product.

• One major challenge was making the real-time AI conversation feel smooth and natural. We had to handle live voice interaction, transcript updates, and session flow without making the experience feel delayed or awkward.

• Another challenge was keeping the project focused. We had many ideas at the start, but we had to simplify the MVP so it stayed centered on one strong speaking-coach experience instead of becoming too complicated.

• We also had to make the feedback feel useful and believable. That meant combining transcript-based metrics, AI-generated summaries, and non-verbal analysis in a way that felt clear rather than random.

• Non-verbal tracking was also difficult. With Google MediaPipe, we had to stay realistic and focus on practical signals like face presence, head angle, and general on-camera attention instead of trying to build overly advanced analysis.

Overall, the biggest challenge was making all these parts work together in one clean and polished experience.

Accomplishments that we're proud of

• We are proud that we turned VoiceForge into a real working product, not just an idea.

• We successfully built live AI voice interaction with ElevenLabs, which made speaking practice feel realistic and engaging.

• We combined verbal feedback with non-verbal analysis, allowing VoiceForge to evaluate both what the user says and how they say it.

• We also built multiple modes, including scenarios, custom practice, and debate mode, which made the platform more flexible and dynamic.

• Most importantly, we are proud that VoiceForge addresses a real problem by helping people communicate with more confidence and clarity.

What we learned

• We are proud that we turned VoiceForge into a real working product, not just an idea.

• We successfully built live AI voice interaction with ElevenLabs, which made speaking practice feel realistic and engaging.

• We combined verbal feedback with non-verbal analysis, allowing VoiceForge to evaluate both what the user says and how they say it.

• We also built multiple modes, including scenarios, custom practice, and debate mode, which made the platform more flexible and dynamic.

• Most importantly, we are proud that VoiceForge addresses a real problem by helping people communicate with more confidence and clarity.

What's next for VoiceForge

• We learned that building a strong MVP is not just about adding more features, but about choosing the right features and keeping the product focused.

• We learned that real-time AI systems require careful design. It is not enough for the technology to work — it also has to feel smooth, natural, and useful to the user.

• We also learned that good feedback comes from combining different methods. Structured metrics, transcript analysis, and non-verbal tracking worked better together than relying on a single model alone.

• Most importantly, we learned that communication is a real and meaningful problem to solve. Building VoiceForge showed us how technology can help people express themselves with more confidence and clarity.

Built With

  • codex
  • elevenlabs
  • featherless
  • groq
  • mediapipe
  • medo
  • node.js
  • supabase
  • vite
Share this project:

Updates