Inspiration

VoiceScribe was inspired by the need to quickly turn meeting recordings and long voice notes into clear, actionable summaries. With more meetings happening online and people missing important details or struggling to catch up, we wanted to create a tool that makes audio content instantly searchable, summarized, and interactive, helping teams save time and stay aligned.

What it does

VoiceScribe allows users to upload audio recordings in various formats (MP3, WAV, M4A, OGG and more). The app transcribes the audio using AssemblyAI, summarizes key discussion points with Google Gemini, and enables users to ask questions about the meeting transcript via a smart chat interface. Users get structured summaries (including action items, deadlines, and people responsible) and can chat with the transcript to clarify anything that happened during the meeting.

How we built it

  • Frontend: Built with React and Tailwind CSS for a clean, responsive UI with drag-and-drop file upload, audio playback, and live chat features.
  • Backend: Powered by Express.js. Handles file uploads, audio conversion using ffmpeg, and manages all API integrations.
  • AI/ML: Uses AssemblyAI for accurate transcription and Google Gemini for summarization and Q&A.
  • Deployment: Deployed on Railway for the backend, with the frontend hosted on Netlify. CORS and environment variables were managed for smooth integration between preview and production environments.

Challenges we ran into

  • Getting CORS to work seamlessly between Bolt.new preview, production deployments, and third-party APIs required lots of debugging and careful configuration.
  • Handling different audio file formats (especially webm) required real-time conversion and additional error handling.
  • Managing rate limits and asynchronous polling with AssemblyAI’s API to keep the user experience smooth.
  • Synchronizing changes across environments to make sure the app worked the same in local, preview, and live production.

Accomplishments that we're proud of

  • Built a real-time audio transcription and summarization tool from scratch in a short timeframe.
  • Created a seamless upload-to-summary workflow, making advanced AI features easy to use for non-technical users.
  • Made the app flexible enough to support multiple deployment targets and preview environments.
  • Solved tricky cross-origin and API integration issues to provide a reliable experience.

What we learned

  • How to connect multiple third-party APIs (AssemblyAI, Google Gemini) and handle their quirks.
  • Techniques for file conversion, asynchronous polling, and state management in React.
  • Best practices for managing environment variables and CORS in modern full-stack apps.
  • The value of clear error handling, logging, and user feedback.

What's next for VoiceScribe

  • Add support for even more audio and video file formats.
  • Allow users to export transcripts and summaries to Google Docs, Notion, or email.
  • Build in team collaboration features for sharing and commenting on summaries.
  • Explore more advanced AI features, like speaker identification and topic tracking.

Built With

Share this project:

Updates