Inspiration
I personally have struggled with the pronunciation of certain words. I have searched for applications that can address my specific problem but didn't find anything suitable. I have reviewed various research papers on how this pronunciation scoring system was implemented, but they didn’t work well. So, when this hackathon was announced, I thought of building it since Gemini was capable of processing audio and text.”
What it does
This is a simple application that assists users in analyzing and improving their pronunciation. It:
- Generates scripts for users to test their pronunciation
- Records the users as they speak the provided scripts
- Analyzes the user’s recordings and provides suggestions and corrections for improvement.
How we built it
The application is built with a frontend running on Next.js. When a user visits the page, the frontend generates a script for the user to practice their pronunciation. This script is dynamically generated using Gemini. The user then records their voice while reading this script. This recording process happens on the client-side, within the user’s browser, and involves capturing audio input from the user’s microphone.
Once the recording is complete, both the recorded voice and the initial generated text are sent to thebackend API, which is built with FastAPI. The Gemini Model on the backend then analyzes the voice recording based on the initial script and provides valuable suggestions and remarks. This way, the user gets immediate feedback on their pronunciation, which can help them improve.
Challenges we ran into
I have used Google Cloud Run & Docker to deploy the backend on Cloud Run as a containerized serverless application. Initially, I had difficulty setting up the Docker build from GitHub. Later, I moved on to deployment via the gcloud CLI method, which worked well. However, after updating the code, I found it difficult to find documentation on how to reflect the changes. After a few searches, I found the solution.
Since I was dealing with Firebase Auth for the first time, initially it was difficult to find good documentation for dealing with Firebase Authentication for Next.js. Later, I found some videos and code snippets, and implemented that as well.
Accomplishments that we're proud of
- I have managed to develop this application within a tight schedule and am happy to have completed this project.
- The Cloud Run deployment and Firebase authentication were all added at the last moment, hours before the deadline.
What we learned
- I learned how to create better prompts for Gemini.
- I learned how to work with multiple types of files with Gemini.
- I learned how to Dockerize an application.
- I learned how to deploy the application as a serverless container in Google Cloud.
- I learned how to implement Firebase Authentication.
- I learned some new concepts in Next.js.
What's next for Accent Ace: Your personal pronunciation helper
At present, the product build is straightforward and minimalistic.
- I would like to immediately set up a database for this project and provide personalized recommendations for improvement.
- I plan to extend the language support from English to various regional languages.
Log in or sign up for Devpost to join the conversation.