Inspiration
As a team of international students in the US, we all struggled to varying degrees with adapting to a new country, language, and culture. One particular challenge was in lectures, where professors often spoke very quickly and came with new accents of their own. We wanted to build a product which we would have wanted to use when starting our academic journeys abroad!
What it does
HearSay transcribes lectures (all kinds of speech really) in real-time and dictates it back to you in an understandable voice. It allows students to easily summarise their lecture transcriptions and generate study summaries, flashcards, and other study materials, which is organised on our platform. It also allows students to integrate visual study materials such as pictures of class whiteboards, PDFs, and Powerpoints which is integrated into their study notes.
How we built it
We built our web app on top of Streamlit. We used OpenAI's Whisper model for Speech-To-Text, Eleven Labs for Text-To-Speech as well as performing post-processing on audio recordings using PyAudio, PyDub and other Python packages. We used Llama for OCR on visual study materials and text summarisation. We stored audio transcripts and other study materials on MongoDB Atlas. Finally, we accelerated all our AI inference using Groq.
Challenges we ran into
We ran into issues building real-time speech-to-text, as the Streamlit platform did not have good audio recording tools. Streamlit also had issues with website rendering which could have been avoided had we gone with a different web development framework. We had issues with inference latency due to rate limiting. Finally, we wanted to integrate AI agents (such as Fetch.ai) into our app to make our study platform more interactive for International students, but had issues registering the agents on the Agentverse platform.
Accomplishments that we're proud of
We are proud of building a platform that we think can solve problems for other international students. We are also happy that we got to immerse ourselves in a multi-modal AI toolkit, from images, text, audio, and AI Agents.
What we learned
We learned the importance of creating a product that solves real-world problems by staying focused on the user experience. As international students ourselves, we can directly empathise with the pain of struggling to understand lectures, and we learned a lot about how technology can bridge language gaps. We also learned about the importance of accuracy and speed for real-time processing, which creates a product that users can depend on everyday.
What's next for HearSay
We're excited to refine our real-time transcription experience to ensure better accuracy, particularly for a wider array of accents and different languages for real-time translation. Additionally, we want to integrate personalised AI agents to provide more customised study recommendations based on students' lecture notes and preferences. It would be great to partner with universities to make the platform available for students as a useful tool for learning in new cultures.
Log in or sign up for Devpost to join the conversation.