Speak Smart

Inspiration

The idea for speak smart came to us while watching a Django tutorial on YouTube in Hindi. Even though we didn’t fully understand the language, we could tell the content was valuable, but following along was a challenge. That’s when it hit us—there’s so much useful information trapped behind language barriers, limiting access for many people worldwide.

In that moment, we thought of a solution—something that would make learning from any source, in any language, simple and accessible. This led to the creation of speak smart. We envisioned a platform that could break language barriers by providing real-time translations. Speak smart would pull content from different formats, generate comprehensive notes similar to Quizlet flashcards, and offer interactive quizzes. It would even feature an AI tutor to help users grasp complex concepts, making learning easy and accessible for everyone.

Speak smart represents the belief that knowledge should be accessible to everyone, regardless of language or location. By using technology, it empowers learners globally, turning educational content into a truly universal resource.

What it does

Accepts input like video URLs, PDFs, .mp4, and .mp3 files.
Offers innovative AI dubbing (a unique feature).
Generates detailed notes from the resources.
Creates flashcards to help users master the material.
Provides an AI tutor, powered by ChatGPT, that uses the content to offer quizzes, answer questions, and give feedback.
Generates interactive quizzes.

How we built it

We used React to build the frontend for a seamless user experience, while Flask and Python handled the backend processes. Videos are downloaded using the YouTube API, and Assembly AI API is used to transcribe audio to text. We then translated the text with Google Translator and generated dubbed audio using Microsoft Azure Text-to-Speech (TTS). The dubbed audio was synchronized with the original video using moviePy.

The same process applies to podcasts. From the transcript, we generated notes using the OpenAI GPT-3.5 Turbo API, which were followed by creating flashcard questions and answers, formatted in JSON. We also developed a GPT-like AI tutor based on the transcript, which answers user queries and provides interactive quizzes. For PDFs, we used Convert API to convert them to text and followed the same process as videos.

Challenges we ran into

Formatting flashcards properly in JSON.
Syncing audio with video using moviePy.
Connecting the React frontend with the Flask backend.
Finding affordable and high-quality APIs like Microsoft Azure TTS.
Handling English-to-English conversions, which required retaining the original video, causing storage challenges.
Planning for future scalability, particularly efficient video storage based on user IDs.

Accomplishments we're proud of

AI Dubbing Innovation: Developed an AI dubbing feature that translates and voices content into multiple languages.
Comprehensive Learning Platform: Integrated features like notes, flashcards, and an AI tutor to create a robust educational platform.
Smooth Backend-Frontend Integration: Successfully connected React with Flask and Python to ensure the platform runs smoothly.
Effective API Usage: Used APIs like Assembly AI and Microsoft Azure TTS to deliver optimal functionality at a low cost.
Streamlined Workflows: Built efficient processes for transcription, translation, and content generation.
User-Centric Design: Created a simple, intuitive interface with accessibility in mind, making navigation easy for everyone.
Problem-Solving: Tackled challenges like JSON formatting and API integration through persistence and creativity.
Clear Growth Vision: Established a roadmap for enhancing video processing, data handling, and deploying custom machine learning models.

What we learned

We learned how to integrate complex technologies and APIs, optimize workflows, and create a platform that meets real-world educational needs. This experience also taught us the value of user-friendly design and the challenges involved in scaling an innovative product.

What's next for Speak Smart

Optimized Video Processing: Reduce processing times, allowing a 30-minute video to be transcribed and translated in seconds with custom machine learning models.
Efficient Data Pipelines: Implement advanced pipelines to handle multimedia content quickly and deliver translations almost instantly.
Edge Computing: Explore edge computing for faster transcription and translation, even in low-bandwidth environments.
Streamlined UI: Redesign the interface to ensure quicker, easier access to content with fewer clicks and maximum usability.
Performance Monitoring: Introduce real-time system monitoring to gather user feedback and track system efficiency, ensuring a fast and reliable platform.
Custom Models: Develop proprietary machine learning models to enhance accuracy and speed, minimizing dependence on third-party APIs.
Deployment: Use Docker and Google Cloud services (leveraging $300 in free credits) for scalable deployment.