nap & Speak is an innovative, AI-powered language learning tool that turns the user's world into an interactive classroom. By simply taking a photo or uploading an image, users can unlock a rich, contextual learning experience. The application leverages the Google Gemini API to analyze the image and instantly generate: A clear description of the scene. A list of key vocabulary words with translations and phonetic pronunciation guides. Practical example sentences that use the new vocabulary in context. Text-to-speech functionality to hear the correct pronunciation of words and sentences. Alternative descriptions of the image to encourage creative thinking. This approach moves beyond traditional flashcards, connecting language to the user's immediate environment and making learning more personal, memorable, and effective. How It's Built Snap & Speak is a modern web application built with: Frontend: React and TypeScript for a robust and interactive user interface. AI Engine: The Google Gemini API (gemini-flash-latest model) provides the core multimodal intelligence, analyzing images and generating structured JSON for the learning content. Styling: Tailwind CSS is used for a clean, responsive, and mobile-first design. Web APIs: It utilizes the browser's Camera API (getUserMedia) for direct photo capture and the Web Speech API for text-to-speech pronunciation. The architecture is designed to be efficient and scalable, providing a seamless experience from capturing an image to learning from the AI-powered analysis. Inspiration & Journey My journey with Snap & Speak began with a personal struggle: learning a new language. Traditional methods felt disconnected from the real world. I wanted to learn the words for the things I see every day, not just words from a textbook. The idea was born: what if I could just point my camera at something and instantly get a language lesson about it? As someone who isn't a professional programmer, bringing this vision to life was a challenge. After trying a few approaches that didn't quite work, I realized my priority was to make the core feature functional and reliable. This is where the Google Gemini API became a game-changer. I decided to use the API directly, which allowed me to bypass complexities and focus on what truly mattered: the user's learning experience. The power and flexibility of the Gemini API made it possible to quickly build the intelligent analysis engine that is the heart of Snap & Speak. This project is a testament to how modern AI can empower anyone to build powerful applications that solve real-world problems. AI is providing incredible new ways for us to learn, and I'm deeply grateful to Google for making these transformative tools so accessible.

Built With

Share this project:

Updates

posted an update

I'm thrilled to announce that my project, Snap & Speak, is now live and ready for you to try! Snap & Speak is a language learning tool that turns your world into a classroom. Just take a picture, and it uses AI to give you vocabulary, example sentences, and pronunciation guides for the language you're learning. Currently, the app is powered by the Google Gemini API, which has been an incredible tool for bringing this idea to life, especially as a solo developer learning the ropes. Looking Ahead & A Question for the Experts: As I think about future improvements, I'm really fascinated by the potential of on-device AI to make the experience even faster and potentially work offline. I was wondering if anyone could share some insights on this: How can I modify the app to use a browser's built-in AI, like Chrome's integrated Gemini model, for image analysis directly on the client side? My goal would be to perform the image recognition locally in the browser, reducing the reliance on a server-side API key and network requests. Any pointers, articles, or code examples on this topic would be immensely helpful! Please give the app a try and let me know what you think. I'd love to hear your feedback on both the app itself and my question about on-device AI. Thank you for your support

Log in or sign up for Devpost to join the conversation.