Snap & Speak

Welcome to Snap & Speak! Start your language journey by uploading a photo or using your camera to capture the world around you.
Welcome to Snap & Speak! Start your language journey by uploading a photo or using your camera to capture the world around you.
Your world is your classroom. From the supermarket to the warehouse, Snap & Speak turns any real-life scene into a practical language lesson
Learn words in context. The 'Sentences' tab gives you practical examples to help you master new vocabulary and speak more naturally.
Learn words in context. The 'Sentences' tab gives you practical examples to help you master new vocabulary and speak more naturally.
Customize your lesson. The settings modal lets you instantly change your native and target languages for a truly personalized experience.

nap & Speak is an innovative, AI-powered language learning tool that turns the user's world into an interactive classroom. By simply taking a photo or uploading an image, users can unlock a rich, contextual learning experience. The application leverages the Google Gemini API to analyze the image and instantly generate: A clear description of the scene. A list of key vocabulary words with translations and phonetic pronunciation guides. Practical example sentences that use the new vocabulary in context. Text-to-speech functionality to hear the correct pronunciation of words and sentences. Alternative descriptions of the image to encourage creative thinking. This approach moves beyond traditional flashcards, connecting language to the user's immediate environment and making learning more personal, memorable, and effective. How It's Built Snap & Speak is a modern web application built with: Frontend: React and TypeScript for a robust and interactive user interface. AI Engine: The Google Gemini API (gemini-flash-latest model) provides the core multimodal intelligence, analyzing images and generating structured JSON for the learning content. Styling: Tailwind CSS is used for a clean, responsive, and mobile-first design. Web APIs: It utilizes the browser's Camera API (getUserMedia) for direct photo capture and the Web Speech API for text-to-speech pronunciation. The architecture is designed to be efficient and scalable, providing a seamless experience from capturing an image to learning from the AI-powered analysis. Inspiration & Journey My journey with Snap & Speak began with a personal struggle: learning a new language. Traditional methods felt disconnected from the real world. I wanted to learn the words for the things I see every day, not just words from a textbook. The idea was born: what if I could just point my camera at something and instantly get a language lesson about it? As someone who isn't a professional programmer, bringing this vision to life was a challenge. After trying a few approaches that didn't quite work, I realized my priority was to make the core feature functional and reliable. This is where the Google Gemini API became a game-changer. I decided to use the API directly, which allowed me to bypass complexities and focus on what truly mattered: the user's learning experience. The power and flexibility of the Gemini API made it possible to quickly build the intelligent analysis engine that is the heart of Snap & Speak. This project is a testament to how modern AI can empower anyone to build powerful applications that solve real-world problems. AI is providing incredible new ways for us to learn, and I'm deeply grateful to Google for making these transformative tools so accessible.

Built With

camera-api
css3
google-gemini-api
google/genai
html5
javascript
lucide
react
tailwind-css
typescript
web-speech-api

Submitted to

Google Chrome Built-in AI Challenge 2025

Created by

As the sole creator of Snap & Speak, I was responsible for the entire project from concept to deployment. This included the UI/UX design, frontend development using React and TypeScript, and the integration of the core AI features.
This project was a huge learning journey for me, as I am not a professional programmer. I initially faced some challenges trying to get the AI functionality working, but the real breakthrough came when I decided to use the Google Gemini API directly. It was incredibly powerful and allowed me to build the smart, responsive experience I had envisioned for language learners.
I learned so much about modern web development and AI integration through this process. I'm especially grateful for tools from Google AI that empower solo creators like me to bring ambitious ideas to life.

he fangsheng

Updates

he fangsheng posted an update — Oct 27, 2025 05:46 AM EDT

I'm thrilled to announce that my project, Snap & Speak, is now live and ready for you to try! Snap & Speak is a language learning tool that turns your world into a classroom. Just take a picture, and it uses AI to give you vocabulary, example sentences, and pronunciation guides for the language you're learning. Currently, the app is powered by the Google Gemini API, which has been an incredible tool for bringing this idea to life, especially as a solo developer learning the ropes. Looking Ahead & A Question for the Experts: As I think about future improvements, I'm really fascinated by the potential of on-device AI to make the experience even faster and potentially work offline. I was wondering if anyone could share some insights on this: How can I modify the app to use a browser's built-in AI, like Chrome's integrated Gemini model, for image analysis directly on the client side? My goal would be to perform the image recognition locally in the browser, reducing the reliance on a server-side API key and network requests. Any pointers, articles, or code examples on this topic would be immensely helpful! Please give the app a try and let me know what you think. I'd love to hear your feedback on both the app itself and my question about on-device AI. Thank you for your support

Log in or sign up for Devpost to join the conversation.

he fangsheng started this project — Oct 27, 2025 05:25 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.