Inspiration
We live in a world of staged photos. We stop the fun to say "cheese," posing for the camera instead of living in the moment. We realized that our camera rolls are full of "perfect" pictures, but they often miss the genuine sparks of joy, the laughter during a board game, the surprise of a joke, or the quiet smile of a loved one.
We wanted to build the "anti-Instagram." A camera that doesn't demand your attention, but instead passively captures the moments that actually matter. We wanted to use AI not to fake reality, but to document it exactly as it feels. This would be helpful for understanding what makes you happy!
What it does
Happy Moments is an autonomous, AI-powered photographer and journalist.
It Watches (Passively): Using a webcam or a connected phone camera, it monitors the room in real-time.
It Detects Joy: We use computer vision to analyze facial expressions. When it detects a genuine smile that crosses a specific "happiness threshold," it automatically triggers a capture.
It Remembers: The app snaps a photo and instantly sends it to Google Gemini 2.0 Flash.
It Journals: Gemini analyzes the image to understand the context: who is there, where you are, and what activity you're doing, and writes a short journal entry for you.
It Maps Your Happiness: The memories are saved to a visual timeline and map, helping you understand what people, places, and activities truly bring you joy.
How we built it
We built a full-stack application connecting real-time computer vision with Generative AI.
The "Eye" (Computer Vision): We used Python and OpenCV to handle video streams. We implemented multi-threading to handle a primary face-cam (for emotion) and a secondary IP-camera (for capturing the wider scene).
The "Brain" (Emotion Detection): We utilized DeepFace to analyze micro-expressions in real-time. We had to optimize this to run efficiently without lagging the video feed.
The "Storyteller" (Generative AI): We integrated Google Gemini 2.0 Flash. We engineered specific prompts to force Gemini to return structured JSON data, allowing us to extract clean captions, location context, and activity tags automatically.
The Dashboard: The frontend is built with React, featuring a dynamic timeline and map visualization to display the captured memories.
The Backend: A Flask server acts as the traffic controller, managing the video threads, API calls, and data storage.
Challenges we ran into
The "Zombie" Port: We spent hours debugging a Flask issue where our video threads wouldn't close properly, leaving "ghost" processes holding onto port 5001. We learned a lot about process management and graceful shutdowns!
The "Smile" Lag: DeepFace is powerful but heavy. Initially, running it on every frame froze the video. We solved this by running the detection logic on a separate thread and only analyzing every few frames, keeping the UI buttery smooth.
Structured AI Outputs: Getting an LLM to consistently return perfect JSON without markdown formatting took some serious prompt engineering. We had to build robust error handling (and fallback modes) for when the AI hallucinated the format.
Network Juggling: connecting a phone camera via local IP was tricky on university WiFi (due to client isolation). We had to get creative with hotspots and network configs.
Accomplishments that we're proud of
It actually works! Sitting in front of the laptop and seeing it snap a photo only when we genuinely laughed was a magical moment.
Dual-Camera Sync: We successfully synced a laptop webcam (for emotion detection) with a phone camera (for the "world view"), allowing us to capture the scene, not just our own faces.
Fail-Safe Design: We built a system that degrades gracefully. Even if we hit the Gemini API quota (which we did!), the app catches the error and saves the memory locally so nothing is lost.
What we learned
AI is a commodity, Context is King: The raw emotion data is cool, but the value comes from Gemini explaining why we were happy (e.g., "Playing cards with friends").
Latency Matters: In a real-time app, milliseconds count. Threading isn't just a "nice to have," it's a requirement.
Hardware limits: We pushed the limits of what a standard laptop can do while running heavy ML models alongside a web server.
What's next for Happy Moments
User Accounts & Cloud: Moving from local JSON files to a proper database so users can access their "Happy Map" from anywhere.
Video Highlights: Instead of just photos, we want to capture the 5 seconds before the smile to create "Live Photos" of the laughter.
Mood Analytics: expanding beyond just "Happy" to track surprise, excitement, and calm to provide a full mental health picture.
Glasses incorporation such as Meta glasses to keep screens away.
Log in or sign up for Devpost to join the conversation.