Welcome to MirrorMate!

First of it's kind local smart AI mirror for real-time insights and interaction, right at home.

Inspiration

With the buzz around the new Raspberry Pi AI HAT+ (boasting 23 additional TFLOPS of compute) and the powerful brand new Raspberry Pi 5 12GB model, we realized that running advanced AI tools locally was now a reality. Thankfully power over USBC ultra-thin displays are now on the market and cheap, so after ripping one apart we realized we now had a display for our mirror. This breakthrough in on-device processing inspired us to build a seamless “smart mirror” experience: an intelligent, interactive display that can respond and adapt to users in real time—showcasing local YoloV6 AI model capabilities, face detection, and pose estimation for future translation. Our goal was to create something that felt magical and intuitive, harnessing cutting-edge local compute potential so for the very first time, local LLMs and cutting edge AI models can run offline, and integrate in a useful way to the average consumer (without a subscription or needing a powerful Desktop!

What it does

  • AI-Powered Display: The mirror continuously streams real-time face and pose estimations, visually presenting users with animated outlines or indicators of their position, stored to rotate our 3D head model in the top right (had to disable as this project already pushed portable local compute to it's limit)

  • Gemini Welcome: Our project integrates Google’s Gemini to greet and engage users with dynamic, AI-driven content. Although there is no microphone interaction yet, this visual prompt lays the groundwork for future expansions. The output was written with custom rendering and css that should feel familiar and comfortable to read and interacte with

  • Local LLM Assistance: A local large language model (LLM) is fully orchestrated on the Pi with the attached AI HAT+ 23 FLOPS board, offering quick responses and personalized user suggestions without requiring constant online connectivity.

  • Natural, Intuitive Experience: Each displayed element—from the facial recognition markers to the AI’s dynamic text—was carefully positioned and styled to deliver a seamless, high-tech but welcoming feel. Combined with useful components such as a simple weather time interface.

How we built it

  • Hardware Setup: We combined a Raspberry Pi 5 12GB model with the new Raspberry Pi AI HAT+ 23 FLOPS board released this month. This provides a potent small-footprint platform with enough horsepower to handle both local LLM inference and real-time pose/face estimation. Note the hardware was assembled before the hackathon, but no OS, configuration, or files were created beforehand

  • Modular Architecture: We developed a separate set of modules for face tracking, pose detection, AI inference, and the user interface components. This allowed for parallel experimentation and isolated improvements.

  • Prototyping for User Experience: We iterated through numerous prototypes, each time refining how and where the visual feedback would appear on the mirror surface. From adjusting font sizes to altering the overlay’s color palette, we tested diverse layouts to ensure interactions felt natural and visually appealing.

  • Integration of Gemini: Incorporating Gemini’s AI capabilities into our pipeline was straightforward using readily available APIs. We set it up to generate context-specific greetings or insights for the user, augmenting the local LLM’s functionalities.

Challenges we ran into

  • Hardware Constraints: Even with the additional TFLOPS, running face/pose estimation and an LLM on-device pushed our Pi’s limits, causing our head model to break. Balancing model size, video compression, frame rate, goRTC functionality, and CPU/GPU usage proved crucial to ensure smooth performance.

  • No Documentation: Figuring out as we went along how to use these new models, make the AI HAT+ board run and serve a preview of the model. This project is the first of it's kind, so we had to rely a lot on mentor help and obscure videos for adjacent attempts.

  • Latency Management: YOLOv6, operating in “preview” mode, leveraged named pipes to send frames to ffmpeg for real-time video processing, so detection inferences can be generated but the stream led to latency delays on school wifi. Meanwhile, gortc packages handled low-latency RTC data for MagicMirror, but overlaying these inference results from the twice encoded video display lead to frequent hardware crashes.

  • Ensuring a User-Friendly Experience: We revised our UI/UX multiple times to avoid an overwhelmingly technical or cramped display. Managing real-time analytics in a neat, intangible reflection was a constant balancing act.

Favorite Accomplishments

  • Seamless Real-Time Interaction: Achieving near real-time image analysis and on-screen rendering was a big win considering the constrained environment of embedded hardware.

  • On-Device LLM: Successfully running a pared-back LLM like Gemini locally proves that personal, conversational AI experiences can be built even without a steady internet connection.

  • Integrated System Design: By combining visual recognition, a text-based AI, and advanced hardware into a single solution, we demonstrated the versatility and power of embedded AI when carefully optimized, in a package that can be used in the average person's daily morning.

  • Building Prototypes: Our thorough prototyping approach led to an experience that feels alive and personalized without being intrusive or gimmicky, even though the size of the mirror is thicker than we'd planned

What we learned

  • Prototyping Is Key: Each iteration taught us more about user expectations and hardware capabilities, affirming that frequent testing and feedback loops lead to the best results.

  • Hardware-Software Co-Optimization: Fully leveraging the AI HAT’s compute potential required close cooperation between software layers—our approach to concurrency and resource sharing was paramount.

  • Practical AI UX Considerations: When integrating ML models in real-world products, cohesive user interfaces are just as important as raw algorithmic performance.

  • Potential of On-Device AI: This project showcased how powerful on-device AI can become when the latest hardware is tailored for local computation.

What’s next for us?

  • Voice Interaction & Advanced NLP: The natural next step is to include a microphone with Free-WiLi and advanced speech recognition, letting users directly converse with the system. We tried for many hours to integrate this feature with the hardware given, but with no luck.

  • Multi-User Personalization: We aim to expand face recognition to run our simulated head model to respond to user movements, possibly allowing a web 3.0 social platform for two-way communication.

  • Expanded Sensor Integration: Beyond camera data, we’ll explore harnessing other sensors (e.g., ambient light, temperature) which can allow for features like turning on if motion is detected, temperate of room, telemetry based gesture controls, etc.

  • Wider Model Support: In addition to Gemini, we’re eager to incorporate more generative AI models for text, images, and perhaps even audio, further enriching the mirror’s interactive repertoire.

Built With

Share this project:

Updates