Mirror Mind

Inspiration

The project was inspired by a collection technologies that we wanted to learn during the hackathon:

VLMs
Raspberry PI
RL We were also inspired by projects developed with meta's AR glasses.

What it does

Our project is a tool that you carry around with you to take pictures of the world, talk about the world and have the brain be able to detect the items around the world. We plan to let it interface with obsidian so you can drop in notes directly, and read from your vault when needed.

How we built it

We built the project with a data pipeline consisting of:

User query summarization with LLM
Object detection by Grounding-Dino
Semantic segmentation with SAMv2
Custom short-term memory as inspired by this paper
LLM Model to output (Essentially chatgpt) The raspberry pi was setup with a camera and microphone and streamed its data with WebRTC to the server, connected both via ethernet

We optimize the ML models by using multi-threading for object detection, and llm responses. We also skipped frames per models to get better FPS

Challenges we ran into

Audio transcription during streaming is so horrible. It is so hard to work with audio formats over the network and reconstruct them afterwards. The FPS was slow at the start before we added in optimization. The Grounding-DINO has low accuracy since it is a small model.