My grandfather has Alzheimer's. Not the dramatic kind you see in movies the quiet, daily kind. He asks where his glasses are. You tell him. Five minutes later, he asks again. You tell him again. He is not being difficult. He genuinely cannot hold onto the memory of where he put something.

I kept thinking: what if something was always watching, so he never had to remember? That became Senova.

What it does

Senova is an AI memory companion that watches your world through your phone camera and remembers where things are so you don't have to. You wear it like glasses. It runs quietly in the background, watching your environment. When it sees your water bottle, your keys, your medicine, your bag — it quietly notes what it saw, where it was, and when.

When you forget, you just ask. Out loud, in plain English: "Where did I leave my water bottle?" "Have you seen my keys?" "Where did I put my medicine this morning?" And Senova answers. Not with a list of search results. Not with a form to fill in. It answers the way a person would: "Right now I can see your orange water bottle on the desk next to your laptop." That is it. That is the whole experience. Ask, get an answer, move on with your day.

How we built it

The hardest part of building Senova was not the AI it was making the AI fast enough, accurate enough, and honest enough to be genuinely useful to someone who is already struggling. Under the hood, Senova runs a continuous vision pipeline on your device. Every few seconds, it captures a frame from the camera and runs it through Florence-2 (a Microsoft vision model) to detect objects, read any text on them, and describe where they are in the scene. A second model YOLO catches everyday items like food and clothing that Florence occasionally misses. Each detected object gets a visual fingerprint (a CLIP embedding) and a text description, both stored in a database alongside a timestamp.

When you ask a question, three things happen at once: a visual search (finding objects that look like what you described), a semantic text search (finding objects whose descriptions match the meaning of your question), and a keyword scan (finding exact brand names or words you mentioned). All three results are merged, ranked by how recently the object was seen, and handed to Claude (Anthropic's AI) along with the live camera frame what the camera sees right now so the answer is always grounded in the current moment.

The whole pipeline runs locally on any edge device enabling faster inference . No cloud compute for vision. Sub-2-second responses.

Challenges we ran into

Making "where is it?" actually work. Early versions would confidently answer with sightings from an hour ago even when the object had moved. The system was ranking visually similar historical frames above more recent ones. We rebuilt the retrieval ranking to strongly favour recency a sighting from 5 minutes ago now outscores a cleaner visual match from an hour ago and added the live camera frame as direct input to every answer, so Claude can see the current moment, not just indexed history.

Reading product labels. If you ask "where is my BaByliss trimmer" and the system only stored it as a "box", no amount of visual search will find it. We added per-crop OCR to read brand names and product text off every detected object, then built a keyword-first search path that finds objects by their text labels independently of visual similarity. Now if "BaByliss" is readable on the box, it gets found.

Hallucination on small crops. Florence-2, when asked to describe a tiny cropped image, sometimes returned alarming nonsense (it once described a plain bag as "a man in a black hoodie holding a gun"). We removed per-crop captioning entirely and switched to full-frame dense captions, which are far more stable.

Accomplishments we're proud of

Watching someone with no technical background pick up their phone, ask "where is my water bottle?", and get a real, accurate, spoken answer in under two seconds — that is the moment we are most proud of. Also: the entire vision pipeline runs on-device with no internet required. Your environment, your memories, your device. Nothing leaves unless you ask.

What we learned

Designing for someone with memory loss is humbling. The interface has no room for error messages, retry prompts, or "try rephrasing your question." If the answer is wrong, the person asking may not even realize it is wrong. That forces a completely different standard of reliability than a typical consumer app.

We also learned that the gap between "technically works" and "actually useful" is enormous. The system retrieved the right object dozens of times in testing. But it only became genuinely useful when the answer came back in plain, warm, human language without timestamps in ISO format, without bounding box coordinates, without confidence scores. Just: here is where your thing is.

What's next for Senova

The camera on a phone sitting on a desk is a proof of concept. The real version lives in a lightweight wearable — glasses with a small camera, always on, always watching. Next steps:

  • Wearable form factor — clip-on camera glasses, no phone needed
  • Proactive reminders"You haven't taken your medicine today" without being asked
  • Person recognition"Your daughter visited at 3 PM and left something on the kitchen table"
  • Caregiver dashboard — a quiet, private view for family members to check in without intruding
  • Multi-room awareness — remembering not just what was seen but where in the home it was seen The goal is not to replace memory. It is to make forgetting less frightening.

Built With

Share this project:

Updates