Inspiration
We have experienced the hurdles of taking care of our grandparents and other older people in our family, and have dealt with the difficulties of the monetary cost of hiring a caregiver and emotional cost of constantly worrying if they have done the things they need to do for the day. Existing solutions to this problem FAIL to keep family members informed about the status of their loved ones at a reasonable price.
What it does
It cares for the user in nearly every way that a caregiver can. This includes keeping track of what the user does throughout the day, communicating with them to keep them from being lonely, notifying family members if anything goes wrong, and reminding them to do things they need to do to take care of themselves.
How we built it
We took an existing project of ours with a basic app that connected with smart glasses to estimate what the user was doing and for the hackathon, we added an AI agent on top of it that can reason about the user's actions, communicate with them and their family members, and make informed decision based on the user's data and past events. We gave the agent access to multiple tools that include querying our database of information on the user, querying the events log, and directly asking questions to the user.
Challenges we ran into
Existing computer vision models are very effective at object detection, but atrocious at action detection from a first person perspective. This is partially due to the fact that existing models are built to execute on a frame by frame basis, rather than analyzing changes in frames temporally to estimate the intention of the user. However, we used algorithms to identify the locations of key objects and then implemented a reasoning system through LLM's to detect the user's actions over several frames. Additionally, these models are not that reliable and sometimes hallucinate events and objects, which we handled by using a combination of models to verify the other models' guesses.
Accomplishments that we're proud of
Integration of the agent into the rest of the app, including the database and event logging, which allows the agent to have all the information that a regular caregiver has when assisting the user.
What we learned
In order to detect user intent and actions, we need much more advanced models that can handle videos because existing ones perform poorly at these tasks.
What's next for Memento
We want to continue building our app in the future.


Log in or sign up for Devpost to join the conversation.