Inspiration
Briefly, this project is a facial recognition program that will automatically record the converstation between user and other people, then send the conversation to a LLM to extract useful information, such as name and hobby, to assist user have a better conversation experience.
This project is inspired by a communication with my friend. I have some problem with matching other's name and face, and such problem keeps bothering me especially when I have to work within a large group. At the beginning of this semester, my friend told me that he is thinking about to start a business about AR glasses, and ask me for suggestions, and the first idea comes out with it is using CV technique to assist users remind other's name. This repository is a feasibility testing of that idea, since I don't have an AR glass, this program is running on a PC now. The program capture
What it does
"WHO ARE YOU?" is a real-time facial recognition assistant that helps users identify people around them and instantly recall personalized information—such as their name and hobby—through a live video feed and audio-based LLM analysis. It supports auto-enrollment for unknown faces, and seamlessly transitions between recognition, enrollment, and update modes.
It’s designed for scenarios where remembering faces and social context matters—like sales, healthcare, or for people with face blindness. A secondary floating window displays the detected user’s name, interests, and system status live—making it ideal for future use in AR smart glasses.
How we built it
- OpenCV for real-time face detection and recognition (using LBPH).
- PyAudio for capturing the user’s voice conversation.
- OpenAI’s GPT API for extracting structured info (like name & hobby) from raw transcripts.
- Whisper (OpenAI) for speech-to-text transcription.
- SQLite for storing face metadata.
- Threading and state-machine-style mode switching for smooth operation between Recognition, Enrollment, and Update modes.
Challenges we ran into
- It tooks some time to understanding OpenCV’s LBPHRecognizer.
- Thread safety: Managing UI, audio recording, database writes, and model updates all from different threads led to concurrency bugs and race conditions.
- Speech transcription and LLM parsing errors, especially when background noise or unclear speech affected Whisper's output.
Accomplishments that we're proud of
Although I met some issues during programming, I still finished the project on time!
What we learned
Via this project, I enhanced my technique with using OpenCV and some related models, and I have learnt some new methods for dealing with multi-threads.
What's next for WHO ARE YOU?
1. Switch to more precious face recognization model (ArcFace)
2. Support other LLM platform (Claud, Grok, etc)
3. Realize the speach to text function locally (Run Whisper locally)
4. (If got chance) Support multiple devices (phone, AR Glasses)
5. Enhance recorder function, currently this doesn't work well in noise environment
6. Move to Cloud
7. More friendly GUI (I have no experience with front end so that's why I'm using Terminal Here(;_ ;))
8. Support Update Hobby once name is given
9. Support modify stored data manually
Log in or sign up for Devpost to join the conversation.