Inspiration

Smart locks are amazing tools, with studies showing that even the presence of one smart lock in the neighborhood is able to reduce crime by 55%! However, modern locks like Nest and Ring have one major limitation, their notifications can only tell you that someone is at your door, but not who. That's why during the past 36 hours, we not only made a fully functional smart lock, but implemented a transformer powered image captioning system that gives you a description of who is at your door through a notification!

What it does

"Who's there" is a smart doorbell system that sends a description of actions happening at your front door, analyzing the image using AI transformer. The doorbell has a distance sensor, constantly measuring how far away someone is from the door. When an individual gets close enough to the sensor, the camera snaps a photo and sends it to our server, where a state-of-the-art transformer model is used to generate a caption. The generated caption is then sent to the homeowner through TwilioSMS, where they are given the option to either let the person in or keep them out. If the person texts "yes", the lock is opened and their guest is let inside!

How we built it

The project consists of hardware and software part. We built the door with a video camera attached to a Raspberry Pi 4. The ultrasonic distance sensor measures the distance to an object in a loop till it gets close enough. Then after being processed and sent to the user as a description, it receives a command to open or keep the door close, which in response activates the motor on the door.

Challenges we ran into

  • Finding an image captioning model that generated accurate descriptions while not using an absurd amount of computational resources
  • The most difficult part was trying to integrate our systems with a web server. Our original idea to use linode ended up failing because they weren't a sponsor of this hackathon. When we tried to use AWS, after an hour of trying with a mentor we realized that our account was not eligible for GPU access, which made captioning take impossibly long.
  • Devposts deleting our descriptions... twice

Accomplishments that we're proud of

  • We made a custom server with tailscale that was able to interface with twilio and run our image captioning model!
  • All of our hardware worked perfectly, the webcam takes a picture when someone walks up close enough to it and send it to our web server and the motor can lock and unlock the door upon request!
  • We got all the components (Pi4, phone, web-server) working together to generate descriptions and lock and unlock the door

What we learned

  • How to develop a functional IoT product, from the hardware to the software
  • We went into this hackathon with barely any hardware experience, so it was a really fun and challenging time building the prototype and getting all the pi and sensors to work
  • How to deploy a transformer model on a web server (and how to make your own web-server)

What's next for Who's There

Due to time and technology constraints, we were not able to build Who's There to its full potential. In the future we wish to:

  • Fine tune the transformer on a dataset of labeled service workers (food delivery people, mail people, plumbers and etc.) to allow for more accurate captions
  • Design and implement an app to interface with the smart lock. The app should let you lock and unlock the door as well as get a live view from the smart lock camera
  • Interface it with other messaging apps like telegram and whatsapp instead of just SMS
  • Create a "vacation mode", for homeowners that still want to check a system that automatically generates a summary of everything that happened over a given period of time (ex: every 24 hours)
Share this project:

Updates