Inspiration:
We took inspiration for our project from the theme of this year's hackathon, Cave men, and what better way to represent a caveman than with a man's best friend - 'Flash' the dog!!
What it does:
We built Flash the dog to analyze a situation and detect a threat. When it does, it will alert its owner, the caveman, with a caveman-like text-to-speech prompt of said threat.
- Real World Application: We plan on taking this project further, allowing Flash the dog's code to be implemented on a service dog for blind people. When the service dog senses something amiss, it alerts the owner by giving a clear and concise text-to-speech prompt of the imminent threat and directs him/her away to safety.
How we built it:
We built it using Google Gemini API software, using both a vision model and a language model. A Sense App indicator ESP32 device. Language of C.
Challenges we ran into:
- First Challenge: Displaying the Sense App indicator ESP32.
- Second Challenge: Hooking up the Sense App indicator ESP32 to WIFI. We had to get the original firmware for the WIFI to work.
- Third Challenge: using the camera we needed for image input. Would not function properly and wouldn't function otherwise.
Accomplishments that we're proud of:
We got the Google Gemini API working. Got the persona to work. We have a viable, usable product that functions almost like its supposed to.
What we learned
We learned that Google Gemini's Gem feature is pretty good at making prompts. We learned that the Sense App indicator ESP32 is a load of work. It may look fancy, but it is also deadly.
What's next for Caveman's Best Friend:
We want to push this towards real-world applications of things like seeing eye dogs and genuinely helping the future.
One application is for use by visually impaired people. Instead of having a regular dog follow someone who is visually impaired that is only able to bark at threats as it’s only method of communication to the owner, a robot dog can intelligently identify threats using AI vision and then tell it’s owner exactly what it is seeing is a great use for the Google Gemini API and for our implementation. The voice lines would be read aloud by a text to speech model to the owner.
Another application is security. For security systems that are scanning environments 24/7, it would be helpful for a vision model to be able to intelligently detect types of threats. This information can be used to then notify the user or even authorities if the user is not available. If a dinosaur is breaking into your house, and you are not there and are not actively watching the cameras, having this knowledge beamed to your phone would be extremely important.
WORKING VIDEO LINK: https://drive.google.com/file/d/1bvsKIylsEfS-qTyi0eAAREtUMNUqDJNd/view?usp=sharing
YouTube no work :(
Log in or sign up for Devpost to join the conversation.