Bevis, your portable AI assistant

Inspiration

BeVIS was inspired by JARVIS, aiming to create an AI that seamlessly integrates into daily life. The idea was to enhance human interaction with the environment using intuitive gestures, making information accessible in a highly interactive manner. BeVIS aims to make a significant contribution to educational equity, providing everyone with access to the best education resources and learning opportunities.

What it does

BeVIS uses a moving camera to allow users to select objects or areas by framing them with their fingers. It then provides real-time explanations or descriptions about the selected object. This is particularly beneficial for learning, exploring historical sites, and assisting visually impaired individuals by describing their surroundings.

How we built it

The Bevis Buzz, who is based on ESP32-CAM platform
3d printed cover, designed purposefully like a cute HachTX 24 logo BeVIS combines advanced computer vision and machine learning technologies:
YOLOv5: A lightweight version for efficient hand detection.
ResNet50: Self-trained for precise keypoint detection.
Simple IOU-Based Hand Tracking: Ensures smooth and reliable tracking of hand gestures.
2D Angle Analysis and Spatio-Temporal Motion Analysis: For interpreting gesture semantics and tracking motion trajectories.
OpenCV and PyTorch: Used for real-time image processing and deep learning inference.

Challenges we ran into

Some of the main challenges included:

Ensuring accurate hand gesture detection and tracking under various lighting conditions and backgrounds.
Integrating the complex gesture semantics with motion trajectory analysis to enable intuitive and responsive interactions.
Optimizing the system to run efficiently on resource-constrained devices without compromising performance.

Accomplishments that we're proud of

We are proud of successfully creating a system that can detect and interpret hand gestures with high accuracy and providing a valuable tool for educational purposes and for assisting the visually impaired.

What we learned

We gained insights into optimizing deep learning models for real-time applications, handling the nuances of gesture detection, and the importance of user-friendly interactions in AI systems.

What's next for BeVIS

Future plans include:

Expanding the range of detectable gestures and enhancing the system’s ability to understand more complex interactions.
Improving the AI’s contextual understanding to provide more detailed and contextually relevant explanations.
Developing a portable hardware solution for BeVIS to make it more accessible and user-friendly.

Built With

python
pytorcg

Submitted to

HackTX 2024

Created by

I worked on the GPT-4 Integration and the FastAPI backend.

Mason Melead
I optimized YOLOv5 for hand detection, trained ResNet50 for keypoint extraction, and implemented IOU-based tracking with 2D angle and spatio-temporal motion analysis. Using cuDNN, I accelerated performance and developed BeVIS on ESP32-CAM, ensuring real-time, efficient hand gesture recognition on a portable platform.

Bin (Jakie) Lian
dora-4

Updates

Bin (Jakie) Lian started this project — Nov 03, 2024 11:59 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.