Inspiration
This project was inspired by the need for efficient and low-cost object detection systems in real-world industrial scenarios, such as sorting and identifying screws and nuts. I wanted to explore how edge AI could replace traditional cloud-based solutions and run entirely on low-power hardware.
What it does
This project is a real-time embedded vision system based on ESP32-S3. It detects screws and nuts directly on-device using an onboard camera. The system runs independently without a computer and only requires a power supply.
How I built it
I started by collecting my own dataset using the ESP32-S3 camera and manually labeling images of screws and nuts. Initially, I trained a lightweight classification model using Edge Impulse, and later moved to object detection to support real-time bounding box detection.
After training, I deployed the model to the ESP32-S3 using the Arduino development environment. I integrated the camera module for image capture and the onboard display for real-time visualization. The system processes frames continuously, performs on-device inference, and displays detection results with bounding boxes and labels.
Challenges I ran into
One of the biggest challenges was working within the hardware limitations of the ESP32-S3, including limited memory and processing power. I had to carefully balance model complexity and performance to ensure it could run in real time.
Another major issue was camera image quality. The captured images often appeared washed out or had inconsistent colors due to automatic exposure and white balance settings. I spent a significant amount of time tuning camera parameters such as brightness, contrast, saturation, and exposure to achieve stable input for the model.
In addition, since the deployment was done using the Arduino environment rather than ESP-IDF, the camera pipeline relied on JPEG compression, which reduced image quality. This sometimes caused objects to lose detail and be partially recognized as background, negatively affecting detection accuracy.
I also encountered inconsistencies between training data and real-world inference. The model performed well during training but struggled in live conditions due to differences in lighting, background clutter, and object orientation. To address this, I collected additional data under different lighting conditions and improved the dataset diversity.
Furthermore, as the system was designed for standalone operation, I chose to display detection results directly on the device’s built-in screen instead of developing a separate computer-based UI. While this approach improved portability and independence, it also introduced additional constraints in terms of rendering performance and visualization flexibility.
Additionally, integrating the display with real-time inference introduced performance challenges. Higher image quality improved accuracy but reduced frame rate, so I had to find a balance between visual quality and system responsiveness.
What I learned
Through this project, I learned how to deploy machine learning models on microcontrollers and optimize them for edge devices. I gained hands-on experience with Edge Impulse, Arduino-based deployment, and embedded system constraints. I also learned how important data quality and real-world testing are for improving model robustness.
What's next
Future work will expand support for more object categories, improve robustness under varying lighting conditions, and further optimize real-time performance. In addition, I plan to incorporate object size estimation to enhance its applicability in real-world industrial scenarios.
Log in or sign up for Devpost to join the conversation.