LUNA | Devpost

Camera module of LUNA
Internals of LUNA

Inspiration

According to IAPB, International Agency for Preventing Blindness, 1.1 billion people globally were living with vision loss in 2020. Among those 1.1 billion, 350 million people were at least moderate-to-severely blind. That is more than 12% of the world! In addition, according to this research study (https://pmc.ncbi.nlm.nih.gov/articles/PMC7721280/), blindness is one of the most feared health problems, with a higher proportion fearing it than cancer and paralysis. Although two of our teammates only have minor vision issues like astigmatism, it made us recognize how life-altering a full visual impairment can be. Inspired by this realization, we created Luna—a device that scans and narrates the world in real time for people who are blind.

What it does

By combining advanced multimodal Optical Character Recognition (OCR) machine-learning models, transformer-based text-to-speech technology, and minimal, intuitive hardware, LUNA provides an enriched sensory context. Ultimately, the instant source of information empowers users to confidently understand their surroundings and bring it to life. LUNA focuses on providing an accurate and rich description of the environment, rather than trying to provide navigation or safety advice. While we want to fully capitalize on the immense potential of machine learning models, we also acknowledge its limitations. By relying less on its spatial reasoning and critical navigation, we reduce any potential harm that may be caused from any inevitable errors. With that being said, the model performs excellently at scene descriptions and object recognition, which we take advantage of to the fullest extent.

How we built it

We used the Xiao ESP32S3 Sense Microcontroller development board that is put into a casing that can clip onto eyewear. This is the main LUNA device. The Microcontroller is connected to a camera and a speaker to be able to take images and provide auditory narration. We used Platform IO with C++ to code the Microcontroller and send HTTP POST and GET requests between the microcontroller and our backend Express JS server. These requests are responsible for transferring images from the microcontroller to the backend and transcribed audio to the LUNA device. We use Anthropic’s Claude 3.5 Sonnet, given its superior OCR performance, to generate the text descriptions of our images. The text response is then piped to Google Cloud Text to Speech to get the audio transcriptions to be played back locally on LUNA.

Challenges we ran into

Our microcontroller uses Arduino libraries with Arduino C++. A big problem that we had throughout the whole weekend was finding libraries that would fit the functions we needed. The libraries that were available to us were often out of date, or poorly documented. This made debugging difficult and development slow. We swapped over to Platform IO from the Arduino IDE because it allowed us to save time during code compilation and upload to the microcontroller. The libraries however were still the same and we had to consistently deal with issues, like finding a good service for HTTPS instead of HTTP. At some point, we had our backend hosted on a dedicated hosted server but due to slow demos as a result of long API call times, we decided to scrap the server.

Accomplishments that we're proud of

The process of integrating the ESP32S3 with complex peripherals such as a high-definition camera, I2S DAC, and networking proved to be much more challenging than expected. All the challenges we outlined, mixed with the seemingly random error messages that we would constantly receive made the process grueling. Thus, the final result felt a lot more satisfying in the end. We are also really proud of the actual hardware of the LUNA. A highly iterative prototyping process involving CAD and a 3D printer resulted in a compact, polished package that can be comfortably worn by the user. Finally, we are also proud of the seamless integration of the two separate LM models that we include in our final project that work so well together.

What we learned

Overall, we were impressed at the current state of OCR models and their capabilities. It was excellent at object recognition and showed promise in spatial reasoning under limited information.

What's next for LUNA

Although the current LUNA is very sleek and streamlined, its footprint could be further optimized by replacing the loudspeaker with a smaller model. It would also be a very nice feature to allow connection to the device via bluetooth so that it may be connected to earbuds, so that it does not exclusively rely on the loudspeaker. Finally, we would like to allow voice activation for certain commands to make the user experience more fluent.

Built With

Submitted to

QHacks 2025
- Winner First Place: PS5 Slim Digital
- Winner Nord Security Bundle

Created by

I worked on exploring various platform paths for LUNA. Several different ones were considered, including a small Single Board Computer, Radxa Zero 3W, Raspberry Pi 4, and of course, the ESP32 S3 Xiao board. As the ESP32 S3 is a microcontroller rather than a microprocessor, it was the most difficult to get working seamlessly. However, through collaboration and hard work, we were able to get the ESP working, which allowed us to get the most compact and least intrusive design possible. Additionally, I developed some of the embedded software by implementing haptic feedback to the users using a multi-threaded scheme and enabling deep-sleep for long runtimes. I also contributed much of the electrical design and implementation, as well as the CAD model for the enclosure.

Alex Cho
I worked on the backend webserver built on express.js the microcontroller communicates with. The server makes the necessary API calls to perform the OCR and text-to-speech. I also chipped in on programming the microcontroller.

Brian Cho
In a recent project, I worked with Alex to create a compact and functional prototype integrating CAD design, 3D printing, and miniature computing using a Radxa Zero 3W and a Raspberry Pi camera. The initial prototype exceeded size expectations, prompting us to redesign and iterate numerous times to balance size and functionality. We made several adjustments to the CAD model, refined our 3D printing approach, and performed precise soldering and component redesigns to reduce the model's size. This process highlighted the importance of flexibility and iterative development in achieving a prototype that not only met my criteria for compactness but also enhanced performance and reliability. Through continuous refinement, we successfully developed a model that demonstrates the potential for advanced technologies in compact spaces, underscoring the dynamic nature of hardware development.

Robert Zhang
Valareza Arezehgar