Nemo | Devpost

Inspiration: The inspiration behind Nemo was to create a novel embedded computer vision system that could ingest everything the user reads. Nemo can track particular finger pointing position to conversationally discuss what the user is referring to. The goal was to create a seamless and intuitive way for users to interact with documents and easily retrieve information when needed. The user talks with Nemo naturally through large language models in a prompt architecture we designed.

What it does: Nemo is an innovative embedded computer vision system that utilizes a combination of image processing and machine learning techniques to track the position of the user's finger as they read documents. Nemo can capture and store the document text that the user is currently reading, as well as the user's finger position, in real-time. This information is then processed and stored in a database for later retrieval.

When the user needs to recall information from a document they have read in the past, they can simply touch the corresponding location on the document with their finger, and Nemo will retrieve the stored information associated with that location. This allows users to quickly and easily access information from previously read documents without the need for manual annotations or bookmarks.

How we built it: To build Nemo, we used a combination of computer vision techniques and machine learning algorithms. We utilized image processing libraries and techniques to capture and track the user's finger position in real-time as they read documents. We also used optical character recognition (OCR) to extract the text from the documents, which was then processed and stored in a database for retrieval.

For machine learning, we used a convolutional neural network (CNN) to train a finger detection model. The model was trained on a large dataset of finger images in various positions and orientations, allowing it to accurately track the user's finger position even in different lighting conditions and angles.

Challenges we ran into: Building Nemo presented several challenges, including the accuracy and robustness of finger tracking in different environments and lighting conditions. We had to fine-tune the finger detection model to ensure accurate and reliable finger tracking in real-time. Additionally, integrating the OCR component with the finger tracking system required careful coordination and synchronization to capture and store the correct text associated with the user's finger position.

Accomplishments that we're proud of: We are proud of the accuracy and reliability of Nemo in tracking finger position and capturing text from documents. Our finger detection model achieved high accuracy even in challenging conditions, and our OCR component was able to accurately extract text from a variety of document types. We are also proud of the seamless and intuitive user experience we were able to create, allowing users to easily store and retrieve information from documents they have read in the past.

What we learned: Building Nemo was a valuable learning experience. We gained expertise in computer vision techniques, including image processing and machine learning, as well as integrating different components to create a cohesive system. We also learned about the challenges of working with embedded systems and the importance of fine-tuning models for different environments and use cases.

What's next for Nemo: There are several potential future directions for Nemo. One potential direction is to further improve the accuracy and robustness of finger tracking, including expanding the system's capability to track multiple fingers or even entire hand gestures. Another direction is to explore additional applications for Nemo, such as in educational settings for tracking reading progress or in professional settings for document annotation and collaboration. We also plan to continue refining and expanding the OCR component to support a wider range of document types and languages. Overall, the future for Nemo looks promising, with potential for further advancements and applications in the field of computer vision and document processing.