Inspiration
We really wanted to build something that would interpret data from images and we combined this idea with this year's UofTHacks theme which is exploration.
What it does
This application invites users to either upload an image or use their device's camera to take an image. Then they will be able to tap on the objects in their pictures and see the definitions of those objects.
How we built it
We use google cloud vision API to detect multiple objects in the provided photo. We access this API via google’s python SDK for cloud vision API. We used the cohere API use case of text generation. Specifically, we use their python SDK to make requests and get information by using the provided information such as prompts, data models and token numbers. By using the pre-trained data model, we can alter the prompt to further tailor the API's responses for our uses. We provide prompts similar to mimicking a conversation where person 1 is the word and person 2 is the definition. This allowed Cohere AI to better predict what definition(person 2) will be given for the word(person 1).
Challenges we ran into
Our biggest challenge was merging our separate files into one file that still maintained the integrity of all the features that we implemented in our separate files. Another issue we ran into was importing because the imports cv2 and cohere would work on some of our member's laptops but not others.
Accomplishments that we're proud of
We are happy that we were able to use the CohereAI API successfully and modify the API's responses for our purposes. Also, we are pleased that we were able to master PYQT6, which is a new UI for us. We are also satisfied with our ability to employ Cloud Vision API effectively to identify items in pictures. Last but not least, we are glad that we were able to link the front-end and back-end components of our project.
What we learned
We learned how to use PYQT6, CohereAI API, and Google's Cloud Vision API.
What's next for Retina
What's next for Retina is improving the accessibility of our application by adding a feature where the definition is spoken out loud. In future updates, the runtime of Retina could also be implemented to be faster. The UI design for Retina could also be enhanced in the future. There could be other features such as making the definitions of the objects in the image downloadable and also allowing users to add their own definitions of objects.
Built With
- cloudvision
- co:here
- github
- pyqt6
- python
- visual-studio
Log in or sign up for Devpost to join the conversation.