Inspiration

It's common for language learners to spend a lot of time learning things they never encounter for their entire life, what if we are given the information of the learner's ambient environment? Could we leverage the learning task by taking advantage of that?

What it does

It uses frontal camera of the AR glasses attached to an android cell phone to take pictures periodically and send it to backend server for OCR detection and object detection. It then allows the user to make query to the open source LLM to generate a custom word list of the day for learning. The user can then query the openAI GPT4 API for exmplary usage of those words, in rather decorated language.

How we built it

We used open source paddleOCR network for text extraction from the image, and facebook DETR network for object detection and open source Falcon-7B hosted on huggingface cloud for word list generation. The android unity app communicates with the local backend server running ubuntu 22.04 with Unity TCP ros bridge.

Challenges we ran into

Integrate LLM into API calls, lack of computational resourced locally and in the cloud

Accomplishments that we're proud of

We are able to achieve many milestones in this project, for example the possibility sending picture frames over ros bridge from unity device side to the backend server, and get the detection results back. We also aimed to minimize the calls to non-free APIs such as GPT4, and leveraging the many tasks with open source solutions

What we learned

having a clear objective from the start really helps

What's next for MemEye

The next stage would be finetuning the Falcon model against GPT-4 responses regarding word use example, so that API calls that incur a fee would be further reduced. Further, we decide to distill the knowledge of Falcon-7B with even a smaller LLM so inference time can be sped up

Built With

Share this project:

Updates