Inspiration
Admit it, we were all excited when Siri and Alexa first debuted over a decade ago. These technologies have since failed to fulfill their promise of supreme digital assistants for one main reason, mobile phones can only do so much while stored in your pocket or purse. Instead, our team built lightweight AR headphones (think Beats by Dr. Dre meets Google Glass) that are differentiated from similar products through the potential of a retractable display and EEG sensors that make it a high bandwidth device while also flexible for use on the go. Imagine having the ability to walk through life pointing, and simply looking at objects; instantly receiving information and starting a conversation about something.
What it does
Our model is built to allow users to ask questions and have conversations about things they see in the real world with a LLM-based assistant.
How we built it
Hardware: We used Raspberry Pi, Latte Panda, and NeuroSky Mindwave EEG sensors powered by off-the-shelf components to assemble two clunky headset prototypes. Software: For parsing visual data we experimented with using open-source multi-modal models, object detection modes, and OCR models. For the sake of time, we chose OCR, first using Tesseract, then easyocr. We then passed the parsed text and associated metadata to GPT-4 as context for the user-prompted query, which was then detected and activated by whisper running locally using faster_whisper and an int8 quantized model, and later using the whisper API. We finally played back the audio using gTTS.
Challenges we ran into
By far worst of all, the SDKs interfacing with our EEG sensors were extremely outdated (6+ years) and we had to troubleshoot their entire source code, rewriting most of it from scratch. This became our bottleneck to retrieving useful information about our brain wave data. We then pivoted to using audio and visual data to communicate with the assistant. Other issues include Shipping cross-compatible code. Once we had working code on our Macs, installing it on Raspberry Pi's Ubuntu and LattePanda's Windows environments proved time-consuming. Battery dying and memory constraints of the raspberry pi.
Accomplishments that we're proud of
We used the GPT assistant model to facilitate multimodal input. Happy about our preliminary attempt at merging more domains of information. Quick implementation of techniques such as OCR detection, voice recognition, prompting engineering, etc. to provide solutions for our users. Hackers, mentors, and judges stopped by our table to ask questions and express their excitement about our product.
What we learned
We learned that brainwave data via EGG was severely limited by our older, outdated hardware (NeuroSky Mindwave) used to collect the data. This is fine as we aim to launch a friendly alternative to Neuralink's surgical implant technique. Although our headset may not have advanced brain wave recognition capabilities today, we will invest in this R&D space until we become the industry standard. Rewriting the manufacturer's 'Getting Started' code was a good exercise that taught us how the brain waves are parsed into data we can analyze (we found 8 data streams of ouput).
What's next for ElisXR
For now, we will ship our augmented reality headphones with AI capabilities centered around visual input and voice interaction. We will continue tinkering with newer BCI headsets and learn more about implementing more ways to interact with an assistant, both input and output. As well as learning about state-of-the-art hardware methods and capabilities in order to establish ourselves at the forefront of consumer-friendly BCI.
Built With
- lattepanda
- mindwave
- ocr
- openai
- opencv
- raspberry-pi
- tesseract
- whisper
Log in or sign up for Devpost to join the conversation.