Inspiration
We were inspired to build this project because we wanted to build something that could improve accessibility for others. The daily interaction barriers faced by individuals with visual, hearing, and speech impairments are what inspired us to build our project. We wanted to bridge the gaps between communication and awareness for impaired people.
What it does
AccessLens is a web-based AR application that acts as an assistive layer over the real-world through camera. It runs directly in browser or mobile device. Key features that we have include: Live Speech-to-text Captions: Real-time captions for users hard of hearing Scene & Person Description: Audio narration of detected objects and identified people for visually impaired users AR Memory Face System: Face recognition to display customized summary and names of people automatically from past conversations Voice Command Interface: Hands free operation of application features Hand Command interface: Hands-on access to AR features by reading users hands
How we built it
Frontend: Utilized Vite and JavaScript to prioritize speed and performance across cross-browser compatibility. ML/Vision: We utilized open-source libraries running in the client side, such as: face-api.js for face recognition and COCO-SSD (Tensorflow.js) for object detection Backend & Automation: Implemented serverless architecture through Firebase Firestore to store user data and face embeddings. We used a Firebase Cloud Function to automically call OpenAI API to summarize recorded memories
Challenges we ran into
Some of the challenges we ran into: Cloud Function Paywall: I was utilizing firebase's free spark plan but it wouldn't allow me to call an external API like openAI. I tried to find a plan that didn't require me to upgrade to a premium plan, but ended realizing even if I used a different plan it charges by usage and would ultimately cost near $0 for the amount I would use it ML performance: Running multiple complex ML models eventually led to various performance issues, but we were able to solve this by implementing frame throttling, dynamic CPU fallback, and a persistence buffer for face regisration.
Accomplishments that we're proud of
We are proud of how we deployed a robust, secure, and automated backend through Cloud that summarized recordings using another LLM (OpenAI). It was quite challenging to set this up originally but we were eventually able to figure it out. We are also proud of how we utilized various ML libraries in our project that all tie together for a full application.
What we learned
We learned about how to manage and access databases through firebase and using serverless features. Building a scalable data model that can safely isolate user-specific data is also something valuable we learned.
What's next for AccessLens
Next up for AccessLens we would like to implement new features that could be useful to impaired people like reading scenes better or reading signs. Also making the UI look more AR related. Ideally, it would be great if we could implement this through meta glasses, as that is the end goal.
Built With
- authentication
- cloud-functions)
- coco-ssd
- css
- esbuild-ml-libraries:-mediapipe-hands
- face-api.js-(@vladmandic/face-api)-web-apis:-web-speech-api-(speech-recognition-&-synthesis)
- github-pages-databases:-firebase-firestore-(nosql)-external-apis:-openai-api-(gpt-3.5-turbo-for-memory-summarization)-platforms:-web-browsers
- html
- languages:-javascript-(es6+-modules)
- mediastream-api-(camera)-cloud-services:-firebase-(firestore
- node.js
- node.js-frameworks:-vite
- tensorflow.js
- vercel
Log in or sign up for Devpost to join the conversation.