Echo | Devpost

Echo logo and app design

Inspiration

There are approximately 253 million people who are legally blind across the world (US EEOC 2014). Walking sticks are a good tool, but cannot describe elements of the world in detail. Although other accommodations exist for the visually impaired, a lot of the currently available technology is either ineffective or inaccessible for certain demographics.

As the world progresses with technology, so too can applications that aid the physically disabled. Visual detection software on mobile devices can provide a cheap and accessible aid that helps the blind/severely impaired effectively navigate and understand their surroundings.

What it does

Echo is an iOS application that takes an image of your surroundings and echoes back (audibly reads out) the objects that are included in the image. It aids visually impaired individuals by informing them of their surroundings. The UI of the application is a camera with shutter audio when the user captures images.

How we built it

The logo was made using Figma. We built the app as a Flutter application with a serverless Firebase backend, implemented in conjunction with the Google Cloud APIs Cloud Vision and Cloud Text-to-Speech. Images taken with Echo are sent to the Google Cloud Vision API, and the resulting JSON objects are then parsed by a Firebase cloud function into a single string that is transformed into an mp3 file by the Google Cloud Text-to-Speech API. Then, the audio is read to the user by the Flutter application.

Challenges we ran into

Kennan: The greatest challenge was learning how to git and code on a team, especially in languages and API’s that I am unfamiliar with. I never really coded in a realistic project outside of an academic environment, so everything was new to me. Some problems we ran into was the team not knowing how to distribute work, especially at the beginning when we had little idea of how anything worked. Additionally, connecting the Google Vision API to Firebase proved to be really difficult. It was really hard to tell what our previous code was actually doing, making it hard to implement new code.

Nithin: The biggest challenge I ran into was learning to connect the various working parts of the project. It was a little difficult to connect the Google Firebase serverless functions with the cloud vision API and have that return the correct output audio to the client. Additionally, every test for the cloud functions that we wrote had to be deployed which took a long time to run.

Varun: I found learning how to use firebase challenging, particularly understanding how cloud functions are deployed with CLI. A particular aspect I spent a lot of time on was in sending images in the Firebase storage to the Cloud Vision API.

Francis: Learning how to integrate the Cloud Text-to-Speech API with our Firebase cloud storage proved to be extremely difficult, especially uploading the synthesized mp3 file to the database. In general, the interactions of all the different API’s and software that we had integrated in our project was definitely harder to manage than we thought. There were a lot of cloud function nuances that we had to overcome before any of our code was able to send information to each other, and often we would be waiting a really long time for firebase to deploy (which made debugging even more difficult).

Accomplishments we're proud of

Kennan: I am most proud of how we were able to code in languages we haven’t used before, and how well we adapted to a realistic, team-based coding environment rather than the typical academic one, which for some of us, is the only coding environment we’ve been exposed to. I’m also really proud of how we were able to integrate all of the software and API’s together as it took all four of us working together to debug many of these interactions.

Nithin: I am most proud of how we chose to break down a complex problem into more bite-sized problems and tackle them individually. Even when we faced complexities and unforeseen challenges, we adapted well and went back to the drawing board. A lot of the frameworks and libraries that we used were unfamiliar, and we did a good job of reading through the documentation and writing robust code.

Varun: I was most proud of how we chose to tackle this problem using concepts completely unfamiliar to us. I’m thrilled that our product has such a practical application.

Francis: I’m most proud of how we were able to manage the transfer of data between all of these applications. There were so many things being passed around, including JSON objects, audio files, and cloud function triggers. Having everything work at the end without any significant latency errors or overloading was an awesome achievement.

What we learned

Many of the APIs and languages that we used today we had little to no experience over (especially both of the Google Cloud services), so all of us gained knowledge from browsing through their documentation and experimenting with the various interfaces. Additionally, we did not use any one of these softwares in isolation, we also learned how to have them communicate with eachother to produce a proper flow of information.

This project also taught us how to code collaboratively to implement a project. We had to figure out how to distribute work and manage our schedules so that we could finish on time and create a functioning product.

What's next for Echo

Our next step is to implement Natural Language Processing so that the image can be described in a natural way rather than the simple list-like sentence that the audio translation is now. We could also improve the descriptions of the objects through factors like the shades of color and the size of objects. Finally, adding some sort of relevance/weight calculation that can be associated with each detectable object has the potential to return only the most significant parts of the image.

Built With

cloud-text-to-speech
cloud-vision
figma
firebase
flutter
google-cloud
node.js

Submitted to

DubHacks '21

Created by

I worked primarily on the back-end of our product, implementing Firebase Cloud Functions for the Cloud Text-to-Speech API. I also contributed to the design and function of the Echo UI.

Francis Peng
I worked on the backend of the product in firebase and was responsible for sending images from the Firebase storage to the Cloud Vision API. I also was involved in designing our logo.

Varun Agrawal
I worked on the backend, helping connect the different API's and software together. I extracted information from the Google Cloud Vision API and parsed it into a string that would be read by the text-to-speech API.

Kennan Gaibel
I worked on the frontend and the backend and helped connect the Backend APIs with the frontend. I also helped with the user flow of uploading the image to the Cloud Vision API and downloading the parsed audio file.

Nithin Srinivasan