Back at university, we had a friend who couldn't see the professor's notes during lectures so he needed to take pictures, and often fell behind in class. But what if there was a way to take the pictures and convert them straight to notes? Introducing NoteSense, the fast and easy way to digitize captured photos and audio into ready to use typed up notes.
What it does 🤔
NoteSense is a notes accessibility app that allows users to create notes based on images or audio snippets. Through harnessing technologies such as speech recognition and optical character recognition (OCR) users who have hearing deficits or vision impairment can create notes in a format they can access quickly and conveniently! Our platform quickly converts their image or audio they took from their mobile device to a PDF that is sent to their email! This way users can quickly stay on track during their lectures and not feel behind or at a disadvantage compared to their colleagues. Users also have the ability to view their generated PDFs on their device for quick viewing as well!
How we built it 🖥️
When building out NotesSense, we chose 3 key design principles to help ensure our product meets the design challenge of accessibility! Simplicity, Elegance and Scalability.
We wanted NotesSense to be simple to design, upgrade and debug. This led to us harnessing the lightweight framework of Flask and the magic of Python to design our backend infrastructure. To ensure our platform is scalable and efficient we harnessed the Google Cloud Platform to perform both our speech and image conversions harnessing its vision and speech api respectively. Using GCP as our backbone allowed our product to be efficient and responsive! We then used various python libraries to create our email and file conversion services, enabling us to harness the output from GCP to rapidly send pdfs of their notes to our users' emails!
To create an elegant and user-friendly experience we leveraged React Native and various design libraries to present our users with a new, accessible platform to create notes for individuals who may have hearing and/or seeing difficulties. React Native also worked seamlessly with our Flask backend and our third party APIs. This integration also allowed for a concurrent development stream for both our front end and back end teams.
Challenges we ran into 🔧
Throughout the course of this hackathon, we faced a variety of challenges before producing our final product. Issues with PDF reading and writing, audio conversion, and cross platform compatibility were the most notable of the bunch.
Since this was our first time manipulating a phone’s filesystem using React Native, we had a few hiccups during the development of the PDF code to write to and read from the phone’s document directory. More specifically, we were confused as to how to create and populate a file with a stream of data of a PDF file type in the local filesystem. After some thorough research, we discovered that we could encode our data in a Base64 format and asynchronously write the string to a file in the local filesystem. Consequently, we could read this same file asynchronously and decode the Base64 to display the PDF in the app.
Audio conversion was initially a big issue as both the frontend and backend did not have in-built or 3rd-party library functionality to convert between two specific file types that we believed we could not avoid. However, we later found that the client-side recording can be saved as a file type that was compatible with the Google Cloud Platform’s speech to text API.
Cross platform compatibility was an issue that arose in multiple places throughout the course of development. Some UI elements would appear and behave differently on different operating systems. Fortunately, we had the ability to test on both Android and IOS devices. Therefore, we were able to pinpoint the cross platform issues and fix them by adding conditionals to change UI based on what platform the app is running on.
Although we had to face various obstacles during the development of our app, we were able to overcome every single one of them and created a functional application with our desired outcome.
What we learned 🤓
Hack the 6ix really helped develop our hard and soft skills. For starters, for many of us it was our first time using Google Cloud platform and other various google services! Learning GCP in a high pressure and fast paced environment was definitely a great and unique experience. This was also the first hackathon where we targeted a specific challenge (accessibility and GCP) and designed a product accordingly. As a result, this event enabled us to hone both our technical and design skills to create a product to help solve a specific problem. Furthermore, we also learned how to deal with file conversions in both Python and in React Native!
Participating in this hackathon in a virtual setting definitely tested our team work and communication skills. We collaborated through Discord to coordinate our issues and track progress, as well as play music on our server to keep our morale at a high :).
What's next for NoteSense 🏃♂️
For the future we have many ideas to improve the accessibility and scalability of NoteSense. A feature we weren’t able to develop currently but are planning is to improve our image recognition to handle detailed diagrams and drawings. Diagrams often paint a better picture and improve one's understanding which is something we would want to take advantage of in Note Sense. Due to the limitations of Google Cloud Platform currently our speech to text functionality is limited to only 60 seconds. This is fine for shorter recordings, however in the future we would want to look into options that allow for longer audio files for the purpose of recording live lectures, meetings or calls. Another feature that we would like to explore is using video to not only convert the audio into notes, but also capture any visual aid provided in the video to enhance the PDF notes we create.