We drew inspiration from the mutual understanding we had with Kate in regards to dining halls. Being of college age, we immediately understood how important it is to be able to sit where one wants. Sharing meals with people is a large part of blossoming socially in college, so we set out to try to make it easier for the visually impaired to tackle as well.
What it does
CafeCam is an intuitive and easy to use contextual awareness tool.
The user visits our website and uploads a photo of their surroundings. CafeCam analyzes the image and informs the user through audio feedback if a table is full or empty and also recognizes the presence of any of their friends.
Our aim is to make it more comfortable for the visually impaired to interact with their environment.
How we built it
CafeCam was built with the Google Cloud Platform as its backbone, with a Google App Engine instance running the web application using Python, Flask to run HTML5, Bootstrap to make the website responsive, and the Clarifai API to perform image analysis. We trained a model using Clarifai to create a model specific to the user, allowing it to recognize their friends, and recognize the availability of a table.
Challenges we ran into
We ran into problems while trying to capture frames from the livestream of the camera, to allow the Clarifai API to provide live contextual awareness. We were also unable to autoplay the results of the prediction model, due to limitations the mobile web browsers place on autoplaying content.
Accomplishments that we're proud of
While there are apps out there that will verbally recite objects in the frame, FoodCam is able to recognize people specific to the user. When given a picture the program is able to identify people the user is familiar with, and also differentiate empty from full tables.
What we learned
What's next for CafeCam
CafeCam would eventually use Faces from Apple Photos to train the Clarifai API. Eventually, we could create a pair of goggles with a camera built-in, that would provide the user with contextual information relevant to them and their surroundings in real time.