OvenBot

Inspiration

"Appliances all around us have digital displays. Most of the time, even though I can’t see the display, I can memorize where the buttons are or use tactile stickers as a reference. But now more of these displays are becoming context dependent: the display on my oven looks different when I’m setting the timer than it does when I’m starting to cook a meal.

These changing displays make it difficult for me to use my oven, stove and crock pot on my own, like I used to. I wish there was an alternative to having a sighted friend on hand to read the options out loud!"

What it does

We are creating an app intended for mobile use that allows visually impaired users to interact more efficiently with modern home appliances. This is done through a front-end web app with speech input, and an OCR image-to-text input and text-to-speech output back end. This app will map out and vocalize where the buttons are on the display, and the text in the readout.

How I built it

A front end web app with speech input that calls the python script.
Getting the image and detecting text in the image using the Google Cloud OCR API
OpenCV and image processing is used to detect color at the fingertip
Digital display of appliance is mapped to a grid as a reference coordinate system for the fingertip
Uses IBM Watson for text-to-speech output to user

Challenges I ran into

-Mobile device orientation -Method of mapping appliance button location -Lighting; glare, reflection -Connecting the front end and back end: design architecture

Accomplishments that I'm proud of

-Mapping and navigation
-Integrating various APIs and code -Speech verification of relative location -UI implementation -Overall vision for the idea even though we didn't get to accomplish all the tasks we set in mind

What I learned

Architecture strategy is key for integrating front end and back end
Design with compensating behavior in mind

What's next for OvenBot

Machine learning algorithm for fingertip detection, remove external aid
Field of view
Converting project into a mobile app
More natural conversational speech output
Speech input

Built With

Submitted to

PerkinsHacks 2018
- Winner Making a Meal Challenge Prize

Created by

I worked on the front-end portion of the application as well as creating an interface in which the front-end and back-end features can interact. I used HTML, CSS, JavaScript, JQuery, JSON, and python! It my first time/low experience with the latter half of that list, so it was a very knowledgeable experience!

Pratyusha Karnati
I worked on implementing the Text to Speech part of the project by using the IBM-WATSON TTS API.
I also worked on some of the front-end parts of the project which made me learn a ton about Django and how to interface between a python script and a web page.

Winston Moh T.
Computer engineering student with Minor in Networking and Information Security
I worked on developing the computer vision and image recognition interface, detecting the finger tip (using color detection), integrated the backend modules together, and implemented the beep generation module.
Tested Microsoft Azure Computer vision API to check signal-to-noise ratio.

Karan Tyagi
I created the design architecture and worked on developing the computer vision and image recognition interface & integrating the backend modules together. I used OpenCV and Google Cloud OCR API for the first Time. I learned a ton about image processing, mapping and GPS.

HITESH VERMA
Machine Learning Enthusiast | Computer Science Graduate student at Northeastern University
I worked on building image extraction, processing algorithms and mapping text sections in the image, preprocessing tasks on image. Also worked on GPS, beeping GPS

sairohith07
First time seriously coding (outside of MATLAB). Worked on the front end while learning how to use HTML, CSS and JavaScript. Researched depth mapping in python. Parsed feedback from clients that better established the problem statement, and helped user-center the output of the text-to-speech back end.

MaxZin8