Driving the Price

Inspiration

We wanted to tackle a difficult technical problems: turn gas station prices, a very variable display by individual companies and locations, into price data using computer vision. We wanted to develop a tool to crowdsource gas prices without the difficulty of manually typing gas prices into tools.

What it does

Our code currently takes in an image of a gas station prices from a mobile phone app, and then stores that into a database in Firestore. Then, once it does that, it uses a Google Cloud Function to first pre-process the image and then use the Google Vision API to process the image, as well as further post processing within the Cloud Function. We finally return the data for display by the front-end.

How we built it

Front-end user interfaces were built using React-native templates and components, and tested on mobile with Expo for cross platform work. Back end functionality was built using Google Cloud Vision API for image analysis, Google Firebase for data storage and transfer, and Google Cloud Functions for further processing both pre and post.

Challenges we ran into

Engineering Challenge: Gas signs are extremely varied and the google vision API performs poorly
- Intensity profiles can vary greatly per image
  - Sometimes the text and surrounding area is extremely bright like with LED signs in darker lighting conditions
  - Other times the text is dark, and the sky or something unrelated is bright
- Varying colorations (red/green, white on blue, black on yellow)
- Angles (text is skewed)
- Occlusion (objects in foreground)
- Noise from background
Pre-processing considerations to improve OCR
- Localization (Recognizing where in the photo the sign is)
- Deskewing (Applying an affine transformation to make the text read horizontal)
- Binarization (setting the interest pixels to 1, and everything else to 0)
- Edge Detection
Steps we took:
- Grayscaling: Human vision is most sensitive to green and least sensitive to blue. Gas station price boards also frequently have prices in red, green, and white, so it makes sense to use a commonly used colorimetric grayscaling that weights the green and red channels most.
- Tested: various image pre-processing techniques (gaussian smoothing, bilateral filter)
- Binarization: Otsu thresholding or a simple constant threshold on the grayscaled image values worked on many images but notably strugged when provided images with unusualy skewed luminance profiles. Applying pre-processing techniques to reduce noise such as gaussian smoothing or bilateral filter (to preserve edges for downstream OCR but still removing noise) improved results, but operating on the image domain was proving to not be robust enough. So, we instead turned to looking at the gradient domain - i.e. looking at edges
- Edge detection as a pre-processing step proved to work well with the google vision API. Canny edge detection algorithm can be used to binarize the image for edges, rather than high luminance pixels.
Post-processing:
- Extract & Correct prices
- Match prices to regular/plus/special & diesel, cash/credit etc.

Accomplishments that we're proud of

We, as an inexperienced team, managed to produce a final product that not only works, but is something we are proud of. It handles very difficult challenges, including in the real world testing.

What we learned

We came into this challenge as a group with only a moderate amount of computer vision knowledge and very limited knowledge in Google Cloud Platform and React. We as a group learned a lot about those technologies as well as the process of building such a product using tools like Git and Github.

What's next for Driving the Price

We hope to take this project to the next level, integrating a full backend to allow for distributed users to contribute to the platform and to truly achieve its goal of crowdsourcing data using fully native apps on Android and IOS. We want to also expand this

Built With

firebase
google-cloud-function
google-cloud-vision
javascript
npm
numpy
pillow
python
react-native
scipy

Submitted to

HackGT 6 Into the Rabbit Hole

Created by

I worked on the interconnect, connecting the front-end and back-end. I truly enjoyed working on this project, and learned a lot as a non-CS CompE major.

Arvind Srinivasan
I worked on all of the front end and user interfaces involved, writing the code for them in react native using the cross party platform Expo. As a non-CS major, I enjoyed learning React from the ground up and putting together a product that is clean and easy to use.

Varun Mosur
I worked on back-end image pre-processing with computer vision, and post-processing from the google cloud vision API output.

Akhil Goel
Neil Thistlethwaite

Updates

Arvind Srinivasan started this project — Oct 27, 2019 06:29 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.