We wanted to tackle a difficult technical problems: turn gas station prices, a very variable display by individual companies and locations, into price data using computer vision. We wanted to develop a tool to crowdsource gas prices without the difficulty of manually typing gas prices into tools.

What it does

Our code currently takes in an image of a gas station prices from a mobile phone app, and then stores that into a database in Firestore. Then, once it does that, it uses a Google Cloud Function to first pre-process the image and then use the Google Vision API to process the image, as well as further post processing within the Cloud Function. We finally return the data for display by the front-end.

How we built it

Front-end user interfaces were built using React-native templates and components, and tested on mobile with Expo for cross platform work. Back end functionality was built using Google Cloud Vision API for image analysis, Google Firebase for data storage and transfer, and Google Cloud Functions for further processing both pre and post.

Challenges we ran into

  • Engineering Challenge: Gas signs are extremely varied and the google vision API performs poorly
    • Intensity profiles can vary greatly per image
      • Sometimes the text and surrounding area is extremely bright like with LED signs in darker lighting conditions
      • Other times the text is dark, and the sky or something unrelated is bright
    • Varying colorations (red/green, white on blue, black on yellow)
    • Angles (text is skewed)
    • Occlusion (objects in foreground)
    • Noise from background
  • Pre-processing considerations to improve OCR
    • Localization (Recognizing where in the photo the sign is)
    • Deskewing (Applying an affine transformation to make the text read horizontal)
    • Binarization (setting the interest pixels to 1, and everything else to 0)
    • Edge Detection
  • Steps we took:
    • Grayscaling: Human vision is most sensitive to green and least sensitive to blue. Gas station price boards also frequently have prices in red, green, and white, so it makes sense to use a commonly used colorimetric grayscaling that weights the green and red channels most.
    • Tested: various image pre-processing techniques (gaussian smoothing, bilateral filter)
    • Binarization: Otsu thresholding or a simple constant threshold on the grayscaled image values worked on many images but notably strugged when provided images with unusualy skewed luminance profiles. Applying pre-processing techniques to reduce noise such as gaussian smoothing or bilateral filter (to preserve edges for downstream OCR but still removing noise) improved results, but operating on the image domain was proving to not be robust enough. So, we instead turned to looking at the gradient domain - i.e. looking at edges
    • Edge detection as a pre-processing step proved to work well with the google vision API. Canny edge detection algorithm can be used to binarize the image for edges, rather than high luminance pixels.
  • Post-processing:
    • Extract & Correct prices
    • Match prices to regular/plus/special & diesel, cash/credit etc.

Accomplishments that we're proud of

We, as an inexperienced team, managed to produce a final product that not only works, but is something we are proud of. It handles very difficult challenges, including in the real world testing.

What we learned

We came into this challenge as a group with only a moderate amount of computer vision knowledge and very limited knowledge in Google Cloud Platform and React. We as a group learned a lot about those technologies as well as the process of building such a product using tools like Git and Github.

What's next for Driving the Price

We hope to take this project to the next level, integrating a full backend to allow for distributed users to contribute to the platform and to truly achieve its goal of crowdsourcing data using fully native apps on Android and IOS. We want to also expand this

Built With

Share this project: