Inspiration
What it does
How we built it
Challenges we ran into
Accomplishments that we're proud of
What we learned
What's next for SmartScanner
Parking signs in SF are crucial for complaince with parking regualtions which can be tedious. The problem is made worse by the fact that often times these parking meters ar difficult to read and decipher especially for tourists or recent immigrants. Another aspect thats often overlooked is that even if parking is permitted visiting tourists often have no idea of the crime in the area and end up parking in hotspots which are infamous for car break ins.
As the user scans the parking board, the app tells the user that according to the regulations mentioned on the parking plate, is it okay for them to park there. It then proceeds to show the users a summary of crimes commited in the last month in the location they scanned the parking meter to give them an idea of whether it would be safe for them to park there.
The app backend is served through flask. This as an API which accepts the location and the parking meter image. It then runs the image through 3 Optical Character Recognition models, coalates their outputs and appends it to a custom prompt fine tuned to get the most accurate response and feeds it to an LLM api to get an answer of whether user can park and an explanation for it if they choose to see so. It also takes the location data and by plugging into the SF crime data brings the user a summary of crimes commited at the spot where they scanned the image. This is served to a react native front end.
We played around with different OCR models but found they all lacked in some manner around detecting the size mismathes in font often seen on parking boards and account for the mistakes people might make while taking the picture. So we decided to feed it to multiple models and prompt train our LLM to come to a consensus based on these multiple responses. To get the crime stats properly around a definite point on a fixed radius also had us thinking about how to calculate the coverage. On the front end it was critical to get the permissions working.
We went through a slew of OCR models and LLM prompt tuning cycles before arriving to a point where even multiple boards with varied instructions in different fonts was able to be interpreted by our API.
We learnt that historical data churning and processing a tedious process and that like parking boards, most of the legacy design choices made for the public utilities were made at a time when these technologies were not around so have varied affects. For eg the mismatched font size was on boards was a challenge.
Log in or sign up for Devpost to join the conversation.