Inspiration
Urban drivers spend an average of 17 hours a year searching for parking, leading to increased emissions and city congestion. While "smart cities" typically attempt to solve this with expensive, static ground sensors, we realized the solution is already moving on the road: Dashcams.
We were inspired to use the "eyes" of every vehicle to create a living, breathing map of the city’s curbside. Our goal was to turn every driver into a contributor to a smarter urban mesh, moving away from centralized infrastructure toward a crowdsourced, intelligent network.
How we built it
ParkMesh is built on a stack that prioritizes real-time multimodal reasoning over simple object detection:
- The Brain: We utilized Gemini 3.0 Flash for its native multimodal video understanding. Unlike traditional models, it provides superior spatial reasoning out of the box.
The Intelligence: We developed a custom "Instructional Mesh" within Gemini's prompt. This allows the system to handle complex spatial geometry and regulatory compliance. For instance, the model calculates:
The Interface: A premium Vue 3 dashboard integrated with the Google Maps API, featuring synchronized video-to-map playback and automatic GPS interpolation.
Zero-Shot Compliance: We leveraged Gemini’s reasoning to identify complex rules (like temporary signs or curb colors) without training a single custom detection model.
Challenges we faced
The biggest hurdle was "Spatial Precision." Initially, the model could identify an empty space but couldn't reliably tell if a car would actually fit or if a hidden fire hydrant made it illegal.
We solved this by implementing a telemetry-overlay system. By feeding Gemini 3 context about the vehicle's movement and scale, we allowed the model to perform much more accurate spatial calculations, transforming a 2D video feed into a 3D-aware understanding of the street.
What we learned
We learned that the future of Computer Vision is not just about "labels" (drawing boxes around cars), but about "reasoning" (understanding why a spot is available). Working with Gemini 3.0 Flash showed us that multimodal LLMs can handle 3D spatial tasks that previously required expensive LiDAR or complex depth-sensing hardware.
What's next? The "Mesh" in ParkMesh represents a hyper-connected, live urban grid. Our next milestone is direct integration with cloud-connected dashcams for live, low-latency inference, turning every connected vehicle into a live sensor that updates the city map in real-time.
Built With
- css3
- fastapi
- ffmpeg
- gemini-3.0-flash
- google-maps
- html5
- javascript
- python
- vite
- vue-3
Log in or sign up for Devpost to join the conversation.