Inspiration

We built this project after noticing a consistent safety and usability gap in everyday navigation: when you're driving alone, changing a route often requires typing and tapping through multiple screens. Even a small change—adding a coffee stop, fixing a destination, or inserting a waypoint—pulls attention away from the road. Google Maps and Apple Maps are powerful, but route editing is still largely text-driven, which creates friction and risk in real driving scenarios. This problem became very concrete to us in a real, everyday situation. Imagine you're driving from your home in New Jersey to Manhattan. The fastest default route often sends you through a nearby tunnel. But that tunnel is expensive, and you know a cheaper option exists if you cross the George Washington Bridge instead. In theory, this should be simple: just add the George Washington Bridge as a stop. In practice, it isn't. When you try to add the George Washington Bridge in Google Maps, the bridge is treated as a normal stop pinned to a single latitude and longitude. Because that pin often sits on the opposite traffic direction, the generated route becomes nonsensical: the car is instructed to go up and down the bridge repeatedly, backtrack, or cross it in the wrong direction. The result is a route that is:

  • longer than before
  • more confusing
  • more expensive
  • and completely unusable while driving Fixing this requires staring at the screen, dragging pins, and manually correcting the route—exactly the kind of interaction that should not happen behind the wheel. This experience highlighted two core problems that existing navigation apps do not solve:
  • Drivers need to add and adjust routes hands-free, using voice.
  • Not all "locations" should be treated as stops. Some, like bridges and tunnels, should act as pass-through constraints that shape the route, not destinations that force rerouting. Our goal was to build a navigation system that understands these realities—one where you can simply say what you want: "Take the George Washington Bridge instead, not the tunnel." …and get a clean, accurate, and cost-aware route, without typing, pin-dragging, or fighting the map. In short: No typing. Just speak, verify, and drive. This motivation directly shaped our design decisions—from voice-first route editing, to structured intent parsing, to the introduction of via/pass-through waypoints that produce stable, real-world routes where existing apps fail.

What it does

Voice-text Navigation Web Application is a voice-driven navigation web app that turns natural speech into validated multi-stop routes.

With a single voice command, users can:

  • Start navigation by speaking an origin and a destination
  • Add, remove, or reorder stops before, between, or after existing stops
  • Use different location styles naturally:
    • Full addresses (e.g., “40 Wickley Ave, Piscataway, NJ”)
    • Landmarks / POIs (e.g., “Times Square”)
    • Partial/ambiguous inputs (e.g., “Main Street Boston”)
    • Relative requests (e.g., “the nearest coffee shop”)
  • Get a clean multi-stop route shown on an interactive map
  • Request coffee stops by voice and receive ranked recommendations
  • Send the finalized route by email as a one-tap link that opens directly in Google Maps

How we built it

Voice → Structured Routing (Gemini 3)

Instead of using Gemini as a chatbot, we used it as a routing intent engine. After speech-to-text, we send the transcript to Gemini 3 and ask it to output structured JSON for each stop:

  • type: full_address | landmark | partial | relative
  • parsed components (street, city, state, etc.) when available
  • searchQuery: a map-optimized query string
  • confidence: a score in ([0,1])
  • via: whether an intermediate stop is a pass-through waypoint (bridges/tunnels)

This structured output is the core of the system: it enables downstream validation and reliable routing, not just “LLM text.”

Address Validation + Geocoding + Places (Conditional Pipeline)

We implemented a pattern-aware resolution layer rather than one fixed API call sequence:

  • Full address path: Address Validation → Geocoding cross-check → confirm if ambiguous
  • POI path: Places → Geocoding cross-check → confirm if mismatched
  • Partial/relative path: Places + Geocoding with location bias → confirm if multiple candidates

We also preserve metadata (type/confidence) through the pipeline so the UI can explain “why” a confirmation is needed.

Multi-stop Routing and “Via” Waypoints

Multi-stop routing required strict waypoint formatting. After fixing an early waypoint payload bug, we supported cleaner routes and better waypoint control by switching toward the Google Routes API.

For tricky intermediate locations like bridges and tunnels, we introduced pass-through (‘via’) waypoints rather than hard stops to avoid messy backtracking.

Coffee-stop Recommendation System

For “coffee” requests, we built a search + ranking pipeline that:

  • detects brand intent (e.g., “Starbucks”) and category intent (“coffee shop”)
  • supports nearby and route-aware search
  • filters for open-now (when available)
  • ranks by a practical objective: minimize detour cost first, then rating/reviews as tie-breakers

UI and Workflow

On the frontend, we built:

  • an interactive map view with route visualization
  • a stop-editing workflow with confirmation steps
  • confidence/type indicators to make decisions transparent

User Customization and History

To enable personalized routing, we implemented:

  • Route History: automatically saves completed routes with timestamps for quick revisit
  • Favorite Locations: save frequently visited addresses (home, work) for faster voice reference
  • Quick Access: displays recent routes and saved favorites for one-tap reloading
  • Email to Phone: one-button send to the user's registered email, opening the route directly in the Google Maps mobile app

Challenges we ran into

1) Relative references aren’t real addresses

Requests like “the nearest Starbucks” cannot be routed directly. The system must: 1) detect a relative reference
2) search candidates (Places)
3) select/confirm the correct one
4) only then generate a route

This pushed us to treat voice navigation as both a routing problem and a retrieval/ranking problem.

2) Confidence vs. confirmation UX

We found that model confidence alone isn’t enough. Even with (c \ge 0.90), geocoding can return multiple plausible matches or far-from-bias results. We refined confirmation logic to incorporate ambiguity signals:

[ \text{NeedConfirm} = \mathbb{1}\left( c < \tau \;\lor\; n_{\text{candidates}} > 1 \;\lor\; d_{\text{bias}} > \delta \right) ]

3) Bridges/tunnels causing route backtracking

Geocoding a bridge to a single lat/lng can land on the wrong side of the traffic flow. We solved this by supporting via/pass-through waypoints, and improving stability by adding heading-aware hints based on origin-to-destination bearing.

4) “Won’t reroute while editing stop.”

Live route editing introduced state-sync complexity between UI stops, cached routes, and server re-routing. We improved re-validation and rerouting logic so user-edited stops can be re-geocoded and confirmed without breaking the route flow.


Accomplishments that we're proud of

  • Built a working voice → structured stops → validated route pipeline where Gemini 3 is a core component, not an add-on.
  • Designed an address-type framework that handles full addresses, POIs, partial inputs, and relative references.
  • Implemented real-world routing fixes for bridges/tunnels using via waypoints (and heading-aware routing).
  • Delivered a route-aware coffee recommendation system with ranking + open-now filtering + fallbacks.
  • Created a smooth end-to-end UX: speak → verify → route appears → email deep link to Google Maps.

What we learned

  • Voice navigation reliability depends on structured output and verification, not conversational text.
  • A single confidence score is not enough—routing needs combined signals from:
    • model confidence,
    • geocoding ambiguity,
    • validation verdicts,
    • distance-from-bias checks.
  • Many “locations” in natural speech (bridges, tunnels, highways) should be treated as constraints, not stops.
  • For real driving scenarios, recommendations must prioritize what drivers care about most: minimal detour cost.

What's next for Voice-text Navigation Web Application

  • Full support for “nearest Starbucks” style requests with a standardized candidate-selection UI every time.
  • More robust multilingual voice input (language auto-detection + localized Places bias).
  • Automatically acquire the current location when the origin is not specified.
  • Replace unrestricted Google API keys with proper restrictions and secure deployment.
  • Improve route-accuracy policies: dynamically tune confirmation thresholds based on route length and density.
  • Performance tuning (lower latency) and better caching for repeated routes within a session.
Share this project:

Updates