Inspiration
We all know someone who drives terribly and won't admit it. The problem with bad driving is that there's no feedback loop. You brake too hard, take a turn too fast, creep over the speed limit for 10 minutes straight, and nothing tells you. Driving instructors exist for learners, but once you have your license you're on your own. We wanted to build something that sits in your car and actually watches how you drive, tells you when you mess up in real time, and gives you a honest breakdown at the end. No expensive hardware, no OBD dongles, just your phone.
What it does
PilotPal turns your phone into an AI driving coach. You mount it under your rearview mirror and it runs continuous analysis on your driving using the camera and onboard sensors.
While you drive, it tracks braking smoothness using the accelerometer and gyroscope, monitors your speed against posted limits using GPS and vision-based sign detection, measures how smooth your turns are through lateral g-force analysis, and estimates following distance using visual analysis. If you do something wrong, it tells you out loud immediately. Drive 12 over the limit and it says "slow down." Brake too hard and it flags it.
When you park, the app generates a full post-drive summary. It scores you across braking, turn smoothness, speed compliance, lane discipline, and overall safety. It pulls up specific video clips from the drive showing exactly what you did wrong, with explanations attached. Over time it tracks your trends so you can see yourself improving.
How we built it
The app is built with Flutter. We capture the live camera feed and stream frames to Gemini's multimodal API for vision analysis. Gemini handles everything visual: detecting cars, pedestrians, traffic lights, speed limit signs, lane positioning, following distance, and overall road scene understanding. Instead of running multiple specialized ML models on-device, we let one powerful multimodal model do all the heavy lifting.
Sensor data comes from the phone's gyroscope and accelerometer at 100Hz, fused with GPS speed through a Kalman filter. Braking events are detected via accelerometer z-axis spikes above a threshold, and turn smoothness is scored by computing jerk (rate of change of lateral acceleration) through each turn. The sensor pipeline runs entirely on-device and feeds into the same analysis stream as the vision data.
We built a rolling video buffer that continuously records and discards old footage. When an event is flagged (hard brake, speed violation, red light), the surrounding clip gets saved. Post-drive summaries are generated by sending a structured JSON payload of all events and scores to Gemini, which returns a conversational breakdown with specific feedback.
Real-time audio warnings use the native TTS engine with a priority queue and cooldown system so it doesn't spam the driver.
Challenges we ran into
Latency was the biggest issue. Streaming frames to Gemini for analysis means you're dependent on network round-trip time. We had to figure out the right balance of frame sampling rate and prompt structure to keep responses fast enough for real-time feedback. Sending every frame was way too slow and expensive, so we tuned it down to key frames at intervals that still catch important events without missing anything critical.
Sensor fusion was harder than expected. Raw accelerometer data is incredibly noisy, especially on bumpy roads. Without the Kalman filter, every pothole registered as a hard brake event. Tuning the thresholds to distinguish between "bad road" and "bad driving" took a lot of trial and error.
The rolling video buffer caused memory pressure issues. Recording continuously at 30fps while streaming frames for inference eats RAM fast. We had to implement aggressive chunk rotation and drop resolution on saved clips to keep the app from getting killed by the OS.
Getting Gemini to return structured, consistent analysis across frames was another challenge. The model is powerful but you have to be very deliberate with your prompting to get reliable JSON output with consistent scoring rather than freeform descriptions. We iterated on the system prompt a lot to get deterministic, actionable responses.
Accomplishments that we're proud of
The real-time audio feedback system feels natural. It warns you without being annoying, which is a harder UX problem than it sounds. Getting the cooldown timing and priority logic right so it doesn't nag you every 2 seconds made a huge difference.
Using Gemini as our vision backbone meant we didn't have to train or fine-tune any models. We got object detection, sign reading, scene understanding, and depth reasoning all from one API. That let us move fast and focus on the product experience instead of the ML pipeline.
The post-drive summary is genuinely useful. It doesn't just dump numbers at you. It tells you "your braking was rough on highway exits, specifically at 14:32 and 14:47" and shows you the clips. That level of specificity makes it actionable.
What we learned
Prompting a multimodal model for real-time structured output is its own skill. You can't just say "analyze this driving frame." You need to tell it exactly what to look for, what format to return, and how to score things consistently across hundreds of frames. Small changes in the prompt caused big swings in output quality.
We also learned that sensor data is messy. Textbook algorithms for IMU-based motion detection assume clean signals, and real phone sensors in a moving car on real roads are anything but clean. Filtering and thresholding is more art than science.
On the product side, we learned that the feedback timing matters more than the feedback content. A perfectly accurate warning delivered 3 seconds too late is useless. A slightly less precise warning delivered instantly is valuable. That tension between API latency and feedback urgency shaped a lot of our architecture decisions.
What's next for PilotPal
Next steps are adding a calibration flow so the app adapts to different phone mounting positions and angles. We want to add night driving support by tuning our prompts and frame preprocessing for low-light conditions. Historical trend tracking across drives is partially built and needs to be finished out. We're also looking at adding a social/competitive element where drivers can compare scores and challenge friends to drive better. Longer term, we want to explore insurance integrations since safe driving data has real value for usage-based insurance programs.

Log in or sign up for Devpost to join the conversation.