Inspiration
When we think about a “golden age,” we think about growth, accessibility, and tools that empower more people to create and express themselves. One creative outlet we love is photography, but getting the perfect shot is hard. You're either fumbling with a tripod, asking a stranger to take your photo, or settling for an awkward selfie. Professional photographers know exactly where to stand, how to angle the camera, and when the lighting is just right. We thought: what if an AI could do that for you, autonomously, using a robot?
What it does
Golden Hour is an AI-powered robotic photographer. You set it down, tap one button, and it does the rest. A ground robot with a camera gimbal drives around, adjusts its height and tilt, and frames you using real composition principles: golden ratio, rule of thirds, leading lines, lighting quality. It sees through your phone's camera, thinks with Gemini's multimodal Live API, and physically moves to find the best angle. When it's happy with the shot, it talks to you: "Looking amazing! Ready for the shot?" Say yes, and it snaps. Say no, and it keeps working.
How we built it
The system has four layers. The phone app (React Native / Expo) captures the camera feed and audio, streams them to the Gemini Live API over WebSocket, and plays back the AI's voice responses. The Gemini 2.5 Flash model runs in a persistent bidirectional session, receiving ~1fps video frames and live audio, then making tool calls to control the robot. A Node.js bridge server running on a laptop translates WebSocket commands from the phone into USB serial for the Arduino. The Arduino drives DC motors (forward, backward, rotate) and controls a stepper motor (camera height) and servo (camera tilt), sending back a "done" acknowledgment after each movement so the AI knows when to evaluate the next frame.
Challenges we ran into
Getting the AI to actually iterate instead of declaring victory after one adjustment was tough. Early prompts led to the model calling stop_analysis way too soon. We had to be very explicit that audio is muted during analysis and that it should be a perfectionist.
The send-and-receive synchronization between the phone, server, and Arduino was tricky. We needed the tool call to block until the robot physically finished moving, which meant tracking a pending promise that only resolves when the Arduino sends "done" back over serial. Without this, the AI would re-analyze before the motors stopped, seeing a blurred or half-moved frame.
Mapping the AI's abstract understanding of composition ("subject is too far left") into concrete motor commands (rotate right for 400ms) required a lot of prompt tuning. The AI doesn't know the robot's speed, so it has to learn from visual feedback whether its adjustment was too big, too small, or in the wrong direction.
Accomplishments that we're proud of
The closed feedback loop actually works! The AI sees a frame, makes a physical adjustment, sees the result, and corrects course. It genuinely iterates toward better composition rather than making one move and giving up. The live rule-of-thirds grid that lights up green when the subject hits a guideline is a small touch but makes the AI's reasoning visible and tangible. You can watch it "think" in real time. Building the whole pipeline: phone camera → WebSocket → Gemini Live API → tool calls → WebSocket → serial → motors → "done" → back to AI, and having it run in real time with voice interaction feels like a lot of moving parts that came together.
What we learned
Prompt engineering for agentic tool use is very different from chat. The model needs to understand physical constraints (motors take time, small adjustments compound, don't repeat failed moves). Treating the AI as a photographer with a specific workflow was more effective than giving it open ended instructions.
What's next for Golden Ratio
Adding obstacle detection so the robot doesn't drive into things. Supporting multiple subjects (group photos) with smarter framing logic. Implementing a "golden hour detector" that uses time-of-day and light angle analysis to suggest when and where to shoot for the best natural lighting, living up to the name. And eventually, swapping the Arduino for a more capable platform so it can navigate outdoor terrain autonomously.
Log in or sign up for Devpost to join the conversation.