What it does
pose-coach-android-starter turns a phone into a two-sided photo coach. It gives the photographer step-by-step framing guidance (horizon, distance, angle) while the subject hears voice prompts (e.g., “tilt right shoulder 5°”, “lower chin slightly”) so both people move in sync. It targets a widely discussed pain point—many people struggle to take flattering photos of their partners—and reframes it as a collaborative, real-time coaching experience. Under the hood, the app runs on-device pose estimation to track 33 body landmarks in real time, and uses a low-latency voice interaction pipeline to deliver natural prompts without a custom backend.
How we built it
- On-device perception (edge-first): Camera frames → pose landmark extraction; we derive posture angles, symmetry, and center-of-mass hints locally for responsiveness and privacy.
- Scoring loop: Each frame gets a composite quality score:
$$ S ;=; w_c,S_{\text{composition}} ;+; w_p,S_{\text{pose}} ;+; w_l,S_{\text{lighting}} ;+; w_s,S_{\text{stability}} $$
We iteratively update prompts until the improvement $\Delta S \ge \epsilon$.
- Voice coaching: A streaming, bidirectional voice channel turns perception signals into bite-size, actionable instructions that users can interrupt or confirm mid-pose.
- Android integration: Kotlin + CameraX + Jetpack Compose. For the demo path, the client streams audio directly to the speech/LLM layer and receives synthesized voice prompts, avoiding server glue.
- Privacy by default: Landmarking, scoring, and immediate guidance run on device; any optional cloud reasoning uses compact feature summaries.
Challenges we ran into
- End-to-end latency: Real-time posing feels awkward if feedback lags. We kept landmarking on device and streamed voice to minimize round trips.
- Actionable language: Vague tips (“be natural”) underperform. We converted advice into measurable cues (degrees, centimeters, gaze anchor points).
- Device variance: Different FOV/OIS/DR across phones required per-device heuristics for horizon leveling and exposure hints.
- Multi-person scenes: We added rules to lock onto one active subject before issuing pose prompts.
- Context & tone: We leaned into humor for marketing, but kept inclusive, encouraging copy in-product.
Accomplishments that we're proud of
- A working dual-coach loop where both roles receive timely, coordinated guidance—before and during framing, not just post-shot filters.
- Conversational feel: users can interrupt, retry, or adjust mid-pose without losing flow.
- A clear, extensible scoring function (S) that ties aesthetics to concrete micro-actions rather than abstract style notes.
- A serverless demo path on Android that reduces integration friction for hackathons and small teams.
What we learned
- Two-sided coaching > one-sided tips. Coaching both photographer and subject reduces frustration and shortens time to a “first good shot.”
- Latency is UX. Sub-second, bidirectional streaming is the difference between robotic and natural guidance.
- Metrics drive behavior. When prompts are grounded in measurable signals (angles, distances, stability), users improve faster.
- Tone matters. Humor attracts; kind coaching retains. Inclusive copy keeps the product welcoming to everyone.
What's next for pose-coach-android-starter
- AR “ghost-pose” overlays and haptic nudges for horizon leveling.
- Multi-subject choreography: group spacing, triangle compositions, collision avoidance for limbs in crowded frames.
- Scene-aware presets: backlight/night/portrait templates that adapt prompt sequences.
- Offline pack: smaller on-device prompt sets for airplane-mode coaching; sync richer sequences when online.
- Evaluation kit: an A/B harness with a rubric to quantify $\Delta S$, success-to-first-shot time, and user CSAT—so teams can iterate systematically.
Built With
- android-studio
- gemini
- geminiapi
- kotlin
- liveapi
Log in or sign up for Devpost to join the conversation.