pose-coach-android-starter

A_flat-style_digital_illustration_showcases_an_app

What it does

pose-coach-android-starter turns a phone into a two-sided photo coach. It gives the photographer step-by-step framing guidance (horizon, distance, angle) while the subject hears voice prompts (e.g., “tilt right shoulder 5°”, “lower chin slightly”) so both people move in sync. It targets a widely discussed pain point—many people struggle to take flattering photos of their partners—and reframes it as a collaborative, real-time coaching experience. Under the hood, the app runs on-device pose estimation to track 33 body landmarks in real time, and uses a low-latency voice interaction pipeline to deliver natural prompts without a custom backend.

How we built it

On-device perception (edge-first): Camera frames → pose landmark extraction; we derive posture angles, symmetry, and center-of-mass hints locally for responsiveness and privacy.
Scoring loop: Each frame gets a composite quality score:

$$ S ;=; w_c,S_{\text{composition}} ;+; w_p,S_{\text{pose}} ;+; w_l,S_{\text{lighting}} ;+; w_s,S_{\text{stability}} $$

We iteratively update prompts until the improvement $\Delta S \ge \epsilon$.

Voice coaching: A streaming, bidirectional voice channel turns perception signals into bite-size, actionable instructions that users can interrupt or confirm mid-pose.
Android integration: Kotlin + CameraX + Jetpack Compose. For the demo path, the client streams audio directly to the speech/LLM layer and receives synthesized voice prompts, avoiding server glue.
Privacy by default: Landmarking, scoring, and immediate guidance run on device; any optional cloud reasoning uses compact feature summaries.

Challenges we ran into

End-to-end latency: Real-time posing feels awkward if feedback lags. We kept landmarking on device and streamed voice to minimize round trips.
Actionable language: Vague tips (“be natural”) underperform. We converted advice into measurable cues (degrees, centimeters, gaze anchor points).
Device variance: Different FOV/OIS/DR across phones required per-device heuristics for horizon leveling and exposure hints.
Multi-person scenes: We added rules to lock onto one active subject before issuing pose prompts.
Context & tone: We leaned into humor for marketing, but kept inclusive, encouraging copy in-product.

Accomplishments that we're proud of

A working dual-coach loop where both roles receive timely, coordinated guidance—before and during framing, not just post-shot filters.
Conversational feel: users can interrupt, retry, or adjust mid-pose without losing flow.
A clear, extensible scoring function (S) that ties aesthetics to concrete micro-actions rather than abstract style notes.
A serverless demo path on Android that reduces integration friction for hackathons and small teams.

What we learned

Two-sided coaching > one-sided tips. Coaching both photographer and subject reduces frustration and shortens time to a “first good shot.”
Latency is UX. Sub-second, bidirectional streaming is the difference between robotic and natural guidance.
Metrics drive behavior. When prompts are grounded in measurable signals (angles, distances, stability), users improve faster.
Tone matters. Humor attracts; kind coaching retains. Inclusive copy keeps the product welcoming to everyone.

What's next for pose-coach-android-starter

AR “ghost-pose” overlays and haptic nudges for horizon leveling.
Multi-subject choreography: group spacing, triangle compositions, collision avoidance for limbs in crowded frames.
Scene-aware presets: backlight/night/portrait templates that adapt prompt sequences.
Offline pack: smaller on-device prompt sets for airplane-mode coaching; sync richer sequences when online.
Evaluation kit: an A/B harness with a rubric to quantify $\Delta S$, success-to-first-shot time, and user CSAT—so teams can iterate systematically.

Built With

android-studio
gemini
geminiapi
kotlin
liveapi

Updates

秀吉蔡 started this project — Oct 06, 2025 04:04 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.