What it does

pose-coach-android-starter turns a phone into a two-sided photo coach. It gives the photographer step-by-step framing guidance (horizon, distance, angle) while the subject hears voice prompts (e.g., “tilt right shoulder 5°”, “lower chin slightly”) so both people move in sync. It targets a widely discussed pain point—many people struggle to take flattering photos of their partners—and reframes it as a collaborative, real-time coaching experience. Under the hood, the app runs on-device pose estimation to track 33 body landmarks in real time, and uses a low-latency voice interaction pipeline to deliver natural prompts without a custom backend.


How we built it

  • On-device perception (edge-first): Camera frames → pose landmark extraction; we derive posture angles, symmetry, and center-of-mass hints locally for responsiveness and privacy.
  • Scoring loop: Each frame gets a composite quality score:

$$ S ;=; w_c,S_{\text{composition}} ;+; w_p,S_{\text{pose}} ;+; w_l,S_{\text{lighting}} ;+; w_s,S_{\text{stability}} $$

We iteratively update prompts until the improvement $\Delta S \ge \epsilon$.

  • Voice coaching: A streaming, bidirectional voice channel turns perception signals into bite-size, actionable instructions that users can interrupt or confirm mid-pose.
  • Android integration: Kotlin + CameraX + Jetpack Compose. For the demo path, the client streams audio directly to the speech/LLM layer and receives synthesized voice prompts, avoiding server glue.
  • Privacy by default: Landmarking, scoring, and immediate guidance run on device; any optional cloud reasoning uses compact feature summaries.

Challenges we ran into

  1. End-to-end latency: Real-time posing feels awkward if feedback lags. We kept landmarking on device and streamed voice to minimize round trips.
  2. Actionable language: Vague tips (“be natural”) underperform. We converted advice into measurable cues (degrees, centimeters, gaze anchor points).
  3. Device variance: Different FOV/OIS/DR across phones required per-device heuristics for horizon leveling and exposure hints.
  4. Multi-person scenes: We added rules to lock onto one active subject before issuing pose prompts.
  5. Context & tone: We leaned into humor for marketing, but kept inclusive, encouraging copy in-product.

Accomplishments that we're proud of

  • A working dual-coach loop where both roles receive timely, coordinated guidance—before and during framing, not just post-shot filters.
  • Conversational feel: users can interrupt, retry, or adjust mid-pose without losing flow.
  • A clear, extensible scoring function (S) that ties aesthetics to concrete micro-actions rather than abstract style notes.
  • A serverless demo path on Android that reduces integration friction for hackathons and small teams.

What we learned

  • Two-sided coaching > one-sided tips. Coaching both photographer and subject reduces frustration and shortens time to a “first good shot.”
  • Latency is UX. Sub-second, bidirectional streaming is the difference between robotic and natural guidance.
  • Metrics drive behavior. When prompts are grounded in measurable signals (angles, distances, stability), users improve faster.
  • Tone matters. Humor attracts; kind coaching retains. Inclusive copy keeps the product welcoming to everyone.

What's next for pose-coach-android-starter

  • AR “ghost-pose” overlays and haptic nudges for horizon leveling.
  • Multi-subject choreography: group spacing, triangle compositions, collision avoidance for limbs in crowded frames.
  • Scene-aware presets: backlight/night/portrait templates that adapt prompt sequences.
  • Offline pack: smaller on-device prompt sets for airplane-mode coaching; sync richer sequences when online.
  • Evaluation kit: an A/B harness with a rubric to quantify $\Delta S$, success-to-first-shot time, and user CSAT—so teams can iterate systematically.

Built With

Share this project:

Updates