About Posey

Inspiration

For many people, the hardest part of fitness is not the workout itself ... it's the moment before.

The moment filled with doubt. The crowded gym full of mirrors and confident people. The fear of doing something wrong while everyone watches. The voice in your head asking: "Do I even belong here?"

We built Posey for that moment.

The reality is stark:

  • 50% of new gym members quit within 6 months, often citing intimidation and lack of guidance
  • Personal trainers cost $50–150/hour, inaccessible for most people
  • Fitness apps give you videos to follow, but never tell you if you're doing it right
  • Introverts, beginners, and people with body image concerns are underserved by an industry designed for the already-confident

When Gemini Live API launched with real-time vision and voice capabilities, we saw an opportunity to solve a deeply human problem: What if everyone could have a patient, encouraging coach who watches their form, speaks to them in real-time, and never judges?

Not a louder coach. A kinder start.

What it does

Posey is an AI personal trainer that turns your iPhone into a judgment-free coaching experience. No gym required. No crowds. No mirrors reflecting your insecurities back at you.

It Understands You First

Before Posey coaches you, it listens. Through a onboarding experience, Posey learns:

  • "How active are you right now?"
  • "Do you have any pain or discomfort?"
  • "How much time do you have each day?"
  • "Do you prefer working out at home or at the gym?"
  • "What's your goal — confidence, strength, posture, or simply getting started?"

That conversation becomes the foundation for everything Posey does next.

It Creates a Plan That Fits Your Life

Based on your onboarding, Posey generates a personalized 3-week workout plan - not a random workout, not a one-size-fits-all video, but a plan built around your body, your schedule, and your confidence level.

For a beginner with limited time and posture concerns, Posey might create:

  • A short home workout (10–15 minutes)
  • A posture-focused warm-up
  • Simple bodyweight exercises
  • Core stability work
  • A gentle cooldown

No commute. No crowded gym. No pressure to keep up with anyone else.

It Coaches You While You Move

Here's where Posey becomes different from every other fitness app:

Posey watches you exercise and coaches in real-time.

Using Apple Vision, Posey detects 19 body keypoints at 30fps and calculates joint angles for biomechanical analysis. This data streams to Gemini Live API alongside camera frames, and Posey responds with spoken coaching:

"Lift your chest." "Keep your back straight." "Lower a little more — great depth!" "Nice correction. That's it."

The user no longer has to wonder:

  • "Am I doing this wrong?"
  • "Do I look weird?"
  • "Should I stop?"

Instead of confusion, they get guidance. Instead of embarrassment, they get support.

It Encourages, Not Intimidates

Posey motivates with warmth, not pressure:

"You're doing great." "One more rep - you've got this." "Progress, not perfection." "That was a good try - let's make the next one even better."

Because for many people, the hardest part is starting when you don't feel confident yet. Showing up while still feeling insecure. Trying before you feel ready.

Posey is designed for exactly that moment.

How we built it

Architecture Overview

The architecture is highlighted in the code repository.

The Intelligence Behind Real-Time Form Analysis

We extract biomechanically meaningful joint angles using vector mathematics:

$$\theta = \arccos\left(\frac{\vec{a} \cdot \vec{b}}{|\vec{a}| \cdot |\vec{b}|}\right)$$

Where $\vec{a}$ and $\vec{b}$ are vectors formed by adjacent body segments. We calculate:

Joint Why It Matters
Knee angle Squat depth, lunge safety
Elbow angle Push-up and curl range
Hip angle Deadlift hinge, core engagement
Shoulder angle Overhead press alignment
Torso angle Forward lean detection

This data streams to Gemini alongside video frames every 4 seconds - enabling coaching that references specific biomechanical measurements rather than guessing from pixels alone.

Multimodal Payload Example

{
  "image": "<JPEG frame>",
  "pose": {
    "joints": ["nose", "leftShoulder", "rightKnee", ...],
    "angles": {
      "leftKnee": 87.3,
      "rightElbow": 142.1,
      "torsoInclination": 12.4
    }
  }
}

Gemini responds with streaming PCM audio (16-bit, 24kHz mono) that plays through AVAudioEngine with sub-200ms latency.

Tech Stack

Layer Technology
Mobile App Swift 6, SwiftUI
Pose Detection Apple Vision Framework
AI Coaching Gemini Live API via Firebase AI Logic SDK
Training Plans Gemini3 Flash Lite via Google ADK
Backend Python, FastAPI, Google Cloud Run
Database Firebase Firestore
Auth Firebase Auth + Google Sign-In
Audio AVAudioEngine (low-latency PCM playback)

Challenges we ran into

1. Making AI Coaching Feel Human, Not Robotic

Early versions of Posey sounded like a drill sergeant. Users didn't need harsh corrections - they needed encouragement. We spent significant time crafting system prompts that balance accuracy with warmth:

"Your form is incorrect. Fix your knee position.""Try pushing your knees out a bit more - that's it, nice adjustment!"

The difference is subtle but crucial for users who already feel insecure.

2. Real-Time Audio Without Stuttering

Gemini Live returns audio in chunks. Naive playback caused stuttering and awkward silences. We implemented a buffered audio queue with AVAudioPlayerNode that schedules chunks ahead of playback, achieving smooth, continuous speech.

3. Balancing Responsiveness vs. Cost

Sending 30fps video to Gemini would be prohibitively expensive. We settled on 4-second intervals - fast enough for meaningful form correction, practical enough for real-world use. The pose skeleton overlay runs at full 30fps locally for immediate visual feedback.

4. Swift 6 Strict Concurrency

Camera frames, pose detection, audio playback, and network streaming all happen on different threads. Swift 6's strict concurrency flagged dozens of potential data races. We restructured using @MainActor, careful DispatchQueue isolation, and @unchecked Sendable where thread-safety was manually guaranteed.

5. Pose Detection in Real Workout Conditions

Apple Vision struggles with partial body visibility, fast movements, and low lighting. We added confidence thresholds and graceful degradation - Gemini coaches based on visible joints rather than failing entirely.

Accomplishments that we're proud of

Technical Achievements

  • True real-time multimodal AI - The AI genuinely watches and responds to form within seconds
  • Sub-200ms audio latency - Coaching feels immediate, like a human trainer
  • Zero exercise configuration - The AI automatically identifies what movement you're doing
  • Production deployment - One-command Cloud Run deployment with proper auth and security

Human Impact

  • Democratized personal training - What cost $100+/hour is now accessible to everyone
  • Designed for the underserved - Built specifically for introverts, beginners, and people with gym anxiety
  • Judgment-free zone - No mirrors, no crowds, no comparison to others
  • Encouragement over criticism - A coaching style that builds confidence, not insecurity

What we learned

On Building with Multimodal AI

  1. Gemini Live API enables genuinely new experiences - Bidirectional streaming with interruption handling creates interactions impossible with request/response APIs

  2. Context is everything - Sending pose angles alongside images dramatically improves form analysis. The AI can say "your left knee is at 87°, try to get below 90°" instead of vaguely guessing

  3. Firebase AI Logic SDK is powerful - Managed Gemini access with automatic auth eliminated client-side API key management entirely

On Building for Humans

  1. Voice-first is essential for fitness - Users can't look at screens mid-workout; audio is the only viable output modality

  2. Tone matters more than accuracy - A technically correct but harsh correction discourages users; a warm suggestion with the same content keeps them going

  3. The barrier isn't physical, it's emotional - People don't avoid fitness because it's hard. They avoid it because it feels intimidating. Technology that addresses the emotional barrier is more impactful than technology that addresses the physical one.

What's next for Posey

Immediate Roadmap

  • Rep counting Automatic repetition detection using pose angle patterns
  • Workout history - Track progress and celebrate milestones
  • Apple Watch integration - Heart rate data for fatigue detection and recovery guidance

Expanding Access

  • Android version - Bring Posey to the majority of global smartphone users
  • Multi-language support - Coaching in users' native languages
  • Offline mode - On-device models for gym environments with poor connectivity

Deeper Impact

  • Physical therapy integration - Partner with healthcare providers for rehabilitation exercises
  • Mental health awareness - Recognize signs of overexertion or exercise avoidance patterns
  • Community features - Connect users with similar journeys (optional, never forced)

The Bigger Picture

Health, movement, and confidence should not belong only to the people who already have them.

They should belong to everyone.

For the person who wants to feel stronger but doesn't want to feel judged. For the person who wants to begin but doesn't want to begin alone. For the person who's tried before and quit because no one was there to say "You're doing great. Keep going."

Sometimes, what people need is not a louder coach. They need a kinder start.

That's Posey.

Built With

Share this project:

Updates