Architecture Digram

AI-Powered Yoga Virtual Assistant

Inspiration

With yoga studios closed during the pandemic, millions turned to online classes but lost access to personalized feedback on form and technique. Traditional instruction relies on real-time corrections from instructors who observe subtle misalignments. Without this feedback, practitioners risk poor form habits, reduced effectiveness, and potential injuries.

We asked: "What if AI could provide the same detailed, personalized feedback as an in-person yoga instructor?" This led us to create a virtual assistant using computer vision and generative AI to analyze videos, detect poses, and provide actionable feedback.

What it does

AI-Powered Yoga Virtual Assistant evaluates yoga poses through video analysis and provides detailed feedback:

Training Phase: Upload reference video → Extract frames → Detect 33 body landmarks with MediaPipe → Calculate joint angles → Create statistical "golden standard"

Testing Phase: Upload user video → Validate with Claude 3 Sonnet → Compare angles against standard → Generate evaluation with scores and recommendations

Output: Overall score (0-100), letter grade, per-angle analysis, status indicators, and actionable feedback for improvement.

Supported Poses: Downward Dog, Warrior I & II, Tree Pose, Triangle Pose

How we built it

Serverless architecture on AWS with conversational AI interface:

Amazon Bedrock AgentCore (Runtime, Observability and Memory) with Strands Agents framework
Claude 3 Sonnet for conversational AI and video validation
AWS Lambda (Dockerized) for video processing
MediaPipe for pose detection and angle calculation

Challenges we ran into

API Throttling: Claude API calls caused ThrottlingException errors. Implemented exponential backoff (2s→4s→5s delays), reducing errors to <1%.

Long Processing Times: Videos took 94-136 seconds. Reduced validation frames (6→3), removed duplicate validation, saving 50 seconds total.

Poor Pose Detection: Only 13.8% frame success rate. Lowered visibility threshold (0.5→0.3) and added per-angle checking, improving to >80% success.

AgentCore Integration: Created wrapper tools to format Lambda events correctly while maintaining conversational capabilities.

Accomplishments that we're proud of

We built a production-ready serverless system that processes videos in <60 seconds with >80% pose detection accuracy. Key achievements:

Performance Optimization: 50% faster processing through systematic improvements
Conversational AI: Natural language interaction via AgentCore with tool-based architecture
Dual Interface: Both direct upload and chat-based user experiences
Scalable Architecture: Serverless design that scales automatically and costs only when used
AI-Powered Validation: Intelligent video validation ensuring correct pose detection

What we learned

Computer Vision: Pose detection varies with lighting and angles; visibility thresholds need careful tuning for real-world scenarios.

API Integration: Always implement exponential backoff for external APIs; small optimizations (6→3 frames) can save significant time.

AgentCore: Tool-based architecture is powerful for workflow orchestration; natural language interfaces greatly improve user experience.

Optimization: Measure everything, eliminate duplicate work, and test with real user data to identify bottlenecks.

What's next for AI-Powered Yoga Virtual Assistant

Short-term: Multi-agent orchestration, Dynamic Time Warping analysis, conversation memory integration

Medium-term: Expand to 20+ poses, real-time feedback via WebRTC, personalized training plans

Long-term: Multi-person support, AR/VR integration, advanced AI coaching with voice guidance

Built With

agentcore
aws-api-gateway
aws-bedrock-agent-core-orchestrator
bedrock
congnito
flask
frontend-aplication(not-defined)-aws-cognito
generatorfrontend-aplication(not-defined)-aws-cognito
lambda
mediapipe

Updates

Vighneshwar Mishra started this project — Oct 20, 2025 05:11 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.