Inspiration
I've been taking dancing lessons for a couple of years and was genuinely surprised by how popular Bachata has become in Germany. The dance floors are packed, the community is vibrant, and people are hungry to improve.
Dancers want to:
- Learn new moves beyond what they pick up in weekly classes
- Share their progress on social media with polished choreography videos
- Practice at home with structured routines they can follow
But here's the challenge: putting moves together when social dancing is hard. You learn individual steps in class, but combining them into a flowing sequence that matches the music? That's where most dancers struggle.
Bachata Buddy is your personalized AI dance teacher. Just describe what you want—"a romantic beginner routine" or "something energetic for an advanced dancer"—and the AI creates a custom choreography video synced to music.
No more awkward transitions. No more forgetting what comes next. Just dance.
What it does
Bachata Buddy transforms natural language requests into personalized dance choreography videos:
The Magic Flow
- You describe → "Create a sensual intermediate choreography with medium energy"
- AI understands → OpenAI extracts difficulty, style, energy level, and mood
- Music analysis → Librosa analyzes tempo, rhythm patterns, and energy curves
- Smart matching → Trimodal embeddings find the perfect moves for your request
- Video assembly → FFmpeg stitches clips together, synced to music
- You dance → Download your personalized choreography and practice!
Key Features
🤖 Natural Language Interface
- Chat with the AI like you'd talk to a dance instructor
- "Make me something fun for a party" just works
🎬 Real Video Output
- Actual dance clips assembled into a cohesive routine
- Audio perfectly synced to the choreography
🔍 Trimodal Embedding Search
- Pose embeddings (512D) - Match body positions and transitions
- Audio embeddings (128D) - Align moves to music characteristics
- Text embeddings (384D) - Understand style, difficulty, mood
📚 Collection Management
- Save your favorite choreographies
- Build a personal library of routines
- Track your progress over time
🎯 Difficulty Levels
- Beginner, Intermediate, Advanced
- Energy levels from chill to high-intensity
- Styles: Romantic, Sensual, Energetic, Playful
How we built it
Architecture Overview
User Request → OpenAI Agent → Music Analysis → Vector Search → Video Assembly (FFMPEG Service) → Result
Tech Stack
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React + Vite + Tailwind | Modern, responsive UI with real-time updates |
| Backend | Django + DRF | Robust API with JWT authentication |
| AI Orchestration | OpenAI GPT-4 | Natural language understanding & function calling |
| Audio Analysis | Librosa | Extract tempo, MFCCs, spectral features |
| Pose Detection | YOLOv8 | Extract dancer keypoints from video clips |
| Text Embeddings | Sentence Transformers | Semantic understanding of move descriptions |
| Vector Search | FAISS | Fast similarity search across 1024D embeddings |
| Video Processing | FFmpeg | Concatenate clips, add audio, normalize formats |
| Database | PostgreSQL | Store users, tasks, embeddings, collections |
The Trimodal Embedding Innovation
What makes Bachata Buddy special is how we match moves to requests. Each dance move is represented by three types of embeddings:
Pose (512D) × 35% + Audio (128D) × 35% + Text (384D) × 30% = Combined (1024D)
This allows us to find moves that:
- Look right (similar body positions)
- Sound right (match the music's energy)
- Feel right (align with user intent)
Kiro-Assisted Development
Built with Kiro's AI-powered development assistance:
- Spec-driven development for complex features
- Intelligent code generation for boilerplate
- Real-time debugging and error resolution
- Architecture guidance for scalable design
Challenges we ran into
1. Embedding Dimension Mismatch
Early on, our pose, audio, and text embeddings had incompatible dimensions. We solved this with weighted normalization—L2 normalize each embedding, apply weights, then concatenate.
2. Video Synchronization
Getting clips to flow smoothly was tricky. Different source videos had varying frame rates and resolutions. FFmpeg normalization to 30fps with consistent encoding solved this.
3. Race Conditions in Video Delivery
The frontend would navigate to the video page before the file was fully written. We implemented a retry mechanism with exponential backoff in the video player.
4. OpenAI Function Calling Reliability
The agent sometimes wouldn't call all required functions. We added automatic fallback logic—if the blueprint isn't assembled, the system calls assemble_video automatically.
5. FAISS Index Management
Keeping the vector index in sync with the database required careful cache invalidation. We implemented a TTL-based cache with manual refresh capability.
Accomplishments that we're proud of
✅ End-to-End AI Pipeline
From natural language to video output—fully automated. No manual intervention required.
✅ Trimodal Embedding Fusion
A novel approach combining pose, audio, and text embeddings for holistic move matching.
✅ Real-Time Reasoning Panel
Users can watch the AI "think"—see which functions it calls and why. Transparency builds trust.
✅ Production-Ready Architecture
- JWT authentication with refresh tokens
- Proper error handling and logging
- Docker containerization
- AWS deployment ready
✅ Smooth Video Playback
Custom video player with:
- Loop controls for practice sections
- Playback speed adjustment (0.5x - 1.5x)
- Keyboard shortcuts for dancers
- Authenticated streaming
✅ UNIQUE USE CASE
- This app is a completely new angle on Latin dancing, it has the potential to become a profitable standalone product
What we learned
Technical Insights
Multimodal embeddings are powerful - Combining different data types (video, audio, text) creates richer representations than any single modality.
FAISS is incredibly fast - Even with 1024-dimensional vectors, similarity search is nearly instantaneous.
FFmpeg is a Swiss Army knife - Video processing that would take hundreds of lines of code is a single command.
OpenAI function calling needs guardrails - Always have fallback logic for when the AI doesn't behave as expected.
Product Insights
Dancers want personalization - Generic tutorials don't cut it. People want routines tailored to their level and style.
Video is king - Text instructions for dance moves are nearly useless. Visual demonstration is essential.
Progress tracking matters - Saving and organizing choreographies creates long-term engagement.
Development Insights
AI-assisted coding accelerates everything - Kiro helped us move fast without sacrificing quality.
Spec-driven development prevents scope creep - Having clear requirements upfront kept us focused.
Start with the happy path - Get the core flow working, then handle edge cases.
What's next for Bachata buddy
Short Term (Next Month)
- [ ] More dance moves - Expand the clip library with turns, dips, and advanced patterns
- [ ] Music upload - Let users upload their own songs for choreography
- [ ] Social sharing - One-click export to Instagram/TikTok format
- [ ] Mobile app - React Native version for practice on the go, move recognition and feedback, something like BachaTrainer should be integrated https://devpost.com/software/bachatrainer?ref_content=user-portfolio&ref_feature=in_progress

Log in or sign up for Devpost to join the conversation.