Inspiration

I've been taking dancing lessons for a couple of years and was genuinely surprised by how popular Bachata has become in Germany. The dance floors are packed, the community is vibrant, and people are hungry to improve.

Dancers want to:

  • Learn new moves beyond what they pick up in weekly classes
  • Share their progress on social media with polished choreography videos
  • Practice at home with structured routines they can follow

But here's the challenge: putting moves together when social dancing is hard. You learn individual steps in class, but combining them into a flowing sequence that matches the music? That's where most dancers struggle.

Bachata Buddy is your personalized AI dance teacher. Just describe what you want—"a romantic beginner routine" or "something energetic for an advanced dancer"—and the AI creates a custom choreography video synced to music.

No more awkward transitions. No more forgetting what comes next. Just dance.

What it does

Bachata Buddy transforms natural language requests into personalized dance choreography videos:

The Magic Flow

  1. You describe → "Create a sensual intermediate choreography with medium energy"
  2. AI understands → OpenAI extracts difficulty, style, energy level, and mood
  3. Music analysis → Librosa analyzes tempo, rhythm patterns, and energy curves
  4. Smart matching → Trimodal embeddings find the perfect moves for your request
  5. Video assembly → FFmpeg stitches clips together, synced to music
  6. You dance → Download your personalized choreography and practice!

Key Features

🤖 Natural Language Interface

  • Chat with the AI like you'd talk to a dance instructor
  • "Make me something fun for a party" just works

🎬 Real Video Output

  • Actual dance clips assembled into a cohesive routine
  • Audio perfectly synced to the choreography

🔍 Trimodal Embedding Search

  • Pose embeddings (512D) - Match body positions and transitions
  • Audio embeddings (128D) - Align moves to music characteristics
  • Text embeddings (384D) - Understand style, difficulty, mood

📚 Collection Management

  • Save your favorite choreographies
  • Build a personal library of routines
  • Track your progress over time

🎯 Difficulty Levels

  • Beginner, Intermediate, Advanced
  • Energy levels from chill to high-intensity
  • Styles: Romantic, Sensual, Energetic, Playful

How we built it

Architecture Overview

User Request → OpenAI Agent → Music Analysis → Vector Search → Video Assembly  (FFMPEG Service) → Result

Tech Stack

Layer Technology Purpose
Frontend React + Vite + Tailwind Modern, responsive UI with real-time updates
Backend Django + DRF Robust API with JWT authentication
AI Orchestration OpenAI GPT-4 Natural language understanding & function calling
Audio Analysis Librosa Extract tempo, MFCCs, spectral features
Pose Detection YOLOv8 Extract dancer keypoints from video clips
Text Embeddings Sentence Transformers Semantic understanding of move descriptions
Vector Search FAISS Fast similarity search across 1024D embeddings
Video Processing FFmpeg Concatenate clips, add audio, normalize formats
Database PostgreSQL Store users, tasks, embeddings, collections

The Trimodal Embedding Innovation

What makes Bachata Buddy special is how we match moves to requests. Each dance move is represented by three types of embeddings:

Pose (512D) × 35% + Audio (128D) × 35% + Text (384D) × 30% = Combined (1024D)

This allows us to find moves that:

  • Look right (similar body positions)
  • Sound right (match the music's energy)
  • Feel right (align with user intent)

Kiro-Assisted Development

Built with Kiro's AI-powered development assistance:

  • Spec-driven development for complex features
  • Intelligent code generation for boilerplate
  • Real-time debugging and error resolution
  • Architecture guidance for scalable design

Challenges we ran into

1. Embedding Dimension Mismatch

Early on, our pose, audio, and text embeddings had incompatible dimensions. We solved this with weighted normalization—L2 normalize each embedding, apply weights, then concatenate.

2. Video Synchronization

Getting clips to flow smoothly was tricky. Different source videos had varying frame rates and resolutions. FFmpeg normalization to 30fps with consistent encoding solved this.

3. Race Conditions in Video Delivery

The frontend would navigate to the video page before the file was fully written. We implemented a retry mechanism with exponential backoff in the video player.

4. OpenAI Function Calling Reliability

The agent sometimes wouldn't call all required functions. We added automatic fallback logic—if the blueprint isn't assembled, the system calls assemble_video automatically.

5. FAISS Index Management

Keeping the vector index in sync with the database required careful cache invalidation. We implemented a TTL-based cache with manual refresh capability.

Accomplishments that we're proud of

✅ End-to-End AI Pipeline

From natural language to video output—fully automated. No manual intervention required.

✅ Trimodal Embedding Fusion

A novel approach combining pose, audio, and text embeddings for holistic move matching.

✅ Real-Time Reasoning Panel

Users can watch the AI "think"—see which functions it calls and why. Transparency builds trust.

✅ Production-Ready Architecture

  • JWT authentication with refresh tokens
  • Proper error handling and logging
  • Docker containerization
  • AWS deployment ready

✅ Smooth Video Playback

Custom video player with:

  • Loop controls for practice sections
  • Playback speed adjustment (0.5x - 1.5x)
  • Keyboard shortcuts for dancers
  • Authenticated streaming

✅ UNIQUE USE CASE

  • This app is a completely new angle on Latin dancing, it has the potential to become a profitable standalone product

What we learned

Technical Insights

  1. Multimodal embeddings are powerful - Combining different data types (video, audio, text) creates richer representations than any single modality.

  2. FAISS is incredibly fast - Even with 1024-dimensional vectors, similarity search is nearly instantaneous.

  3. FFmpeg is a Swiss Army knife - Video processing that would take hundreds of lines of code is a single command.

  4. OpenAI function calling needs guardrails - Always have fallback logic for when the AI doesn't behave as expected.

Product Insights

  1. Dancers want personalization - Generic tutorials don't cut it. People want routines tailored to their level and style.

  2. Video is king - Text instructions for dance moves are nearly useless. Visual demonstration is essential.

  3. Progress tracking matters - Saving and organizing choreographies creates long-term engagement.

Development Insights

  1. AI-assisted coding accelerates everything - Kiro helped us move fast without sacrificing quality.

  2. Spec-driven development prevents scope creep - Having clear requirements upfront kept us focused.

  3. Start with the happy path - Get the core flow working, then handle edge cases.

What's next for Bachata buddy

Short Term (Next Month)

Built With

Share this project:

Updates