🌍 Earth2Echo

From starlight to soundtrack.

Music is the universal language. Earth2Echo helps early musicians, indie composers, filmmakers, and content creators turn any video—celestial or otherwise—into evolving scores they control. It accelerates content, music, media, and marketing workflows: fast idea generation, creator-first control, and exportable stems/MIDI for real DAW work.

Impact (who benefits & why they’d pay):

  • Up-and-coming musicians & composers: instant picture-aware cue sketches → less time blocking, more time finishing tracks.
  • Indie filmmakers & editors: unique, non-stock temp cues that adapt to the cut → better creative direction before hiring/finishing.
  • Creators/marketers: rapid, on-brand sound beds for reels/trailers → higher output without generic libraries.
  • Why they’d pay: saves hours per cue, produces exportable assets (stems/MIDI), and keeps human taste in control—no “AI-lock-in.”

Novelty (what’s truly new):

  • Not just “music from text”—video-aware, continuously steerable scoring that adapts as the scene changes.
  • Multimodal loop (video + user steer + audio feedback) → a real co-composer, not a one-shot generator.
  • Astral Data Mode ties live space metrics to sound design for a signature cosmic feel.

🌌 Problem & Why Now

Scoring to picture is slow and expensive; stock music is generic. Short-form and indie production are exploding, but creators need fast, picture-aware music that preserves creative control. Earth2Echo is built for that exact gap.

💡 What It Does

  • Live Space-to-Sound (or any video): Pick an ISS/telescope livestream or paste a YouTube link. Every few seconds we capture a frame and produce a concise music directive (BPM, mood, timbre, motion cue).
  • Realtime Co-Composer: Directives stream into a realtime music model to render evolving audio that follows the visuals.
  • You Steer the Vibe: Prompt box + “Mood Mixer” sliders (Energy↔Calm, Warm↔Cool, Acoustic↔Electronic). Tiny nudges = audible shifts.
  • Astral Data Mode (optional): Map ISS position / solar wind metrics to reverb, pitch, or tempo.
  • Upload Your Universe: Works on reels, timelapses, and film clips—any video.
  • Export for Real Workflows: Download stems/MIDI/WAV and finish in Logic/Ableton/Pro Tools.

🧠 Inspiration

From Interstellar’s awe to Voyager’s Golden Record (music as a message for anyone), plus our friend Ryan—a lofi producer/film composer—who wanted faster, picture-aware cues without losing his touch. Earth2Echo is a co-composer, not a replacement.

🌟 Key Benefits (Usability for this community)

  • Creator-first: Short, predictable directives → consistent results you can shape.
  • Workflow-ready: Stems/MIDI export, tempo suggestions, and cue-friendly durations.
  • Fewer dead ends: Visual-difference checks avoid over-steering; you get stable musical arcs.
  • Faster iteration: Try multiple moods in seconds; keep what works, edit the rest in your DAW.

🖼️ Live Demo Moments

  1. ISS Earth Limb → music starts: “~68 BPM, airy pads, slow swell.”
  2. Type “warmer strings, less percussion” → clear audible shift.
  3. Toggle Astral Data Mode (solar flux → reverb) → subtle, explainable change.
  4. Paste a YouTube city timelapse → instant genre pivot.
  5. Click Download Stems → open in a DAW.

🛠️ How We Built It

🖥️ Frontend

  • React + WebAudio + WebSocket for low-latency audio streaming & waveform/spectrum
  • Video player, prompt box, Mood Mixer sliders, and “Gemini thinks…” caption strip

⚙️ Backend

  • FastAPI/Flask session driver
  • Frame capture (YouTube/local) every ~5s
  • SSIM-based frame diff to skip unnecessary re-prompts
  • Persistent realtime music socket; streams chunks to client & to a downloadable file

🤖 AI & Data

  • Gemini (Vision + text): joint multimodal prompts produce compact, Lyria-ready directives (BPM, mood, timbre, motion).
  • (Optional) Vertex SFT: small (frame → directive) JSONL set to tighten style/consistency.
  • Audio feedback loop (optional): tempo/loudness summaries feed back for tri-modal reasoning.
  • Astral hooks: ISS position / solar metrics mapped to subtle sound parameters.

🔄 Data Flow

Video/Livestream → frame sampler → [image + user steer (+ audio summary?)] → Gemini
→ concise directive → Realtime music model → streamed audio chunks (→ UI & file)
→ repeat (5s cadence or streaming)

🧪 Tech Stack

Frontend: React, WebAudio, Canvas, WebSocket Backend: FastAPI/Flask, websockets, ffmpeg/yt-dlp AI: Gemini 2.x (Vision+Text), (optional) Vertex AI SFT, Lyria Realtime Utils: Pillow, OpenCV, SSIM, librosa/pydub, (optional) Redis

🚧 Challenges We Overcame

  • Latency vs. musicality: 5-second cadence balances responsiveness with coherence; streaming mode ready.
  • Prompt discipline: strict, compact directives = repeatable, editable outcomes.
  • Stability: only re-steer on meaningful visual change; avoids jittery cues.
  • Generality: space feeds and arbitrary uploads work equally well.

🏆 Accomplishments

  • Realtime picture-aware scoring with human steerage
  • Exportable stems/MIDI for professional post work
  • Astral Data Mode that ties real space metrics to sound design
  • Clean, minimal UI a non-technical creator can use in minutes

📚 What We Learned

  • Structured prompts beat verbose prose for control.
  • Visual features × user steer can feel surprisingly musical.
  • A tiny fine-tune set (50–200 pairs) improves consistency more than long prompts.

🚀 What’s Next

  • Full streaming multimodal loop (continuous frame+text)
  • Style packs (Ambient, Cinematic, Synthwave) with quick A/B
  • Tempo map & cue markers aligned to timecode
  • Better stem separation & MIDI detail for deep editing

🎯 Prize Fit

  • Best Overall: polished UX, complex realtime pipeline, and meaningful creator impact
  • Best Celestial-Themed: live space feeds + real astronomical mappings
  • MLH Best Use of Gemini API: multimodal reasoning + optional Vertex SFT
  • MLH Best Use of Gemini 2.5 Computer Use: vision-driven interaction and control

🔒 Privacy & Safety

  • Secrets live only on the server; frontend never sees keys.
  • Clear session deletion and auto-purge of buffers after download.
  • Uploads are processed for scoring only—no resale, no third-party sharing.

❤️ Why Earth2Echo

Because creators deserve speed without losing soul—a co-composer that listens to the picture, follows your direction, and hands you assets you can finish and own.

🧑‍🚀 Team

Sahas, Evan, Alan, and Adarsh.

Built With

+ 3 more
Share this project:

Updates