Inspiration

Space photos hit me in a weirdly human way. A nebula feels calm, a galaxy feels loud, a star field feels lonely. The images are silent though. I wanted to turn that feeling into something you can actually hear. I tried pure synthesis first, but quick DSP often sounds thin. So I built a system that understands an image like a person would and then curates music that matches the story in the photo.

What it does

CelestiSynth takes an astronomy image, extracts mood, colors, and objects with Gemini Vision, adds simple image metrics like brightness and complexity, and then uses Gemini Text to build a listening guide. You get a poetic caption, three themed mixes with vibes and tempo ranges, and clickable search queries for Spotify or YouTube that lead to real music that fits the image. There is also an optional AI Original Mix from my synth engine if you want a custom ambient piece.

How we built it

  • Frontend: React, Vite, TypeScript, Tailwind for a clean starfield UI. Wavesurfer for playback and waveform.
  • Backend: FastAPI for endpoints. Static serving for generated tracks. Simple storage in a local tracks folder.
  • AI analysis: Gemini Vision returns mood adjectives, color words, object hints, and a short description. Pillow and NumPy compute brightness and edge density.
  • Curation: A strict Gemini Text prompt converts analysis into mixes, search queries, and a short caption. I return strict JSON to avoid parsing errors.
  • Audio engine: NumPy and pydub for a pad and arpeggio with detuned saws, ADSR, a low pass filter, stereo width, and reverb. It now sounds warmer, but curation is the default path for quality.
  • Optional narration: ElevenLabs can read the caption over the intro at low volume.
  • Simple mapping math used for tempo and energy from image features: [ E = 0.5 + 0.3B + 0.2C ] where (B) is brightness in ([0,1]) and (C) is complexity in ([0,1]).

Challenges we ran into

  • Synthesis quality: Basic sine pads felt gimmicky. I added detune, envelopes, filtering, stereo, noise beds, and reverb. Better, but still limited under hackathon time.
  • Prompt design: For curation I needed strictly valid JSON. I had to tighten the system prompt and handle fallback parsing.
  • Latency and UX: I wanted instant feel. I cached analysis for identical images by hash and kept audio rendering short.
  • Mapping emotion: Deciding how brightness, color, and mood influence scale, tempo, and timbre took several test loops.

Accomplishments that we're proud of

  • Curator Mode feels right. The music you find actually matches the photo. The caption adds a human layer.
  • The UI is simple and cinematic. Upload, see mood and colors, click to explore mixes, and play results fast.
  • The codebase is clean. React and FastAPI with clear contracts, strict JSON responses, and a minimal audio pipeline that still feels musical.

What we learned

  • Multimodal AI is powerful when you treat it like a collaborator. Vision pulls the facts. Text shapes the taste.
  • Sound design needs motion and space. Detune, envelopes, filtering, stereo, and reverb move the needle more than big pitch swings.
  • Clear contracts win. A strict schema for AI output cut errors and let me focus on UX.
  • A good product sometimes means pivoting. Curation gave a better experience than pushing synthesis beyond what time allowed.

What's next for CelestiSynth

  • Deeper curation with platform links and optional use of the Spotify API for direct track embeds.
  • Multi image mode that stitches short chapters into a longer galactic piece with crossfades.
  • Constellation visualizer that traces bright points from the image behind the waveform.
  • Educator mode that shows the mapping math live and adds short facts about the object in the image.
  • MIDI export and stems from the synth path so creators can remix in a DAW.
Share this project:

Updates