Inspiration

Inspired by the complexity of these datasets, we came up with Cosmic DJ to explain these concepts to elementary aged kids through music. Each planet has a song, and each song tells you something about the planet you're exploring. The goal is to turn space literacy into something playful and intuitive, making astrophysics feel like a musical learning journey.

What it does

Cosmic DJ generates a song per planet using GPT-4o. The generated music captures:

  • Planetary traits (like orbit, mass, and brightness).
  • Artistic personas (based on star temperature, constellation, and system type).
  • Melodic patterns (reflecting orbital periods, eccentricities, and multiplicity). Kids ultimately get a musical biography of each planet.

How we built it

  1. Data Prep: Load & filter. The runner reads SpaceData.csv and keeps only rows with values. Required fields. It enforces non-empty values for: pl_name, hostname, disc_year, st_teff, pl_orbper, ra, dec, sy_gaiamag, and also pl_bmasse when present. Identity/system: pl_name, hostname, Multiplicity: sy_snum, sy_pnum, Discovery/meta: disc_year, Dynamics/physics: pl_orbper, st_teff, pl_orbeccen, ra, dec, sy_gaiamag, Diversity sampling (we offer an optional via --max-rows, to avoid token exhaustion).

  2. Core Mapping: Mappings are expressed through instructions inside the system prompt (see next section). The runner itself preserves and forwards the raw values needed by those rules.

  3. System Prompt: ROLE. The system prompt defines “Cosmic DJ” as a creative engine that blends astrophysics literacy, data-driven mapping, and musicology., Input guarantees. It declares the filters and fields the model will receive, aligning with how the runner prepared each row, Mapping rules. The prompt enumerates the bins and interpretations for temperature, orbital period, multiplicity, discovery year, eccentricity, the MCI formula, Gaia magnitude tiers, and constellation/hemisphere style cues. Output contract (strict JSON). The model must return exactly: { "Trait Snapshot": ["..."], "Artist Name": "Exact Artist Name", "Song Blueprint": ["Song Title 1"], "Data Confidence": ["..."], "Kid Summary": "Single paragraph summary" }

  4. Build UI: We created a Streamlit app - Data loading. Reads the JSONL results, constructs a summary DataFrame for quick counts and success rate, Layout. Wide layout with a sidebar selector: choose a system or view all; shuffled planet list to keep exploration playful, Utilizing Spotify's API, we have an Artist block (name, optional Spotify artist image), Song Blueprint list (optional album art for the first track, if resolvable), Scientific Justification (bulleted “Trait Snapshot” from the model)., Kid Summary (friendly explanation).Spotify integration. Uses client-credentials to fetch artist/track imagery when environment variables are present. Metrics available in sidebar. Total planets processed, JSON success rate, and unique artist count.

Challenges we ran into:

Scaling data: A lot of these columns are difficult to grasp conceptually, so figuring out how to turn orbit and mass into musical trends was a difficult task. Consistent formatting: ensuring GPT outputs always respected JSON schema.

Accomplishments that we're proud of

  • Created a data-to-music pipeline that consistently works across hundreds of exoplanets.
  • Designed mappings that are both scientifically meaningful and musically fun.
  • Built a system that makes astrophysics accessible to elementary students without oversimplifying.

What we learned

  • Importance of strict system prompts and guardrails when using LLMs.
  • How to convert raw astrophysical features into engaging narrative formats.

What's next for Cosmic DJ

We want to expand beyond astronomy and bring other complex subjects into our music-to-features pipeline. Our vision is to create a playful space where kids can explore challenging datasets, and make learning easier, fun, and unforgettable.

Built With

Share this project:

Updates