Habi-scope: an interpretable, uncertainty-aware habitability engine

Ranks exoplanets by true surface habitability: transparent sub-scores + confidence, then Monte Carlo → probability. A telescope-ready shortlist, not just “inside the HZ”.


Why I built it (inspiration)

Zero Gravity made me drop rigid checklists. Classic Habitable Zone cuts use average flux, but oceans need more: seasons, a surface, and an atmosphere. I wanted a tool that tells the whole story and is honest about uncertainty, useful to scientists deciding where to point a telescope.


What I built (overview)

Habi-scope turns the NASA Exoplanet Archive CSV into an interpretable score (HABI) with probability and confidence.

  • 7 sub-scores (0 to 1): Energy (flux and seasons), Surface (rockiness), Atmosphere (escape and retention), Stellar (host Teff window), Orbital (sanity and tides), System (multiplicity), Feasibility (can we observe it).
  • HABI: weighted average of available factors, with hard vetoes for obvious no-gos and a confidence term that down-weights missing data.
  • Probability: Monte Carlo (MC) propagates measurement errors and nuisance physics to estimate ( p(\text{viable surface conditions}) ). A calibrated fallback maps HABI to ( p ) when MC is impossible.
  • Outputs: ranked tables, stacked contributions, radar for top candidates, ablation (what matters most), and a small model card.

How I built it (methods)

Data cleaning and dictionary
Parsed the raw CSV (including # COLUMN lines) into a clean table and an auto data dictionary.

Derived physics

  • Stellar luminosity: \( L_\star \propto R_\star^{2}\,(T_\star/T_\odot)^{4} \)
  • Insolation (Earth = 1): \( S = \dfrac{L_\star}{a^{2}} \)
  • Seasonal extremes from eccentricity:
    \( S_{\min}=\dfrac{L_\star}{(a(1+e))^2} ), ( S_{\max}=\dfrac{L_\star}{(a(1-e))^2} \)

Teff-dependent HZ edges (Kopparapu)
\( S_{\rm eff}(T_\star)=S_\odot + aT' + bT'^2 + cT'^3 + dT'^4 ), with ( T'=\dfrac{T_\star-5780}{1000} \)

Atmosphere retention proxy

  • Equilibrium temperature: \( T_{\rm eq}\approx 278\,\text{K}\,S^{1/4}!\left(\dfrac{1-A}{4\varepsilon}\right)^{1/4} \)
  • Jeans-like escape check via \( v_{\rm esc}(M_p,R_p) ) vs. ( T_{\rm eq} \) to score whether air is retainable.

Scoring and probability

  • HABI is a weighted mean of the seven sub-scores that exist for a planet. Confidence is the fraction of total weight supported by data. Vetoes guard against non-physical regimes.
  • MC perturbs \( T_\star, R_\star, a, e, R_p \) with priors for albedo and greenhouse to estimate \( p \). If essential values (usually \( R_p \) for RV planets) are missing, switch to a conservative, calibrated fallback.
# confidence-penalized HABI (illustrative snippet)
W = dict(energy=.25, surface=.20, atmosphere=.20, stellar=.15, orbital=.10, system=.05, feasibility=.05)

def HABI(row):
    num = den = 0.0
    for k, w in W.items():
        v = row.get(f"sub_{k}")
        if v is not None and not pd.isna(v):
            num += w * v
            den += w
    return (num / den) if den > 0 else float("nan")

H = df.apply(HABI, axis=1)
conf = df[[f"sub_{k}" for k in W]].notna().dot(pd.Series(W)) / sum(W.values())
df["HABI_penalized_soft"] = H * (0.5 + 0.5 * conf)

What I learned (insights)

  • HZ is not the same as habitable. Accounting for seasonal extremes \( (S_{\min}, S_{\max}) \) changes borderline rankings.
  • Atmosphere and energy dominate. Ablation shows these factors drive most score movement. Seasonal stability often flips near-threshold cases.
  • Interpretability helps. Sub-scores make it easy to justify or challenge a planet’s rank.
  • Uncertainty matters. MC vs. fallback and confidence penalties prevent over-claiming when key physics are missing.

Results (what the model surfaces)

  • A shortlist of likely-rocky, MC-backed candidates with high HABI and high ( p ).
  • A shortlist of HZ systems needing radius. These have strong energy and stellar context but missing ( R_p ). They are prime for follow-up or habitable moon searches.
  • Clear reasons for each planet via stacked contributions. For example: wins on Energy and Atmosphere, tradeoff on Orbital.

Challenges and how I solved them

  • Sparse RV planets with no radius. MC cannot run, so I mark insufficient_data, reduce rank via confidence, and use the conservative fallback \( p \).
  • Memory limits. Replaced merge-heavy steps with idempotent in-place arrays, down-cast types, and capped MC samples. The notebook stays fast on low-RAM hardware.
  • Scope control. Dropped optional stellar activity penalties for reproducibility under time. Noted as future work.

Why this is valuable (judge lens)

  • Impact: A rigorous, uncertainty-aware telescope triage. Spend precious photons on the right worlds.
  • Innovation: Adds seasons and atmosphere retention to HZ logic, plus an interpretable score and a probability.
  • Completeness: Guardrails, confidence penalties, MC uncertainty, clean exports and visuals.
  • Communication: Every ranking is explainable. Two lists guide immediate follow-up vs. data-collection needs.

Limitations and next steps

  • No stellar activity penalties in this edition. Add GALEX UV and TESS flares to refine stellar environment.
  • Incorporate new masses and radii as they publish to strengthen Atmosphere and Surface scoring.
  • Explore moon habitability proxies around HZ giants.

TL;DR: Habi-scope goes beyond “in the HZ”. It combines physics, interpretability, and uncertainty to produce a ranked, defendable, probability-aware shortlist—a practical tool for choosing the next worlds to study.

Built With

Share this project:

Updates