Habi-scope: an interpretable, uncertainty-aware habitability engine
Ranks exoplanets by true surface habitability: transparent sub-scores + confidence, then Monte Carlo → probability. A telescope-ready shortlist, not just “inside the HZ”.
Why I built it (inspiration)
Zero Gravity made me drop rigid checklists. Classic Habitable Zone cuts use average flux, but oceans need more: seasons, a surface, and an atmosphere. I wanted a tool that tells the whole story and is honest about uncertainty, useful to scientists deciding where to point a telescope.
What I built (overview)
Habi-scope turns the NASA Exoplanet Archive CSV into an interpretable score (HABI) with probability and confidence.
- 7 sub-scores (0 to 1): Energy (flux and seasons), Surface (rockiness), Atmosphere (escape and retention), Stellar (host Teff window), Orbital (sanity and tides), System (multiplicity), Feasibility (can we observe it).
- HABI: weighted average of available factors, with hard vetoes for obvious no-gos and a confidence term that down-weights missing data.
- Probability: Monte Carlo (MC) propagates measurement errors and nuisance physics to estimate ( p(\text{viable surface conditions}) ). A calibrated fallback maps HABI to ( p ) when MC is impossible.
- Outputs: ranked tables, stacked contributions, radar for top candidates, ablation (what matters most), and a small model card.
How I built it (methods)
Data cleaning and dictionary
Parsed the raw CSV (including # COLUMN lines) into a clean table and an auto data dictionary.
Derived physics
- Stellar luminosity: \( L_\star \propto R_\star^{2}\,(T_\star/T_\odot)^{4} \)
- Insolation (Earth = 1): \( S = \dfrac{L_\star}{a^{2}} \)
- Seasonal extremes from eccentricity:
\( S_{\min}=\dfrac{L_\star}{(a(1+e))^2} ), ( S_{\max}=\dfrac{L_\star}{(a(1-e))^2} \)
Teff-dependent HZ edges (Kopparapu)
\( S_{\rm eff}(T_\star)=S_\odot + aT' + bT'^2 + cT'^3 + dT'^4 ), with ( T'=\dfrac{T_\star-5780}{1000} \)
Atmosphere retention proxy
- Equilibrium temperature: \( T_{\rm eq}\approx 278\,\text{K}\,S^{1/4}!\left(\dfrac{1-A}{4\varepsilon}\right)^{1/4} \)
- Jeans-like escape check via \( v_{\rm esc}(M_p,R_p) ) vs. ( T_{\rm eq} \) to score whether air is retainable.
Scoring and probability
- HABI is a weighted mean of the seven sub-scores that exist for a planet. Confidence is the fraction of total weight supported by data. Vetoes guard against non-physical regimes.
- MC perturbs \( T_\star, R_\star, a, e, R_p \) with priors for albedo and greenhouse to estimate \( p \). If essential values (usually \( R_p \) for RV planets) are missing, switch to a conservative, calibrated fallback.
# confidence-penalized HABI (illustrative snippet)
W = dict(energy=.25, surface=.20, atmosphere=.20, stellar=.15, orbital=.10, system=.05, feasibility=.05)
def HABI(row):
num = den = 0.0
for k, w in W.items():
v = row.get(f"sub_{k}")
if v is not None and not pd.isna(v):
num += w * v
den += w
return (num / den) if den > 0 else float("nan")
H = df.apply(HABI, axis=1)
conf = df[[f"sub_{k}" for k in W]].notna().dot(pd.Series(W)) / sum(W.values())
df["HABI_penalized_soft"] = H * (0.5 + 0.5 * conf)
What I learned (insights)
- HZ is not the same as habitable. Accounting for seasonal extremes \( (S_{\min}, S_{\max}) \) changes borderline rankings.
- Atmosphere and energy dominate. Ablation shows these factors drive most score movement. Seasonal stability often flips near-threshold cases.
- Interpretability helps. Sub-scores make it easy to justify or challenge a planet’s rank.
- Uncertainty matters. MC vs. fallback and confidence penalties prevent over-claiming when key physics are missing.
Results (what the model surfaces)
- A shortlist of likely-rocky, MC-backed candidates with high HABI and high ( p ).
- A shortlist of HZ systems needing radius. These have strong energy and stellar context but missing ( R_p ). They are prime for follow-up or habitable moon searches.
- Clear reasons for each planet via stacked contributions. For example: wins on Energy and Atmosphere, tradeoff on Orbital.
Challenges and how I solved them
- Sparse RV planets with no radius. MC cannot run, so I mark
insufficient_data, reduce rank via confidence, and use the conservative fallback \( p \). - Memory limits. Replaced merge-heavy steps with idempotent in-place arrays, down-cast types, and capped MC samples. The notebook stays fast on low-RAM hardware.
- Scope control. Dropped optional stellar activity penalties for reproducibility under time. Noted as future work.
Why this is valuable (judge lens)
- Impact: A rigorous, uncertainty-aware telescope triage. Spend precious photons on the right worlds.
- Innovation: Adds seasons and atmosphere retention to HZ logic, plus an interpretable score and a probability.
- Completeness: Guardrails, confidence penalties, MC uncertainty, clean exports and visuals.
- Communication: Every ranking is explainable. Two lists guide immediate follow-up vs. data-collection needs.
Limitations and next steps
- No stellar activity penalties in this edition. Add GALEX UV and TESS flares to refine stellar environment.
- Incorporate new masses and radii as they publish to strengthen Atmosphere and Surface scoring.
- Explore moon habitability proxies around HZ giants.
TL;DR: Habi-scope goes beyond “in the HZ”. It combines physics, interpretability, and uncertainty to produce a ranked, defendable, probability-aware shortlist—a practical tool for choosing the next worlds to study.
Log in or sign up for Devpost to join the conversation.