Inspiration

Finding exoplanets is like searching for needles in a cosmic haystack. The Kepler mission has collected light curve data from thousands of stars, but exoplanet-hosting stars represent less than 1% of all observations. Manual review of this data is time-consuming and prone to human error. Automated detection can significantly accelerate discoveries and reduce the workload for astronomers.

What it does

We've developed a sophisticated machine learning pipeline that:

Preprocesses raw light curve data through multiple stages: Fourier transformation to analyze frequency domain characteristics Savitzky-Golay filtering to reduce noise while preserving signal features Normalization and robust scaling to standardize inputs SMOTE augmentation to address class imbalance

Employs multiple model architectures to achieve optimal performance: Dense Neural Networks for baseline performance 1D Convolutional Neural Networks designed specifically for time-series data Pattern recognition that works regardless of where patterns occur in the light curve

Achieves exceptional accuracy in identifying exoplanet candidates: Over 99.5% validation accuracy High precision and recall across both classes Robust performance on unseen test data

How we built it

Our solution is implemented in Python using TensorFlow and Keras, making it accessible and deployable for research institutions. The CNN architecture is specifically designed to detect transit signatures in light curves regardless of their temporal position, addressing a key limitation of traditional methods. Model Architecture Highlights

Input: 3197 flux measurements per star CNN with 1D convolutions (filter sizes of 5 and 3) MaxPooling layers to capture features at different scales Dense output layer with sigmoid activation for binary classification

Challenges we ran into

Extreme class imbalance: With exoplanet-hosting stars representing less than 1% of observations, our models initially struggled with bias toward the majority class. Signal noise: Raw light curve data contained significant noise that obscured the subtle transit signatures, requiring sophisticated filtering techniques. Feature extraction complexity: Identifying the most relevant features from 3197 flux measurements per star presented computational challenges. Temporal variability: Transit events occur at different times in different light curves, making pattern recognition difficult without specialized architectures. Overfitting risks: Given the limited number of positive samples, we had to carefully implement regularization techniques to ensure generalizability.

Accomplishments that we're proud of

Scientific Advancement: Accelerates the pace of exoplanet discovery by efficiently pre-screening thousands of stellar observations Resource Optimization: Allows astronomers to focus their attention on high-probability candidates rather than sifting through vast amounts of raw data Educational Value: Demonstrates the power of machine learning in solving complex astronomical problems Future Exploration: Creates a pipeline that can be applied to new data from ongoing and future space missions

What we learned

Data preprocessing is crucial: The quality of input data dramatically impacts model performance. Our multi-stage preprocessing pipeline (Fourier transformation, Savitzky-Golay filtering, normalization) proved essential for success. CNN architectures excel for this task: 1D convolutional networks outperformed traditional dense networks by effectively capturing transit signatures regardless of their position in the time series. Class imbalance techniques: SMOTE augmentation significantly improved model training by generating synthetic examples of the minority class without compromising data integrity. Interdisciplinary collaboration: Combining astronomical domain knowledge with machine learning expertise was vital for designing effective features and validating results. Interpretability matters: Beyond raw accuracy, developing methods to visualize and interpret what the model identifies as transit signatures helps build trust with the astronomical community.

What's next for Exoplanet Detection

This project could be extended to:

Classify types of exoplanets based on their transit signatures Estimate planetary parameters like size and orbital period Integrate with other exoplanet detection methods (radial velocity, direct imaging) Implement as a live processing system for incoming telescope data

Built With

Share this project:

Updates