What it does

Our project has analyzed a dataset of over 5,000 exoplanets with data coming from NASA, which are then used to train a model that can check if a planet is habitable or not. This can allow astronauts and scientists to home in and focus their search for life on fewer planets, allowing them to research the right planets.

How we built it

We used the NASA Exoplanet Archive to compile and clean up a database that is made up of a set of 5,562 planets with different values for each. We then take these values and put them together to find out which planets fall in the Goldilocks Habitability Zone, which is where planets can be classified as habitable.

We then use Earth as a reference, defining the total stellar flux it receives from the sun (amount of radiation). As the Earth is habitable AND has life, its stellar flux is an important piece of information to keep stored and utilize for other planets. We then normalize the stellar flux so that it can be used to define more and more planets.

Finally, we define what being “habitable” truly means using all of the gathered information. We give it a range of an okay size for the planet and an okay flux, where if it falls between the ranges for both sets, then the planet can be defined as habitable. The planet is then stored as a value of 1, where if it lacks habitability, it is stored as 0.

We take a balanced split of our 5,000 chosen planets, with 4,000 to train and 1,000 to test the actual model. We then take a balanced amount of habitable and non-habitable planets that is proportional and realistic to what astronauts may actually encounter.

We classify all the training data using a Random Forest Classifier, which uses many different decision trees to make a multitude of predictions. We also do the same for the testing data, but this is done just to check true values so they can be compared with the AI’s predictions.

For the training data, 3,906 were unhabitable and 94 were habitable, while for the testing data, 977 were unhabitable while 23 were habitable.

For the final part of the process, we had our newly trained model predict the classification for 1000 planets based on specific values that can help determine whether they fall within the Goldilocks Zone. We found out that our model had a 100% accuracy rate using the Confusion Matrix to the right, where all planets were accurately predicted. Of course, 1,000 planets are not enough, and you would need a pool of about 100,000 to fully be sure that this model can work every single time. However, NASA has data for only about 5,900 planets, and some of these lack data that we require to analyze whether they may be habitable.

Our Confusion Matrix is attached in the media and can be seen below.

To the side is a graph that compares the stellar flux of a planet logged to the planet radius in terms of the radius of the Earth (how many times more or less is the planet’s radius than the radius of the Earth).

The blue dots resemble uninhabitable planets, while the red dots resemble habitable planets. From these, we can see around where most habitable planets fall, which seems to be when they are around the top left of the graph, having about the same radius and stellar flux that Earth has/experiences.

Our graph is attached in the media and can be seen below.

Challenges we ran into

We had difficulties with creating a way to collect user input and figured that this is something for the future. We learned that we needed to sort and classify our data more specifically, which meant that we needed much more time to actually implement that than we were offered at this hackathon.

Accomplishments that we're proud of

We were proud of being able to successfully compile our data and sort it into training and testing data. We are also proud of our 100% accuracy rate for our model within 1,000 planets for testing data, which is an incredible feat. However, we believe that this may decrease as there are more planets available to be tested. As of now, NASA has data for a little under 6,000 exoplanets, with some missing data that we require. As time passes, we will have access to more data that we could potentially use to improve this project and make it more realistic/usable/viable.

What's next for APEC - AI-Powered Exoplanet Classifier

As we gain access to more data through NASA's database, we plan to carry out more tests on our model to truly test its accuracy to its limit. We also want to add a way for user input to be possible, so your average Joe would be able to put in data for any exoplanet and find out whether or not it is habitable.

Share this project:

Updates