Problem Statement

Understanding the evolutionary characteristics that enabled dinosaurs to thrive across the Mesozoic Era is crucial for answering broader questions about life and adaptability . However, research into dinosaur traits often lacks a holistic approach due to fragmented studies, data limitations, methodological challenges, and technological barriers. Although many papers such as Zano 2011 and Sookias 2012 have attempted to dive into this subject, none have been able explore many attributes at once and develop a reproducible strategy to handle a large amount of data.

Our project aims to evaluate the best evolutionary characteristics for dinosaur species to survive during the Mesozoic Era, specifically across three periods: Triassic, Jurassic, and Cretaceous. By looking at attributes such as diet, number of legs, and other features, this project offers a more holistic understanding to the discussion of Paleontology and Evolutionary Biology by incorporating Data Science and building upon the sparse dinosaur datasets currently available. Additionally, by analyzing the past, we hope to provide some insight into the patterns and habits of newly discovered or existing dinosaurs.

Method

Data and Feature Analysis

Using data from Kaggle and Natural History Museum, we analyzed data across three different periods and explored how certain features of each dinosaur affected the amount of time it existed for. For this project, we used four attributes: diet (e.g. whether it ate plants or meat), number of legs it used to walk, dinosaur type (e.g. theropod), and length of the dinosaur. Additionally, we analyzed how these attributes differed by period and location (continent). We then used Linear Regression and One-Hot-Encoding to find which features had the most impact on the time span the dinosaur existed for. The results and code can be found here or below. After that we looked at the overall dataset and found the top 3 "existence" periods and the dinosaurs that fell under those categories. From there we pulled their attributes and built our own Dinosaur!

Visualization

For our visualization, we used CSS and JSX to implement a website to present our findings. This visualization can be found in the Project Media and GitHub repository. There, we show the best evolutionary features for each period and overall, as well as the dinosaurs that fit that attribute. We also present our version of the "best" dinosaur to live in the Mesozoic Era.

Challenges we ran into

One of the challenges we ran into was the lack of datasets regarding dinosaur characteristics. Many that analyze phenotypical traits are often done in research where data is sourced from academic papers or collected primarily. However, this data is not easily accessible to the online community or data scientists looking to contribute to Paleontology using a data science approach. Additionally, there were little websites with organized and well-documented information, making it difficult to web-scrape and acquire more information. This could also be due to the fact that there are many dinosaurs with little to no information. Therefore, in our project, we attempted to manually include some data that would help future data scientists on their Paleontology endeavor.

Accomplishments that we're proud of

We are proud that we were able to accomplish so much in such a short amount of time. Since this was everyone's first hackathon, we are proud that we were able to create a project from scratch, collaborate, incorporate both exploratory analysis and creativity, and learn so much in the span of two days. By providing a holistic approach to the study of Paleontology, we were able to provide some insight into the past and hopefully assist in future Paleontology research when discovering new dinosaurs or finding information about current ones.

What we learned

Everyone learned new things from this project and from the workshops. We were able to acquire many new skills from each other, whether that be teaching each other Pandas or HTML or how to use git. Additionally, we were able to incorporate the skills we learned from the workshops to build our visualization.

What's next for The Best Dinosaur of All Time

For future work, we hope to incorporate more data into a larger dataset, whether that be manually inputting into or finding a new web-scraping strategy. We want to include more features such as ornamentation (e.g. beaks, spiked tails, frills) and how location and these attributes affect time span. We can also explore more nuanced analysis on the interaction of features. To address missing values, we hope that with more time, we can fill in those values with research to provide a more comprehensive dataset.

Built With

Share this project:

Updates