Inspiration

With pop culture as our theme, we were inspired to create an interactive project using familiar characters anyone would recognize.

What it does

Our project takes in a user's personality traits and a brief background of them and outputs a Disney character most similar to them.

How we built it

We built the project on Google Colab using Python. For our project, we needed an HTML parser to webscrape for personality traits, and an NLP to process that data. We then organized that data into a dataframe and used a pre-trained transformer model (sentence-transformers all-MiniLM-L6-v2) to perform vector embedding. Then, any user input can also be embedded into the model, and the most similar Disney character can be found.

Challenges we ran into

To get the data about each character, we needed to get a link to their fandom page to scrape their data off of. However some characters had the same name as their movie, so using our base URL and adding their name at the end would instead take you to the movie page that didn't contain personality info. To account for this, if no personality info is found, our code tries a different URL with "(character)" at the end to ensure it finds the character page. We also tried having an LLM explain why the user is similar to the matched Disney character, but its explanations were too nonsensical.

Accomplishments that we're proud of

We're proud to have successfully implemented an HTML parser and NLP to extract data that seamlessly works with our model to extract the character most similar to the user.

What we learned

We learned how to effectively webscrape using spaCy and BeautifulSoup, apply our vector embedding model to the data, and do front-end and back-end web development.

What's next for Disney Doppelgängers

Implementing a fully functioning website, creating a larger dataset to improve accuracy, and including characters from other domains.

Built With

Share this project:

Updates