Inspiration

We were inspired by the vast number of awesome mythical creatures from old legends so we thought it would be really fun to make a dedicated search engine for them!

What it does

We made a web scraper tool that retrieves information on almost every mythical creature from Wikipedia. It uses natural language processing to generate a set of adjectives to describe the creature and combines all the information into a file. The website and API then work together to allow users to search for creatures by name or describing features.

How we built it

We worked in a team of 4. Billy and James built the web scraper using PHP and Javascript, with Billy writing functionality to retrieve the core information from Wikipedia and James using natural language processing to describe each article. Laurence designed and built the user-facing website using HTML, CSS and Javascript. The website interacts with the search API, written by Tom Panton using Go.

Challenges we ran into

Ensuring everyone's sets of code worked with exactly the same creature data format was a definite challenge, requiring several tweaks to perfect. The large differences in the formatting and wording of Wikipedia articles was also tricky. Billy and James managed to finetune the scraper considerably to rectify this but some minor problems remain, which we will rectify with time.

Accomplishments that we're proud of

We're really proud of the end product because we've managed to use the core technologies used by major search engines in just 36 hours. We're also pleased with how we've communicated to successfully integrate 4 sets of code written in many different programming languages.

What we learned

We've learned that communication is super important, especially when it comes to things requiring meticulous attention to detail, like file formats. We've also found that coding something from scratch so quickly is extremely satisfying. Finally, we've learnt that although natural language processing is evolving, it's still not perfect, especially on challenging text!

What's next for Legendary Creature Browser

We'll give the site a domain name! It already has one but the DNS propagation is taking a long time. We'll also work on finetuning description and image extraction from Wikipedia on pages that are challenging for the natural language processing and web scraper systems.

Share this project:

Updates