toogle
A custom search engine I'm making
Simple search engine complete with a web crawler and an indexer written in Go and Python, respectively. The crawler scans the web for various web pages and their data, and the indexer removes stopwords, lemmetizes, and formats all of the data. There is also a query processor with a search function with synonym expansion and lemmetization. See psuedocode for more information on my plans for how the project works (and how it will work).
Do note that it only runs on python 3.7 to 3.12
Dependencies
You will need an up-to-date installation of:
Note that after installing nltk, you will need to run nltk.download('wordnet') to install wordnet
How to run:
___LINUX, MAC OS & WINDOWS___
- Git clone and cd into the crawler directory
- Run
go mod tidy - Run
go run crawler.go - Cd into the indexer directory and run
python indexer.py
Accesing the database
Do expect the crawler to take a while, it is literally scanning and saving descriptions and titles from every single website ever created. You can however stop it whenever you want with Ctrl + C and it will have updated the database for the websites that have already been scanned
When running the indexer, as it is $$O(n)$$, the longer you run the crawler for, the longer the indexer will run for. The indexer will eventually prompt you to search for something, and then it will print the results (if there are any) along with their tf-idf scores
Hope you enjoy!
Log in or sign up for Devpost to join the conversation.