toogle

A custom search engine I'm making

Simple search engine complete with a web crawler and an indexer written in Go and Python, respectively. The crawler scans the web for various web pages and their data, and the indexer removes stopwords, lemmetizes, and formats all of the data. There is also a query processor with a search function with synonym expansion and lemmetization. See psuedocode for more information on my plans for how the project works (and how it will work).

Do note that it only runs on python 3.7 to 3.12

Dependencies

You will need an up-to-date installation of:

NLTK

Go

Python

Git

SpaCy

SQLite3

sklearn

Xcode devtools (mac only)

Note that after installing nltk, you will need to run nltk.download('wordnet') to install wordnet

How to run:

___LINUX, MAC OS & WINDOWS___

  1. Git clone and cd into the crawler directory
  2. Run go mod tidy
  3. Run go run crawler.go
  4. Cd into the indexer directory and run python indexer.py

Accesing the database

Do expect the crawler to take a while, it is literally scanning and saving descriptions and titles from every single website ever created. You can however stop it whenever you want with Ctrl + C and it will have updated the database for the websites that have already been scanned

When running the indexer, as it is $$O(n)$$, the longer you run the crawler for, the longer the indexer will run for. The indexer will eventually prompt you to search for something, and then it will print the results (if there are any) along with their tf-idf scores

Hope you enjoy!

Built With

Share this project:

Updates