I wanted to do something with an html parser. Someone mentioned this idea to me, sounded fun.

What it does

It takes in a start URL, a end URL, and optionally a prefix to limit the searching. For instance if you only want to look at links on, is added as a parameter when parsing the neighboring links from a URL

How I built it

I used Java and the jsoup library for the html parser.

Challenges I ran into

The hardest part was getting rid of the unnecessary edges after finding the shortest path from the start to the end URL. The other difficulties where dealing with infinite loops and processing time. Limiting the URLs to a certain prefix helped a lot with the processing time and made the program potentially more useful.

Accomplishments that I'm proud of

I was able to BFS on links using knowledge of hashing, HashMaps,and HashSets to improve running time.

What I learned

I learned a linear way to remove unnecessary edges leaving only the path from A to B by traversing backwards through the output ArrayList of edges added.

What's next for LinkPath

Possibly a GUI, adding a timeout to prevent lengthy queries, other information related to parsing the html along the path.

Built With

Share this project: