I wanted to do something with an html parser. Someone mentioned this idea to me, sounded fun.
What it does
It takes in a start URL, a end URL, and optionally a prefix to limit the searching. For instance if you only want to look at links on uiowa.edu, uiowa.edu is added as a parameter when parsing the neighboring links from a URL
How I built it
I used Java and the jsoup library for the html parser.
Challenges I ran into
The hardest part was getting rid of the unnecessary edges after finding the shortest path from the start to the end URL. The other difficulties where dealing with infinite loops and processing time. Limiting the URLs to a certain prefix helped a lot with the processing time and made the program potentially more useful.
Accomplishments that I'm proud of
I was able to BFS on links using knowledge of hashing, HashMaps,and HashSets to improve running time.
What I learned
I learned a linear way to remove unnecessary edges leaving only the path from A to B by traversing backwards through the output ArrayList of edges added.
What's next for LinkPath
Possibly a GUI, adding a timeout to prevent lengthy queries, other information related to parsing the html along the path.