We wanted to see how data and information is spread on the internet and how you can find out new ways of manipulating this spread and maybe not get affected by it.
What it does
It displays how information is spread on the internet and how links/informations is linked between each other.From there you can see what domains are referenced by each other and who has the biggest influence.
How I built it
We built it using php, python and gource for visualising the data.
Challenges I ran into
We had some problems with the implementation of the shortest path between two URLs as there was a big amount of BackLinks which had to be taken into consideration. At the moment this process runs smooth as long as you have a modest amount of nodes between the links. This could be improved by building the dataset using C++ and buying a better computer.
Accomplishments that I'm proud of
We managed to make a significant and trust full visualisation of how one piece of information surfs over the internet and influences our lives even if we do not realise this. We can also use these data to find out about other webpages that referenced the main URLs.
What I learned
I have learned a lot about crawling and about how big the internet is. The main lesson was why algorithms have to be efficient. THE INTERNET IS HUGE!!!
What's next for LinkTracker
We might have to improve the efficiency of our shortest path tree algorithm and rewrite some of our function. Create a powerful front-end.