Sonobi's Social Blog Dataset was enormous and contained very interesting information about the user base and their activities. I wanted to use my data mining and analytical skills to extract the useful information out of the dataset
The program uses Python's external library Pandas and Numpy to logically organize the data into DataFrames which allow easy data extraction and manipulation. A number of functions provide various statistical analysis and are easily scalable
Python, Pandas, Numpy, HTML and a lot of energy drinks
Since time was extremely limited and I was the only member of the team, data cleanup took a lot of time. Most of the problems emerged from the "nan" values in the dataset.
So far, information about 26811 bloggers is in the DataFrames from which any data can be extracted or added easily
I have practiced my Python skills and learned a lot more about Pandas and Numpy
I am going to make the project open-source so that other people could use it to read, sort and analyze their datasets easily. If issues arise, I will keep improving the code.