Stack Explorer

Inspiration

Both Daman and Sheeza are Software Developers who often find it difficult to decide on a new technology stack when starting a new project. Like many other hackers out there, they also wonder:

What language has gained the most popularity over the years?
What is the most commonly used programming language?
What type of questions were asked on stack overflow ?
What is the most widely used operating system ?

Rather than following random opinions posted online, they want to rely on fact based findings to choose a tech stack instead of following random online opinions. Therefore, they have come up witha solution called, Stack Explorer!

What it does

The Stack Explorer is a Jupyter-Lab notebook where we analyzed more than 3 million stackoverflow questions to understand latest trends in programming languages. We are happy to share our findings with everyone.

How we built it

As part of this project, we analyzed a stack overflow data set provided by Kaggle, and then broke the project down into the following steps:

Data Collection
Data Cleaning
Data Analysis
Result Interpretation

Challenges we ran into

It was a challenging project for us. The following can be highlighted:

We analyzed vast amounts of data (6 GB), so running analytics on such a large amount of data took longer than usual.
Decide what questions should be addressed during the data analysis phase.
Eliminating duplicate data from the dataset.

Accomplishments that we're proud of

Everything we did was an accomplishment, we made possible the functionality we imagined at the start of the project, in other words, our project rocks! 🎸

What we learned

Our personal learnings are the following:

Daman:
- Python libraries, such as Matplotlib, Numpy, Pandas, Scatterplot.
- using jupyterLab Notebook and its git extension.
- using Github project board.
Sheeza:
- JupyterLab, and JupyterLab-Git
- Python Libraries
- How to ask real world applicable questions given a raw data set
- How to debug (a lot)!

What's next for Stack Explorer

Do a Machine Learning model that can answer questions such as based on current popularity and growth rate, what language will become the most popular in 2025? or, could automatically classify stackoverflow questions using Natural Language Processing.

Built With

python

How to run the app locally

Installation Requirements:

Python >= 3.8
npm >= 12.0
Pip3 >= 20
Jupyter Lab >= 3.4.7

To run our Jupyter Lab Notebook, follow these steps:

Install Python 3
Create a virtual environment
Activate the virtual environment
Install Jupyter Lab: pip install jupyterlab
Install Jupyter Lab-Git
Generate a GitHub Personal Access Token
Run Jupyter Lab: jupyter lab
Open the hosting url on your browser. The host link can be found on your terminal/command line after completing step 7.
Download all import dependencies using: pip install <package-name>
Download the "Train.csv" file to your local project directory. Note this single file is too large (2.35GB) to store on a GitHub repo.
You should see a Jupyter Lab Notebook on the hosting link, as well as the git extension on the left hand panel of the web application. You can now use the git extension to commit changes on our notebook using its User-Interface, as opposed to the terminal/command line.