I lived in Delhi for years and I love the city and its people. However, I am appalled at its recent condition and took up this project to better it. In cities like Delhi, staying out could be equivalent to smoking 50 cigarettes. This problem is because of smog and air pollution. Air pollution is the single biggest killer in the world, with some pollutants reducing avg. global life expectancy by 2.2 years. There are >5 million deaths from smog a year. Global economy lost USD 29 trillion in 2018 according to the World Economic Forum because of air pollution. What people do not understand is that this problem could easily be ended if some international laws can be followed. These mainly include cap and trade schemes for emissions. These are called ETS by the United Nations and are used by many countries in the world. However, what does not allow this to work is a very complicated problem - the problem of tracking pollutants. If a factory pollutes, it is very difficult to track the pollutants back to them.

What it does

Smog-Pirate is an inventive, data-driven solution to the above problem. First, I studied years of weather and Air pollution data and found correlations between some factors. Then, using multiple regression and adjusting the number of factors, I was able to forecast AQI (Air Quality Index) by >97% accuracy. I started with 17 inputs and ended with 4. I evaluated and I compare the AQI to the actual AQI and if it is much larger, I track the pollutants to different factories based on wind velocities in the last day. There are many complex steps that I had to go through before making this project. The first was to scrape the web for data for weather and live AQI. I did this through Beautiful Soup 4 and requests. Then, I tried to make a novel algorithm for tracking pollutants, and it worked really well. I changed the wind velocity from the last day to vectors and added them to find coordinates. I then referenced known coordinates of factories to give a list of polluters. I also had to use an API for finding my current coordinates. I then compared AQI's and prescribed a certain penalty to polluters.

How I built it

I built the project on a .ipynb file using Python. First, I collected data from websites about AQI and weather in Delhi in past years. I made a multiple regression model to forecast AQI and filtered down 17 variables to 4. I tried using a neural network too but it was very inaccurate. I scraped data from the web to predict AQI and compared it to actual AQI. If actual AQI was greater than predicted AQI, pollutant tracking is done. Converting wind velocity to vectors and adding them to my coordinates using geolocation from my IP, I found coordinates from which pollution is created and then referenced a list of known factories to create a list of polluters closest to the pollution. I also create a penalty for polluters.

Challenges I ran into

I have not done web scraping before and it took a lot of time to learn how to do it using Beautiful Soup and requests. Reducing the 17 variables to 4 and getting a high accuracy took a lot of time. Working alone was very difficult but I really wanted to create a project like this since late Friday.

Accomplishments that I'm proud of

I am very proud of the pollutant tracking algorithm I invented. Converting velocity to addable vectors is a real world application of my knowledge that instills a lot of confidence in me. I am also very proud that I was able to come up with a novel project of such complexity in such less time.

What we learned

I learned about collecting data from the web and plan to use it to automate data collection in other projects. I also learnt that neural networks, while sounding cooler than most other algorithms, do not always analyse data in the way that is most useful for a project.

What's next for Smog-Pirate

Putting the code in a website or web page so that the UX is better, the people can access the data and view polluters. Governments can also be approached to implement the project at a small scale. The factory list can be changed so that nearby factories can be obtained from an API instead of being user-entered.

Share this project: