Inspiration
Having grown up in an urbanizing region grappling with the double-edged sword of development, We witnessed firsthand the environmental degradation that often accompanies economic progress. Smog-filled skies that were once crystal clear and rivers that nourished communities for generations run dry or polluted. These experiences opened our eyes to the limitations of current environmental monitoring tools in rapidly evolving cities. The fragmented data and reactive policies fail to provide a holistic perspective or predict emerging threats. After studying machine learning, we realized its potential for integrating disparate data streams to enable proactive insights and interventions. By developing an adaptive AI system to synthesize environmental indicators, We hope to create the comprehensive diagnostics needed for communities to curb pollution early and balance sustainability with growth. This project is our opportunity to translate personal passion into an impactful solution empowering data-driven environmental stewardship.
What it does
This project develops an automated pipeline that incorporates several data sources, manual labeling, ensemble modeling, and feature selection to reliably assess environmental quality. It provides data-driven insights on the fundamental components of environmental deterioration, which can be used to improve sustainability policies and activities. The study exhibits machine learning's ability to enable evidence-based environmental assessments.
How we built it
The machine learning algorithm was developed and tested using Python, using Indian Air Quality and Indian Water Quality datasets as main sources. The pipeline was structured into stages for high accuracy and precision. Data preprocessing involved calculating air quality indexes based on chemical concentrations and air particles, while data labeling involved manual review of global city data. A stacking ensemble model, consisting of Random Forest Classifier, Support Vector Classifier, and Logistic Regression, was employed for feature selection. A labeling classifier was introduced to create a comprehensive environmental quality label, combining air and water quality indices, and assigning holistic labels based on predefined scenarios. This weighted, multi-faceted labeling approach enhanced predictions by capitalizing on the strengths of both data sources.
Challenges we ran into
A major obstacle we faced was the highly fragmented nature of environmental data, spread across disparate sources and separate indices, requiring extensive manual aggregation and harmonization to develop integrated insights. Additionally, many important metrics lack real-time, continuous monitoring, relying instead on intermittent reporting which limits timeliness and predictive abilities. Processing massive multivariate spatio-temporal datasets also posed scaling challenges that taxed computational resources and required thoughtful data infrastructure. Furthermore, insufficient labeled training data hindered supervised learning, necessitating creative data augmentation and alternative teaching methods to overcome. The complex machine learning models required to handle multivariate integration also posed challenges for explanation and transparency important for stakeholder adoption. But the immense value of the impact potential made surmounting these obstacles a worthwhile pursuit.
Accomplishments that we're proud of
Our machine learning model was able to achieve very high accuracy scores across a large dataset and our website being able to show our model.
What we learned
We learned about multi class labeling algorithms.
What's next for FuturaSustain: Environmental Monitor using Machine Learning
With an initial prototype developed, FuturaSustain is focused on refining our machine learning model and expanding integration across more environmental data streams. On the algorithmic side, we are tuning neural network architectures to optimize predictive accuracy and are augmenting training with generative processes to handle sparse datasets. To enhance comprehensiveness, we are incorporating more unstructured data sources, like social media and satellite feeds, alongside structured statistics.
Operationally, we are transitioning our pilot from simulated data to live city feeds in partnerships with smart cities. As more urban centers use FuturaSustain for real-time monitoring and diagnosis, our model will rapidly improve via continuous learning. Our goal is a planetary scale deployment, synthesizing data worldwide to share insights across cities.
Looking ahead, we aim to evolve FuturaSustain into a prescriptive system, moving beyond diagnostics to recommending tailored interventions for each city based on the specifics of their environmental challenges. We believe data-driven sustainability is the future and are excited to lead the way with our machine learning platform.
Log in or sign up for Devpost to join the conversation.