Inspiration
As part of a generation facing the climate crisis head-on, it's become increasingly apparent how much of an influence financial firms and institutions hold on carbon emissions' impact on the planet. We believe there is a win-win situation where policymakers can make informed decisions on productive legislation, and investors can begin to gain traction in the carbon market projected to be worth $250bn.
What it does
Our platform unifies and analyses vast banks of data from various sources to provide actionable intelligence and information to regulators and investors. Our project's core is the financial market surrounding carbon credits and offsets. As an immature market, the financial data is not enough to tell us the whole story about the future of the carbon market, so we decided to enrich this data with sentiment analysis, geospatial data and government reporting, to operationalise the data that is available to us.
How we built it
Our project has a very simple structure, but complex individual parts. At the centre of the product is the data aggregation and predictive analytics pipeline that combines vast datasets spanning stock price, sentiment, geospatial and government reporting resources. By bringing this information together and utilising cutting-edge machine learning and statistical modelling, we can provide deep insights into the future of carbon offsets and emissions by region and sector. Our web scraper, hosted by CodeWords, scans government reports and financial news sites for quotes and headlines surrounding emissions and movement within carbon-related financial instruments. We continuously run sentiment analysis on this scraped data, using a top-rated natural language processing model from HuggingFace, the results of which feed into our super-dataset. This dataset is also fed with geospatial data from satellite imagery, which allows us to exploit correlations between carbon offset price movement in regions where we have spotted influxes/drops in large freight vehicles, ships, planes or construction sites. We use computer vision, tree-based classifiers and linear regression to estimate the carbon footprint of these vehicles. Combined with information from accredited researchers following carbon capture projects, as well as timeseries market data from KroneShares carbon allowance tracking ETFs, we can make informed spatial-temporal predictions about emissions and carbon offset price movement, as well as how many carbon offsets will be deposited, released and allocated by governments and projects.
Challenges we ran into
The hardest part of this all was bringing together a super dataset to capture complex correlations and relationships between the different elements of the dataset. We utilised multiple models compounded together using intricate preprocessing techniques to effectively and meaningfully join our datasets together such as to maintain the spatial-temporal relationships hidden inside of them.
The more tedious issue was deployment and mobilisation. It was difficult to get all of our models deployed for free within the time constraints we were placed under. We have to pre-train and query our models to serve data for some demonstrations.
Accomplishments that we're proud of
Having a working project at all! Getting our models to acceptable error metrics, deploying them, setting up backends and a frontend dashboard as well as organising all of our data was a huge task the entire team put a lot of work into.
What we learned
Machine learning gets a lot harder as the dependencies, relationships and size of your data increase. Exponentially harder. Optimisation, tuning, and model structure become much more important than when you are working with traditional timeseries forecasting.
What's next?
More data, more granularity, more analytics. There is so much data enrichment yet to be done and this platform can become a valuable tool for stakeholders to chip away at vast amounts of connected and correlated data with internal dependencies, joins and correlations already exploited. We have lots more ideas regarding national grid statistics, thermal imaging and global temperature statistics.
Log in or sign up for Devpost to join the conversation.