Inspiration

There are many claims about sustainability, but it is very hard to verify them.

Businesses are releasing ESG and sustainability reports that make claims about environmental impact reduction, ethical sourcing, or carbon neutrality. However, investors, regulators, and the general public find it challenging to evaluate the veracity of these assertions since they are frequently ambiguous, selective, or deceptive.

This leads to a serious issue:

  • Greenwashing skews judgment.
  • ESG reporting is becoming less trustworthy.
  • Scalable technologies are lacking for regulators to validate claims.

To close this gap, we were motivated to create Green Lens.

Green Lens use AI to automatically analyse ESG reports, identify environmental claims, and assess them against the Seven Sins of Greenwashing, revealing evidence-backed risk assessments in place of laborious, manual audits.

What it does

  1. An ESG or sustainability report is uploaded by the user (PDF).
  2. The document's text, structure, and tables are parsed by the system.
  3. Environmental claims are automatically retrieved.
  4. Every assertion is assessed in light of the Seven Sins of Greenwashing.

The front-end shows:

  • Scores for risk (per sin)
  • Snippets of evidence from the report
  • Synopsis of insights

How we built it

Technical Flow

Frontend

The frontend is a Streamlit dashboard where users can explore the results. It includes:

  • a radar chart to show Seven Sins risk scores
  • an evidence panel to inspect supporting text from the report
  • a summary section with key findings

Backend

The backend is built with FastAPI and handles the document analysis pipeline.

How the pipeline works

Step 1. PDF parsing

The system first reads the uploaded ESG or sustainability report PDF and extracts:

  • the main text
  • document structure

Step 2. Claim extraction

Next, the system identifies environmental or sustainability-related claims in the report. This step uses a ClimateBERT-based model to detect and group relevant claims.

Step 3. Seven Sins analysis

Each extracted claim is then checked against the Seven Sins of Greenwashing using separate detection modules. These modules look for patterns such as:

  • Hidden Trade-Off
  • No Proof
  • Vagueness
  • Irrelevance
  • Lesser of Two Evils
  • Fibbing
  • False Labels

Step 4. RAG + LLM reasoning layer

After that, the system retrieves supporting evidence from the report and uses an LLM to:

  • judge how credible the claim is
  • explain why it may be risky
  • connect the score to actual evidence in the text

Step 5. Score aggregation

The outputs from all modules are combined into a structured set of risk scores.

Step 6. Final output

The backend returns the results as structured JSON, which the frontend uses to generate the dashboard visualisations.

Challenges we ran into

One of the biggest challenges was that ESG reports are messy. PDF layouts vary a lot, which makes parsing inconsistent.

Another challenge was that greenwashing detection is not just a text classification problem. A risky claim often needs context, missing evidence, or broader document comparison to be assessed properly.

We also found that some of the Seven Sins overlap conceptually, which makes clean category boundaries difficult.

On the engineering side, aligning the frontend and backend schema took careful design, and the ML backend introduced dependency and setup complexity.

Accomplishments that we're proud of

  • Built a working end-to-end prototype from PDF upload to risk visualization
  • Designed a modular Seven Sins pipeline
  • Connected risk categories to actual evidence in the report
  • Created a frontend structure that can cleanly support real backend outputs
  • Framed greenwashing detection as an explainability problem, not just a scoring problem ## What we learned

Greenwashing detection is not just about labelling text. It needs context, evidence, and explainability.

We also learned that document quality matters a lot. If the PDF structure is weak, downstream NLP becomes much harder.

A modular architecture was important because each sin requires slightly different logic. Some depend more on language patterns, while others depend more on missing evidence or contextual judgment.

Most importantly, users trust a system more when it shows why something was flagged, not just what score it got.

What's next for Green Lens

Short term

  • Integrate the frontend upload flow directly with the backend API
  • Finalise a consistent response schema between backend and frontend
  • Add downloadable report export for users
  • Improve score calibration and the quality of explanations
  • Validate the system on a broader set of ESG reports

Longer term

  • Introduce a human-in-the-loop review workflow
  • Add persistent storage for reports and past analyses
  • Enable comparisons of the same company across reporting years
  • Develop tailored reporting views for regulators and investors

Built With

Share this project:

Updates