About the project: Causality (SDG/ESG Evaluation Agent)
What inspired us
We kept seeing “green” investment recommendations that sounded convincing but were difficult to verify. Most real-world workflows still rely on opaque ESG scores, ad-hoc prompt tests, or narrative justifications, which makes it hard to tell whether a sustainability claim is evidence-supported or drifting into greenwashing. This gap motivated Causality: an evaluation-layer agent that audits sustainability reasoning instead of generating investment advice.
What we learned
Building Causality taught us how to convert a qualitative question (“Is this SDG-aligned?”) into something testable and repeatable:
- how to decompose narratives into atomic, verifiable claims,
- how to align claims to indicators/disclosures with traceable provenance,
- how rubric design affects reliability and false positives/negatives,
- how to reduce hallucinations by forcing evidence-grounded outputs and structured constraints.
How we built it
Causality is a modular pipeline that turns free-form recommendations into a structured audit:
- Input & normalization: ingest a recommendation (AI or human) and standardize entities (company, sector, timeframe).
- Claim extraction: split text into testable sustainability claims (e.g., emissions reduction, labor practices, governance policies).
- Indicator mapping: map each claim to relevant SDG targets and ESG indicators/disclosures; attach citations/evidence snippets.
- Rubric scoring: apply SDG-aligned rubrics to score support strength, reasoning quality, and transparency.
- Risk flagging: detect red flags (missing evidence, vague claims, inconsistent logic, cherry-picked metrics).
- Report generation: output a compact scorecard plus an audit-style explanation and evidence links.
A simple aggregation used for summarizing rubric dimensions is:
$$ S_{\text{overall}}=\sum_{k=1}^{K} w_k\, s_k, \qquad \sum_{k=1}^{K} w_k = 1 $$
where (s_k) is the rubric score for dimension (k) (e.g., evidence support, SDG alignment, transparency) and (w_k) is its weight.
Challenges we faced (and how we handled them)
- Ambiguous claims (e.g., “sustainable”, “responsible”): we enforced claim templates and required measurable indicators or disclosures.
- Evidence gaps / inconsistent disclosures: we treated “insufficient evidence” as a first-class outcome rather than forcing a confident score.
- Hallucination risk in LLM reasoning: we used structured outputs, strict evidence citations, and validation checks to keep the model grounded.
- Rubric calibration: we iterated on scoring rules using edge cases (strong marketing language, partial data, mixed SDG impacts) to reduce over-penalizing legitimate claims.
Outcome
Causality turns “trust me” sustainability narratives into a repeatable evaluation: a scorecard with evidence links and greenwashing risk flags, designed to plug into compliance and portfolio workflows.
Built With
- api
- interactive-data
- natural-language-processing
- python
- webplatform
Log in or sign up for Devpost to join the conversation.