Gradient Gang

About the project: Causality (SDG/ESG Evaluation Agent)

What inspired us

We kept seeing “green” investment recommendations that sounded convincing but were difficult to verify. Most real-world workflows still rely on opaque ESG scores, ad-hoc prompt tests, or narrative justifications, which makes it hard to tell whether a sustainability claim is evidence-supported or drifting into greenwashing. This gap motivated Causality: an evaluation-layer agent that audits sustainability reasoning instead of generating investment advice.

What we learned

Building Causality taught us how to convert a qualitative question (“Is this SDG-aligned?”) into something testable and repeatable:

how to decompose narratives into atomic, verifiable claims,
how to align claims to indicators/disclosures with traceable provenance,
how rubric design affects reliability and false positives/negatives,
how to reduce hallucinations by forcing evidence-grounded outputs and structured constraints.

How we built it

Causality is a modular pipeline that turns free-form recommendations into a structured audit:

Input & normalization: ingest a recommendation (AI or human) and standardize entities (company, sector, timeframe).
Claim extraction: split text into testable sustainability claims (e.g., emissions reduction, labor practices, governance policies).
Indicator mapping: map each claim to relevant SDG targets and ESG indicators/disclosures; attach citations/evidence snippets.
Rubric scoring: apply SDG-aligned rubrics to score support strength, reasoning quality, and transparency.
Risk flagging: detect red flags (missing evidence, vague claims, inconsistent logic, cherry-picked metrics).
Report generation: output a compact scorecard plus an audit-style explanation and evidence links.

A simple aggregation used for summarizing rubric dimensions is:

$$ S_{\text{overall}}=\sum_{k=1}^{K} w_k\, s_k, \qquad \sum_{k=1}^{K} w_k = 1 $$

where (s_k) is the rubric score for dimension (k) (e.g., evidence support, SDG alignment, transparency) and (w_k) is its weight.

Challenges we faced (and how we handled them)

Ambiguous claims (e.g., “sustainable”, “responsible”): we enforced claim templates and required measurable indicators or disclosures.
Evidence gaps / inconsistent disclosures: we treated “insufficient evidence” as a first-class outcome rather than forcing a confident score.
Hallucination risk in LLM reasoning: we used structured outputs, strict evidence citations, and validation checks to keep the model grounded.
Rubric calibration: we iterated on scoring rules using edge cases (strong marketing language, partial data, mixed SDG impacts) to reduce over-penalizing legitimate claims.

Outcome

Causality turns “trust me” sustainability narratives into a repeatable evaluation: a scorecard with evidence links and greenwashing risk flags, designed to plug into compliance and portfolio workflows.

GitHub Repository