Main Page
Referece Verification

R.I.S.E. — Reputation-Integrated Scoring Engine

Community-vouched alternative credit scoring for the 1.3 billion adults invisible to traditional finance.

Hack the Globe 2026 | Economic Empowerment Track

Inspiration

1.3 billion adults worldwide lack a financial account. Over 2 billion work in the informal economy — street vendors, gig drivers, domestic workers — with no financial paper trail. The global MSME and informal credit gap sits at USD 8 trillion (IFC). These aren't risky borrowers. They're invisible ones.

Legacy credit systems like FICO were designed in 1989 for salaried workers with bank accounts, credit cards, and mortgages. That framework structurally excludes the majority of borrowers in emerging markets — and 26 million credit-invisible Americans.

We studied the solutions that came before and found that each solved part of the problem while creating new ones:

Grameen Bank proved the poor are creditworthy but introduced coercive peer pressure dynamics that excluded widows and migrants.
M-Pesa achieved 91% mobile penetration in Kenya but created surveillance risks and ignored social context.
India's PMJDY opened 180 million accounts in a year, but 40%+ went dormant — access without relevance.
Tala and Branch brought digital speed but rely on black-box models, face GDPR/NDPR pressure, and treat borrowers as isolated digital entities.

R.I.S.E. was designed to combine the strengths of each while directly addressing their failures. The core question: can we turn informal income and real business relationships into a financial identity — without surveillance, coercion, or black boxes?

What it does

R.I.S.E. is a B2B2C credit scoring platform that scores borrowers who are invisible to traditional finance by combining three pillars:

1. Alternative Data Scoring

A LightGBM classifier trained on mobile money flows, utility payment regularity, airtime top-up patterns, and merchant transaction history. Research from the IFC and BSP has validated these as reliable proxies for repayment capacity in markets like Kenya and the Philippines. No need to trust verbal income claims — M-Pesa transaction history proves it.

2. Verified References (Transaction-Backed)

References in R.I.S.E. are business and transaction relationships, not a blind social graph. A reference like "I've supplied Maria with fresh produce for three years" is checked against real cash-flow patterns and verification rules. Unverifiable references receive zero credit in the model. This directly mitigates the coercive peer pressure dynamics that undermined Grameen-style joint liability, while preserving a genuine economic signal.

This pairs with network contamination scoring — exposure to risky or unverified counterparties is a separate model input that penalizes blind trust in default-prone clusters.

3. SHAP Explainability

Every credit decision comes with a per-applicant factor breakdown showing exactly which features helped or hurt the score. Loan officers see why, not just what. This satisfies emerging GDPR Article 22 and NDPR transparency requirements — critical for adoption by regulated MFIs.

The Product

The loan officer dashboard (React SPA) lets officers select a borrower, view their mobile money signals and credit behavior, run a credit score via our FastAPI backend, and see the full SHAP breakdown, network contamination index, and verified references — all in one interface.

Demo personas tell the story:

Persona	Background	Score	What It Shows
Maria Otieno	Market vendor, Nairobi. Consistent M-Pesa, one late payment.	64 → 80 (with verified reference)	Borderline borrower approved through community vouching
James Okonkwo	Gig driver, Manila. Strong digital trail, zero delinquencies.	~83	R.I.S.E. correctly identifies strong borrowers
Aisha Mbeki	Recent migrant, Dar es Salaam. Limited history.	~37	R.I.S.E. doesn't rubber-stamp everyone — responsible scoring

Competitive Positioning

Feature	R.I.S.E.	Tala / Branch	Grameen-Style MFIs
Core Signal	Cash flow + verified refs + network contamination	Device metadata	Joint liability groups
Explainability	SHAP per-applicant	Black box	Manual / subjective
Regulatory Risk	Low (no invasive data)	High (GDPR/NDPR)	Low
Scalability	API-first, near-zero marginal cost	High (digital-native)	Low (in-person ops)
Coercion Risk	Mitigated via cash-flow overlay	None	High (peer pressure)

How we built it

Architecture

Frontend (Vite + React SPA)
        ↕ POST /score
FastAPI API (main.py)
  - Reference graph lookup (voucher_graph.json)
  - Network contamination computation
  - SHAP TreeExplainer
        ↕ predict_proba
LightGBM Model (16 features, AUC ~0.88)
  - Trained on 150K labeled records
  - Kaggle GMSC + synthetic M-Pesa augmentation

Data Pipeline

Base data: Kaggle "Give Me Some Credit" dataset — 150,000 labeled borrower records with income, debt ratio, delinquency history, and repayment outcomes.
Synthetic augmentation (data_gen.py): We generated four features to simulate the mobile money data R.I.S.E. would ingest in production via M-Pesa/GCash APIs — mpesa_regularity_score, airtime_topups_monthly, utility_streak_months, and merchant_txn_density. A fifth feature, voucher_centrality, encodes verified reference strength (~70% of rows get 0.0, ~30% get a value correlated with good outcomes). A sixth, network_contamination_risk, is derived from the same heuristic used at inference time.
Preprocessing (preprocess.py): Median imputation for missing income, outlier capping at the 99th percentile, 80/20 stratified train/validation split.
Training (train.py): LightGBM classifier with balanced class weights (93/7 class imbalance). Validation AUC-ROC consistently hits ~0.87–0.88 after adding the mobile money, voucher, and network features.

Scoring Flow

When a loan officer clicks "Calculate R.I.S.E.":

Frontend sends borrower features + optional reference ID to POST /score
API looks up the reference in voucher_graph.json — if verified, sets voucher_centrality; otherwise 0.0
API computes network_contamination_risk (0–1) using the same formula as training: weaker M-Pesa regularity, failed reference checks, and missing verification raise contamination
LightGBM runs predict_proba on all 16 features → default probability
Score = (1 - default_probability) × 100 on a 0–100 scale
SHAP TreeExplainer computes per-feature contributions
API returns score, risk band, SHAP breakdown, reference metadata, contamination index, and network graph payload
Frontend renders the score gauge, SHAP drivers, network map, verified references, and benchmark comparison

The Voucher Mechanism (Key Innovation)

The voucher score shift is not hardcoded — it comes from the trained model. Because voucher_centrality was included as a training feature with variance correlated to good outcomes, LightGBM learned that high voucher centrality predicts repayment. At inference:

V001 (Joseph Kamau, supplier, centrality 0.72): Strong positive SHAP driver when verified — this is what pushes Maria from 64 to 80.
V002 (Agnes Wanjiku, neighboring vendor, centrality 0.58): Moderate boost when verified.
V003 (Joseph Mwangi, unverified contact, centrality 0.0): Fails verification — API sets voucher_centrality to 0.

This directly addresses the collusion and coercion problems identified in social collateral research.

Validation

Model performance: AUC-ROC of ~0.88 on held-out validation set (80/20 stratified split, 150K records)
SHAP consistency: Feature attributions are stable across runs and align with domain expectations (verified references and M-Pesa regularity are top positive drivers; late payments and high debt ratio are top negative drivers)
Demo honesty: Reference detail pages show illustrative transaction narratives for the hackathon demo. Scores and SHAP values come from the trained model and live API responses — not mocked.
Network contamination: The contamination index tracks expected behavior — verified references with strong payment history lower contamination; unverified or risky counterparties raise it

Tech Stack

Component	Technology	Why
ML Model	LightGBM	Production-grade gradient boosting; handles class imbalance natively; fast inference
Explainability	SHAP (TreeExplainer)	Gold standard for per-prediction feature attribution; regulatory-grade
API	FastAPI	Async Python, auto-generated OpenAPI docs, Pydantic validation
Frontend	Vite + React 18 + React Router + Recharts + Chart.js + Tailwind CSS	SPA with dashboard and reference detail pages, network contamination panel, verified references
Data	Kaggle GMSC + synthetic augmentation	150K labeled records; synthetic features bridge to M-Pesa production data

Challenges we ran into

Data access is the fundamental challenge. Mobile money operators like Safaricom (M-Pesa) don't share transaction data openly. For the hackathon, we bridged this with synthetic augmentation — generating realistic mobile money features calibrated to published M-Pesa usage statistics. The model architecture is designed so that swapping synthetic features for real Daraja API data requires changing the ingestion layer only, not retraining the model structure.

Class imbalance (93/7 split). The Kaggle dataset has a ~93% non-default / 7% default ratio. We addressed this with LightGBM's native is_unbalanced flag and validated that recall on the minority class was reasonable without sacrificing precision.

Voucher signal design. Getting the model to learn voucher centrality as a genuine predictive signal (rather than noise) required careful synthetic generation — correlating voucher presence with good outcomes at realistic rates (~30% voucher coverage, not 100%) so the model doesn't overfit to it.

Network contamination calibration. Tuning the contamination heuristic so it penalizes real risk exposure without unfairly punishing borrowers in dense informal networks was iterative. We anchored the formula to payment stability and verification status rather than network size.

Frontend complexity. Building a polished loan officer dashboard with a score gauge, SHAP chart, network contamination graph, verified references section, and reference detail pages — all in a hackathon timeframe — required aggressive scope management and component-level division of labor.

Accomplishments that I'm proud of

The voucher mechanism works as designed. The score shift from verified references is learned by the model, not hardcoded. Maria Otieno's score genuinely moves from 64 to 80 when Joseph Kamau's reference is verified — and SHAP correctly attributes that shift to "Verified reference strength." This is the core innovation and it holds up.

AUC of ~0.88 on a real dataset. This isn't a toy model. 150,000 labeled records, 16 features, proper train/validation split, and consistent performance across runs.

Full-stack integration. The frontend hits a real API that runs a real model with real SHAP computation. Nothing is mocked on the scoring side. The loan officer can change inputs and get meaningfully different scores and explanations.

Network contamination as a model feature. Most alternative credit scoring approaches ignore counterparty risk entirely. We built it as a first-class model input that's computed at inference time, visualized on the dashboard, and explained via SHAP.

Demo honesty. We were transparent about what's real (model, scores, SHAP, API) and what's illustrative (transaction narratives on the reference detail page). The README and UI both label the illustrative components clearly.

What we learned

Alternative data is powerful but access is the bottleneck. The ML side of alternative credit scoring is well-understood — the real barrier to adoption is getting mobile money operators to share data through APIs. Our production roadmap starts with Safaricom's Daraja API for exactly this reason.

Explainability changes the product. Adding SHAP didn't just satisfy a regulatory checkbox — it changed how we designed the dashboard. When loan officers can see why a score moved, they trust the system enough to act on it. Black-box scores get ignored.

Community vouching needs guardrails. Naive social graph scoring recreates the coercion problems of joint liability. Transaction-backed verification and network contamination scoring are the guardrails that make community signals safe to use.

Scope discipline matters. We cut several planned features (multi-applicant batch scoring, historical score tracking, PDF report export) to ship a polished core experience. The features we kept all work end-to-end.

What's next for R.I.S.E.

Short-Term Improvements (1–2 weeks)

PostgreSQL persistence: Replace in-memory JSON lookups with a proper database for borrower records, reference graphs, and score history
JWT authentication: Secure the API for multi-tenant use by MFI partners
Batch scoring endpoint: Allow loan officers to score multiple applicants in a single request
PDF report export: Generate downloadable credit reports with score, SHAP breakdown, and reference verification details for compliance records
Real feature engineering pipeline: Replace synthetic M-Pesa features with a structured ingestion layer ready to accept Daraja API data

Long-Term Scalability (6–12+ months)

Safaricom Daraja API integration: Connect to real M-Pesa transaction data for the Kenya pilot (target: 500–1,000 borrowers with 3–5 MFI partners)
Model retraining pipeline: Airflow + MLflow for automated retraining as real repayment outcome data accumulates
GDPR/NDPR compliance certification: Formal audit trail, data retention policies, and right-to-explanation infrastructure
Public API marketplace: Per-query pricing ($0.08–0.15/call) + monthly SaaS tiers ($500–2,000) for MFIs and digital lenders
White-label dashboard: Customizable frontend that MFI partners can brand and deploy to their loan officers
Credit bureau partnerships: Feed R.I.S.E. scores back into national credit systems so borrowers build formal financial identity over time
Geographic expansion: Philippines (GCash integration, 57.4% digital payment share) and additional East African markets