Inspiration

Infrastructure budgets in institutional and government contexts are routinely allocated without any real-time mechanism for stakeholders to verify whether funds are reaching the field. Reports can be self-filed. Photos can be staged. Ledger records can be silently altered after entry.

As Cyber Security students, we recognised this immediately as a data integrity failure — the attack surface is the audit trail itself: centralised, mutable, and unverified. Citizens — the actual taxpayers funding these projects — have zero visibility or agency over where their money goes.

Our goal was to close that attack surface permanently and put control back in the hands of citizens.


What it does

Track-My-Tax is a full-stack civic technology platform that fuses three independent verification technologies into a single auditable system:

  • Blockchain-based escrow — Every rupee is traceable on-chain via a live Solidity smart contract (CitizenCredits.sol) on Ganache. Citizens choose exactly which projects receive their CitizenCredits, with every allocation recorded immutably.
  • AI-powered risk prediction — A RandomForest classifier (200 trees, 88.2% CV accuracy) trains in real time on live MySQL project data, automatically flagging financial risk before it becomes a loss.
  • 3-stage image forensics — Every field photo uploaded by Faculty passes through GPS proximity (≤ 500m Haversine), 7-day timestamp recency, and MobileNetV2 classification before any fund release is triggered. Photos with no EXIF metadata are routed to UNDER REVIEW — not auto-rejected — preserving human judgment for ambiguous cases.

Key platform metrics:

Stat Value
Active projects tracked 4
Total escrow secured 465K CitizenCredits
ML cross-validation accuracy 88% (±1.8%)
Live blockchain refresh 5 seconds

How we built it

Stack: React · TypeScript · Flask · MySQL · Solidity · Ganache · Hardhat · MobileNetV2 · RandomForest

We spent the first phase mapping the problem space, not the codebase — identifying where audit failures occur, what data is available in a real deployment, and what each user role actually needs. Blockchain was the first architectural decision, made before we chose a frontend framework.

Build sequence:

  1. Foundation — Full-stack scaffold: React + Vite frontend, Flask REST API (8 endpoints), MySQL schema, SHA-256 auth, and role-based session management for Admin, Faculty, and Citizen roles.
  2. Citizen CC Allocation — Citizens browse active projects with live risk ratings, escrow balances, and GPS locations, then allocate CitizenCredits to their chosen project. Allocations hit MySQL and trigger allocate() on-chain via CitizenCredits.sol.
  3. Blockchain Layer — Authored CitizenCredits.sol in Solidity with 5 escrow functions (mint, allocate, lock, release, freeze). Deployed to Ganache via Hardhat. Built the web3.py bridge in Flask with SHA-256 fallback for offline resilience.
  4. ML & Forensics — Integrated MobileNetV2 for 4-class image classification. Built the 3-stage forensics pipeline. Built the RandomForest risk engine trained in real time on live MySQL project data — no static dataset.
  5. Crisis & Recovery — Truffle deprecation broke our deployment pipeline mid-project. Migrated fully to Hardhat in 48 hours. The SHA-256 fallback tx hash introduced during this sprint became a permanent resilience layer.
  6. Integration & QA — End-to-end testing of all objectives. API contract finalisation between Flask and React.

The dual-persistence architecture is the core technical contribution: every critical state change exists in both a queryable relational database (MySQL) and an immutable distributed ledger (Ganache/Solidity). Neither can be compromised without the other making the discrepancy immediately visible.


Challenges we ran into

Truffle deprecation — mid-project crisis. Midway through development, Truffle became incompatible with our Hardhat-based environment. The smart contract deployment pipeline broke entirely — not gradually, but completely. We converted all Truffle migration scripts to Hardhat deployment scripts and updated the web3.py Flask bridge in 48 hours without disrupting the concurrent frontend timeline.

GPS data unavailability in demo environment. Real EXIF GPS data requires photos taken with location services enabled. In an academic environment, most test images are screenshots or web images with no EXIF metadata. This is precisely the scenario our UNDER REVIEW pipeline handles — photos with absent EXIF metadata are automatically routed to UNDER REVIEW, not rejected, which is the correct production behaviour.

Live ML training on a small initial dataset. Training the RandomForest on real MySQL data meant early accuracy was constrained by dataset size. We validated with 5-fold cross-validation across 2,000 records to ensure the reported 88.2% accuracy was robust, not an artefact of a lucky holdout split.


Accomplishments that we're proud of

  • 88.2% ML accuracy trained on live data — not a static pre-built dataset. The model trains on real operational project records (funding velocity, escrow utilisation, transaction count, project age) and improves as the system is used.
  • 3-stage forensics with principled UNDER REVIEW routing — the decision to route metadata-absent photos to human review rather than auto-rejection was the right security call. It prevents false positives for legitimate field submissions while still blocking automated fund release on unverifiable evidence.
  • Full civic transparency loop — a citizen can choose which infrastructure project receives their CitizenCredits, see that allocation confirmed on-chain in real time, and track the project's forensically-verified progress. Every technical decision in the system exists to make that promise trustworthy.
  • Delivered every stated objective — 7-page real-time React dashboard, 8 production API endpoints, live Solidity smart contract, 3-stage forensics pipeline, RandomForest risk classifier — two developers, zero budget, one semester.

What we learned

  • Live ML training changes the system's character. A model that trains on real operational data does not decay from a static snapshot — it improves as the platform is used. This is the correct design for a system intended to operate over time.
  • Missing metadata is a signal, not a failure. Auto-rejecting photos without EXIF data punishes legitimate field submissions. Routing to UNDER REVIEW preserves human judgment for ambiguous cases — the system is strict where it can be certain, and defers to Admin where it cannot.
  • Security and immutability are architecture decisions, not features. Treating the blockchain audit trail as a first-class requirement from the start shaped better design decisions throughout — particularly the dual-persistence pattern.
  • Consistent cryptographic strategy matters. Using SHA-256 throughout — from authentication to blockchain fallback tx hashes — creates a coherent, auditable security model rather than mixing hashing approaches across layers.
  • Citizen agency is a design constraint, not a feature. Building CC allocation around citizen choice — not admin assignment — forced better data modelling, better UX, and a more honest implementation of the platform's civic mission.

Component Production Path
Blockchain network Change web3 provider URL to Polygon Mumbai or Ethereum mainnet — no contract or API changes required
ML model Continuous retraining as MySQL project data grows; joblib serialisation for persistence across Flask restarts
Image evidence storage IPFS storage: photos become immutable evidence alongside tx hashes
Smart contract testing 5 Hardhat unit tests minimum, one per escrow action (mint, allocate, lock, release, freeze)
Citizen allocation scale Rate limiting on /api/escrow/allocate; CC balance validation at DB level with row locks to prevent double-spend
Deployment Frontend: Static build hosted on Vercel or Netlify. Backend: Python virtual environment (venv) on a cloud Linux server (AWS EC2 or Render), served via Gunicorn with an Nginx reverse proxy
Allocation behaviour ML Log which projects citizens choose to fund and at what CC volumes — a second ML-ready dataset enabling allocation behaviour prediction alongside project risk prediction
DEMO_MODE flag Explicit DEMO_MODE=true/false environment config to make demo-vs-production boundaries clear and prevent demo behaviour from persisting in real deployments

Built With

Share this project:

Updates