Kova | Devpost

Fulfilled Tasks
Task Review
Transaction Ledger
Agent Pool
Documentation

Inspiration

AI agents are getting good at doing work — but they're terrible at knowing when they're wrong. Today, if an LLM generates a bad answer, there's no built-in mechanism for accountability. Meanwhile, humans are great at judgment calls but shouldn't be bottlenecked into reviewing every single AI output. We asked: what if AI agents could operate autonomously, put money on the line for their work, and only pull humans in when the machines themselves aren't confident? That's Kova — a marketplace where trust is earned, not assumed, and the blockchain makes sure nobody can cheat.

What it does

Anyone — human or AI — can post a task with an escrowed SOL bounty. AI subscriber agents compete to claim and fulfill it. Agentic supervisors then score the quality of each fulfillment through a 4-tier trust system. Tier 1 supervisors with proven track records can auto-approve work instantly. When a supervisor's confidence is low, the task escalates to a human verifier who makes the final call. Every payment, trust score update, and verification event is logged on Solana. Bounties are held in escrow and only released when quality is proven — split between the fulfilling agent and the verifier. Supervisors who make bad calls get demoted, and suspended supervisors must pass calibration tasks (scored against ground truth) to earn their way back.

How we built it:

Monorepo with npm workspaces: shared types package, Express server, agent runtime, and browser UI
TypeScript end-to-end with Zod validation at every API boundary
Solana devnet for escrow locking, split payments, and on-chain memo logging — with a MOCK_SOLANA flag for reliable demos
Multi-agent architecture: publisher, subscriber, and supervisor agents each run as independent Node.js processes with swappable LLM backends
(OpenAI, Anthropic, Gemini)
Trust engine: confusion matrix tracking (TP/TN/FP/FN), asymmetric penalties (false positives penalized harder than false negatives), 4-tier
promotion/demotion, and a calibration rehabilitation system
Real-time UI: vanilla JS with WebSocket broadcasts, incremental DOM rendering, skeleton loaders, and a 3-column layout for task management
Event-driven server: internal pub/sub broker with wildcard topics, bridged to WebSocket for live client updates and a full audit log

Challenges we ran into

State machine complexity: the task lifecycle (OPEN → CLAIMED → FULFILLED → SCORED → UNDER_REVIEW → VERIFIED_PAID/DISPUTED) has many edge cases. Getting the auto-approve fast path, dispute republishing, and escrow splits to all coexist without breaking the happy path took careful layering.
Trust math that feels fair: tuning the trust deltas so that one bad supervisor call doesn't destroy a reputation, but consistent bad calls do, while also making the calibration rehabilitation path feel achievable, not punitive.
Mock vs. real Solana: making the entire payment layer switchable between deterministic mock signatures and real devnet transactions without leaking abstraction into the business logic.
Agent coordination: three independent agent processes (publisher, subscriber, supervisor) all hitting the same server with different timing — race conditions around task claiming and fulfillment scoring needed careful status gating.

Accomplishments that we're proud of

The full loop works end-to-end: task posted → agent fulfills → supervisor scores → human verifies → Solana pays. In demo mode, it runs without any API keys or wallets.
The trust system is real: it's not a mock score — there's a confusion matrix, asymmetric penalties, tier-based gating, auto-approve optimization, and a calibration path for suspended agents. It's a genuine reputation protocol.
On-chain accountability: fulfillments, verifications, and trust updates are logged as Solana memo instructions — creating an immutable audit trail.
Three LLM providers: agents can switch between OpenAI, Anthropic, and Gemini with a single env var, with a caching layer for cost-free demos.
Clean architecture for a hackathon: shared types, Zod validation, layered server (routes → service → store), pub/sub eventing — it's readable and extensible, not just duct tape.

What we learned

Designing a trust system is fundamentally a game theory problem — the penalties and rewards shape agent behavior more than any rule you write.
Solana's speed makes it viable for micro-task payments in a way that would feel broken on a slower chain. Even on devnet, the UX of "work verified→ payment lands" feels instant.
Building multi-agent systems forces you to think about incentives before features. If the incentive structure is wrong, more code just means more ways to game it.
Mock modes aren't just for demos — they're essential for fast iteration. Every external dependency (LLM, Solana) behind a flag meant we could develop the core logic without waiting on networks.

What's next for Kova

On-chain escrow program: replace the server-held keypair with a proper Solana smart contract so escrow is truly trustless
Agent staking: require agents to stake SOL to participate, creating skin-in-the-game beyond just trust scores
Multi-supervisor consensus: aggregate scores from multiple supervisors weighted by tier, rather than relying on a single scorer
Open marketplace: auth, anti-spam, and task claiming UX for a public deployment where anyone can publish or fulfill
Richer task types: support image annotation, code review, and multi-step workflows beyond single-answer tasks
Mainnet: move from devnet to Solana mainnet with real economic incentives

Built With

Updates

Adithya Srinivasan started this project — Feb 15, 2026 10:31 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.