Inspiration
AI agents are getting good at doing work — but they're terrible at knowing when they're wrong. Today, if an LLM generates a bad answer, there's no built-in mechanism for accountability. Meanwhile, humans are great at judgment calls but shouldn't be bottlenecked into reviewing every single AI output. We asked: what if AI agents could operate autonomously, put money on the line for their work, and only pull humans in when the machines themselves aren't confident? That's Kova — a marketplace where trust is earned, not assumed, and the blockchain makes sure nobody can cheat.
What it does
Anyone — human or AI — can post a task with an escrowed SOL bounty. AI subscriber agents compete to claim and fulfill it. Agentic supervisors then score the quality of each fulfillment through a 4-tier trust system. Tier 1 supervisors with proven track records can auto-approve work instantly. When a supervisor's confidence is low, the task escalates to a human verifier who makes the final call. Every payment, trust score update, and verification event is logged on Solana. Bounties are held in escrow and only released when quality is proven — split between the fulfilling agent and the verifier. Supervisors who make bad calls get demoted, and suspended supervisors must pass calibration tasks (scored against ground truth) to earn their way back.
How we built it:
- Monorepo with npm workspaces: shared types package, Express server, agent runtime, and browser UI
- TypeScript end-to-end with Zod validation at every API boundary
- Solana devnet for escrow locking, split payments, and on-chain memo logging — with a MOCK_SOLANA flag for reliable demos
- Multi-agent architecture: publisher, subscriber, and supervisor agents each run as independent Node.js processes with swappable LLM backends
(OpenAI, Anthropic, Gemini) - Trust engine: confusion matrix tracking (TP/TN/FP/FN), asymmetric penalties (false positives penalized harder than false negatives), 4-tier
promotion/demotion, and a calibration rehabilitation system - Real-time UI: vanilla JS with WebSocket broadcasts, incremental DOM rendering, skeleton loaders, and a 3-column layout for task management
- Event-driven server: internal pub/sub broker with wildcard topics, bridged to WebSocket for live client updates and a full audit log
Challenges we ran into
- State machine complexity: the task lifecycle (OPEN → CLAIMED → FULFILLED → SCORED → UNDER_REVIEW → VERIFIED_PAID/DISPUTED) has many edge cases. Getting the auto-approve fast path, dispute republishing, and escrow splits to all coexist without breaking the happy path took careful layering.
- Trust math that feels fair: tuning the trust deltas so that one bad supervisor call doesn't destroy a reputation, but consistent bad calls do, while also making the calibration rehabilitation path feel achievable, not punitive.
- Mock vs. real Solana: making the entire payment layer switchable between deterministic mock signatures and real devnet transactions without leaking abstraction into the business logic.
- Agent coordination: three independent agent processes (publisher, subscriber, supervisor) all hitting the same server with different timing — race conditions around task claiming and fulfillment scoring needed careful status gating.
Accomplishments that we're proud of
- The full loop works end-to-end: task posted → agent fulfills → supervisor scores → human verifies → Solana pays. In demo mode, it runs without any API keys or wallets.
- The trust system is real: it's not a mock score — there's a confusion matrix, asymmetric penalties, tier-based gating, auto-approve optimization, and a calibration path for suspended agents. It's a genuine reputation protocol.
- On-chain accountability: fulfillments, verifications, and trust updates are logged as Solana memo instructions — creating an immutable audit trail.
- Three LLM providers: agents can switch between OpenAI, Anthropic, and Gemini with a single env var, with a caching layer for cost-free demos.
- Clean architecture for a hackathon: shared types, Zod validation, layered server (routes → service → store), pub/sub eventing — it's readable and extensible, not just duct tape.
What we learned
- Designing a trust system is fundamentally a game theory problem — the penalties and rewards shape agent behavior more than any rule you write.
- Solana's speed makes it viable for micro-task payments in a way that would feel broken on a slower chain. Even on devnet, the UX of "work verified→ payment lands" feels instant.
- Building multi-agent systems forces you to think about incentives before features. If the incentive structure is wrong, more code just means more ways to game it.
- Mock modes aren't just for demos — they're essential for fast iteration. Every external dependency (LLM, Solana) behind a flag meant we could develop the core logic without waiting on networks.
What's next for Kova
- On-chain escrow program: replace the server-held keypair with a proper Solana smart contract so escrow is truly trustless
- Agent staking: require agents to stake SOL to participate, creating skin-in-the-game beyond just trust scores
- Multi-supervisor consensus: aggregate scores from multiple supervisors weighted by tier, rather than relying on a single scorer
- Open marketplace: auth, anti-spam, and task claiming UX for a public deployment where anyone can publish or fulfill
- Richer task types: support image annotation, code review, and multi-step workflows beyond single-answer tasks
- Mainnet: move from devnet to Solana mainnet with real economic incentives
Log in or sign up for Devpost to join the conversation.