JudgeVC

Home Page
Assign Tiers
Batch Processing
Description

Inspiration

Early-stage incubators and VCs face a massive screening bottleneck. Programs like Techstars and Y Combinator receive tens of thousands of applications annually. Even reviewing 1,000 startups at just 30 minutes each costs:

$$ 1000 \times 0.5 = 500 \text{ analyst hours} $$

That’s weeks of manual, cognitively demanding evaluation — often subjective and inconsistent.

Human investors don’t make decisions linearly. They:

Form hypotheses
Challenge assumptions
Debate trade-offs
Weigh uncertainty

So we asked:

What if AI could simulate an actual investment committee — not just answer prompts?

That question led to JudgeVC.

What it does

JudgeVC is an AI Investment Committee.

It automates top-of-funnel startup screening by simulating structured debate between specialized agents.

Instead of a single LLM producing a yes/no answer, JudgeVC:

Analyzes market opportunity
Evaluates founder strength
Assesses technical defensibility
Runs financial projections
Conducts pro vs. contra debate
Assigns Tier 1 / Tier 2 / Tier 3 with confidence scores

Users upload an Excel sheet of startup applications. JudgeVC returns a tiered Excel output with structured reasoning.

It compresses weeks of screening into minutes — without sacrificing transparency.

How we built it

JudgeVC is a multi-agent architecture orchestrated by NVIDIA Nemotron.

System Design

Nemotron (Orchestrator & Market Agent) Coordinates the workflow, activates agents, calls tools, and synthesizes the final decision.
Claude (Team & Counter-Thesis Agent) Evaluates founder quality and generates structured counter-arguments.
GPT-4 (Technical Risk & Judge Agent) Assesses defensibility and scores debate outputs.
Financial Projection Tool (Python Function) Deterministic valuation estimator:

$$ \text{Projected Revenue} = \text{Market Size} \times \text{Capture Rate} $$

$$ \text{Valuation} = \text{Projected Revenue} \times \text{Industry Multiple} $$

Nemotron performs multi-step reasoning, structured tool-calling, and agent coordination.

This is not simple prompt chaining. It is agentic automation.

Example of our tool-calling structure:

def financial_projection(market_size, capture_rate, multiple):
    revenue = market_size * capture_rate
    valuation = revenue * multiple
    return {"revenue": revenue, "valuation": valuation}

Challenges we ran into

Model Disagreement

Different models often produced conflicting evaluations. We implemented a structured debate layer and calibrated scoring thresholds to stabilize classifications.

Stochastic Variability

LLMs are probabilistic. Small output variations led to inconsistent tiering. We reduced volatility by integrating deterministic financial tools and confidence banding.

Tool Integration

Ensuring Nemotron could reliably call Python functions and integrate outputs into downstream reasoning required careful function schema design and structured prompting.

Human-Level Alignment

Matching real investor intuition was non-trivial. We tuned evaluation thresholds and debate scoring logic to better approximate real screening dynamics.

Automating judgment under uncertainty was our hardest engineering challenge.

Accomplishments that we're proud of

Built a fully functioning multi-agent evaluation pipeline
Successfully integrated NVIDIA Nemotron as orchestration brain
Implemented real tool-calling within reasoning loops
Created structured debate instead of single-opinion outputs
Automated Excel → Tiered Excel workflow end-to-end
Delivered explainable decisions rather than black-box outputs

We transformed LLMs from chat tools into structured decision infrastructure.

What we learned

Multi-agent reasoning outperforms single-prompt workflows for complex decisions.
Debate structures reduce bias and improve interpretability.
Deterministic tools are critical for stabilizing AI systems.
Orchestration matters more than raw model size.

We learned that early-stage investment decisions are fundamentally dialectical — not linear.