Inspiration

Scientists and researchers have incredible ideas, but translating a vague hypothesis into a rigorous, reproducible, and preregistration-ready experimental protocol is incredibly difficult. This gap often contributes to the "reproducibility crisis"—studies that are underpowered, fail to account for critical confounders, or fall apart under peer scrutiny. We wanted to build a system that acts as a world-class methodology team in a box, forcing scientific concepts through a gauntlet of rigorous stress tests before a single data point is ever collected.

What it does

NovaLab is an agentic AI system that transforms a simple research hypothesis into a completely bulletproof experimental protocol.

You simply type or speak your hypothesis (using Amazon Nova Sonic's speech-to-speech capabilities), and a pipeline of 10 specialized AI agents takes over. It produces:

  1. Refined Constraints: Identifies ambiguous areas and clarifies the research question.
  2. Literature Review: Conducts live retrieval via Semantic Scholar.
  3. Experimental Designs: Proposes three distinct methodologies (e.g., RCT, observational) ranked by feasibility and rigor.
  4. The Epistemic Tribunal: An Advocate agent defends a proposed design, a Red Team ruthlessly attacks its flaws, and a Synthesis Judge renders a verdict to create the optimal combined approach.
  5. Statistical Plan: Generates power analysis guidance, sample sizes, and endpoints.
  6. Risk Audit: Identifies ethical risks, confounders, and adds a reproducibility checklist.
  7. Preregistration Protocol: Outputs a clean, structured Markdown document ready to drop into a registry.

How we built it

The core backend is built in Python with FastAPI, functioning as an orchestrator for the diverse multi-agent pipeline. We relied entirely on Amazon Bedrock, taking advantage of the unique strengths of the Amazon Nova model family:

  • Amazon Nova Pro powers our "heavy reasoning" agents (the Judge, the Quality Evaluator, and the Literature Scout) where deep, nuanced analytical framing is crucial.
  • Amazon Nova Lite drives our fast, specialized agents (Clarifier, Advocate, Red Team, Stats Planner, Auditor) taking advantage of its exceptional speed and cost-efficiency while retaining high instruction-following capability.
  • Amazon Titan Embed Text v2 provides semantic search fallback embedding for indexing local academic literature.
  • Amazon Nova Sonic enables bidirectional speech-to-speech WebSockets streaming so users can literally brainstorm aloud.

The frontend is constructed using Next.js and Tailwind CSS. We utilized Server-Sent Events (SSE) to stream the thought processes and state updates of each agent live to the UI.

Challenges we ran into

Coordinating 10 different agents with complex dependencies was challenging. State had to be carefully passed down the chain—the Red Team agent needed to closely understand what the Architect proposed and what the Literature Scout initially found. If we just dumped all prior context into every prompt, we quickly ran into context window bloat and lost model focus.

To solve this, we implemented strict Pydantic schema parsing at each step, forcing agents to output structured, concise intermediate JSON states that could be tightly formatted into prompt injections for the downstream agents.

Integrating Nova Sonic over WebSockets for real-time voice streaming was also a hurdle. Bidirectional audio requires careful handling of buffers, sample rates, and stream chunking in FastAPI to ensure low-latency responsiveness without dropping packets.

Finally, getting the Epistemic Tribunal to argue effectively took heavy prompt engineering. Initially, the LLM agents were "too polite" and agreeable. We had to construct very strict personas to turn the Red Team agent into a ruthless skeptic, while keeping the Advocate objective rather than defensive.

Accomplishments that we're proud of

We are especially proud of the Epistemic Tribunal dynamic. Watching the Red Team agent successfully identify a massive unmeasured confounder in a proposed study design, and then seeing the Judge agent synthesize a new design that actively addresses that flaw, feels like magic.

We are also extremely proud of the full-stack performance and cost optimization. By strategically using Nova Lite for the bulk of the pipeline and saving Nova Pro for complex synthesis, we managed to make an incredibly rigorous, 10-step generation process run beautifully fast while keeping costs per run under a few cents.

What we learned

We learned that multi-agent architectures require highly structured handoffs. An agent pipeline succeeds or fails based on the quality and brevity of its intermediary data structures.

We also learned that extremely fast, cost-effective models like Nova Lite punch way above their weight class when isolated with a very narrow, specific system prompt (like "You are only the Stats Planner...") rather than trying to make one giant model do everything at once.

What's next for NovaLab

The immediate next step is to integrate directly with platform APIs like the Open Science Framework (OSF) so users can push their generated protocols straight to preregistration with one click.

We also want to implement a "Code Generator" agent that writes the actual R or Python data analysis scripts based on the finalized statistical plan, so researchers securely generate analysis pipelines the moment they finish collecting their actual evidence.

Built With

  • amazon-bedrock
  • amazon-nova-lite
  • amazon-nova-pro
  • amazon-nova-sonic
  • amazon-titan
  • docker
  • fastapi
  • next.js
  • python
  • react
  • render
  • semantic-scholar-api
  • sqlite
  • tailwind-css
Share this project:

Updates