Amawta

Inspiration

Modern “AI science” often stops at persuasive narrative. We wanted a system that treats science as an auditable workflow: a claim becomes falsifiable tests, runs produce reproducible artifacts, and uncertainty is handled explicitly (not papered over by confident prose). Our guiding principle was: evidence first, traceability always.

What it does

Amawta turns a user hypothesis into an end-to-end scientific workflow:

Normalizes the claim into a minimal schema (domain, entities, relation, observables).
Generates a bounded falsification plan (tests + small variant matrix).
Performs grounded literature search to avoid “refrying” existing work.
Executes a two-phase runner:
- Toy run (always): quick falsifiers / sanity checks.
- Field run (mandatory when toy isn’t falsified): resolve/download datasets and run with real evidence.
Emits deterministic gate reports: PASS / FAIL / UNRESOLVED.
If UNRESOLVED, it self-recovers (retry datasets, rerun field) within a bounded budget, and remains resumable.

All stages produce versioned JSON artifacts; the system does not depend on chat history.

How we built it

Google ADK-powered multi-agent orchestration integrated directly in our TypeScript CLI runtime.
Gemini 3 enforced everywhere (hard-blocks non–Gemini 3 models).
A science workflow agent that runs:
- dialectic + Bacon-style analysis → normalization → literature → falsification plan → runner → gates → repair loop.
A deterministic artifact layer:
- hypothesis_normalization.json, literature_search.json, falsification_plan.json, runner code + logs, dataset manifests, results, and gate_<gate_id>.json.
Deterministic gates (pure functions over artifacts) and an autopoietic repair loop driven by recommended_actions.
A “truth anchor” inspired by Ledger Closure: operational existence requires closing an explicit ledger inside a feasible set $\mathcal{F}$: $$O \in \mathcal{F} \ \wedge \ L\ \text{closes} \ \Rightarrow \ \text{EXISTS}$$ Otherwise, Amawta reports METHOD-NOTE/UNRESOLVED and seeks missing evidence instead of inventing.

Challenges we ran into

Latency + routing: keeping greetings/simple Q&A fast while reliably triggering the full workflow for hypotheses.
Groundedness: ensuring URLs/DOIs and “what ran” claims are always backed by artifacts, never hallucinated.
Resumability: supporting partial failures (downloads, missing evidence) without losing state or corrupting runs.
Determinism: stabilizing JSON-only steps (canonical JSON, code-fence tolerance) and making gates artifact-driven.
E2E reliability: building smoke tests that cover interleaving quick chat + workflow + retry + resume.

Accomplishments that we're proud of

A working scientific autopoietic loop with explicit state + artifacts + deterministic gates + bounded self-repair.
Two-phase execution (toy → field) with mandatory field attempt when toy isn’t falsified.
Actionable UNRESOLVED: when evidence is insufficient, the system retries acquisition/execution instead of producing false FAILs.
Auditability by construction: every meaningful claim about execution, datasets, or results maps to a stored artifact.
No regex heuristics for workflow decisions LLM orchestration + deterministic evaluators only.

What we learned

“Autonomy” in science is less about longer reasoning and more about contracts: explicit schemas, artifact persistence, deterministic evaluators, and bounded retries.
The most important safety feature isn’t a refusal it’s the discipline to say UNRESOLVED and request/obtain evidence.
A fast-path is essential: scientific rigor shouldn’t make basic interactions unusable.

What's next for Amawta

Add a regression “metabolism” with ADK Evaluate (hallucination checks + rubrics for expert vs guided behavior).
Expand field execution across more data modalities while keeping one auditable runner/gate contract.
Tighten Ledger Closure further toward full “physics-as-accounting” semantics (more explicit ledger lines and invariants) and then layer additional epistemic gates on top without sacrificing determinism or traceability.

Built With

Updates

Daslav Ríos started this project — Feb 09, 2026 07:48 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.