Inspiration

Over 70% of researchers have failed to reproduce another scientist's experiment (Baker, 2016). Methodologies are locked in PDF prose — converting them into runnable pipelines requires deep expertise. We asked: what if AI could read a paper and produce a validated, executable workflow automatically?

Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016).

What it does

A researcher uploads a PDF, and VeriFlow autonomously converts it into an executable workflow using 6 Gemini 3 features. The Scholar Agent ingests the PDF via inline PDF bytes (types.Part.from_bytes(..., mime_type="application/pdf")) and leverages Gemini's native PDF document understanding to interpret text, figures, and methodology diagrams in a single pass — extracting the methodology as a structured ISA-JSON hierarchy with per-field confidence scores. The Engineer Agent generates Python execution scripts using tool/function calling for iterative generation with local validation feedback, and structured output via Pydantic response_schema for type-safe results. A self-healing validation node retries up to 3 times, then the Reviewer Agent performs semantic review with thought signatures (thought_signature) preserved across multi-step tool calls to maintain reasoning continuity. All agents use thinking level control (thinking_level: high for complex reasoning, medium for validation). The generated workflow is visualized as an interactive DAG in the frontend and can be exported for execution.

How we built it

Python 3.11 / FastAPI backend with a LangGraph StateGraph ($\text{scholar} \rightarrow \text{engineer} \rightarrow \text{validate} \rightarrow \text{reviewer}$) implementing think-act-observe loops for self-healing. Three agents on the google-genai SDK with json_repair for resilient parsing. PyMuPDF extracts page images for supplementary figure analysis. Vue 3 + Vue Flow frontend with real-time WebSocket streaming. 10 Docker Compose services.

Challenges we ran into

Deeply nested structured output schemas (ISA-JSON) required iterative prompt engineering and json_repair fallbacks. The self-healing retry loop sometimes oscillated, solved by capping retries with Reviewer-as-graceful-degradation. Preserving thought signatures across multi-turn tool-calling cycles took significant experimentation. Integrating Apache Airflow for workflow execution proved too complex within the hackathon timeline.

Accomplishments that we're proud of

End-to-end autonomous PDF-to-workflow pipeline. 6 Gemini 3 features as core engine, not add-ons. Self-healing graph topology where LangGraph edges are the error handling. Full real-time transparency via WebSocket.

What we learned

Pydantic response_schema eliminates fragile JSON parsing. Higher thinking levels dramatically improve scientific extraction. Tool/function calling with local validation feedback outperforms single-shot generation. Scope management matters — we scoped down from full CWL/Airflow execution to Python script generation to deliver a working end-to-end demo.

What's next for VeriFlow

Grounding with Google Search for tool reference verification, CWL v1.3 workflow generation, Apache Airflow execution with Docker-in-Docker, SPARC SDS-compliant export with provenance tracking, multi-paper synthesis, and cloud deployment on Google Cloud Run.

Built With

Share this project:

Updates