CellForge AI

Arize AX Trace Observability

Inspiration

Battery innovation is moving fast, but researchers still spend huge amounts of time reading papers, comparing experimental claims, and checking whether a promising idea is actually grounded in evidence. We built CellForge AI because green battery manufacturing and better electrode materials are critical for the future of EVs, clean energy storage, and sustainable industry.

What it does

CellForge AI is a self-auditing research agent for green lithium-ion battery materials. It ingests battery papers, extracts structured experimental claims, benchmarks retrieval, generates candidate research hypotheses, audits each hypothesis for citation coverage, grounding, contradiction evidence, feasibility, novelty, and hallucination risk, then exports an evidence-backed research proposal package.

It does not claim to automatically write a publishable final paper. It helps a human researcher move from literature to validated research directions faster.

How we built it

We built a Python research-agent pipeline with typed evidence models, real extraction artifacts from Gemini Document Understanding, local retrieval benchmarks, deterministic hypothesis generation, evidence auditing, and a human validation gate. Gemini Pro preview is used to generate the final research brief from audited evidence.

For observability, we integrated Arize AX / Phoenix-style tracing so each major agent step becomes inspectable: evidence loading, retrieval, hypothesis generation, evaluator spans, and self-introspection. The repo also includes reports, audit outputs, a manuscript draft, and a molecular interface schematic for the demo.

Challenges we ran into

The hardest part was making the system more than a RAG chatbot. We had to narrow the research scope, convert PDFs into usable structured evidence, handle messy scientific claims, and design scoring that could expose weak grounding or contradictions instead of hiding them.

Another challenge was making the demo understandable in a short video. We added trace evidence, subtitle burn-in, and a concise final research report so judges can inspect the actual outputs.

Accomplishments that we're proud of

We are proud that CellForge AI has a full evidence-to-hypothesis workflow, not just a search box. It produces real benchmark reports, audits hypotheses, selects the best candidate through self-introspection, and shows the reasoning path in Arize traces.

We are also proud of the research direction it surfaced: circular natural-graphite anodes with bio-based conductive surface repair, combining recycled/natural graphite purification with green surface stabilization.

What we learned

We learned that research agents need evaluation and observability as first-class features. In scientific domains, a fluent answer is not enough. The system must show what evidence supports a claim, what contradicts it, and where human validation is still required.

We also learned that a narrow, domain-specific workflow can be more valuable than a general chatbot, especially when the output is a research proposal that scientists can inspect and improve.

What's next for CellForge AI

Next, we want to connect more real literature sources, improve chart and figure understanding, add stronger retrieval benchmarks, and build larger evidence-auditor eval sets. We also want to deepen the Arize feedback loop so the agent can compare past runs and improve prompts, retrieval, and hypothesis quality over time.

Longer term, CellForge AI can support other materials research domains such as solar materials, catalysts, carbon capture, semiconductors, and biomedical materials.

Built With

arize-ax
elasticsearch-compatible-retrieval
fastapi
gemini-3-/-gemini-pro-preview
gemini-document-understanding
google-cloud
markdown
mcp
next.js
phoenix-style-openinference-tracing
pydantic
python
tailwind-css
typescript

Updates

Quang Nguyen started this project — Jun 11, 2026 03:38 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.