DeepResearch++

Inspiration

Math progress is bottlenecked by proof search and verification. With open models like GPT‑OSS‑120B showing strong mathematical reasoning, we asked: can an open, reproducible agent systematically propose conjectures, attempt rigorous proofs, and iterate toward publishable results—accelerating not just olympiad problem solving, but science more broadly?

What it does

DeepResearch++ is an end‑to‑end research+proving agent for mathematics:

Literature review → conjecture generation: builds candidate conjectures from seeds and recent results.
Novelty filter: down‑selects to likely‑new statements to avoid re‑discoveries.
Proof engine: multi‑attempt prover generates Markdown proofs; a judge critiques flaws; the system iterates and selects the best proof.
Outputs: a reproducible report with statement, status (proved/failed), proof, and failure analysis.
Interfaces: Minimal web UI; runs with OpenAI‑compatible endpoints (local GPT‑OSS via Ollama or hosted).

How we built it

Orchestration: solver.py coordinates parallel Provers and a Judge with retries and final selection.
Structured I/O: Pydantic schemas in output_schemas.py; model calls go through a provider‑agnostic wrapper (llm_provider.py) using the Responses API.
Research loop: research.py does review → prediction → novelty filter → proof → report; optional guideline steering to focus on a direction.
Safety/rigor: result_refiner.py cleans and standardizes proofs.
Stack: Python, FastAPI/CLI backend; Vite/React front‑end; runs with GPT‑OSS locally (Ollama) or cloud models interchangeably.

Challenges we ran into

Stabilizing long, multi‑step proofs (hallucination vs. rigor).
Novelty checking without a full formal math KB.
Geometry brittleness—fixing with a strict complex‑plane policy.
Latency/cost/VRAM constraints for large open models.
Normalizing different provider APIs and structured outputs.

Accomplishments that we're proud of

A reproducible agent that produces checkable Markdown proofs and failure analyses.
Strong performance on olympiad‑style problems with open models (GPT‑OSS) in our internal runs.
A clean, scriptable CLI and lightweight UI that make research loops easy to reproduce.
Provider‑agnostic design: same pipeline runs locally via Ollama or in the cloud.

What we learned

Judging diversity and iteration matter more than a single long pass.
Most people don't know how to squeeze out all abilities of the models to push humanity forward rapidly.
Super intelligence is here.

What's next for DeepResearch++

Make the LLM more creative at predicting new results (I already came up with a way to do this)
Polish the generated report.
Create automate systems to read and understand, then apply the produced new math results to physics, biology, chemistry, machine learning. I will need a lot of tool call wirings for this.

Built With

openai
python

Updates

Hieu Le started this project — Sep 02, 2025 06:09 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.