Inspiration

The scientific reproducibility crisis affects over 70% of researchers who have tried and failed to reproduce another scientist's experiments. With millions of papers published annually, the gap between reading research and implementing it remains a massive barrier to scientific progress.

We asked: What if AI could bridge this gap instantly?

What it does

Paper to Code transforms any scientific PDF into a working Jupyter notebook. Upload a research paper, and our Gemini 3-powered agent:

  1. Analyzes the full paper using multimodal understanding (text + figures + equations)
  2. Extracts methodology, algorithms, data sources, and dependencies
  3. Generates production-ready Python code with proper structure
  4. Validates the code through an agentic self-correction loop
  5. Outputs an executable notebook + requirements.txt

How we built it

We leveraged Gemini 3's most powerful features:

| Feature | Usage | |---------|-------| | 1M Token Context | Process entire papers without chunking | | Multimodal Understanding | Analyze figures, diagrams, and equations visually | | Thinking Levels | high reasoning for complex code generation | | Thought Signatures | Maintain context across multi-step agent workflows | | Structured Outputs | Extract methodology into precise JSON schemas |

Architecture: PDF → Gemini 3 (Analysis) → Structured Extraction → → Gemini 3 (Generation) → Self-Validation Loop → → Jupyter Notebook + requirements.txt

Tech Stack: Python, Streamlit, PyMuPDF, nbformat, google-genai SDK

## Challenges we faced

  • Equation extraction: Scientific papers have complex LaTeX that needed careful prompt engineering
  • Code validation: Building a reliable self-correction loop without infinite iterations
  • Context management: Balancing detail vs. token usage for long papers

## What we learned

  • Gemini 3's thinking levels dramatically improve code quality when set to high
  • Thought signatures are essential for maintaining coherent multi-step reasoning
  • Multimodal input (page images + text) catches information that text-only misses

## What's next

  • Support for arXiv URL input
  • Dataset auto-download integration
  • GPU-accelerated code detection and optimization suggestions
  • Community library of reproduced papers

Built With

  • gemini3
  • googlegeminiapi
  • jupyter
  • pymupdf
  • python
  • streamlit
Share this project:

Updates