paper to code

Inspiration

The scientific reproducibility crisis affects over 70% of researchers who have tried and failed to reproduce another scientist's experiments. With millions of papers published annually, the gap between reading research and implementing it remains a massive barrier to scientific progress.

We asked: What if AI could bridge this gap instantly?

What it does

Paper to Code transforms any scientific PDF into a working Jupyter notebook. Upload a research paper, and our Gemini 3-powered agent:

Analyzes the full paper using multimodal understanding (text + figures + equations)
Extracts methodology, algorithms, data sources, and dependencies
Generates production-ready Python code with proper structure
Validates the code through an agentic self-correction loop
Outputs an executable notebook + requirements.txt

How we built it

We leveraged Gemini 3's most powerful features:

| Feature | Usage | |---------|-------| | 1M Token Context | Process entire papers without chunking | | Multimodal Understanding | Analyze figures, diagrams, and equations visually | | Thinking Levels | high reasoning for complex code generation | | Thought Signatures | Maintain context across multi-step agent workflows | | Structured Outputs | Extract methodology into precise JSON schemas |

Architecture: PDF → Gemini 3 (Analysis) → Structured Extraction → → Gemini 3 (Generation) → Self-Validation Loop → → Jupyter Notebook + requirements.txt

Tech Stack: Python, Streamlit, PyMuPDF, nbformat, google-genai SDK

## Challenges we faced

Equation extraction: Scientific papers have complex LaTeX that needed careful prompt engineering
Code validation: Building a reliable self-correction loop without infinite iterations
Context management: Balancing detail vs. token usage for long papers

## What we learned

Gemini 3's thinking levels dramatically improve code quality when set to high
Thought signatures are essential for maintaining coherent multi-step reasoning
Multimodal input (page images + text) catches information that text-only misses

## What's next

Support for arXiv URL input
Dataset auto-download integration
GPU-accelerated code detection and optimization suggestions
Community library of reproduced papers

Built With

gemini3
googlegeminiapi
jupyter
pymupdf
python
streamlit

Updates

Umurzoq Sirliboyev started this project — Jan 09, 2026 04:01 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.