Gemini Labs research prototyping dashboard

Automated Prototyping Engine for Research Papers (ArXiv-to-Code)

Rationale

Today, AI research advances quickly, but reproducibility in AI research does not match the speed of progress for modern AI research. Groundbreaking research papers appear daily on arXiv that have elegant mathematics and elaborate architecture diagrams, but a reliable implementation of the research often does not exist for months, if one ever appears at all. When we were learning about the latest developments in research as students or practitioners of research, we encountered the same problem time and time again: understanding the mathematics was only half of the battle; the actual translation of that understanding into functioning code was the more difficult part of the task.

We decided to create an automated research prototyping engine called ArXiv-to-Code to resolve this disparity - an engine that would convert research papers into executable models.

Objective

ArXiv-to-Code is a multimodal artificial intelligence system that converts dense deep learning research papers into working deep learning code.

For instance, if provided with a research paper in PDF format, the system:

Reads long form academic text
Interprets mathematical equations and the Greek alphabet
Visually analyzes the architecture diagram and figures
Extracts the model definition and loss function
Generates executable PyTorch source code files (e.g., model.py, loss.py)

As an example, the system will extract the loss function from Figure 3 by reconstructing each of the equations contained within this figure.

[ \mathcal{L} = \mathbb{E}_{(x,y)\sim\mathcal{D}} \left[ \lambda_1 |\hat{y} - y|_2^2 + \lambda_2 \, \mathrm{KL}(q(z|x)\,|\,p(z)) \right] ]

and directly implements them as numerically stable Python code ready for training.

How we built it

We built the project using Gemini’s multimodal capabilities in AI Studio, focusing on research-level fidelity rather than summarization.

Our pipeline consists of:

Long-context document understanding to process full research PDFs
Vision–language reasoning to interpret figures and architecture diagrams
Mathematical parsing to preserve equations, symbols, and constraints
Code synthesis to translate theory into PyTorch implementations

We designed a strict prompting framework that enforces:

No hallucinated equations
Exact variable matching with the paper
Diagram-first reasoning when figures define key logic

The result is an AI system that behaves more like a research engineer than a chatbot.

Challenges we ran into

One of the biggest challenges was preventing the model from guessing missing details. Research papers often omit implementation specifics, and it was crucial that the system explicitly flag missing information rather than silently invent it.

Another challenge was diagram interpretation. Figures often encode critical logic that is not fully described in text, so we had to ensure the system treated diagrams as first-class sources of truth.

Finally, translating complex mathematical expressions into stable, shape-correct code—especially for multi-term loss functions—required careful reasoning and validation.

Accomplishments that we're proud of

Successfully extracting and implementing loss functions directly from paper figures
Generating clean, modular, runnable PyTorch code from brand-new research papers
Building a reproducibility-focused system that prioritizes correctness over verbosity
Demonstrating how multimodal AI can meaningfully accelerate research workflows

What we learned

This project taught us how powerful multimodal reasoning can be when applied to real research problems. We learned that diagrams are not just visual aids—they often are the specification. We also gained a deeper appreciation for the gap between mathematical elegance and implementation reality, and how carefully designed AI systems can help bridge that gap.

What's next for ArXiv-to-Code: Automated Research Prototyping Engine

Next, we plan to:

Add full training script generation and dataset hooks
Support multiple frameworks (TensorFlow, JAX)
Automatically generate experiment-ready GitHub repositories
Evaluate implementations against official or community benchmarks
Extend the system to other domains such as robotics and scientific computing

Our long-term vision is to make research reproducibility the default, not the exception.

Built With

ai
apis:
cloud
databases:
frameworks:
gemini
gemini)
google
languages:
multimodal
none
platforms:
python
pytorch
services:
studio
via

Submitted to

Gemini 3 Hackathon

Created by

I designed the paper-to-code workflow, created the multimodal prompting logic, and ensured math-faithful PyTorch code generation using Gemini.

SREE NITHYA K J 717823I256
NIRANJANA A 717823I234
POOJA E 717823I236
PADMAPRIYA S 717823I136

Updates

SREE NITHYA K J 717823I256 started this project — Feb 09, 2026 07:40 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.