Inspiration

Every year, thousands of breakthrough research papers are published, yet most remain trapped in static PDF format. For engineers, the gap between reading a complex LaTeX equation and having a verified, working implementation can take days of head-scratching and debugging. We were inspired by the "Action Era" of AI—moving beyond chat to actual execution. We wanted to build a system where the AI doesn't just explain a paper, but actually proves it understands the methodology by building it from scratch and verifying it against reality.

What it does

ResearchLoop is an autonomous research assistant that handles the entire pipeline of paper implementation:

  1. Multimodal Extraction: It leverages Gemini 3’s native vision to read academic PDFs, identifying primary algorithms, latent methodology, and reported benchmarks.
  2. Autonomous Synthesis: It generates complete, modular Python modules and NumPy logic based on the extracted mathematics.
  3. WASM Execution: It runs the generated code in an isolated, browser-based Python environment (Pyodide), ensuring zero-dependency verification.
  4. Self-Correction Loop: If the code fails or results don't match benchmarks, the agent (acting as a "Marathon Agent") analyzes the traceback, revises its logic, and iterates autonomously until it achieves mathematical parity.
  5. Theory-Code Mapping: It provides a transparency layer that links specific equations from the paper directly to the corresponding lines of generated code.

How we built it

ResearchLoop is powered by the Gemini 3 family, utilizing high thinking budgets (up to 32k tokens on Pro) to handle dense mathematical reasoning.

  • Reasoning Engine: We used the @google/genai SDK to implement "Thought Signatures," allowing the model to "think through" execution errors before attempting a patch.
  • Frontend: React 19 with a "Brutalist Academic" aesthetic, focusing on technical transparency and high readability.
  • Runtime: Pyodide (WASM) was integrated to allow users to run heavy NumPy-based algorithms directly in their browser without any server-side infrastructure.
  • Multimodal Assets: We used gemini-2.5-flash-image for 2K architectural blueprints and gemini-2.5-flash-preview-tts for neural audio theory maps.

Challenges we ran into

The biggest challenge was handling the "Reasoning vs. Context" balance. High-frequency debugging loops consume significant tokens, so we optimized our system instructions to ensure Gemini focused on "First Principles" logic repairs. Another hurdle was multimodal parsing—extracting precise logic from multi-column academic layouts required specific prompt engineering to maintain context across complex page structures.

Accomplishments that we're proud of

We are incredibly proud of the Autonomous Convergence Loop. Watching the agent encounter a ValueError in its first attempt, reason about the "shape mismatch" in a matrix operation, and then successfully patch the code in a single iteration without human intervention is the definition of the "Action Era." Achieving mathematical parity in a browser sandbox is a major technical milestone.

What we learned

We learned that the Thinking Budget is the most important variable in modern AI orchestration. By giving the model space to "think" before it "codes," the quality of the first-draft implementation increased dramatically. We also realized that transparency—showing the internal state and WASM logs—is crucial for building trust in autonomous agents.

What's next for ResearchLoop

We want to expand ResearchLoop beyond single-file implementations. Our roadmap includes:

  • Multi-module Synthesis: Handling papers that require complex project structures and multiple file dependencies.
  • Web-Search Grounding: Fully integrating Gemini's Google Search tool to compare implementation results with existing State-of-the-Art (SOTA) benchmarks on the web.
  • GPU Acceleration: Leveraging WebGPU to allow the autonomous agent to implement and test larger neural networks directly in the browser environment.

Built With

  • browser-localstorage
  • fira-code
  • google-gemini-3-flash-(gemini-3-flash-preview)
  • google/genai-sdk
  • marked
  • numpy
  • pyodide
  • python
  • react-19
  • recharts
  • space-grotesk
  • tailwind-css
  • typescript
  • webassembly-(wasm)
Share this project:

Updates