REXA - Project Story

Inspiration

As developers and data scientists, we constantly face the frustration of lost experiments, unreproducible results, and the fear of breaking working code when trying new approaches. We watched teams waste hours recreating experiments they had already run, debugging environment-specific issues, and hesitating to try new ideas because there was no easy way to track or revert changes.

We were inspired by the need for a system that could:

  • Preserve every experiment with full execution context
  • Enable fearless iteration through safe branching
  • Accelerate development with AI-powered code generation
  • Ensure reproducibility through isolated sandbox execution

The vision was clear: Code experimentation without the chaos.

What it does

REXA is an AI-powered agent snapshot system that transforms how developers and data scientists experiment with code:

Core Features:

  1. AI Code Generation: Simply describe what you want to do in natural language, and REXA generates Python code using Google Gemini. Users can even use voice input through Galileo - just speak your task description and get working code!

  2. Isolated Execution: Every code run executes in a fresh Daytona sandbox, ensuring complete isolation and security. No more "works on my machine" issues.

  3. Snapshot History: Every execution is captured as a snapshot with full context - code, input, output, errors, and timestamp. Nothing gets lost.

  4. Visual Timeline: See all your experiments in a beautiful horizontal timeline. Each snapshot shows ID, timestamp, and task description at a glance.

  5. Replay Any Snapshot: Re-run any snapshot's code with one click to verify results or debug issues. Perfect for reproducibility testing.

  6. Branch & Iterate: Create new snapshots from any existing one. The form autofills with parent's code, letting you safely modify and experiment without losing original versions.

Real-World Example:

A data scientist working on clustering:

  • Starts with K-Means clustering (snapshot created)
  • Branches to try DBSCAN with new input
  • Modifies the code in the autofilled form
  • Runs and compares both approaches
  • Replays any version to verify results

All experiments are tracked, compared, and reproducible.

How we built it

Backend Architecture (FastAPI + Python)

  • API Layer: RESTful endpoints (/run, /replay, /branch, /snapshots, /generate-code)
  • Daytona Integration: Secure sandbox creation, code execution, and cleanup
  • Gemini Integration: AI-powered code generation from natural language
  • Snapshot Management: In-memory storage with full execution tracking
  • Configuration: Environment-based setup for API keys and settings

Frontend Architecture (Next.js + TypeScript)

  • Terminal-Style UI: Black background (#000) with fluorescent green accents (#39FF14)
  • Timeline Component: Horizontal scrollable list of snapshot cards
  • Task Form: Input field with AI code generation button
  • Snapshot Details: Full execution context display with code and output
  • Real-time Updates: Live snapshot display and execution results

Key Technical Decisions:

  • Isolated Execution: Each run in fresh Daytona sandbox ensures no state pollution
  • Text-Based Output: All results printed to stdout (no GUI dependencies)
  • Code Normalization: Automatic indentation fixing for pasted code
  • Branch Workflow: Non-executing branches with form autofill for safe iteration
  • Error Handling: Comprehensive error messages with graceful degradation

Integration Points:

  • Daytona SDK: Python SDK for sandbox management
  • Google Gemini API: Code generation from natural language
  • FastAPI: Modern async Python web framework
  • Next.js: React framework with App Router

Challenges we ran into

Challenge 1: Sandbox Isolation

Problem: Ensuring each execution runs in complete isolation without state leakage between runs.

Solution: Create fresh Daytona sandbox for every execution and immediately delete after capturing output. This guarantees clean slate for each run.

Challenge 2: Code Indentation Errors

Problem: Pasted Python code often has leading indentation that causes IndentationError at module level.

Solution: Implemented automatic code normalization that detects and removes common leading whitespace while preserving relative indentation within code blocks.

Challenge 3: Output Visualization

Problem: Matplotlib visualizations don't work in terminal output. Need text-based alternatives.

Solution: Designed print-based output format with ASCII art visualizations and detailed text statistics. Updated AI prompts to avoid GUI libraries.

Challenge 4: Branch Execution Flow

Problem: Initially branches executed immediately with parent's code, showing same output (confusing UX).

Solution: Changed branch behavior to create snapshot without execution. Form autofills with code, allowing user to modify before running. This provides clear separation between "copying" and "executing".

Challenge 5: Replay Output Update

Problem: Replay function ran code but didn't update snapshot's stdout/stderr, so output appeared unchanged.

Solution: Modified replay to create updated snapshot object with new execution results and update stored snapshot in database.

Challenge 6: Dependency Management

Problem: Default Daytona sandbox doesn't include all packages (e.g., hdbscan not available).

Solution: Updated AI prompts to constrain code generation to standard Python and scikit-learn only. Created fallback examples using only available libraries.

Challenge 7: API Key Configuration

Problem: Managing multiple API keys (Daytona, Gemini) across environments.

Solution: Centralized configuration system with .env file support and clear error messages for missing keys. Automatic path resolution for project root.

Accomplishments that we're proud of

1. Complete Integration Success

  • Successfully integrated Daytona SDK for secure sandbox execution
  • Integrated Google Gemini API for AI code generation
  • Seamless frontend-backend communication with error handling

2. Innovative Branching Workflow

  • Created unique branching system that doesn't execute immediately
  • Form autofill feature for smooth iteration experience
  • Clear separation between snapshot creation and execution

3. Text-Based Visualization

  • Developed ASCII art visualization system for clustering results
  • Comprehensive text output format that works perfectly in terminal
  • Rich statistics and metrics without GUI dependencies

4. Production-Ready Error Handling

  • Graceful degradation for all API failures
  • Clear error messages for users
  • Automatic cleanup of resources even on errors

5. Developer Experience

  • Beautiful terminal-style UI that feels modern and intuitive
  • One-click operations for all major features
  • Visual timeline makes experiment history immediately clear

6. Code Generation Integration

  • AI-powered code generation with constraints (scikit-learn only)
  • Voice input support through Galileo integration
  • Smart prompt engineering for relevant code output

7. Full Workflow Demonstration

  • Complete clustering algorithm comparison example (K-Means → DBSCAN)
  • Shows real-world use case for data scientists
  • Demonstrates iterative experimentation workflow

What we learned

Technical Learnings:

  1. Sandbox Isolation: Daytona's sandbox model provides true isolation - each execution is completely independent, which is crucial for reproducibility.

  2. AI Code Generation: Prompt engineering is critical. Constraints must be explicit (e.g., "only scikit-learn") to avoid dependency issues. Voice input through Galileo adds accessibility.

  3. State Management: In-memory storage works for MVP, but snapshot data structure is designed for easy migration to database (PostgreSQL, MongoDB, etc.).

  4. Async Operations: FastAPI's async model makes it easy to handle multiple concurrent requests while sandboxes are being created/executed.

  5. Frontend-Backend Sync: React state management with real-time updates requires careful coordination. The form autofill via refs works perfectly for programmatic updates.

Process Learnings:

  1. Iterative Development: Started with placeholders, integrated Daytona, then added Gemini. Each layer built on the previous one.

  2. User-Centric Design: Changed branch behavior based on user feedback. Non-executing branches with autofill proved much better UX than immediate execution.

  3. Error Handling Early: Building error handling from the start prevents cascade failures and provides better debugging.

  4. Example-Driven Development: Creating comprehensive examples (clustering workflow) helped validate the system's real-world utility.

Domain Learnings:

  1. Data Science Workflows: Understanding how data scientists iterate (baseline → advanced algorithms) helped design branching workflow.

  2. Experiment Tracking: Snapshot structure captures everything needed for experiment tracking - this could extend to full ML experiment tracking systems.

  3. Code Execution Safety: Isolation is critical for untrusted code execution. Daytona provides this out of the box.

What's next for REXA

Immediate Enhancements:

  1. Database Persistence: Migrate from in-memory storage to PostgreSQL or MongoDB for persistent snapshot history.

  2. Artifact Handling: Implement file upload/download for execution artifacts. Store generated images, data files, and reports.

  3. Multi-Language Support: Extend beyond Python to support JavaScript, TypeScript, R, and other languages.

  4. Enhanced AI Integration:

    • Support for multiple AI models (Claude, GPT-4, etc.)
    • Code explanation and documentation generation
    • Error explanation and debugging suggestions

Built With

  • 2.5
  • daytona-sandboxes
  • daytona-sdk
  • fastapi
  • flash
  • gemini
  • google
  • google-gemini-api
  • next.js-14
  • pydantic
  • python-3.10+
  • react-18
  • tailwindcss
  • typescript
  • uvicorn
Share this project:

Updates