Inspiration
As a developer, I frequently use AI to generate code, but the workflow is often manual and inefficient—copying code, running it, encountering errors, and repeatedly sending those errors back to the AI. I realized that if an AI agent could directly observe runtime errors and interact with project files, it could fix its own mistakes autonomously. This insight led me to build an agent that doesn’t just write code, but takes ownership of ensuring it actually runs—a true ‘Marathon’ runner that continues until the task is complete
What it does
Writes Code: You give it a prompt, and it uses the Gemini 2.0 Flash API to generate a complete Python script. Executes Locally: Unlike a standard chatbot, this agent has "hands." It saves the file to the local system and executes it using a subprocess. Self-Heals: If the code crashes (runtime errors, syntax errors, or logic bugs), the agent captures the error message (stderr). Iterates: It feeds that error back into Gemini 2.0, which analyzes the crash, rewrites the code to fix the specific bug, and re-runs it. This loop continues until the application runs successfully.
How we built it
Brain: We used the Google GenAI SDK to access Gemini 2.0 Flash and Gemini 1.5 Flash. We implemented a "Model Roster" system that rotates between models to handle rate limits and ensure uptime.
Hands (Tools): We built custom Python functions (create_file) and connected them to Gemini using the Function Calling API. This allows the model to interact directly with the local file system.
Body (Execution Environment): We used Python's subprocess module to create a controlled environment where the generated code runs. We capture standard output and errors in real-time.
Frontend: We built a clean interface using Streamlit, allowing users to interact with the agent through a web browser instead of just the terminal.
Challenges we ran into
Infinite Error Loops: Sometimes the AI would make the same mistake twice. We had to implement a strict "retry limit" (5 attempts) and improve the prompt engineering to force it to try different solutions.
API Quotas: We hit the "429 Resource Exhausted" error frequently during testing. To solve this, we built a fallback system that automatically switches from gemini-2.0-flash-exp to gemini-1.5-flash if one model is overloaded.
File System Security: Giving an AI write access to your computer is risky. We had to ensure it only wrote to specific files (app.py) and learned how to securely manage API keys using keys.py and .gitignore so they wouldn't be leaked on GitHub.
Accomplishments that we're proud of
True Autonomy: Watching the agent crash, detect the error, and fix itself without me touching the keyboard was a magic moment.
Robust Error Handling: We successfully handled complex crashes (like IndexError and ModuleNotFoundError) where the agent identified missing libraries or logic gaps and fixed them instantly.
Security Best Practices: We successfully implemented secure key handling, separating our secrets from our codebase, which is crucial for a production-level application.
What we learned
Agents vs. Chatbots: We learned that an "Agent" is fundamentally different from a Chatbot. An Agent needs a loop, memory of past errors, and the ability to execute actions, not just generate text.
Prompt Engineering for Debugging: We discovered that giving the AI the exact Python traceback (error message) is much more effective than just saying "it didn't work." The model is incredibly good at debugging when it sees the raw error logs.
What's next for Gemini-Marathon-Agent
Sandboxing with Docker: Currently, the code runs on the host machine. The next step is to run the generated code inside a Docker container to prevent accidental system damage (e.g., if the AI tries to delete files).
Multi-File Projects: Enabling the agent to build complex apps that require multiple Python files (CSS, HTML, databases) interacting with each other.
Web Search Capability: Adding a tool that allows the agent to Google documentation if it gets stuck on a library it doesn't know.
Log in or sign up for Devpost to join the conversation.