Running Doom with Lagrange : Creating a VizDoom RL environment using Lagrange model backend
Lagrange's implementation of Conway's Game of Life
Multi-page Streamlit application made using Lagrange
Chat UI

Inspiration

After the disappointment of discovering that Devin.ai's demonstrations were fake, our team felt a mix of frustration and determination. We realized there was a gap between the promise of AI and reality. So, we decided to step up and fill that gap. Inspired by the potential of AI and the need for genuine innovation, we embarked on a journey to create our own AI software engineer. Our aim is simple: to build a tool that makes coding easier and more reliable for everyone. We're not aiming to be saviors or revolutionaries; we just want to create something that works, something that people can trust. It's a grounded approach to a complex problem, driven by a desire to make a real difference in the world of technology.

What it does

Lagrange is a full fledged AI software engineer which currently uses python to craft a solution for a user's query. Functionalities:

Research: Lagrange performs meticulous research to thoroughly understand the problem space of each project. This involves analyzing existing solutions, exploring relevant technologies, and synthesizing information to identify the most efficient and innovative approaches.
Code generation: Based on the insights gained from in-depth research and an iterative user feedback loop, Lagrange crafts a detailed implementation strategy. This strategy encompasses all aspects of project development, from directory structure and management to writing and executing code directly on the user's local system.
PowerShell support: Our model can utilize PowerShell for a plethora of use cases; be it directory management, cloning git repo, installing all the dependencies, or setting up the environment.
Python support: Python's flexibility in applications such as web development, machine learning, and GUI application creation allows Lagrange to implement complex workflows effectively. The model leverages Python to build versatile and scalable solutions that cater to a wide range of use cases.

How we built it

We began our project by exploring the capabilities of Gemini models within AI Studio, ultimately choosing Gemini 1.5 Pro due to its extensive one million token context window and robust multimodal support. Our approach involves finely tuned prompt engineering to develop specialized functionalities within the Gemini model. Specifically, we've designed distinct agents for different tasks:

Research Agent: This agent is tasked with conducting in-depth research, synthesizing large volumes of information, and providing comprehensive analyses to guide the project’s direction.
Code Generation Agent: Focused on translating the insights and strategies developed by the Research Agent into executable code, this agent helps automate and streamline the development process.

For the architecture of our application, we utilized Flask to create a responsive and efficient backend. The frontend is crafted using HTML and JavaScript, with Tailwind CSS for styling, ensuring a sleek, modern user interface that enhances user interaction and experience, and we used Pyautogui to automate the code generation.

Challenges we ran into

Developing Lagrange posed a variety of challenges, particularly in managing the complex requirements of the project. Here are some key issues we faced and the solutions we implemented:

Context Retention: Initially, we were concerned about our ability to maintain context in a project as intricate as Lagrange. Fortunately, the Gemini 1.5 Pro model’s one million token context window proved instrumental, providing the extensive memory capacity needed to handle the project's complexity effectively.
Local System Access: A major technical challenge was figuring out how to interact with the user's local system to manage directories and write code. Initially, we explored using Hunter and Peck, which allowed system use without a mouse. However, we eventually settled on a combination of pyautogui for automating GUI interactions and PowerShell for executing system-level commands, which provided a more robust and integrated solution.
Code Generation and File Management: We needed a method to accurately parse information and write code into specific files. This process involved not only generating the correct code but also ensuring it was placed correctly within the user's file system.
PowerShell Command Integration: During testing, we noticed that Gemini had difficulties generating PowerShell commands without errors. To address this, we introduced delimiters in the command generation process and developed a specialized parser. This ensured that commands were formatted correctly, significantly reducing errors and streamlining command execution.

Accomplishments that we're proud of

One of our proudest milestones in the development of Lagrange was its successful interaction with ViZDoom, a complex, Doom-based AI research platform. This achievement highlighted several advanced capabilities of our AI system:

Repository Management: Lagrange successfully identified and cloned the correct GitHub repository for ViZDoom. This step was crucial as it involved understanding and locating the specific repository from a myriad of available options, demonstrating Lagrange's adeptness in handling real-world software repositories.
Directory Management and File Handling: After cloning the repository, Lagrange seamlessly managed directory creation, organizing the project structure efficiently without any manual intervention. This automation was particularly valuable as it ensured error-free directory setup, which is often prone to human error.
Path Management: A standout feature was Lagrange’s ability to accurately remember and manage file paths within the project. This capability is essential for integrating and running complex software where the correct file referencing is crucial.
Action Performance: Post setup, Lagrange was able to execute actions within the ViZDoom environment, interfacing with the software to perform tasks that typically require human input. This not only showcased the AI’s ability to interact with advanced gaming environments but also its potential to automate and enhance user interaction in sophisticated software ecosystems.

What we learned

Working on the Lagrange project has been an incredibly enriching experience for our team, significantly enhancing our understanding of large language models (LLMs) and the intricacies of prompt engineering. This project allowed us to explore the full capabilities of sophisticated AI tools and taught us how to leverage their reasoning abilities to perform sequential tasks efficiently. Here are some key areas where we gained valuable knowledge and skills:

Advanced Automation Techniques: We developed a deeper understanding of automation, learning to streamline complex processes such as directory management and system interactions without manual oversight.
Integration of Multiple Technologies: The need to combine Python, PowerShell, pyautogui, and AI models pushed us to enhance our skills in integrating various technologies. This experience was invaluable in learning how to architect and implement multifaceted solutions that require seamless functionality across different platforms.
Debugging and Problem Solving: The challenges we encountered, especially in ensuring error-free PowerShell command execution, honed our debugging and problem-solving skills. These are crucial in a complex, multi-component software environment where precision is key.
Performance Optimization: Managing the large context window of the Gemini 1.5 Pro model and ensuring efficient task execution taught us about performance optimization, particularly in high-demand computational contexts.
User-Centric Design: Implementing a feedback loop for code generation and system interaction improved our approach to user-centric design, emphasizing the creation of intuitive and responsive systems.
Ethical Considerations and Security: Handling sensitive operations and interacting with user systems deepened our awareness of the ethical considerations and security measures essential in software development, especially when dealing with user data and system access.

What's next for Lagrange

As we continue to advance Lagrange, our vision encompasses transforming it from a mere coding assistant to an all-encompassing tech partner capable of handling full-stack development projects, tech research, infrastructure planning, and strategic management. Here's how we plan to expand its capabilities:

Integration of Retrieval-Augmented Generation (RAG): By incorporating RAG, Lagrange will have a vast knowledge base at its disposal, enabling it to autonomously assemble complex workflows from necessary technological components.
Tech Research Automation: Lagrange will automate advanced tech research, ensuring it is continuously updated with the latest technologies, tools, and methodologies. This capability will empower it to offer cutting-edge advice and solutions.
Infrastructure Planning: We will enhance Lagrange's ability to suggest optimal infrastructure setups based on project requirements, covering server architectures, cloud services, and data storage solutions.
Strategic Project Management: Beyond technical assistance, Lagrange will provide strategic insights on resource allocation, timeline estimation, and risk management, ensuring efficient and effective project completion.
Multi-Language and Platform Integration: To serve a global market, we plan to support multiple programming languages and ensure seamless integration with various development tools and platforms.
Enhanced AI Collaboration and Customization: Lagrange will be designed to collaborate with human programmers, adapting to individual coding styles and preferences, and featuring customizable workflow automation to fit specific organizational needs.
Real-Time Error Detection and Resolution: Enhancing its debugging capabilities will allow Lagrange to detect and resolve errors in real-time, significantly streamlining the development process.
Security and Privacy Enhancements: As Lagrange takes on more complex tasks, ensuring robust security measures and privacy protocols will be paramount to protect user data and intellectual property.
Natural Language Programming: Our goal is to make programming through natural language a reality, lowering the entry barrier for product creation. This will enable users to focus more on conceptualizing products while leaving the complexities of technological implementation to Lagrange.

By broadening Lagrange’s scope to include not just coding but also comprehensive tech assistance, we aim to transform it into a tool that not only builds, tests, and deploys full-stack complex projects but also revolutionizes how technology is approached, making software development more accessible, efficient, and innovative for everyone involved.