Research is a vast and highly congested ocean of information. For beginners, students, or even seasoned novices entering a new field, "pinning down" a specific research gap is an overwhelming challenge. The sheer volume of papers published daily makes it nearly impossible to identify where the current state-of-the-art ends and where new opportunities begin. We were inspired to build a tool that acts as a GPS through this academic congestion, democratizing the ability to find and solve unexplored scientific problems.
ResearchGPT is an autonomous multi-agent pipeline that transforms a simple research topic into a verified technical implementation. It coordinates five specialized AI agents:
The Researcher: Scans arXiv and Semantic Scholar for literature. The Analyst: Identifies research gaps and proposes novel architectures. The Coder: Implements the proposed methodology in Python (PyTorch/TensorFlow). The Verifier: Executes the code, generates metrics, and fixes errors in real-time.
We built the core logic using Python and a Multi-Agent System (MAS) architecture powered by Google’s Gemini API. The backend is a Flask web application that manages weighted state hand-offs between agents. We integrated ArXiv and Semantic Scholar APIs for data retrieval and used PyMuPDF for complex PDF parsing. To demonstrate the agent's code, we implemented a real-time execution sandbox that captures standard output and system logs for iterative self-rectification.
The primary hurdle was API Rate Limits—navigating the strict quotas of academic databases required building robust caching and queuing systems. Data acquisition was equally difficult; extracting structured data from unstructured academic PDFs is notoriously noisy. Most importantly, we faced a "data desert" when trying to verify the newly generated novel methods. Since real-world data for these brand-new hypotheses didn't exist yet, we had to pivot and build a logic to generate synthetic datasets on the fly. This allowed the agent to verify its code logic and produce performance metrics even in the absence of pre-existing datasets. Finally, Real-time Verification was the hardest part: getting an LLM to not only write code but autonomously debug its own environment and fix execution errors in a sandbox was a massive engineering feat.
We are incredibly proud of our Self-Rectification Engine. Seeing the agent get a "ModuleNotFoundError" or an indentation error, analyze the traceback, and rewrite its own code correctly without human intervention was our biggest "wow" moment. We successfully bridged the gap between theoretical research and functional, running code within a single, unified interface.
We learned that the most difficult part of AI is not the individual intelligence of the model, but the orchestration and communication between agents. Managing the "context window" as a project moves from literature search to code execution taught us a great deal about state persistence and structured data hand-offs. We also realized that synthetic data generation is a powerful tool for rapid prototyping when real data is a bottleneck.
We plan to expand our database reach to include IEEE Xplore and PubMed, and integrate a "Human-in-the-Loop" dashboard where researchers can tweak the agent's hypothesis mid-stream. Our ultimate goal is to turn ResearchGPT into a standard collaborator for labs across the globe, accelerating the pace of human discovery.
Built With
- arxiv-api
- flask
- gemini-api
- javascript
- pymupdf
- python
- report-lab
- semantic-scholar-api
Log in or sign up for Devpost to join the conversation.