Inspiration
We were inspired by recognizing limitations in current AI chatbots. While ChatGPT, Gemini, and other large language models are powerful, they can sometimes provide incomplete answers or fail to thoroughly examine complex problems from multiple angles. We identified that human experts often solve problems by breaking them down into components and approaching them systematically. By creating a system that combines multiple AI models in a "chain of thought" approach, we're mimicking this human problem-solving process to produce more comprehensive and accurate responses. This approach also addresses common shortcomings like hallucinations or reasoning errors by having multiple models verify and refine each other's work, similar to peer review in academic settings.
What it does
Our system implements a sophisticated workflow:
Problem Decomposition: The primary chatbot analyzes the user's prompt and breaks it down into distinct subproblems or components. Distributed Problem Solving: These subproblems are directed to a secondary AI model (potentially a different LLM or specialized model), which tackles each subproblem independently. Answer Synthesis and Verification: The original chatbot collects all subproblem solutions, evaluates their coherence and correctness, and synthesizes them into a unified response. Quality Assurance: Before delivering the final answer, our system performs verification checks to ensure accuracy, completeness, and alignment with the original query. Refinement Loop: If inconsistencies or gaps are detected, we have the system iterate through steps 2-4 again to improve the answer quality.
This process delivers answers that are more thorough, well-reasoned, and reliable than what a single chatbot could produce on its own.
How we built it
Our implementation leverages Streamlit for the frontend interface and Python for backend processing:
Frontend Development: We used Streamlit to create a clean, interactive web interface where users can input their prompts and view responses. API Integration: We developed wrapper functions to interface with multiple LLM APIs (such as OpenAI's API for ChatGPT and Google's API for Gemini). Prompt Engineering System: We created a sophisticated prompt engineering framework that structures how queries are broken down, distributed, and reassembled. Response Processing Pipeline: We implemented custom Python functions to handle the parsing, evaluation, and refinement of responses between the different AI models. Data Flow Management: We built efficient data handling to manage the flow of information between models while maintaining context.
Our architecture prioritizes simplicity and modularity, making it easier for us to maintain and potentially expand with additional models in the future.
Challenges we ran into
We faced several significant technical hurdles during implementation:
Context Management: We struggled with ensuring each AI model retained sufficient context from the original query while focusing on specific subproblems. API Limitations: We had to work within the rate limits and token constraints of different AI service providers. Model Disagreements: We needed to develop resolution strategies when different models provided conflicting answers to subproblems. Response Formatting: Creating a consistent and user-friendly format for final answers was challenging despite varied formats from different AI sources. Latency Issues: We had to balance the improved quality of multi-model responses against the increased response time from making sequential API calls. Prompt Design: We spent considerable time crafting effective prompts that would guide each model to perform its specific role in the chain of thought process. Error Handling: We built robust fallback mechanisms for when one model in the chain fails to provide a usable response.
These challenges required iterative development and creative problem-solving approaches, ultimately leading us to build a more resilient system.
Built With
- python
- streamlit

Log in or sign up for Devpost to join the conversation.