consensus.ai

The consensus.ai landing page
The fundamental consensus.ai architecture

Inspiration

We have all experienced the frustrations of using an artificial intelligence model for a specific task only to be disappointed when it gives inaccurate information. Well, what if there was an AI model that integrates features of three widely used artificial intelligence systems to create a more seamless and efficient way of driving better responses with less hallucinations…think no more! Welcome to Consensus AI - a platform for collaboration between AI models which leverages their strengths in different aspects to give better results!

What it does

Consensus AI fosters collaboration between widely used LLMs like GPT, Claude, and Llama, by incorporating the most efficient feature of each model. On the front end, a user inputs a query into a chat box, then Consensus AI calls each of the models mentioned above and prompts them to give a response to the user's query. Each answer is then added to a data structure, and the models are prompted to vote for the best and most accurate response. After the voting is completed, a consensus has been reached, and the response with the most votes is displayed to the user. Consensus AI also has memory and history features to provide optimal feedback to users based on previous questions, and to store users' answers for easy referencing.

Demo Video

In our short demonstration video, we input 3 prompts to consensus.ai.

First, we say hello!
'In what episode of Community did Jimmy buy a giant hat?'. (Hint: This episode doesn't exist! GPT 3.5 will often try to tell you some random INCORRECT episode). However, consensus.ai's voting system quickly is able to identify Claude's answer as the most accurate.
'What was the last question I asked you?'. This is to quickly demonstrate consensus.ai's multi-model memory feature. Although the answers are coming from different LLMs, they can keep track of previous things said in the conversation.

How we built it

Brainstorming: We started with brainstorming sessions to outline the core functionalities and design of our model. We noted down interface ideas, user experience issues, additional features, etc. After this, we began planning how the code of these ideas and features could be implemented in an efficient and user-friendly format.
Frontend: For consensus.ai's front end, we combined a variety of programming languages and frameworks, like React, HTML, CSS, and JavaScript to create a user-friendly website with optimal user experience features
Backend: For the backend, we employed various Python libraries to call the various LLMs, store their responses and vote for the most accurate response. We used Flask to handle incoming requests.
LLM Integration: We integrated several LLMs, each designed to handle one input query from the user. These models then analyze the question and generate their respective answers.
Voting System: Implemented a voting system where each AI model could vote on the best answer provided by all the other models. The model with the highest votes was selected as the final answer. This system helps to refine and validate the answer of each LLM, ensuring accuracy and reliability before presenting it to the user.
Testing and Refinement: For testing, we asked rather various individual models commonly hallucinated questions and got a mix of correct and incorrect answers from each of the models. Then we ran the same questions on consensus.ai and only the best answers were presented to the user.

Challenges we ran into

API setup can be tricky!
Parallelizing API calls for faster responses
Deciding on system architecture

Accomplishments that we're proud of

We created a model that incorporates multiple LLM’s to yield stronger performance and better accuracy across different platforms through our system. The model is good for enabling more reliable AI responses and reducing hallucinations. It allows for the best LLM’s answer each time it is prompted with a question. It also utilizes a system with multi-model memory which stores information of all three models.