the Web UI to watch the agent think and work, responding to human feedback
Discord bot responding to questions
Human approval workflow in slack
Human approval workflow in slack
full workflow diagram
Discord Message Index Workflow
system-architecture

Inspiration

Notorious RAG is an AI Agent that responds to user questions in discord. This early version is optimized for developer tooling communities and will soon be rolled out to support the Discord community for https://github.com/BoundaryML/baml.

As communities grow, new users often have similar questions to what have been answered before. A lot of us have had this issue in managing developer tools and developer communities, but it's often more effort to go find the previous answer than to just answer it again. We wanted to tackle this with LLMs, but with a very high quality bar.

The Problem

Today's LLMs are good enough to do 90% of the work and to get the answer right ~90% of the time, but there's a lot of nuance in replying in a way that:

Makes the user feel heard
Doesn't feel like a canned AI response
Genuinely solves the problem

For people building developer tools, community members are precious, and developers have very low patience for low-effort or low quality community support. Most folks who try to answer community questions with LLMs tend to find that there are enough bad results that:

In order to make a reply bot that saves people time without alienating community members, we applied the following key approaches:

Use RAG against previous discord replies from humans as context for answer formulation
Implement low-friction human approval to ensure all answers are very high quality

What it does

When a user asks a question, a discord bot agent picks up the question, decides if it needs an answer, and then hands it off to an answer formulation agent. The agent uses RAG + Tool calls to formulate an answer from docs chunks and previous discord threads. The agent then validates the proposed answer with a human in slack. If the human has feedback, it will continue looping and calling tools until a human approves the response. At that point, the response will be posted.

How we built it

Pre–Indexing

All RAG is performed against a single index with both discord threads + docs included.

Built a custom Pinecone loader that collects discord messages into threads, then inserts whole threads as chunks. For threads that are over 100 messages, split into 2 chunks (there were no threads over 200 messages in our dataset). We had to use a custom loader, because the built-in LlamaIndex loader strips thread metadata from messages.
Used firecrawl to load markdown documentation into Pinecone.

Responding to Messages

When a message comes in:

A discord bot detects it and uses a TogetherAI model to classify whether the message needs a response. E.g. if it's a question or something needs help.
If an answer is needed, the discord bot creates a new thread, and sends the user's question to a FastAPI server
FastAPI kicks off an AnswerAgent background task and returns an id to refer to this question. State lives in Firebase.
The Agent has the following tools, and will loop until it has a good answer. It uses GPT-4:
- RagQuery
- FinalAnswer
The RagQuery tool uses a LlamaIndex retriever to query the pre-loaded index in Pinecone
The Agent may call one or more RagQuery tools, and eventually reaches the FinalAnswer state
When a FinalAnswer is reached, the [incomplete sentence]

Human Approval

We use humanLayer to approve potential actions in slack

Web UI

We built an intuitive Web UI for testing questions and observing the Agent workflow.

Challenges we ran into

Getting the right discord contents into separate threads and questions.
Fork of LlamaIndex discord loader, that puts entire threads into the vector store (Shout out to LlamaIndex).
Classifying whether a user question uses a bot response - can do this with signals like previous questions asked, user intent, but we used a togetherAI model to classify and still tuning it.
Handling the various modes of human feedback, and allowing the LLM to incorporate that into further tool calls and composition:
- Approve / go
- Rewrite it for style
- Fetch context from more places
- Stop and I'll take over

Accomplishments that we're proud of

Really high quality answers
Streamline the agentic rag process
View into the brain and thought process of the agent
Human approval workflows let us iterate and improve without alienating community