Inspiration

Notorious RAG is an AI Agent that responds to user questions in discord. This early version is optimized for developer tooling communities and will soon be rolled out to support the Discord community for https://github.com/BoundaryML/baml.

As communities grow, new users often have similar questions to what have been answered before. A lot of us have had this issue in managing developer tools and developer communities, but it's often more effort to go find the previous answer than to just answer it again. We wanted to tackle this with LLMs, but with a very high quality bar.

The Problem

Today's LLMs are good enough to do 90% of the work and to get the answer right ~90% of the time, but there's a lot of nuance in replying in a way that:

  • Makes the user feel heard
  • Doesn't feel like a canned AI response
  • Genuinely solves the problem

For people building developer tools, community members are precious, and developers have very low patience for low-effort or low quality community support. Most folks who try to answer community questions with LLMs tend to find that there are enough bad results that:

In order to make a reply bot that saves people time without alienating community members, we applied the following key approaches:

  1. Use RAG against previous discord replies from humans as context for answer formulation
  2. Implement low-friction human approval to ensure all answers are very high quality

What it does

When a user asks a question, a discord bot agent picks up the question, decides if it needs an answer, and then hands it off to an answer formulation agent. The agent uses RAG + Tool calls to formulate an answer from docs chunks and previous discord threads. The agent then validates the proposed answer with a human in slack. If the human has feedback, it will continue looping and calling tools until a human approves the response. At that point, the response will be posted.

How we built it

Pre–Indexing

All RAG is performed against a single index with both discord threads + docs included.

  • Built a custom Pinecone loader that collects discord messages into threads, then inserts whole threads as chunks. For threads that are over 100 messages, split into 2 chunks (there were no threads over 200 messages in our dataset). We had to use a custom loader, because the built-in LlamaIndex loader strips thread metadata from messages.

  • Used firecrawl to load markdown documentation into Pinecone.

Responding to Messages

When a message comes in:

  1. A discord bot detects it and uses a TogetherAI model to classify whether the message needs a response. E.g. if it's a question or something needs help.
  2. If an answer is needed, the discord bot creates a new thread, and sends the user's question to a FastAPI server
  3. FastAPI kicks off an AnswerAgent background task and returns an id to refer to this question. State lives in Firebase.
  4. The Agent has the following tools, and will loop until it has a good answer. It uses GPT-4:
    • RagQuery
    • FinalAnswer
  5. The RagQuery tool uses a LlamaIndex retriever to query the pre-loaded index in Pinecone
  6. The Agent may call one or more RagQuery tools, and eventually reaches the FinalAnswer state
  7. When a FinalAnswer is reached, the [incomplete sentence]

Human Approval

We use humanLayer to approve potential actions in slack

Web UI

We built an intuitive Web UI for testing questions and observing the Agent workflow.

Challenges we ran into

  • Getting the right discord contents into separate threads and questions.
  • Fork of LlamaIndex discord loader, that puts entire threads into the vector store (Shout out to LlamaIndex).
  • Classifying whether a user question uses a bot response - can do this with signals like previous questions asked, user intent, but we used a togetherAI model to classify and still tuning it.
  • Handling the various modes of human feedback, and allowing the LLM to incorporate that into further tool calls and composition:
    • Approve / go
    • Rewrite it for style
    • Fetch context from more places
    • Stop and I'll take over

Accomplishments that we're proud of

  • Really high quality answers
  • Streamline the agentic rag process
  • View into the brain and thought process of the agent
  • Human approval workflows let us iterate and improve without alienating community

What's next for Notorious R A G

We want to get live in some more communities We want to add more concise question/answer pairs to the discord message index

Built With

  • arize-phoenix
  • baml
  • discord-py
  • fastapi
  • firebase
  • humanlayer
  • llamaindex
  • nextjs
  • openai
  • pinecone
  • together-ai
Share this project:

Updates