Inspiration

We identified a problem within our marketing department that could be solved using LLMs. When conducting marketing research our colleagues spend a lot of time going through hundreds of pages of financial reports to produce a summary on some key areas (like Investment strategy, Revenue growth etc.).

What it does

We developed a chatbot where the user can upload one or more financial reports. The chatbot has two main features. The first one is an automated summary which is done using a list of predefined questions that are included in the LLM prompt. The second feature is a chat functionality where the user can ask additional free-form questions on the documents.

How we built it

Using the llama_index library and employing a RAG system with small-to-big retrieval, and subquestion summary engine for the chatbot which communicates with OpenAI's LLM model to generate sub-question and Generating summaries.

Challenges we ran into

Picking the correct querying engines,picking correct questions for the summary, choosing how to approach the retrieval. Most of the time there are too many updates coming from the packages so environment stability is also a big challenge. OpenAI's API usage is typically metered, which means costs can accumulate quickly with extensive use, and there may be usage quotas.

Accomplishments that we're proud of

Using the small to big retrieval helped a lot. We helped the marketing team save time.

What we learned

Building and evaluating a RAG system is hard because there are a lot of components that can be optimized

What's next for Doc-Insider

Refined querying techniques, advanced hallucination detection and output quality evaluation. A user interface can also be developed. Streamline PDF uploads by enabling the Agent to retrieve company-specific annual reports automatically from internet and extract pertinent individual profiles from LinkedIn.​

Built With

  • databricks
  • llamaindex
  • openai
  • python
Share this project:

Updates