Inspiration

The idea for our AI chatbot stemmed from the desire to simplify the way we interact with information. Faced with the deluge of data in PDFs, we wanted to create a tool that could provide instant summaries and answer questions with precision, transforming how the R&D dept. access and utilize information in their daily work lives, enabling them an efficient solution.

What We Learned

Our journey was a deep dive into the intricacies of natural language processing (NLP), machine learning (ML) and large language models (LLMs). We explored state-of-the-art language models and experimented with various algorithms for text extraction and summarization. The process was a hands-on experience in agile development and iterative improvement.

How We Built It

The construction of our chatbot followed a modular approach:

  1. PDF Processing: Implementing text extraction techniques to convert PDF content into analyzable text.
  2. NLP Engine & LLM model: Integrating DOLLY 2.0 for understanding and generating human-like responses.
  3. Data Management: Utilizing MLflow for model lifecycle management and Databricks for scalable computing.
  4. Search Capability: Employing FAISS to efficiently retrieve information from the document corpus.

Challenges Faced

Among the hurdles we encountered were:

  • Data Quality: Cleaning and structuring data from PDFs was more complex than anticipated.
  • Model Tuning: Balancing the trade-offs between response accuracy and processing time.
  • Scalability: Ensuring the bot could handle a large number of documents and user queries simultaneously.
  • User Experience: Crafting a chatbot interface that was intuitive and user-friendly.

Despite these challenges, the project was a rewarding experience that pushed the boundaries of our technical expertise and creativity.

Built With

  • databricks
  • dolly
  • huggingface
  • langchain
  • pypdf
  • python
  • vectordb
Share this project:

Updates