Inspiration

We are Wolfram Summer School graduates with inclination to work with Mathematica and the Wolfram Language. As of yet, Wolfram does not have its own AI Chatbot. Therefore, we were inspired to create a local AI Chatbot using AI Workbench and exposed it to some tools available in Mathematica.

What it does

This project exposes research papers downloaded from arxiv.org and allows the chatbot to query.

How we built it

To fine-tune a Large Language Model (LLM) efficiently, we employed a combination of Low-Rank Adaptation (LoRA) and k-bit precision techniques, ensuring optimal use of computational resources while maintaining model performance. Below is an overview of the key components of our approach:

  1. Model and Data Preparation: We selected Meta’s LLaMA 3.1-8B-Instruct model and configured it for memory efficiency using 4-bit precision. We gathered and cleaned a dataset of machine learning papers, transforming paper titles into user queries and pairing them with summaries to simulate a natural conversation.
  2. LoRA Fine-Tuning: We fine-tuned the model using Low-Rank Adaptation (LoRA), targeting specific projection layers to minimize computational overhead. The Hugging Face Trainer API was used to optimize the training process with techniques such as mixed precision and gradient checkpointing for efficient resource management.
  3. Deployment: After training, the model was saved with LoRA weights and prepared for deployment. The model was tested to ensure it could generate accurate and contextually relevant responses to new user queries based on the learned patterns.

Challenges we ran into

Three competitors lived in two countries and two states. The GitHub repository is absolutely necessary. A single GPU computer made the testing reasonable. We were able to work on the project simultaneously.

The challenges were significant to overcome some important conceptual problems. One of the most significant problems was the ability to expose a Uri so that it could link to Mathematica's functions. This was not accomplished, but I would recommend using NVIDIA's Hybrid RAG example to view this functionality. We were able to connection to this project with Wolfram Language.

Accomplishments that we're proud of

Working together with a physically dispersed team without any background in using NVIDIA products has been well worth the journey. AI Workbench seems solid, albeit is very structured and requires some getting use to.

What we learned

We discovered that fine-tuning a chatbot to recommend papers to users faces significant challenges. One major issue is extrinsic hallucination, where the model generates inaccurate or fabricated information, which negatively impacts the quality of the recommendations. This problem is inherent to the model’s generation process and difficult to avoid. After further analysis, we concluded that Retrieval-Augmented Generation (RAG) would be a more effective solution for this task, as it can ground the model’s responses in real, factual data from an external knowledge source, thereby improving accuracy and reliability.

This section summarizes the key insights and highlights the shift in approach for a more reliable solution.

What's next for Wolfram Language Container

The Wolfram Language (WL) includes a variety of capabilities for making LLMs. Provided the Uri interface works it can provide chat-based access. WL also includes programmatic functionality to allow LLMs to access Wolfram Language tools. The Wolfram Prompt Repository provides a curated collection of prompts for delivering a range capabilities.

Share this project:

Updates