About the Projects
Project 1 AI Powered NJIT ChatBot (Highlander Bot)
Inspiration
Highlander Bot was created to empower NJIT students and prospective students with a personal AI assistant that can address their campus-related questions. The goal is to make information readily accessible, support academic and administrative needs, and improve the overall NJIT experience through a user-friendly, intelligent chatbot. Recognizing the growing demand for fast, accurate responses in university environments, Highlander Bot was designed as an NJIT companion that adapts over time through user feedback.
What it Does
Highlander Bot acts as an AI-powered "buddy" for NJIT students:
- Answers Campus Questions: The bot provides instant answers on a wide range of topics, from admissions to campus events.
- User-Adaptable Responses: Highlander Bot learns from interactions to continually improve response quality.
- Seamless Information Access: By leveraging real-time data from NJIT’s website, the bot keeps its responses up-to-date.
- Fast and Reliable: Built with advanced architecture, the bot delivers responses quickly, even switching models if needed for efficiency.
How It Works 🛠️
The NJIT Highlander Bot is powered by an LLM trained on data scraped from various sections of the NJIT website, creating a knowledge base that enables the bot to respond intelligently to queries. The process of data gathering and embedding is outlined below.
1. Data Scraping with BeautifulSoup 🥣
To build the bot's knowledge base, we used BeautifulSoup to scrape content from the NJIT website. This content serves as the foundation of the bot’s responses, allowing it to provide answers that are accurate and relevant.
The script collects text from specified URLs, focusing on paragraph elements that contain the core information about NJIT. This collected text is then processed for further steps.
2. Creating Vector Embeddings with FAISS 🧬
Once the text is scraped, we use a Sentence Transformer model (all-MiniLM-L6-v2) to generate embeddings for each piece of content. These embeddings are vector representations of the textual data, allowing the bot to perform efficient similarity searches.
To enable quick and scalable querying, we store the embeddings in a FAISS (Facebook AI Similarity Search) vector database. FAISS allows the bot to find the most relevant information for user queries by searching through the embedding vectors, resulting in accurate and contextually appropriate responses.
3. Integrating the First LLM with NVIDIA's Llama Model 🦙
To create an intelligent RAG-based chatbot, we leveraged nvidia/llama-3.1-nemotron-70b-instruct, a powerful language model from NVIDIA's suite. This model allowed us to process and understand complex queries by retrieving relevant information from our FAISS-powered vector database. The combination of Llama’s robust language understanding and our vector embeddings enabled the bot to deliver precise and contextually accurate responses.
4. Fine-Tuning with BERT for Custom Data 🧠
In addition to the initial LLM, we implemented BERT from the transformers library and fine-tuned it on custom NJIT data. To generate this fine-tuning dataset, we used nvidia/nemotron-4-340b, another advanced language model from NVIDIA, which produced synthetic data closely aligned with the types of queries and responses relevant to NJIT.
The integration of BERT, fine-tuned with this custom dataset, added a layer of specificity, enabling the Highlander Bot to better address nuanced questions and provide more personalized assistance to NJIT users. This approach, combining both a large-scale LLM and a fine-tuned BERT model, helped us achieve a balanced chatbot capable of accurate retrieval and context-aware responses.
NJIT Highlander Bot Architecture Flow
# NJIT Highlander Bot Architecture Flow
The NJIT Highlander Bot architecture is designed to provide quick, accurate, and contextually relevant responses by combining predefined responses, a large language model (LLM), and a fine-tuned fallback model. Here’s a detailed step-by-step explanation of the flow:
---
## Architecture Flow Diagram
```plaintext
+----------------+
| User Query |
+----------------+
|
v
+------------------------+
| Check for Stop Words |
+------------------------+
|
v
+---------------------------------+
| Check Predefined Q&A from FAQs |
+---------------------------------+
|
v
+--------------------------+
| Check Conversation Log |
+--------------------------+
|
+----------+-----------+
| |
v v
+------------------+ +-------------------------+
| Predefined | | NVIDIA LLM Processes |
| Response | | Query |
+------------------+ +-------------------------+
|
v
+-----------------------+
| Search Vector DB |
+-----------------------+
|
+------+------+
| |
v v
+------------------+ +-----------------+
| Context Found | | No Context |
| | | Scrape Web |
+------------------+ | for Solution |
| +-----------------+
|
+--------+--------+
| Generate Output |
+--------+--------+
|
v
+-------------------+
| Store in |
| Conversation Log |
+-------------------+
|
v
+----------------------+
| Timeout Handling |
| Fine-tuned BERT |
| Generates Response |
+----------------------+
|
v
+-------------------+
| Store in |
| Conversation Log |
+-------------------+
NJIT Highlander Bot Architecture Flow
The NJIT Highlander Bot architecture is designed to provide quick, accurate, and contextually relevant responses by combining predefined responses, a large language model (LLM), and a fine-tuned fallback model. Here’s a detailed step-by-step explanation of the flow:
Architecture Flow Explained
The NJIT Highlander Bot follows a structured flow to ensure fast, accurate, and contextually relevant responses. Here’s a breakdown of each step in the architecture:
User Query 💬
- The process begins when a user submits a query to the chatbot.
Check for Stop Words 🔍
- The bot initially checks if the query contains any stop words or irrelevant words that might indicate a simple or unimportant query. This helps to filter out unnecessary queries that do not require complex processing.
Check Predefined Q&A 📖
- Before proceeding further, the bot checks a set of hardcoded, predefined questions and answers. This Q&A set handles frequently asked questions and common queries, providing an immediate response when there’s a match.
- Predefined Response: If a match is found in the predefined Q&A set, the bot provides the corresponding answer without further processing.
Check Conversation Log 📜
- If the query isn’t found in the predefined Q&A set, the bot searches its conversation log file to check if the query has been previously asked and answered. This log helps in quickly responding to repeated queries.
- Stored Response: If a match is found in the conversation log, the bot retrieves and provides the saved response.
- NVIDIA LLM Processes Query: If the query is not found in the log, the bot proceeds to the next stage, where the NVIDIA LLM handles the query.
NVIDIA LLM and Vector Database Search 🧠
- For complex queries, the NVIDIA Large Language Model (LLM) is activated. The LLM processes the query and searches a FAISS-powered Vector Database for relevant information:
- Context Found: If relevant context is found in the vector database, the bot generates a response based on this context.
- No Context Found: If no context is available, the LLM initiates a web scraping process to gather information that could help answer the query.
- After generating the response (whether from context in the vector database or web scraping), the bot stores the output in the conversation log for future reference.
- For complex queries, the NVIDIA Large Language Model (LLM) is activated. The LLM processes the query and searches a FAISS-powered Vector Database for relevant information:
Timeout Handling and Fine-Tuned BERT ⏱️
- If the NVIDIA LLM takes too long to generate a response, the bot activates a fine-tuned BERT model as a fallback. This BERT model, fine-tuned on NJIT-specific data, quickly generates a response to ensure the user isn’t kept waiting.
- The response from the BERT model is also stored in the conversation log, allowing for quick retrieval in future queries.
Store in Conversation Log 🗂️
- Regardless of the response source (predefined Q&A, NVIDIA LLM, web scraping, or BERT), the final output is stored in the conversation log. This log is continuously updated, allowing the bot to improve its efficiency and response accuracy over time.
Summary
This architecture flow allows the NJIT Highlander Bot to:
- Quickly handle common queries with predefined responses.
- Leverage the power of NVIDIA LLM for complex, context-dependent questions.
- Fallback on fine-tuned BERT for timely responses when needed.
Chatbot Application Snapshots
Home page
Response for a stop word
Response for data not present in data base
Response for question from converstion log
Response for Pre defined question
Project 2 Star Micronics POS Data Analysis Dashboard
This project is designed to streamline and analyze POS data from Star Micronics. By uploading .stm files, the system consolidates data into an Excel file with multiple sheets and performs insightful analysis to generate interactive dashboards.
Inspiration Behind the Project
The inspiration for this project stemmed from the need to simplify and enhance the analysis of Point-of-Sale (POS) data for businesses. Working with raw POS data can be challenging, especially when it's spread across multiple files and formats. Star Micronics, a leader in POS solutions, generates .stm files that contain valuable transaction data. However, this data is often underutilized due to the lack of streamlined analysis tools.
The primary goals of this project were to:
Automate Data Consolidation: Simplify the process of combining and structuring POS data by automatically parsing
.stmfiles and organizing them into a single, accessible Excel file.Generate Valuable Insights: Enable businesses to gain actionable insights into their sales performance, customer behavior, and operational trends. By integrating external data, such as weather and game day information, we can provide more context to sales patterns.
Leverage AI for Enhanced Analysis: Integrate advanced AI models to generate insights, using both a fine-tuned Language Learning Model (LLM) and traditional data analysis approaches. This dual approach ensures that users receive comprehensive insights from both structured data and unstructured queries.
Build an Interactive Dashboard: Create an intuitive, easy-to-navigate dashboard that allows non-technical users to explore the data and insights effortlessly, empowering them to make informed decisions.
By developing this tool, we aim to unlock the potential of POS data and provide Star Micronics and their customers with a powerful tool for data-driven decision-making. The project brings together data engineering, data science, and interactive visualization to create a holistic solution for POS data analysis.
Project Workflow
Upload & Parse Data
- The application accepts a zip file containing
.stmfiles. - A Python script parses each
.stmfile and consolidates the data into an Excel file with three sheets:- Orders: Contains fields such as
Order Number,Total Price,Order Date Time,Payment Method,Total Amount,VAT Amount,Change Due,VAT Number, andReceipt Print ID. - Items: Contains fields including
Order Number,Quantity,Name,Price,VAT Rate, andVAT Amount. - VAT Summary: Includes fields
Order Number,VAT Rate, andAmount.
- Orders: Contains fields such as
- The application accepts a zip file containing
Enrich Data with External API
- After extracting data from
.stmfiles, an API call is made to retrieve weather and game day information. - Two new fields,
WeatherandGame Day, are added to the Orders sheet based on the date and time of each order.
- After extracting data from
Data Analysis
- The analysis phase has two approaches:
- LLM-Powered Analysis: Leverages a fine-tuned Language Learning Model to generate insights in a fixed format by prompting the model.
- Pandas Data Analysis: Uses the Pandas library to perform traditional data analysis and derive various insights from the data.
- The analysis phase has two approaches:
Dashboard Generation
- The final step utilizes the Dash framework to create an interactive dashboard for data visualization.
- The dashboard includes various insights such as:
- Total spending by day of the week.
- Top-selling items.
- VAT anomalies.
- Spending patterns based on weather and game day conditions.
Dashboard Screenshots
Here are some screenshots of the dashboard to illustrate its functionality:
Upload Data
Sales Insights Dashboard and Spending by Day
Top-Selling Items
Installation & Usage
Prerequisites
- Python 3.x
- Required Python libraries (listed in
requirements.txt)
Installation
- Clone this repository: ```bash git clone https://github.com/PremchandJalla/sma-flask-dashboard.git
- Navigate to the project directory:
cd sma-flask-dashboard
2. Run the application:
```bash
python app.py
Built With
- beautiful-soup
- bert
- docker
- faiss
- faiss-vector-database
- flask
- github
- nvidia-language-model-(llm)
- openai-embeddings
- python
- reinforcement-learning-(rl)-framework
- replit
- retrieval-augmented-generation-(rag)
- scrapy
- vs
Log in or sign up for Devpost to join the conversation.