Inspiration
This project was inspired by my own university, the sheer amount of documents in the university portal inspired me to build this.
What it does
This Chatbot is a context-aware AI system designed for educational institutions, featuring both an Admin Portal and a Student Portal. The Admin Portal allows administrators to upload, edit, and manage PDF files (e.g., textbooks, lecture notes) and their metadata (departments, semesters, tags) using a Streamlit-based interface, with files securely stored in AWS S3. The Student Portal enables students to log in using authentication, providing personalized context (e.g., department, semester) to tailor their queries. The chatbot leverages Cortex Search to retrieve the most relevant document chunks based on the student's query and context, then uses a Mistral LLM to generate accurate, context-aware responses. This ensures students receive precise answers grounded in the latest educational resources, while administrators maintain and organize the document repository efficiently. Together, the system provides a seamless, secure, and context-aware experience for both administrators and students.
How we built it
Admin Portal:
- Built with Streamlit.
- Upload, edit, and delete PDFs and metadata.
- Metadata includes departments, semesters, and tags.
- Files stored in AWS S3 using boto3.
- Automatically generates and uploads metadata CSV files.
- Built with Streamlit.
Student Portal:
- Built with Streamlit.
- Student login with authentication using Snowflake.
- Personalized context (department, semester) passed to the chatbot.
- Chat interface for querying and receiving responses.
- Built with Streamlit.
Context-Aware RAG Chatbot:
- Uses Cortex Search to retrieve relevant document chunks.
- Generates responses using Mistral LLM (e.g.,
mistral-7b). - Integrates with Snowflake Cortex for queries and LLM processing.
- Tailors responses based on student context (department, semester).
- Uses Cortex Search to retrieve relevant document chunks.
Authentication:
- Student credentials stored in Snowflake.
- Login managed using Streamlit Session State.
- Ensures secure access and personalized context.
- Student credentials stored in Snowflake.
Integration:
- Streamlit for frontend (Admin and Student Portals).
- Snowflake for backend (authentication, Cortex Search, Mistral LLM).
- AWS S3 for file and metadata storage.
- boto3 for S3 interactions.
- Streamlit for frontend (Admin and Student Portals).
Deployment:
- Deployed using Streamlit Sharing or Snowflake Native App Framework.
- AWS S3 hosts PDFs and metadata.
- Snowflake hosts backend logic and processing.
- Deployed using Streamlit Sharing or Snowflake Native App Framework.
Technologies Used:
- Streamlit (frontend).
- Snowflake (backend, authentication, Cortex Search, Mistral LLM).
- AWS S3 (storage).
- boto3 (S3 interaction).
- Snowpark Python (Snowflake integration).
- Streamlit (frontend).
Challenges we ran into
- Metadata management, faced difficulties extracting metadata from AWS headers, have to find a workaround through csv files
- Automating the file management was very difficult.
- Authentication Management:
- Ensuring secure and seamless student login while maintaining personalized context (department, semester).
- Handling session state in Streamlit for persistent user authentication.
- Ensuring secure and seamless student login while maintaining personalized context (department, semester).
- Metadata Consistency:
- Managing metadata (departments, semesters, tags) across uploaded files.
- Ensuring metadata updates are reflected accurately in both AWS S3 and the chatbot's retrieval system.
- Managing metadata (departments, semesters, tags) across uploaded files.
- Cortex Search Integration:
- Configuring Cortex Search to retrieve the most relevant document chunks based on student queries and context.
- Optimizing search performance for large datasets.
- Configuring Cortex Search to retrieve the most relevant document chunks based on student queries and context.
- LLM Response Quality:
- Fine-tuning Mistral LLM (e.g.,
mistral-7b) to generate accurate and contextually appropriate responses. - Balancing response generation speed with quality.
- Fine-tuning Mistral LLM (e.g.,
- File Management in S3:
- Handling large file uploads and deletions efficiently in AWS S3.
- Ensuring metadata CSV files are correctly linked to their corresponding PDFs.
- Handling large file uploads and deletions efficiently in AWS S3.
- Streamlit Performance:
- Managing real-time updates and interactions in the Streamlit app without performance bottlenecks.
- Ensuring a smooth user experience for both administrators and students.
## Accomplishments that we're proud of
- Managing real-time updates and interactions in the Streamlit app without performance bottlenecks.
- Managed to workaround many problems
- The fully fledged portal and the automation of file uploads and deletions was rewarding to make ## What we learned
- Got familiarized with Snoflake's environment
- Learned to use streamlit
- Learned SQL programming
- Learned how RAGs work ## What's next for University Chatbot
- Editing the metadata for files is not working for now
- Scaling, with larger pdfs
- The LLM can be fine-tuned more
Built With
- amazon-web-services
- python
- snowflake
- snowpark
- sql
- streamlit
Log in or sign up for Devpost to join the conversation.