Project Overview
Inspiration
The main inspiration behind this project came from the challenge of transitioning a company's outdated, paper-based system to a modern, efficient, and centralized digital database. The organization had significant communication issues due to departments being spread across different floors, with critical information being stored in physical files that were not easily accessible to everyone. The lack of a centralized repository for student, faculty, and fundraising data made it difficult for teams to collaborate effectively. This situation inspired us to explore technologies like OCR and GenAI for digitizing historical data and creating a relational database that would serve as the foundation for a more connected, data-driven decision-making platform.
What We Learned
During the course of the project, we learned several key lessons:
- Data Extraction with OCR: Leveraging OCR and GenAI was crucial in automating the extraction of relevant information from physical documents. This was a valuable learning experience in applying AI technologies to transform unstructured data into a structured format.
- Database Design: Designing a relational database schema was an interesting exercise. We carefully mapped out relationships between students, faculty, courses, and fundraising efforts to ensure optimal data organization.
- AI-Powered Querying: Integrating NLP-based querying for natural language data retrieval added an intelligent, user-friendly interface. This allowed us to provide insights without complex SQL queries.
How We Built It
The project was built in multiple phases:
- Data Extraction: We used **OCR (Optical Character Recognition) to digitize physical records and employed GenAI models to extract and clean the data. This ensured that we had structured information ready for storage.
- Database Design & Setup: We created a **relational database using PostgreSQL, which included tables for students, faculty, courses, and fundraising efforts. These tables were linked using primary and foreign keys to establish relationships.
- AI Querying: Using **GPT-based models, we integrated natural language querying to enable users to interact with the data intuitively. This made the system more user-friendly and accessible to non-technical users.
- Data Visualization: We integrated **Power BI and Tableau to create dashboards that visualize trends in fundraising, student performance, and faculty schedules. This made it easy for administrators to monitor key metrics.
Challenges Faced
We encountered a few significant challenges:
- Data Accuracy: Extracting accurate data using OCR from older, degraded physical files was challenging. Inconsistent formatting in the files required extensive preprocessing.
- Database Relationships: Designing an optimal schema to reflect the relationships between students, faculty, and fundraising efforts without causing redundancy or inefficiencies took careful planning.
- Performance Optimization: As the volume of data grew, we had to ensure that the system was optimized to handle complex queries efficiently. We also had to address latency issues when running AI-powered queries.
Despite these challenges, the final product was a powerful and efficient system that addressed both the communication issues and the inefficiencies caused by physical files, providing a centralized, AI-driven platform for managing all student, faculty, and fundraising data.
Log in or sign up for Devpost to join the conversation.