Inspiration
The inspiration for this project came from a common frustration among students: the difficulty of sharing and accessing quality study materials. We noticed that students often struggle to find well-organized notes for their courses, and valuable handwritten notes are difficult to digitize and share. There was no centralized platform where students could collaborate, share, and discover educational content. We wanted to leverage OCR technology to bridge the gap between physical handwritten notes and digital accessibility. Our goal was to create a comprehensive note-sharing ecosystem that not only allows students to upload and share notes but also makes them searchable, discoverable, and accessible through modern AI-powered OCR technology with a beautiful, intuitive user interface.
What It Does
Gist is a full-stack, feature-complete note-sharing platform designed for students. Its core functionality includes:
- AI-Powered OCR: Users can upload images or PDFs of handwritten notes, and our platform uses Google Gemini AI to convert them into searchable, digital Markdown text, complete with LaTeX support for mathematical equations.
- Centralized Note Hub: It provides a central place for users to upload, store, and organize their study materials for various courses.
- Advanced Search and Discovery: Notes are easily discoverable through a powerful search engine with filters for tags, courses, and popularity.
- Smart Recommendations: A multi-strategy recommendation algorithm suggests relevant notes to users based on their course enrollment, social connections, and tag-based similarities.
- Social Collaboration: The platform includes social features, allowing users to follow others, comment on notes, leave reactions, and bookmark useful materials, creating a collaborative learning community.
- User Dashboards: Each user has a personal dashboard to view their uploaded notes, bookmarks, and engagement statistics.
How We Built It
We built Gist as a full-stack web application with a clear separation between the frontend and backend. The frontend, a single-page application, communicates with the backend via a RESTful API. The backend handles all business logic, including user authentication, database operations, file processing, and the core AI-powered OCR integration.
For the OCR pipeline, a user uploads a PDF or image file from the client-side interface. The backend receives this file, converts PDF pages into images if necessary, and then sends these images to the Google Gemini AI API for text extraction. The returned text, formatted in Markdown, is then saved to our database and associated with the user's note, making it immediately available and searchable on the platform.
Technology Stack
Frontend
- React 18: A modern UI library with hooks for building the user interface.
- React Router v6: Used for client-side routing.
- Axios: An HTTP client for making API calls to the backend.
- Tailwind CSS: A utility-first CSS framework for styling.
- React Markdown: A component for rendering Markdown content with LaTeX support.
- Vite: A fast build tool and development server.
Backend
- Flask: A lightweight Python web framework.
- Flask-RESTX: An extension for building RESTful APIs with automatic Swagger documentation.
- SQLAlchemy: An Object-Relational Mapper (ORM) for database interactions.
- PostgreSQL: A powerful, open-source relational database.
- JWT (JSON Web Tokens): Used for implementing secure user authentication.
AI and File Processing
- Google Gemini AI: The core AI service used for OCR and converting handwritten notes to Markdown.
- PyMuPDF (fitz): A Python library for converting PDF pages into images for processing.
- Pillow (PIL): A library for image processing tasks.
Development Process
Our development was structured in distinct phases over a 24-hour period:
Phase 1: Frontend Foundation (Hours 1-4): We began by setting up the React project using Vite, configuring routing with React Router, and establishing the basic folder structure. We built the UI for authentication (Login and Register pages) and created core reusable components like the navigation bar and note display cards.
Phase 2: Backend Infrastructure (Hours 4-7): Simultaneously, we set up the Flask application and designed the database schema with models for users, notes, courses, and tags. We implemented the JWT-based authentication system and built the foundational CRUD (Create, Read, Update, Delete) API endpoints for notes, complete with Swagger documentation.
Phase 3: OCR Integration (Hours 7-13): This phase focused on the core feature. We initially attempted to use a local ML model (MonkeyOCR) but pivoted to the Google Gemini API due to resource constraints. We built the backend service to handle file uploads, convert PDFs to images, and orchestrate the API calls to Gemini for text extraction. On the frontend, we developed the drag-and-drop file upload interface.
Phase 4: Feature Integration (Hours 13-18): With the core infrastructure in place, we connected the frontend and backend. We built out the user dashboard, the detailed note view page with Markdown rendering, the advanced search page, and user profile pages. On the backend, we implemented the systems for tagging, bookmarking, and tracking user engagement.
Phase 5: Smart Features and Polish (Hours 18-24): In the final phase, we developed the multi-strategy recommendation algorithm on the backend. We polished the frontend by implementing features like infinite scroll, debounced real-time search for better performance, and ensuring the entire application was mobile-responsive. We concluded with end-to-end integration testing to ensure a smooth user experience.
Challenges We Faced
OCR Model Resource Constraints: Our initial plan to use MonkeyOCR, a local model, failed because it required over 25GB of RAM, which was unfeasible. We solved this by pivoting to the cloud-based Google Gemini API, which eliminated local resource issues and provided higher accuracy.
CORS Configuration Issues: We encountered standard cross-origin request errors between our frontend and backend servers. This was resolved by properly configuring the Flask backend to allow requests from our frontend's specific origin.
Asynchronous Processing Feedback: Large file uploads and OCR processing could take time, making the UI appear frozen. We addressed this by implementing a progress bar for uploads on the frontend and creating loading states to provide clear feedback to the user while the backend processed the files.
Complex Database Relationships: Our application required several many-to-many relationships (e.g., users-to-followers, notes-to-tags, notes-to-bookmarks). We successfully managed this by carefully designing our database schema and using SQLAlchemy's association tables to handle these complex connections efficiently.
Inconsistent AI Responses: The Markdown formatting returned by the Gemini API was sometimes inconsistent. We built a robust parsing function on our backend with several fallbacks to correctly extract the content regardless of minor formatting variations.
Accomplishments We're Proud Of
We successfully built a feature-complete, full-stack note-sharing platform from scratch. We are particularly proud of:
- A Modern and Responsive UI: We created a clean, intuitive, and fully responsive user interface with over 15 reusable components that works seamlessly across desktops, tablets, and mobile devices.
- AI-Powered OCR Pipeline: We designed and implemented a robust end-to-end pipeline that takes user-uploaded handwritten notes and transforms them into accessible, searchable Markdown text using Google Gemini AI.
- Smart Recommendation Engine: We developed a multi-strategy recommendation algorithm that provides personalized and relevant content to users, enhancing note discovery.
- Comprehensive Backend API: We built a secure and scalable backend with over 15 RESTful API endpoints, complete with full Swagger documentation, JWT authentication, and advanced search capabilities.
- Full-Stack Integration: We achieved seamless communication between the React frontend and Flask backend, resulting in a polished application with real-time feedback, optimistic UI updates, and consistent error handling across the entire stack.
What We Learned
This project was an immense learning experience. We gained practical skills across the entire development stack:
- Frontend Development: We deepened our expertise in modern React development, including state management with Hooks and Context, building a single-page application with client-side routing, creating responsive layouts with Tailwind CSS, and integrating with a RESTful API using Axios with interceptors for authentication.
- Backend Architecture: We learned how to build a robust RESTful API with Flask, design a complex relational database schema, and implement secure, token-based authentication. A key takeaway was integrating a third-party AI service (Google Gemini) into our backend logic.
- Full-Stack Integration: We learned how to design a consistent API that serves a client-side application, manage an authentication flow from login to token-based requests, and implement a complex feature like a file upload pipeline from the browser all the way to backend processing.
- Algorithm Design: We designed and implemented a practical recommendation algorithm, learning how to weigh different factors like social connections, content similarity, and popularity to provide relevant suggestions. We also implemented performance optimizations like debouncing for our real-time search feature.
- Problem-Solving and Adaptability: One of the most critical lessons was how to adapt under pressure. When our initial local OCR model proved too resource-intensive, we quickly researched alternatives and pivoted to a cloud-based API. This taught us the importance of being flexible and making pragmatic technical decisions to overcome roadblocks.
What's Next for Gist
We believe Gist has the potential to grow into a much larger platform. Future enhancements we would love to add include:
- Real-time Collaboration: Implement collaborative editing on notes using WebSockets, allowing multiple users to work on the same study guide simultaneously.
- AI-Powered Study Tools: Extend the use of AI to automatically generate flashcards, summaries, or practice quizzes from uploaded note content.
- Mobile Applications: Develop native mobile apps for iOS and Android using React Native, allowing students to capture and upload notes directly from their phone cameras.
- LMS Integration: Integrate with popular Learning Management Systems like Canvas or Blackboard to automatically pull course information and sync notes.
- Advanced Analytics: Create a more advanced analytics dashboard for users to track their study habits and identify which notes are most effective for them and others.
- Note Versioning: Add a version history for notes, allowing users to track changes and revert to previous versions if needed.
Built With
- flask
- javascript
- python
- react
- tailwind


Log in or sign up for Devpost to join the conversation.