mdscholar

Inspiration

The inspiration for MDScholar came from the growing need to democratize academic research and make scholarly papers more accessible to researchers worldwide. With millions of academic papers published annually in PDF format, extracting and structuring this knowledge remains a significant challenge. We were inspired by the potential of AI-powered document processing to transform how researchers interact with academic literature - from manual reading to intelligent, structured data extraction that can power knowledge discovery, cross-referencing, and automated research synthesis.

What it does

MDScholar Backend is a GPU-accelerated document processing microservice that converts academic papers and complex documents into structured, searchable markdown format. The system:

Accepts document uploads via a REST API with email notification system
Processes PDFs using AI-powered Docling with CUDA acceleration for enhanced performance
Extracts structured content including text, tables, images, and document metadata
Converts to clean markdown with preserved formatting and reading order
Generates structured topics using OpenAI integration for intelligent content categorization
Stores processed data in Supabase database for persistent storage and retrieval
Provides asynchronous processing with Redis-based task queuing for scalability
Delivers results via email notifications and API endpoints

The system specializes in academic paper processing, extracting key sections like abstracts, methodologies, results, and conclusions while maintaining document structure and relationships.

How we built it

We built MDScholar using a modern, cloud-native architecture with the following technology stack:

Backend Framework:

FastAPI for high-performance REST API development
Python 3.12 with uv package manager for ultra-fast dependency management
Pydantic for robust data validation and serialization

AI/ML Processing:

Docling as the core document processing engine with GPU acceleration
CUDA and Flash Attention 2 for optimized AI model performance
OpenAI API integration for intelligent content structuring and topic generation
DocLayNet and TableFormer models for layout analysis and table structure recognition

Infrastructure & Scalability:

Celery distributed task queue for background processing
Redis as message broker and result backend
Docker containerization with GPU runtime support
Docker Compose for multi-service orchestration

Data & Notifications:

Supabase for persistent data storage
Mailgun API for reliable email notifications
Structured JSON/Markdown output formats

Development & Deployment:

Hot-reload development environment with file watching
NVIDIA Docker for GPU acceleration
Health checks and monitoring for service reliability
Cloudflare tunneling for secure external access

The architecture follows microservices principles with clear separation between API handling, background processing, and data storage layers.

Challenges we ran into

Technical Integration Challenges:

GPU Memory Management: Optimizing CUDA acceleration for document processing while preventing memory leaks in containerized environments
Async Processing Complexity: Implementing reliable task queuing with proper error handling and recovery mechanisms
Docling Configuration: Fine-tuning AI model pipeline settings for optimal performance across diverse document formats
Docker GPU Access: Configuring NVIDIA Docker runtime and ensuring GPU resources are properly allocated to Celery workers

Architecture & Scalability:

Resource Optimization: Balancing processing speed with memory usage for large document batches
Error Handling: Implementing comprehensive error handling across the entire pipeline from upload to notification
Task State Management: Providing real-time updates on processing status while maintaining system reliability
Security Considerations: Addressing potential SSRF vulnerabilities in URL-based document processing

AI/ML Model Integration:

Model Performance: Optimizing AI model inference speed while maintaining accuracy
Content Structuring: Developing effective prompts and parsing logic for intelligent topic extraction
Language Processing: Handling diverse document formats and ensuring consistent markdown output quality

Accomplishments that we're proud of

Technical Excellence:

Successfully implemented GPU-accelerated document processing with sub-second page processing speeds
Achieved seamless AI model integration with Docling, TableFormer, and OpenAI APIs
Built a production-ready microservices architecture with proper containerization and orchestration
Implemented robust async processing with Redis queuing and Celery workers

Innovation & Impact:

Created an intelligent content structuring system that transforms raw PDFs into organized, searchable markdown
Developed end-to-end automation from document upload to processed results delivery
Integrated multiple AI technologies (document AI, layout analysis, language models) into a cohesive system
Built scalable infrastructure capable of handling multiple concurrent document processing tasks

Code Quality & Best Practices:

Maintained clean, well-documented codebase with comprehensive type hints and error handling
Implemented proper dependency management using modern Python tooling (uv, pyproject.toml)
Created extensive testing notebooks for validating different system components
Followed security best practices with environment variable management and input validation

Real-World Application:

Delivered a complete working system that processes real academic papers with high accuracy
Achieved reliable email notifications and status tracking for user experience
Created persistent data storage with structured database schema for future applications

What we learned

AI/ML Integration:

GPU acceleration significantly improves processing speed - achieved 2-6 second table processing times vs. much slower CPU-only processing
Model pipeline optimization is crucial - proper configuration of Flash Attention 2 and CUDA settings dramatically impacts performance
Document AI requires careful tuning - different document types need specific processing parameters for optimal results
Structured prompting enhances AI output quality - well-designed prompts for content extraction produce more consistent results

System Architecture:

Microservices architecture provides excellent scalability - separate services for API, processing, and storage allow independent scaling
Async processing is essential for user experience - background tasks prevent API blocking and enable better resource utilization
Container orchestration simplifies deployment - Docker Compose makes complex multi-service applications manageable
Proper error handling is critical - comprehensive error handling across the entire pipeline prevents system failures

Development Best Practices:

Modern Python tooling improves development velocity - uv package manager and FastAPI enable rapid development
Type hints and validation prevent runtime errors - Pydantic models catch issues early in development
Comprehensive logging aids debugging - proper logging throughout the system simplifies troubleshooting
Testing with real data reveals edge cases - processing actual academic papers uncovered handling requirements not apparent with synthetic data

Technical Insights:

GPU memory management requires careful attention - preventing memory leaks in long-running processes is crucial
Database design impacts query performance - proper schema design for JSON storage in Supabase affects retrieval speed
API design affects user adoption - clear response formats and status tracking improve developer experience

What's next for MDScholar

Enhanced AI Capabilities:

Multi-language support for processing international academic papers
Advanced semantic analysis for automatic literature review generation
Citation network analysis to map relationships between papers
Research trend identification using processed paper collections
Custom model fine-tuning for domain-specific document types

Platform Expansion:

Web interface development for direct user interaction beyond API
Batch processing capabilities for handling large document collections
Integration with academic databases (arXiv, PubMed, IEEE Xplore) for automatic paper ingestion
Collaboration features for research teams to share and annotate processed documents
Mobile app development for on-the-go academic research

Advanced Features:

Real-time collaborative editing of processed markdown content
Automated summarization and abstract generation
Cross-reference resolution and bibliography management
Figure and equation extraction with LaTeX conversion
Research methodology extraction for systematic reviews

Infrastructure & Scalability:

Kubernetes deployment for production-scale orchestration
Advanced monitoring and analytics for performance optimization
Multi-region deployment for global accessibility
CDN integration for faster document delivery
Auto-scaling capabilities based on processing demand

Research & Academic Integration:

University partnerships for large-scale deployment
API marketplace for third-party integrations
Open dataset creation from processed academic papers
Research collaboration platform connecting researchers worldwide
Academic workflow integration with tools like Zotero, Mendeley, and Overleaf