Inspiration

The inspiration for MDScholar came from the growing need to democratize academic research and make scholarly papers more accessible to researchers worldwide. With millions of academic papers published annually in PDF format, extracting and structuring this knowledge remains a significant challenge. We were inspired by the potential of AI-powered document processing to transform how researchers interact with academic literature - from manual reading to intelligent, structured data extraction that can power knowledge discovery, cross-referencing, and automated research synthesis.

What it does

MDScholar Backend is a GPU-accelerated document processing microservice that converts academic papers and complex documents into structured, searchable markdown format. The system:

  • Accepts document uploads via a REST API with email notification system
  • Processes PDFs using AI-powered Docling with CUDA acceleration for enhanced performance
  • Extracts structured content including text, tables, images, and document metadata
  • Converts to clean markdown with preserved formatting and reading order
  • Generates structured topics using OpenAI integration for intelligent content categorization
  • Stores processed data in Supabase database for persistent storage and retrieval
  • Provides asynchronous processing with Redis-based task queuing for scalability
  • Delivers results via email notifications and API endpoints

The system specializes in academic paper processing, extracting key sections like abstracts, methodologies, results, and conclusions while maintaining document structure and relationships.

How we built it

We built MDScholar using a modern, cloud-native architecture with the following technology stack:

Backend Framework:

  • FastAPI for high-performance REST API development
  • Python 3.12 with uv package manager for ultra-fast dependency management
  • Pydantic for robust data validation and serialization

AI/ML Processing:

  • Docling as the core document processing engine with GPU acceleration
  • CUDA and Flash Attention 2 for optimized AI model performance
  • OpenAI API integration for intelligent content structuring and topic generation
  • DocLayNet and TableFormer models for layout analysis and table structure recognition

Infrastructure & Scalability:

  • Celery distributed task queue for background processing
  • Redis as message broker and result backend
  • Docker containerization with GPU runtime support
  • Docker Compose for multi-service orchestration

Data & Notifications:

  • Supabase for persistent data storage
  • Mailgun API for reliable email notifications
  • Structured JSON/Markdown output formats

Development & Deployment:

  • Hot-reload development environment with file watching
  • NVIDIA Docker for GPU acceleration
  • Health checks and monitoring for service reliability
  • Cloudflare tunneling for secure external access

The architecture follows microservices principles with clear separation between API handling, background processing, and data storage layers.

Challenges we ran into

Technical Integration Challenges:

  • GPU Memory Management: Optimizing CUDA acceleration for document processing while preventing memory leaks in containerized environments
  • Async Processing Complexity: Implementing reliable task queuing with proper error handling and recovery mechanisms
  • Docling Configuration: Fine-tuning AI model pipeline settings for optimal performance across diverse document formats
  • Docker GPU Access: Configuring NVIDIA Docker runtime and ensuring GPU resources are properly allocated to Celery workers

Architecture & Scalability:

  • Resource Optimization: Balancing processing speed with memory usage for large document batches
  • Error Handling: Implementing comprehensive error handling across the entire pipeline from upload to notification
  • Task State Management: Providing real-time updates on processing status while maintaining system reliability
  • Security Considerations: Addressing potential SSRF vulnerabilities in URL-based document processing

AI/ML Model Integration:

  • Model Performance: Optimizing AI model inference speed while maintaining accuracy
  • Content Structuring: Developing effective prompts and parsing logic for intelligent topic extraction
  • Language Processing: Handling diverse document formats and ensuring consistent markdown output quality

Accomplishments that we're proud of

Technical Excellence:

  • Successfully implemented GPU-accelerated document processing with sub-second page processing speeds
  • Achieved seamless AI model integration with Docling, TableFormer, and OpenAI APIs
  • Built a production-ready microservices architecture with proper containerization and orchestration
  • Implemented robust async processing with Redis queuing and Celery workers

Innovation & Impact:

  • Created an intelligent content structuring system that transforms raw PDFs into organized, searchable markdown
  • Developed end-to-end automation from document upload to processed results delivery
  • Integrated multiple AI technologies (document AI, layout analysis, language models) into a cohesive system
  • Built scalable infrastructure capable of handling multiple concurrent document processing tasks

Code Quality & Best Practices:

  • Maintained clean, well-documented codebase with comprehensive type hints and error handling
  • Implemented proper dependency management using modern Python tooling (uv, pyproject.toml)
  • Created extensive testing notebooks for validating different system components
  • Followed security best practices with environment variable management and input validation

Real-World Application:

  • Delivered a complete working system that processes real academic papers with high accuracy
  • Achieved reliable email notifications and status tracking for user experience
  • Created persistent data storage with structured database schema for future applications

What we learned

AI/ML Integration:

  • GPU acceleration significantly improves processing speed - achieved 2-6 second table processing times vs. much slower CPU-only processing
  • Model pipeline optimization is crucial - proper configuration of Flash Attention 2 and CUDA settings dramatically impacts performance
  • Document AI requires careful tuning - different document types need specific processing parameters for optimal results
  • Structured prompting enhances AI output quality - well-designed prompts for content extraction produce more consistent results

System Architecture:

  • Microservices architecture provides excellent scalability - separate services for API, processing, and storage allow independent scaling
  • Async processing is essential for user experience - background tasks prevent API blocking and enable better resource utilization
  • Container orchestration simplifies deployment - Docker Compose makes complex multi-service applications manageable
  • Proper error handling is critical - comprehensive error handling across the entire pipeline prevents system failures

Development Best Practices:

  • Modern Python tooling improves development velocity - uv package manager and FastAPI enable rapid development
  • Type hints and validation prevent runtime errors - Pydantic models catch issues early in development
  • Comprehensive logging aids debugging - proper logging throughout the system simplifies troubleshooting
  • Testing with real data reveals edge cases - processing actual academic papers uncovered handling requirements not apparent with synthetic data

Technical Insights:

  • GPU memory management requires careful attention - preventing memory leaks in long-running processes is crucial
  • Database design impacts query performance - proper schema design for JSON storage in Supabase affects retrieval speed
  • API design affects user adoption - clear response formats and status tracking improve developer experience

What's next for MDScholar

Enhanced AI Capabilities:

  • Multi-language support for processing international academic papers
  • Advanced semantic analysis for automatic literature review generation
  • Citation network analysis to map relationships between papers
  • Research trend identification using processed paper collections
  • Custom model fine-tuning for domain-specific document types

Platform Expansion:

  • Web interface development for direct user interaction beyond API
  • Batch processing capabilities for handling large document collections
  • Integration with academic databases (arXiv, PubMed, IEEE Xplore) for automatic paper ingestion
  • Collaboration features for research teams to share and annotate processed documents
  • Mobile app development for on-the-go academic research

Advanced Features:

  • Real-time collaborative editing of processed markdown content
  • Automated summarization and abstract generation
  • Cross-reference resolution and bibliography management
  • Figure and equation extraction with LaTeX conversion
  • Research methodology extraction for systematic reviews

Infrastructure & Scalability:

  • Kubernetes deployment for production-scale orchestration
  • Advanced monitoring and analytics for performance optimization
  • Multi-region deployment for global accessibility
  • CDN integration for faster document delivery
  • Auto-scaling capabilities based on processing demand

Research & Academic Integration:

  • University partnerships for large-scale deployment
  • API marketplace for third-party integrations
  • Open dataset creation from processed academic papers
  • Research collaboration platform connecting researchers worldwide
  • Academic workflow integration with tools like Zotero, Mendeley, and Overleaf

Built With

Share this project:

Updates