Inspiration

My motivation for Cognito stemmed directly from the significant accessibility gaps and engagement challenges inherent in traditional research documents, particularly PDFs. The research paper, "A framework for improving the accessibility of research papers on arXiv.org," highlights that "The research content hosted by arXiv is not fully accessible to everyone due to disabilities and other barriers." It further states that a "significant proportion of people have reading and visual disabilities", with PDF formatting identified as the "biggest barrier" to accessing papers by survey respondents. Even with efforts to improve accessibility, the PDF format has "serious limitations" and "low native accessibility," proving challenging for individuals with blindness, low vision, dyslexia, and more. Furthermore, PDFs offer "poor mobile performance," with 65% of Americans finding mobile content consumption frustrating, and 45% stopping or not even trying to read documents on mobile.

Cognito addresses this critical need by transforming static, inaccessible PDFs into dynamic, engaging, and fully accessible HTML content. By leveraging advanced AI, we aim to overcome these barriers, ensuring that scientific knowledge and complex information are universally available and easily digestible, fulfilling mandates for equivalent access to federally funded research. The research paper confirms that "well formatted HTML will substantially increase accessibility to research" and that "screen readers are much more efficient when working with HTML."

What it does

Cognito is an application designed to revolutionize how users interact with research papers and dense documents. It provides:

  • PDF to HTML Conversion: Users can upload any PDF document, which Cognito automatically converts into a semantically structured HTML format. This is crucial as HTML "mitigates a wide range of access issues" and is preferred by assistive technology users.
  • AI-Powered Sanitization & Structuring: Leveraging the Gemini Pro and Gemini Flash models, Cognito intelligently sanitizes the raw HTML output, restructuring it for optimal readability and accessibility. This ensures proper headings, lists, and other semantic elements, directly addressing the PDF's lack of native semantic properties that hinder screen readers.
  • Engaging Content Transformation: Beyond simple conversion, Cognito utilizes AI to transform "boring PDFs" into engaging and digestible content formats, such as structured blog posts or summaries. This aims to counter the frustration users experience with dense academic text, making complex information accessible to a wider audience.
  • Vector Search: The application integrates advanced AI capabilities, including robust vector search using MongoDB, allowing users to quickly find related papers and and surface relevant information within and across documents.

How I built it

Technology Stack:

  • Core Conversion: We utilize pdf2html to perform the initial, robust conversion of PDF documents into HTML.
  • AI Backend: The heart of Cognito's intelligence is a streaming backend powered by the Gemini Pro model. This model is responsible for:
    1. Sanitizing and structuring the raw HTML output for accessibility and readability.
    2. Generating engaging summaries and creative content (e.g., blog posts) from the original text.
    3. Creating vector embeddings for efficient and accurate semantic search.
  • Vector Database: MongoDB is used for efficient storage and retrieval of vector embeddings, enabling powerful semantic search capabilities.
  • Backend Framework: The backend is built on Node.js, enabling high-performance and real-time processing of document transformations and AI interactions. The streaming architecture ensures a responsive user experience.

Key Implementation Details:

  • Modular Pipeline: The architecture is designed as a modular pipeline, allowing for seamless integration of PDF conversion, AI processing steps, and output formatting.
  • Semantic Enhancement: We focused heavily on ensuring the AI-processed HTML is rich in semantic markup, directly addressing the research paper's point that "screen readers rely on semantic markup... to correctly interpret content."
  • Efficient AI Integration: The Gemini Pro model is integrated to efficiently handle complex natural language processing tasks, from content summarization to re-structuring for clarity.

Challenges I ran into

  • PDF Parsing Nuances: Converting various PDF layouts (especially multi-column or heavily formatted academic papers) into clean, semantically correct HTML proved challenging. Ensuring accurate preservation of tables, figures, and equations required iterative refinement.
  • Maintaining Fidelity with AI Transformation: Balancing the AI's creative content generation (e.g., engaging summaries) with the need to maintain factual accuracy and core information from the original research was a delicate process.
  • Ensuring True Accessibility: Beyond just converting to HTML, truly making the content accessible for diverse needs (as highlighted by the research paper for users with visual impairments or dyslexia) required deep consideration of HTML structure and metadata for assistive technologies.

Accomplishments that I'm proud of

  • Revolutionizing Research Accessibility: I've successfully built a system that fundamentally transforms how complex research is consumed, directly tackling the "low levels of accessibility" in the vast majority of research papers today.
  • Enhanced Content Engagement: Cognito moves beyond mere data extraction to create truly engaging and digestible formats, making "boring PDFs" a thing of the past and significantly improving the "Read Research" user journey.
  • Powerful AI Integration: Seamlessly integrating Gemini Pro for complex content understanding, sanitization, and generation showcases a powerful application of modern AI in document processing.
  • User-Centric Design: Focus on converting to HTML directly addresses the preference of assistive technology users for HTML, as noted in the research paper ("HTML has a significant edge for researchers with disabilities: 'I prefer HTML versions. As an assistive tech user I find it much faster to navigate'").

What I've learned

  • The Critical Need for HTML in Research: The paper validated my initial hypothesis: PDF's limitations and HTML's benefits make the latter essential for true accessibility and modern digital consumption of research.
  • AI's Role in Semantic Enrichment: We learned that raw PDF-to-HTML conversion is insufficient; AI is crucial for adding the necessary semantic structure and clarity that PDFs inherently lack, making content genuinely machine-readable and accessible.
  • User Experience is Paramount: Transforming dense academic text into engaging formats is not just a convenience but a necessity for broader participation and comprehension, especially given the widespread frustration with mobile PDF consumption.
  • Complex Document Processing is a Grand Challenge: Building robust pipelines for parsing, transforming, and enhancing unstructured documents with AI is a significant but rewarding engineering challenge.

What's next for Cognito

  • Extracting Visual elements (Images/graphs): Automatically identify, extract, and convert key visual elements (like images and graphs) from PDFs into accessible formats (e.g., generating detailed alternative text descriptions for screen readers) or prepare them for interactive display.
  • Multi-Document Analysis: Expand capabilities to process and interlink insights from multiple research papers or a library of documents, enabling cross-document search and synthesis.
  • Interactive Data Visualization: Automatically extract data from tables and graphs within PDFs and convert them into interactive HTML visualizations, addressing the current difficulty with screen readers interpreting figures.
  • Personalized Learning Paths: Leverage AI to create tailored learning or reading paths based on a user's interests or knowledge gaps identified from the processed content.
Share this project:

Updates