The toshokAI Journey

Inspiration

The inspiration for toshokAI came from a common challenge in academic research - the difficulty in efficiently exploring and understanding research papers. As a researcher, I identified three key pain points:

  1. arXiv's minimalistic interface limits advanced exploration
  2. Reading and comprehending papers is time-intensive
  3. Lack of interactive ways to engage with research content

I saw an opportunity to leverage AI Large Language Models to create a more intuitive and intelligent way to interact with academic papers.

What it does

toshokAI transforms research paper interaction through:

  1. Smart Library Management:

    • Metadata and Semantic search
    • Intelligent paper recommendations based on library analysis
  2. Interactive Paper Chat:

    • Natural witty easy-to-understand conversations with paper content
    • Contextual Q&A
    • AI-generated insights and deep-dive questions

How I built it

My modern web application consists of:

  1. Frontend: Streamlit for clean, responsive UI
  2. Backend:
    • Snowflake for as backend databse
    • Multi-step RAG implementation with hybrid search
    • Smart recommendation system using LLM
  3. Document Processing:
    • Automated paper chunking
    • Context-aware chat system
    • Conversation memory management

Challenges and Solutions

  1. Document Processing: Optimized speed by using html version of the paper whenever possible
  2. RAG Quality: Improved through:
    • Refined chunking approaches
    • Enhanced prompt engineering
    • Optimized context retrieval
  3. User Experience: Balanced functionality with simplicity through iterative design
  4. Performance: Optimized LLM context, search pagination, and response generation

Key Achievements

  1. Intelligent Search: Semantic search with AI-generated relevance explanations
  2. Smart Discovery: LLM-powered recommendation system analyzing library patterns
  3. Advanced RAG: Multi-step system for comprehensive paper insights
  4. Smooth UX: Clean interface with clear operation feedback
  5. Robust Architecture: Efficient paper processing with maintained chat context

Future Development

  1. Enhanced Discovery:

    • Citation network analysis
    • Cross-paper relationship mapping
  2. Advanced Features:

    • Multi-paper synthesis
    • Automated literature reviews
    • Figure and equation understanding

toshokAI demonstrates the potential of combining modern AI with academic research tools, making research more efficient and insightful while maintaining an engaging user experience.

Built With

  • docling
  • langchain
  • python
  • snowflake
  • streamlit
Share this project:

Updates