Inspiration
While working on a college research paper, I realized how tedious it is to create the first draft. Existing LLMs often fail to provide tailored, relevant research paper content, and manually searching for and concluding multiple papers is time-consuming.
What It Does
This tool:
Finds 5 relevant research papers on a given topic. Summarizes the key points of those papers. Generates a basic research paper draft using provided input and adds relevant insights from the selected research papers. Highlights and mentions all additional information incorporated from the papers.
How We Built It
Research Paper Retrieval: Used the ArXiv API to gather research papers based on a specific topic. Text Extraction: Leveraged pdfminer to extract text from PDFs. Data Structuring: Employed LayoutLM tokenizers and regex to extract and organize important sections (e.g., abstract, introduction) of research papers into a structured JSON format. Data Storage: Stored extracted information in an SQL table on Snowflake, converting titles and abstracts into vector embeddings. Search and Retrieval: Utilized Cortex Search to find relevant papers efficiently. Text Generation: Sent extracted paper content to an LLM in structured parts for assimilation and generation of research paper drafts. Frontend: Built a user-friendly interface using Streamlit.
Challenges We Ran Into
Extracting text accurately from research papers while maintaining structure and key details. Condensing content to save storage without losing meaningful information. Crafting effective prompts and developing a looping mechanism for LLMs to handle extensive tokens. Generating well-structured research papers that maintain proper flow and length.
Accomplishments That We're Proud Of
Successfully retrieving and summarizing relevant research papers. High quality of the generated research paper drafts. Maintaining research papers' structural integrity enables effective knowledge extraction from individual sections like abstracts and methods.
What We Learned
Advanced text extraction and structuring techniques for research papers. Effective prompt engineering and token handling for large-scale LLM use cases. Streamlining workflows for data retrieval, storage, and generation in an end-to-end system.
What's Next for Research_RAG
Enhanced Paper Display: Allow users to view the complete research papers downloaded, with proper structure and format. Rich Media Extraction: Improve the ability to extract and preserve images, tables, and formulas without losing their quality or structure. Better Data Extraction: Further refine the text extraction process to improve accuracy and completeness. User Interaction Features: Enable users to interactively refine and customize research paper drafts based on their preferences.
Log in or sign up for Devpost to join the conversation.