-
-
checkpointing functionality for saving the added research papers maintaining the consistency after logout
-
information about the selected paper and its connections to the rest with confidence scores and extends functionality
-
chatbot feature powered by Google Gemini 3 specialized for the selected research paper
-
personalized graph for the research papers library
-
project layout
- About the project Like many of us in the AI field, my daily routine involves scrolling through X (Twitter) and getting bombarded with dozens of new, groundbreaking research papers. "Attention Is All You Need," "Llama 3," "Gemini 1.5"... the list never ends. Each paper is a PDF island—isolated, dense, and disconnected. I found myself drowning in a sea of PDFs, unable to quickly grasp which papers mattered, how they connected to what I already knew, and why they were important. I realized I didn't need just another "summarizer"; I needed a Detective. Someone to read the papers for me—including the charts and diagrams—uncover the "Villain" (the problem) and the "Hero" (the solution), and map the genealogy of ideas.
That's why I built AI Arch Detective.
What it does
AI Arch Detective transforms a static folder of PDFs into a living, breathing Genealogy Graph of AI history.
- Multimodal DNA Extraction: Using Gemini 1.5 Pro's multimodal capabilities, it doesn't just read the text; it analyzes figures, tables, and architectural diagrams to extract a "DNA Fingerprint." This identifies the core problem ("The Villain") and the proposed solution ("The Hero") with full visual context.
- Genealogy Mapping: It visualizes citations and relationships, showing you how ideas evolved from 2017's Transformers to today's Multimodal giants.
- Missing Link Discovery: It finds the "ghosts" in your library—important papers you don't have but should. It uses a cascading search engine (combining Semantic Scholar, CrossRef, and Google Search via Serper) to identify and fetch metadata for these missing bridge papers, automatically filling the gaps in your knowledge graph.
- Context-Aware Chat: You can interrogate your entire library. Ask "How does FlashAttention optimize the Transformer?" and get an answer synthesized from the specific papers in your graph.
- Checkpointed Personal Library:* After adding new papers and selecting the connected subgraph you want, you can checkpoint the graph state (git-like snapshots). When you sign in again, AI Arch Detective restores your saved library and lets you roll back to prior checkpoints.
How I built it (Gemini Power)
This project was only possible because of Gemini's unique capabilities:
- Multimodal Reasoning (Text + Vision): Research papers rely heavily on diagrams. I used Gemini's vision capabilities to "see" the charts and architecture diagrams, allowing the system to understand the visual evidence of a paper's claims, not just the abstract.
- Long-Context Understanding: I leveraged the massive context window to feed entire PDFs—text, captions, and references—into the model simultaneously. This allowed for holistic understanding of "Villain/Hero" narratives that span across sections.
- Hybrid Reasoning & Search: To solve the "Missing Link" problem, I built a hybrid agent that combines Gemini's reasoning (to parse citations) with real-time data from Semantic Scholar and Serper (Google Search). Gemini acts as the "judge," deciding which external results match the citation context.
Tech Stack:
- Frontend: React + React Flow for the interactive graph visualization.
- Backend: FastAPI + Python.
- Search: Semantic Scholar API + Serper.dev.
- AI: Google Gemini Pro (via API).
Challenges I ran into
- Hallucination in Citations: Early on, the model would sometimes invent relationships. I solved this by grounding the "Missing Link" discovery in real citation data (via Semantic Scholar) and using Gemini only to verify relevance, not to invent connections.
- Visualizing Complexity: Displaying hundreds of papers without overwhelming the user was tough. I implemented a physics-based force-directed graph that clusters papers by Domain (LLM, CV, Audio) and Year, creating an intuitive "Timeline of Innovation."
What I learned
Building AI Arch Detective taught me how to turn unstructured research artifacts into structured, queryable knowledge—especially when the truth of a paper lives in its diagrams as much as its text. It also reinforced how critical grounding and verification are when dealing with citation graphs and knowledge discovery.
What's next for AI Arch Detective
I plan to add Multi-Hop Reasoning—allowing the detective to answer questions like "Trace the evolution of 'Attention' from 2017 to 2024," generating a visual path through the graph. I also want to integrate Google Scholar real-time feeds so new papers land on your desk automatically.
AI Arch Detective isn't just a project; it's the tool I wish I had when I started my AI journey. Now, it's here to help everyone else.
Log in or sign up for Devpost to join the conversation.