Research Flow Agent

Implementation Flowchart
Application Frontend Home page

Inspiration

Every researcher knows the struggle — spending endless hours reading papers, only to lose track of how ideas connect.
We realized that traditional keyword searches can’t explain why methods differ or how research ideas evolve over time.
With the exponential growth of scientific publications (e.g., thousands of new papers on arXiv every week), traditional keyword search and manual reading are no longer scalable.

That frustration sparked our idea for Research Flow Agent — an AI that thinks like a researcher.
Instead of just summarizing papers, it understands how discoveries build on each other and explains those connections clearly and factually.

We built Research Flow Agent to read, reason, and write — just like a human researcher.

Our intelligent assistant that can:

Automatically ingest and understand papers,
Map relationships among them, and
Generate human-readable summaries and literature flows that are factual, organized, and explainable.

How We Built It

Our process followed a Retrieve \(\rightarrow\) Parse \(\rightarrow\) Graph \(\rightarrow\) Reason \(\rightarrow\) Generate \(\rightarrow\) Visualize pipeline:

Data Ingestion: Used the Semantic Scholar API to collect research metadata and Gemini 2.5 Pro to extract key concepts, methods, and results from PDFs.
Knowledge Graph: Built a Neo4j graph database (Docker-hosted) to link papers, datasets, and citations — creating a living map of research connections.
Semantic Retrieval: Integrated Vertex AI Search and Embeddings to enable contextual discovery beyond simple keyword search.
Reasoning & Generation: Leveraged Gemini 2.5 Pro to compare methods, find patterns, and generate structured, citation-backed literature reviews.
Visualization & UI: Developed a React + Flask interface, allowing users to upload papers, explore graphs, and see how ideas evolve all in one place.

Challenges We Faced

Building something that thinks like a researcher wasn’t easy.

Parsing research PDFs was messy formats varied widely.
Normalizing similar entities (like “BERT-base” vs “BERT”) took careful effort.
Neo4j had to remain fast and scalable even with thousands of connections.
Ensuring factual accuracy was critical every AI-generated statement had to trace back to a real citation.
Integrating Gemini, Neo4j, and Vertex AI seamlessly required significant orchestration.

Accomplishments We’re Proud Of

Built a automated pipeline from raw PDFs to building Graphs on neo4j.
Designed an interactive Neo4j knowledge graph that visually maps how research ideas evolve.
Combined reasoning + retrieval + graph intelligence in one cohesive system.
Github the entire setup for flexible local and cloud deployment.
Developed a intuitive front-end using React.js and complemented with Flask for backend APIs.

What We Learned

True understanding comes from blending symbolic graphs with neural reasoning (LLMs).
Explainability is essential, researchers must be able to trace every insight.
Ontology design and data quality determine how accurate and useful the graph becomes.
Building such an agent taught us the power of cross-functional collaboration between AI, backend, and visualization engineers.

Future Scope

We’re just getting started!

Data Integration: Incorporate Google Scholar and publisher APIs for richer coverage.
Collaboration Tools: Add annotation, teamwork, and version-tracking features for researchers.
Multimodal Output: Generate visual and video explainers using.
Analytics Dashboard: Build real-time trend and impact analysis powered by Neo4j.
Open-Source Launch: Release our modular framework to empower the research community.