Inspiration
Analyzing large-scale review datasets can be challenging, especially when trying to uncover meaningful relationships between products and users. The idea for this project stemmed from the need to build an intelligent, graph-based system that could extract insights from Amazon Electronics reviews using AI-powered queries. By leveraging graph databases and network analysis, we aimed to create an efficient and scalable solution.
What it does
This project builds a bipartite graph using Amazon product reviews and enables AI-driven querying. Users can ask complex questions, such as identifying the most influential reviewers or uncovering hidden connections between products and consumers. The system integrates ArangoDB, NetworkX, cuGraph, and LangGraph to process and analyze review data efficiently.
How we built it
- Data Preprocessing: We downloaded and processed the Amazon Electronics review dataset, converting it into a structured CSV format.
- Graph Construction: Using NetworkX, we built a bipartite graph where:
- Products and users are represented as nodes.
- Reviews are represented as edges with attributes like review score and text.
- Graph Persistence: The graph was stored in ArangoDB for efficient querying.
- AI-Powered Queries: We integrated LangChain and LangGraph, allowing users to make natural language queries, which are processed through OpenAI’s GPT-4o-pro.
- Optimized Analysis: cuGraph was used for GPU-accelerated graph computations, enhancing performance for large-scale analysis.
Challenges we ran into
The LLM initially performed an AQL query to find the top 10 most influential users based on their reviews. It then generated NetworkX code to compute influence scores using PageRank. However, it failed to recognize that user nodes were referenced by keys (e.g., AmazonReviewsNode/51915) rather than just user IDs (51915). As a result, pagerank_scores.get(user, 0) returned incorrect values. Despite multiple debugging attempts, the LLM failed to generate an executable script. Additionally, the PageRank function was unexpectedly unrecognized in the generated code, even though it was properly defined. This issue remains unresolved and could be an area for future research and improvement.
Accomplishments that we're proud of
- Successfully integrated multiple technologies (ArangoDB, NetworkX, cuGraph, LangChain) into a cohesive system.
- Built an AI-powered agent that understands and processes complex graph-based queries.
- Optimized graph computations using GPU acceleration, significantly improving performance.
What we learned
- How to work with graph databases using ArangoDB and apply network analysis techniques.
- The power of AI-driven query systems in extracting meaningful insights from large datasets.
- Efficient data structuring and storage methods for large-scale review datasets.
- Optimizing graph processing using cuGraph for high-performance computing.
What's next for Exploring Agentic App with ArangoDB, cuGraph, and LangGraph
- Enhancing query capabilities with more advanced AI reasoning.
- Expanding the dataset to include more categories beyond Electronics.
- Implementing a visualization dashboard for better data exploration.
- Exploring real-time data updates to keep the graph dynamically updated

Log in or sign up for Devpost to join the conversation.