Gearguide

The application is hosted on GearGuide

Github Link: https://github.com/Anusha-raju/GearGuidev2.git

Gear Guide: A Graph-Powered AI Chatbot for Smarter Troubleshooting

The Problem:

Modern car owners often face technical issues, whether it's "AC not cooling" or "headlights flickering” but traditional troubleshooting tools and chatbots fall short. They struggle with maintaining context, handling ambiguous symptoms, and offering precise repair instructions. The experience is frustrating: repetitive conversations, generic answers, and no memory of what was said just moments ago.

The Spark:

Our idea for Gear Guide, an AI-powered RAG (Retrieval-Augmented Generation) chatbot, was born out of this gap. We envisioned a smarter support assistant, one that understands vehicle systems, remembers user queries, and guides them through structured, contextual answers. By blending OpenAI’s embeddings with Neo4j’s graph-based knowledge and a hybrid retrieval strategy, we designed a chatbot that behaves less like a script and more like a skilled mechanic’s assistant.

How It Works: Under the Hood of Gear Guide

Gear Guide combines the best of semantic search and knowledge graphs:

A Rich, Structured Knowledge Base We modeled technical car data as a graph in Neo4j, organizing relationships between components, symptoms, procedures, and problems. For example:

a. "AC not cooling" connects to symptoms like "low airflow" and tests like "compressor pressure check".

b. "Remote key not working" links to "keyless entry malfunction", a common user concern.
Data Ingestion via LLM (Large Language Models) Manuals and troubleshooting documents were converted into structured XML using a custom PDF-to-Graph pipeline powered by GPT. Tags like <symptom>, <procedure>, and <test> were extracted to feed into the Neo4j graph—no manual curation needed.
Hybrid Retrieval (Dense + Sparse)

a. Dense Retrieval uses OpenAI’s embeddings + Neo4j vector index for semantic matches.

b. Sparse Retrieval uses full-text Lucene search to capture exact keyword hits.

c. A Hybrid Strategy reranks results using a weighted score (hybrid_score = α * dense + (1-α) * sparse), ensuring the best of both worlds: relevance and accuracy.
Multi-Turn Conversations Every query goes through rephrasing and context injection, allowing the system to maintain continuity in chat sessions—an often-missing feature in traditional bots.
Interactive UI The frontend is built with Flask, providing a responsive interface where users can ask natural questions like:

a. “Why isn’t my AC working?”

b. “How do I replace the bumper on a Yaris?”

c. “What could cause dim headlights after engine start?”

Why It Matters

Gear Guide isn’t just a chatbot—it’s a structured AI assistant that thinks in graphs.

By linking symptoms, problems, and test procedures with real-world semantics, we provide users with a guided diagnostic experience. Our hybrid search approach outperforms both keyword-based and purely semantic systems, delivering answers that are contextually rich and technically grounded.

Key Impact

Smarter Repairs: Faster and more accurate troubleshooting for drivers and technicians
Context-Aware AI: True multi-turn memory and relevance optimization
Hybrid Retrieval: Combines the power of semantic understanding and exact match precision
Graph-Powered Insights: Every problem is navigated through connected components and causes

Full Technical Document

Graph Schema Overview

Schema Overview

Graph Schema Overview

Node Types

Node Label	Description
ProductGroup	Vehicle categories
Manufacturer	Car maker (e.g., Toyota)
Model	Car model (e.g., Yaris)
Component	Part of the vehicle
Problem	Issues faced
Symptom	Observable signs
Procedures	Repair/installation steps
BasicInfo	General car information
SubComponent	Detailed internal components
AdditionalInfo	Notes, warnings, extra data
SuspectArea	Potential faulty regions
TestProcedures	Diagnostic steps

🔗 Relationships

Relationship Name	Description
HAS_MODEL	ProductGroup → Model
MANUFACTURED_BY	Model → Manufacturer
HAS_COMPONENT	Model → Component
HAS_PROCEDURES	Component → Procedures
HAS_SYMPTOM	Problem → Symptom
HAS_SUBPROBLEM	Problem → Sub-problem (hierarchical)
HAS_ADDITIONALINFO	Component/Problem → AdditionalInfo
HAS_SUSPECTAREA	Problem → SuspectArea
HAS_BASICINFO	Component → BasicInfo
HAS_SUBCOMPONENT	Component → SubComponent
HAS_TESTPROCEDURES	Component → TestProcedures

PDF → Structured XML via LLM

Script: `pdf_extraction.py`

Extracts raw text from manuals
Breaks long text into chunks
Uses GPT to output structured data with tags like:

  `<problem>ac not cooling</problem>`
  `<symptom>weak airflow</symptom>`
  `<test><name>compressor test</name><procedure>Check pressure</procedure></test>`

- Tags are flexible; non-standard tags are stored under `<additional_info>`

Data Ingestion Flow

Script Logic

Function: parse_and_insert_data(xml_content, component_name)

Cleans XML from LLM output
Maps each XML tag to Neo4j label via tag_to_label_map
Creates and links graph nodes using Cypher queries
Handles nested problems and long lists efficiently
Uses APOC for conditional logic in Cypher (dynamic merging)

Embedding-Based Dense Retrieval

Function: get_openai_embedding

Embedding Model: text-embedding-3-small
Converts user queries into dense vectors

Function: vector_search(query_vector, node_label, top_k=10, threshold=0.7)

Vector Index Name: vectorIndex_<NodeLabel>
Cypher Syntax: cypher CALL db.index.vector.queryNodes($index_name, $top_k, $query_vector)

Sparse Retrieval

Function: sparse_search(user_query, node_label)
Uses full-text search index:

CALL db.index.fulltext.queryNodes('search_<Label>', $query) YIELD node, score
RETURN node.name, score

Hybrid Retriever

Function: Use ThreadPoolExecutor to parallelize vector and full-text search
Combines dense and sparse scores:

hybrid_score = alpha * dense_score + (1 - alpha) * sparse_score

Create retrieve_data() to unify both retrieval strategies
Normalize scores before combining
Filter by top-k if needed
Merge and deduplicate results by node name

User Flow Diagram

[User Query]
     ↓
[Rephrasing with LLM (if chat history exists)]
     ↓
[Embedding Generation using OpenAI]
     ↓
┌────────────────────┐       ┌────────────────────┐
│ Dense Search        │     │ Sparse Search       │
│ (Neo4j Vector Index)│     │ (Fulltext Lucene)   │
└──────────┬──────────┘     └──────────┬──────────┘
           ↓                           ↓
     [Hybrid Merge & Rerank Scores (alpha-weighted)]
                       ↓
         [Deduplication & Top-K Selection]
                       ↓
       [Graph Expansion (Related Nodes)]
                       ↓
         [Final LLM Response Generation]

Sample User Queries & Results

User Query	Top Match Node	Node Type	Score
"AC not working properly"	"ac system not cooling"	Problem	0.89
"Car remote doesn't lock"	"keyless entry malfunction"	Problem	0.86
"Headlights flickering at night"	"check headlamp fuse"	Procedures	0.81
"Steering makes noise when turning"	"steering fluid low"	Symptom	0.83
"What's the cause of engine overheating?"	"radiator coolant leak"	SuspectArea	0.84

Visual Graph Snapshot – Example Traversal Path

Query: "AC not cooling"

AC Graph Schema Overview

Traversal Path (Graph View):

(:Component {name: "air conditioning"}) 
   └──[:HAS_PROBLEM]──> (:Problem {name: "ac not cooling"})
       └──[:HAS_SYMPTOM]──> (:Symptom {name: "low airflow"})
       └──[:HAS_PROCEDURES]──> (:Procedures {name: "check compressor pressure"})

Comparative Outcomes – Dense vs Sparse vs Hybrid

Query: "Remote key not working"

Retrieval Mode	Top Match	Type	Score	Notes
Sparse	"key" → "lock" → "battery"	Problem	0.65	Keyword match, misses context
Dense	"keyless entry not responding"	Problem	0.86	Captures intent, loses exact token
Hybrid	"keyless entry malfunction"	Problem	0.92	Best of both, exact + semantic

Conclusion: Hybrid retrieval improved both relevance and clarity. It correctly inferred "remote key" refers to "keyless entry".

Real-World Query Examples (for Demo/Docs)

Query	Expected Top Match	Node Type	Why it's relatable
"AC not working"	"ac system not cooling"	Problem	Common summer issue
"Car won’t lock with remote"	"keyless entry malfunction"	Problem	Frequent user complaint
"Headlights dim after engine starts"	"battery voltage drop"	Symptom	Electrical fault diagnosis
"Steering hard to turn"	"low power steering fluid"	SuspectArea	Linked to hydraulic issue
"How to remove bumper?"	"bumper removal procedure"	Procedures	Practical DIY fix

Tip: Capture real user-like phrasing to test the robustness of your retrieval system.

Environment Configuration

Key	Purpose
`NEO4J_URI`	Neo4j DB connection URI
`NEO4J_USERNAME`	Neo4j DB Username
`NEO4J_PASSWORD`	Neo4j DB Password
`OPENAI_API_KEY`	LLM + Embedding access
`EMBEDDING_MODEL`	OpenAI model (default: text-embedding-3-small)
`model`	Chat model (default: gpt-4o)
`alpha`	Weight for hybrid scoring
`top_k`	Result cutoff
`threshold`	Similarity threshold
`SECRET_KEY`	Secret key for the sql database
`SQLALCHEMY_DATABASE_URI`	Database connection server

Function Reference

Function Name	Description
`extract_pdf_content`	Read PDF into plain text
`structure_content_with_llm`	Generate structured XML tags
`parse_and_insert_data`	Create nodes and relationships in Neo4j
`get_openai_embedding`	Generate dense vector from text
`vector_search`	Perform ANN vector search
`fulltext_search`	Run Lucene query on text fields
`hybrid_search`	Merge dense + sparse results
`process_top_nodes`	Expand retrieved node context
`final_call`	Generate final LLM response
`rag_advisor`	Full pipeline: input → answer

Future Vision

Gear Guide began as a proof-of-concept for smarter troubleshooting using RAG and graph-based reasoning, but its potential goes far beyond automotive diagnostics.
Here's where we see it heading:

Multi-Domain Diagnostic Assistant

Extend beyond vehicles into other verticals where structured troubleshooting is critical:

a. Consumer electronics (e.g., phone won't charge → battery port → test cable → reset)

b. Medical equipment diagnostics for technicians

c. Smart home systems (e.g., thermostat issues, smart lock malfunctions)
Continual Knowledge Ingestion

Automate ingestion of new manuals, guides, and repair logs in real-time using:

a. Fine-tuned LLMs for multi-lingual manuals

b. Auto-tagging pipelines with confidence thresholds

c. Crowdsourced updates from technician feedback
Voice-Enabled Mobile Companion

Build a mobile app where users can:

a. Speak their car issues out loud ("my engine's rattling at startup")

b. Get step-by-step procedures

c. Use AR overlays (e.g., guide you to open the hood, check the dipstick)
Closed-Loop Learning System

a. Track which troubleshooting paths led to successful resolutions

b. Reinforce the knowledge graph based on real-world usage

c. Use reinforcement learning or fine-tuning to improve LLM grounding
B2B Integration with Repair Shops

Partner with:

a. Automotive repair shops for technician-facing AI tools

b. Manufacturers for embedding Gear Guide in owner apps, onboard systems, or support portals.

Contributors:

Anusha Umashankar | LinkedIn | Email: anusha.u.raju@gmail.com

Dinesh Karnati | LinkedIn | Email: dineshkarnati510@gmail.com

Built With

amazon-web-services
chatbot
flask
graphrag
neo4j
openai
python
vector-embeddings
web-application

Updates

Anusha Umashankar started this project — May 29, 2025 11:57 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.