Easysql

Inspiration

EasySQL started from a very real pain at work.

In a weekly business review, we asked a simple question: "Why did net revenue drop for repeat customers in one region last month?"

We tried several open-source Text-to-SQL projects. They were impressive on toy datasets, but in our production schema they failed in predictable ways:

They missed multi-hop relationships and generated wrong JOIN paths.
They produced syntactically valid SQL that was semantically wrong for our metrics.
They could not use our local business knowledge.

That third point became our core insight: database schemas describe structure, not business truth.

If a company practices Domain-Driven Design (DDD), the missing business semantics already exist in code: domain services, policies, and application-layer rules. In the AI era, this DDD-layer code is not just implementation detail—it is strategic company IP.

So we built EasySQL to bridge that gap: from schema-level understanding to business-level understanding.

What it does

EasySQL is an enterprise Text-to-SQL system that combines:

Neo4j for relationship-aware schema reasoning
Milvus for semantic retrieval over tables/columns
Code-context retrieval for DDD business logic grounding
Few-shot memory for continuous adaptation
SQL Agent loop for iterative generation + validation + repair

Given a natural-language question, EasySQL retrieves relevant schema, expands join paths, injects business logic context, optionally adds similar historical examples, generates SQL, validates it, and returns execution-ready output.

The frontend exposes the full execution trace (schema retrieval, few-shot retrieval, context build, business-logic retrieval, SQL generation/validation), so users can see why a query was produced—not just the final SQL text.

How we built it

We built EasySQL as a modular, production-oriented stack:

Core Engine: Python 3.10+, Pydantic, SQLAlchemy
Agent Orchestration: LangGraph node pipeline
Data Services: Neo4j + Milvus
API Layer: FastAPI
Web UI: React + TypeScript
Primary Hackathon Model: Gemini 3 for planning, generation, and tool-calling SQL agent loops
Developer Workflow Tooling: Gemini CLI for rapid debugging, prompt iteration, and delivery acceleration

Implementation highlights:

Schema sync pipeline (python main.py): extracts metadata from source databases and writes graph + vector representations.
Hybrid retrieval: semantic table/column retrieval from Milvus + FK expansion and join-path reasoning from Neo4j.
DDD code-context retrieval: retrieves domain/business-layer snippets to provide business semantics that schemas do not encode.
Few-shot subsystem: users can save high-quality Q&A pairs and retrieve similar examples by vector similarity.
SQL Agent mode: tool-calling loop with mandatory SQL validation before returning final SQL.
Visualization recommendation: after SQL execution, the UI recommends chart plans and now includes a dynamic "chart thinking" state for better UX during suggestion generation.

Challenges we ran into

Enterprise schema complexity: hundreds of tables and naming inconsistency made naive retrieval unstable.
Business-semantic gap: many business terms (like "net revenue") are defined in code, not in schema comments.
Cross-model behavior differences: different LLM providers stream different content shapes.
- Example: Gemini may stream structured list content while OpenAI-compatible models often stream plain text.
- We hardened stream normalization to support both safely.
Gemini-first integration complexity: we optimized prompts and streaming handling specifically for Gemini 3 behavior in production-like traffic.
Provider reliability under load: we observed transient 503 UNAVAILABLE when Gemini was overloaded. This reinforced the need for resilient retries and multi-provider fallback.

These were not just bugs—they shaped our architecture decisions around robustness and observability.

Accomplishments that we're proud of

Built an end-to-end enterprise-ready Text-to-SQL workflow, not just a prompt demo.
Combined graph reasoning + vector retrieval + DDD code context in one coherent pipeline.
Added transparent node-level execution steps in UI, including separate few-shot stage visibility.
Implemented few-shot CRUD + retrieval loop to continuously improve domain adaptation.
Hardened SQL Agent streaming compatibility across providers (Gemini and OpenAI-compatible ecosystems).
Successfully built and demoed the core workflow using Gemini 3 as the primary model and Gemini CLI as our day-to-day engineering co-pilot.
Delivered a practical product demo dataset (examples/product) suitable for hackathon judging.

What we learned

Accuracy is a systems problem, not a single-model problem. Better SQL comes from retrieval quality, context composition, and validation discipline.
DDD code is AI gold. For real companies, business-layer code is the richest source of semantic truth.
Observability builds trust. Showing intermediate steps helps users verify and debug model behavior.
Provider heterogeneity is real. Production AI systems must normalize responses and be resilient to transient API failures.

What's next for Easysql

In the next phase, we plan to:

Add robust retry/backoff + automatic provider fallback for LLM overload scenarios.
Expand DDD code ingestion quality (better chunking, ranking, and policy-level extraction).
Improve benchmark coverage on complex multi-join enterprise queries.
Add governance features: query safety policies, audit trails, and role-aware guardrails.
Provide stronger self-serve onboarding for teams to plug in their own domain repositories.

Our long-term vision is simple:

EasySQL should not only answer questions about tables. It should answer questions the way your business actually defines them.