Inspiration

What it does

How we built it

Challenges we ran into

Inspiration

Being the developer in a modern data catalog company for three years has taught me a lot about data observability, and I experienced firsthand why data visibility across teams, across silos, is so important for key business decisions for data teams. So, definitely, a data lineage graph is one of the most important parts of any organization. I wanted to solve this million-dollar problem statement with some powerful graph analytics powered by agentic AI.

What it does

We have a data lineage graph, meaning a graph of all the data sources, like how a table is converted into a view, then gets consumed by a Tableau dashboard to show some critical business metrics. Think about a few use cases: some execs felt that something in the graph was wrong now in the team where there are millions, billions of data assets. How do you find the root cause without depending on other team/people roles? Simple: you look into the lineage graph for the dashboard. What is the source of it, and check if something is broken? So you can ask, “That, do a BFS and find me all the descendants,” and you know where to fix it. Now, let’s say you have a few assets which are most important and you have to maintain a high SLA for those sources. So, you analyze the graph to find the most important assets, and you will get it and focus on keeping them up all the time.

How we built it

It’s a bit hard to find the prebuilt data lineage graph as it doesn’t really exist publicly, so I created a synthetic data lineage graph, then loaded it into Arango Graph, then used NetworkX, NetworkX-Xugraph adapter to do the graph queries. And Langgraph agents’ code flow to make sure users’ needs translate into required graph queries.

What's next for LineageImpact

The next plan is to find a way, if possible, to extend these efforts to get used by enterprises.

Data lineage is one of the most important observability and governance tools for all the data teams, giving them superpowers to just find critical assets. Failure points is what we are working on.

Accomplishments that we're proud of

What we learned

What's next for LineageImpact

Built With

  • arangodb
  • arangograph
  • cugraph
  • langgraph
  • openai
Share this project:

Updates