Inspiration

Tech teams spend days or weeks setting up data pipelines that should take minutes. 80% of backend developers end up doing pipeline work as a side task, lacking dedicated data engineering support. Current solutions are either too expensive (enterprise pricing) or too complex (require DevOps expertise). We saw engineers wasting 10-20 hours per pipeline setup and companies paying for idle infrastructure they forgot to tear down.

What it does

DataFlow AI lets tech teams build data pipelines through conversation. Say "Create a pipeline from PostgreSQL to ClickHouse for audit logs" and in minutes you get a validated, secure pipeline configured automatically. The AI guides you through source selection, schema validation, transformation setup, and destination configuration. When you're done, one-click cleanup stops the bill.

How we built it

  • Frontend: Next.js + Tailwind + shadcn/ui
  • Backend: FastAPI with Gemini 2.0 Flash (LangChain agents)
  • Data Pipeline: Confluent Kafka for real-time streaming
  • Processing: ksqlDB & Apache Flink for transformations
  • Destinations: ClickHouse, PostgreSQL, S3
  • Auth: Firebase OAuth for secure access

Challenges we ran into

  • Building a full CDC (Change Data Capture) pipeline in hackathon timeframe
  • Orchestrating multi-step pipeline creation through conversational AI
  • Implementing dynamic schema validation across different source/destination types
  • Real-time cost estimation before pipeline deployment

Accomplishments that we're proud of

  • Conversational pipeline builder — no YAML, no CLI, just describe what you need
  • Minutes, not weeks — what used to take DevOps sprints now takes one conversation
  • Built-in validation — AI catches schema mismatches and security issues before deployment
  • One-click cleanup — achieve your goal, tear it down, stop paying

What we learned

  • Confluent Cloud makes Kafka accessible for rapid prototyping
  • ksqlDB is powerful for streaming transformations without heavy Flink setup
  • LangChain tools enable seamless AI-to-infrastructure orchestration
  • Simple guided workflows beat complex configuration UIs for developer adoption

What's next for DataFlow AI

  • Add more connectors: MongoDB, Snowflake, BigQuery, S3
  • Advanced transformation templates with Flink SQL
  • Real-time alerting and monitoring dashboards
  • Team collaboration and pipeline sharing
  • Target: 100 early adopters, enterprise pilot programs

Early respondents get beta access.

Take the 2-min survey at the end, you'll get an exclusive peek at what we're building.

survey.highguts.com

Built With

  • clickhouse
  • confluent
  • fastapi
  • gcp
  • gemini
  • kafka
  • langchain
  • nextjs
  • python
Share this project:

Updates