DataMate

landing-page
screenshot
logo

Project Story

Inspiration

In today's data-driven economy, organizations sit on valuable datasets but struggle with the friction of data collaboration. We've all experienced the painful process: endless emails to find the right data, manual negotiations over pricing and usage terms, and weeks of back-and-forth just to access a simple CSV file.

We were inspired by the vision of autonomous economic agents - what if AI agents could automatically discover, negotiate, and share data on behalf of their owners? Instead of humans spending hours coordinating data exchanges, intelligent agents could handle the entire process through natural language conversations.

What it does

DataMate creates a decentralized marketplace where AI agents automatically handle data discovery and sharing:

🤖 Consumer Agent: Understands your data needs in plain English ("I need e-commerce user behavior data for recommendation algorithms") and automatically searches for matching datasets across the network.

📊 Provider Agent: Continuously scans local data files, understands their semantic meaning, and automatically responds to relevant data requests with intelligent matching.

🤝 Autonomous Collaboration: Agents communicate directly using the A2A (Agent-to-Agent) protocol, eliminating the need for human intervention in routine data discovery and basic negotiations.

The entire process happens through intuitive chat interfaces powered by Google's ADK (Agent Development Kit), making advanced multi-agent collaboration accessible to non-technical users.

How we built it

Our architecture leverages cutting-edge agent frameworks for maximum reliability and scalability:

Core Framework Stack:

Google ADK (Agent Development Kit): Provides the complete agent runtime with built-in web UI, memory management, and tool integration
A2A Protocol: Enables standardized agent-to-agent communication across different applications
Gemini 2.0 Flash: Powers natural language understanding and intelligent decision-making

Multi-Agent Architecture:

# Consumer Agent - ADK with A2A client capabilities
consumer_agent = Agent(
    name="data_consumer_agent",
    model="gemini-2.0-flash",
    instruction="Understand user data needs and find matching providers",
    tools=[parse_requirements_tool, search_providers_tool]
)

# Provider Agent - ADK with A2A server capabilities  
provider_agent = Agent(
    name="data_provider_agent", 
    model="gemini-2.0-flash",
    instruction="Scan local data and respond to relevant requests",
    tools=[scan_data_tool, match_requests_tool, provide_access_tool]
)

Intelligent Data Processing:

Semantic Data Understanding: Automatically extract schema, generate tags, and assess data quality
Smart Matching Algorithm: Combine keyword matching with semantic similarity for accurate data discovery
Natural Language Processing: Convert user requirements into structured queries

Challenges we ran into

Agent Coordination Complexity: Initially, we tried building custom agent communication from scratch. The complexity of managing async messaging, error handling, and state synchronization was overwhelming. We solved this by adopting the standardized A2A protocol, which handled all the low-level communication details.

ADK Integration Learning Curve: Google's ADK is powerful but required understanding its opinionated architecture. We spent significant time learning how to properly integrate custom tools while leveraging ADK's built-in services (memory, sessions, UI). The breakthrough came when we embraced ADK's conventions rather than fighting them.

Natural Language to Data Schema Mapping: Getting agents to understand vague requests like "sales data for analysis" and match them to specific datasets with columns like customer_id, purchase_amount proved challenging. We solved this by implementing a two-stage process: semantic embedding similarity combined with schema compatibility scoring.

Real-time UI Updates: Showing the agent thinking process and A2A communication status in real-time required careful state management. ADK's built-in streaming capabilities ultimately provided the solution.

Accomplishments that we're proud of

🏆 Seamless Multi-Agent Experience: We achieved true autonomous agent collaboration - users can literally watch agents discover and negotiate data access without any human intervention.

🚀 Production-Ready Architecture: Built on enterprise-grade frameworks (ADK + A2A), our solution is immediately scalable and maintainable, not just a hackathon demo.

💡 Natural Language Data Discovery: Our agents understand context and intent, turning "I need customer behavior data for ML training" into precise dataset matches.

🎯 End-to-End Automation: From data scanning to delivery, the entire pipeline runs autonomously while keeping humans in control through intuitive chat interfaces.

⚡ Sub-5-Second Response Times: Despite complex semantic processing, our agents provide near-instant responses thanks to optimized tool integration and efficient A2A communication.

What we learned

Agent Frameworks are Game-Changers: Using production-ready frameworks like ADK accelerated our development by 10x compared to building from scratch. The built-in UI, memory management, and tool integration eliminated weeks of boilerplate coding.

Standardized Agent Communication is Critical: The A2A protocol solved countless edge cases we didn't even know existed. Agent networks need robust standards to scale beyond toy examples.

Natural Language Understanding Requires Iteration: Getting agents to accurately parse user intent took multiple prompt engineering cycles and careful tool design. The key was building tools that fail gracefully and provide clear feedback.

User Experience Makes or Breaks Agent Applications: Even with perfect backend logic, users will abandon agent applications with poor interfaces. ADK's professional UI components were essential for creating a compelling demo.

What's next for DataMate

🌐 Distributed Agent Network: Deploy agents across multiple organizations to create a true peer-to-peer data marketplace where agents automatically discover and share data across company boundaries.

💰 Automated Pricing and Negotiation: Implement dynamic pricing algorithms where Provider agents automatically adjust prices based on demand, data quality, and usage patterns.

🔒 Privacy-Preserving Data Sharing: Integrate differential privacy and secure multi-party computation so agents can share insights without exposing raw sensitive data.

📈 Google Cloud Integration: Leverage Google Cloud's data analytics and ML services for automated data quality assessment, anomaly detection, and intelligent recommendation systems.

🤖 Agent Ecosystem: Build a marketplace of specialized data agents - some focused on financial data, others on marketing analytics, IoT sensors, etc. - creating a rich ecosystem of autonomous data collaboration.

Built with

Agent Frameworks & AI:

Google ADK (Agent Development Kit)
A2A (Agent-to-Agent) Protocol
Gemini 2.0 Flash
Google Cloud AI Platform

Data Processing & Analysis:

Python 3.11+
pandas
numpy
sentence-transformers

Web & Communication:

FastAPI
httpx
uvicorn
WebSocket (real-time updates)

Development & Deployment:

Docker
asyncio
pydantic
python-dotenv

Built With

Updates

Jiaqi Wen started this project — Jun 14, 2025 10:20 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.