Inspiration
AI agents today are stuck with whatever data they were trained on, which gets outdated fast and doesn't cover specialized domains well. We wanted to build a platform where anyone could create their own AI agent with access to fresh, verified data from any source on the internet.
What it does
MAI lets you build custom AI agents that can answer questions using real-time data scraped from any website. The process is straightforward: you provide URLs you want to scrape, our system pulls the content using Bright Data's API, converts the text into searchable vector embeddings with OpenAI, stores everything in ChromaDB for semantic search, and records cryptographic proofs on the Sui blockchain. When someone queries your agent, it searches through your custom dataset and returns relevant answers with full source attribution and verification. Every piece of data has a Merkle root stored on-chain so you can prove exactly where and when it came from. It's basically like building your own specialized ChatGPT that only knows about the exact topics you care about, whether that's latest tech news, research papers, company docs, or anything else you can scrape from the web, and you have blockchain-backed proof that the information is legitimate and traceable.
How we built it
The backend runs on Node.js with TypeScript handling all the API orchestration. Bright Data's Web Unlocker API handles the scraping so we can get past anti-bot measures and pull content reliably from any site. We chunk the scraped text and send it to OpenAI's embedding API to generate 1536-dimensional vectors that capture semantic meaning. These vectors go into ChromaDB for fast similarity search when users query agents. Supabase stores all the metadata about datasets, agents, API keys, and user info. On the blockchain side we wrote 780 lines of Move code for Sui smart contracts that handle data provenance by storing Merkle roots and managing payment distribution for dataset creators. The frontend is React with TypeScript, using Tailwind and shadcn components for the UI, with forms for creating agents and datasets plus an analytics dashboard showing usage metrics. Everything talks to each other through REST APIs and the whole pipeline from scraping a URL to querying an agent takes about 10 seconds end to end.
Challenges we ran into
Getting ChromaDB to work with real embeddings was harder than expected because the local instance is picky about vector dimensions and we had to figure out the right chunking strategy so searches would actually return relevant results. Coordinating the data schemas across four different systems (Supabase, ChromaDB, Sui contracts, and our API) meant a lot of field mapping and making sure IDs matched up correctly. The blockchain integration required learning Move's ownership model and figuring out how to generate and verify Merkle proofs efficiently. Web scraping is inherently unreliable so we had to build in retry logic and handle cases where sites return minimal content or block requests entirely. Time pressure meant we had to make trade-offs between features and had some tense moments debugging why the frontend wasn't talking to the backend properly.
Accomplishments that we're proud of
We actually got the full pipeline working end to end, from scraping a live website with Bright Data to storing vectors in ChromaDB to querying them through our UI. The 780 lines of Move smart contracts we wrote implement complete data provenance logic with Merkle tree verification and payment distribution mechanisms. The frontend turned out really clean with proper form validation, loading states, error handling, and an analytics dashboard. We successfully scraped and ingested multiple real websites including TechCrunch, HackerNews, and Anthropic's blog. The architecture we designed actually holds together with all four major services integrated and working in concert, which felt like a minor miracle given the time constraints.
What we learned
Sui's Move language is actually really elegant for handling ownership and verification once you understand the object model, and having first-class support for cryptographic primitives made the provenance logic cleaner than it would have been on other chains. Vector search is more nuanced than we thought because chunking strategies dramatically affect result quality and you need way more data points than we initially assumed for good semantic matching. Building trust in AI systems really does require cryptographic verification rather than just promises, and having immutable blockchain records adds legitimacy. Bright Data's infrastructure is incredibly powerful for web scraping at scale and handles edge cases we would have spent days debugging ourselves. You can actually integrate four complex services in 24 hours if you architect carefully upfront and don't get stuck bikeshedding.
What's next for MAI
Deploy the Sui contracts to testnet and hook up real blockchain transactions instead of mock mode. Add support for more data sources beyond just web scraping like PDFs, APIs, databases, and document uploads. Build a proper marketplace where people can browse and purchase access to other users' agents and datasets. Implement usage-based pricing and revenue sharing so dataset creators can monetize their work. Add collaborative features so teams can build agents together and share datasets internally. Improve the vector search quality with better chunking algorithms and maybe fine-tuned embeddings for specific domains. Add agent composition so you can combine multiple specialized agents into more powerful meta-agents.
Built With
- brightdata
- chroma
- openai
- sui
- supabase
- typescript

Log in or sign up for Devpost to join the conversation.