ZK Vector DB

Inspiration

Perplexity is a LLM search engine
Problems:
- lack of privacy from them collecting our data
- reliance on a company to continually provide high quality results

So, let's decentralize the database portion! Assuming sound incentives, this allows users to add web results and host nodes. Eventually, it converges to become the world's single source of truth.

What it does

A vector database stores a key-value pair. The key is a vector, the value can be anything. In this example, the key is an embedding - a bunch of numbers that represent the meaning of some text - the value let's just say is a reference to some website whose data was used to create the embedding The vector DB is designed to perform similarity search and return top-n results extremely quickly.

How we built it

An abstraction of the DB was built in Rust. We used Succinct SP1 to prove that the search executed without any issues. The proof can be verified on-chain, which also means users can interact with a smart contract to use the database while minimizing load on the network.

Challenges we ran into

ZK proofs are tricky to get right with anything related to AI, due to the use of a lot of vectors, floating points, complex calculations (ex. vector cosine similarity). There were also many options that failed due to hooks to various other programming languages, as well as irremovable dependencies on randomness. The easiest route for us was to build a PoC using plain Rust, minimal libraries. Also, we were not that familiar with Rust.