This is a proof of concept for a decentralized vector database. Web2 AI applications would be wildly inefficient if reproduced on-chain; instead, we use SP1 to create proofs for the execution. SP1 made it easy to generate and verify ZK proofs with minimal understanding of the topic!
Motivation
Existing market options and drawbacks:
- Perplexity.ai is an amazing AI-powered search engine
- TLDR: an LLM that is focused on summarizing relevant websites and cites its sources
- But it's centralized. Why is this an issue?
- Lack of privacy; web search data is often a cornerstone of demographic modeling for big tech
- Reliance on the company to maintain data quality and continually process new websites
- Privacy-focused search engines: decentralized ones like Presearch, self-hosted ones like Searx, etc.
- Not popular, often due to poor results quality: low relevance or not returning latest websites
The end goal for this PoC is a decentralized version of Perplexity, which simultaneously solves the problems in the prior options.
Proposal
The project is split into two components:
- Self-hosted LLM
- users are free to use whatever models they want to do summarization, including self-hosted open-source models like Llama
- due to the nature of the UX, half of the results quality concerns are handled by the LLM itself (fixable in prompt engineering)
- Decentralized vector database
- a public DB for website results, for the users
- assuming appropriate incentives
- users can freely host nodes
- users can contribute new websites to the DB, addressing the concerns with data availability
This repo contains a webapp to demonstrate the end use case with the LLM, as well as a simple vector database built in Rust.
Built With
- express.js
- javascript
- ollama
- react.js
- rust
- sp1
Log in or sign up for Devpost to join the conversation.