This is a proof of concept for a decentralized vector database. Web2 AI applications would be wildly inefficient if reproduced on-chain; instead, we use SP1 to create proofs for the execution. SP1 made it easy to generate and verify ZK proofs with minimal understanding of the topic!

Motivation

Existing market options and drawbacks:

  1. Perplexity.ai is an amazing AI-powered search engine
  • TLDR: an LLM that is focused on summarizing relevant websites and cites its sources
  • But it's centralized. Why is this an issue?
  • Lack of privacy; web search data is often a cornerstone of demographic modeling for big tech
  • Reliance on the company to maintain data quality and continually process new websites
  1. Privacy-focused search engines: decentralized ones like Presearch, self-hosted ones like Searx, etc.
    • Not popular, often due to poor results quality: low relevance or not returning latest websites

The end goal for this PoC is a decentralized version of Perplexity, which simultaneously solves the problems in the prior options.

Proposal

The project is split into two components:

  1. Self-hosted LLM
    • users are free to use whatever models they want to do summarization, including self-hosted open-source models like Llama
    • due to the nature of the UX, half of the results quality concerns are handled by the LLM itself (fixable in prompt engineering)
  2. Decentralized vector database
    • a public DB for website results, for the users
    • assuming appropriate incentives
    • users can freely host nodes
    • users can contribute new websites to the DB, addressing the concerns with data availability

This repo contains a webapp to demonstrate the end use case with the LLM, as well as a simple vector database built in Rust.

Built With

Share this project:

Updates