Bread Search

What it does

We built a semantic search engine in Databricks to efficiently search HEB data. It leverages a GPT-powered vector embedding database, L2 similarity evaluations, and a hybrid search method to efficiently find information. We tested many datasets, models, and methods (i.e. cross adaptation) to pick the most successful method.

How we built it

*We leveraged Databricks to build it.

*We used the git ingestion capability to directly get the provided JSON files in the platform, which were then ingested by a serverless data warehouse

*We used the SQL interface to manipulate the provided categories to create an effective aggregate of categories.

*We created serving endpoints and models to make custom embedding approaches, which were then applied to datasets.

*We extensively used the Databricks.vector_search class to interface with vector embeddings, especially to compute (as we learned) L2 similarity scores to queries using similarity_search() – the process that formed the core of our product.

*We used the gen AI capabilities to build simple RAG models that attempted to create better aggregate categories for embedding.

Challenges we ran into

Learning Databricks was a challenge, and some optimization features we wanted were locked down before we dived into the documentation with the help of the Databricks team here at datathon (Thank you!)