What it does
We built a semantic search engine in Databricks to efficiently search HEB data. It leverages a GPT-powered vector embedding database, L2 similarity evaluations, and a hybrid search method to efficiently find information. We tested many datasets, models, and methods (i.e. cross adaptation) to pick the most successful method.
How we built it
*We leveraged Databricks to build it.
*We used the git ingestion capability to directly get the provided JSON files in the platform, which were then ingested by a serverless data warehouse
*We used the SQL interface to manipulate the provided categories to create an effective aggregate of categories.
*We created serving endpoints and models to make custom embedding approaches, which were then applied to datasets.
*We extensively used the Databricks.vector_search class to interface with vector embeddings, especially to compute (as we learned) L2 similarity scores to queries using similarity_search() – the process that formed the core of our product.
*We used the gen AI capabilities to build simple RAG models that attempted to create better aggregate categories for embedding.
Challenges we ran into
Learning Databricks was a challenge, and some optimization features we wanted were locked down before we dived into the documentation with the help of the Databricks team here at datathon (Thank you!)
Accomplishments that we're proud of
Being at the front of HEB for the last several hours (and hopefully the end?)
building something in a cutting-edge tool like Databricks
What we learned
databricks & semantic search.
What's next for Bread Search
Built With
- databricks
- openai
- semantic-engines-semantic

Log in or sign up for Devpost to join the conversation.