Semantic Similarity SEarch

First Page
Second Page

Inspiration

The inspiration behind this project is to empower consumers and businesses in the grocery retail sector with a more intuitive, efficient, and intelligent search experience. By addressing challenges such as language barriers, spelling errors, and vague search queries, we aim to make grocery shopping more accessible and user-friendly for everyone.

What it does

Enhances search functionality: Replaces traditional keyword-based search with intelligent and user-focused features.
Improves accuracy: Delivers precise search results by understanding user intent and context.
Supports fuzzy search: Handles spelling errors effortlessly (e.g., "rise" → "rice").
Implements semantic search: Understands contextual queries like "show beverages" or "small pack sizes."
Includes regional language support: Bridges language gaps with Hindi-to-English mapping (e.g., "chawal" → "rice").
Optimizes user experience: Customizes results for users, enhancing satisfaction for both consumers and businesses.

How we built it

Frontend:
- HTML/CSS: Designed and styled the user interface to make it clean and user-friendly.
- JavaScript: Added interactivity to enable seamless query submission and dynamic result displays.
Backend:
- Python: Handled data preprocessing, vectorization, and the core search logic.
- TF-IDF and Truncated SVD: Used for vectorization and dimensionality reduction of product data.
- FAISS: Implemented for efficient similarity-based search over large datasets.
Search Features:
- Fuzzy Matching: Incorporated to correct spelling errors and improve query interpretation.
- Semantic Understanding: Engineered to handle specific user intents like size or category-related searches.
- Regional Language Handling: Included preprocessing logic to map regional keywords to their English counterparts.

Challenges we ran into

Data Quality: Handling inconsistencies, missing fields, and non-standard formats in the dataset.
Query Interpretation: Ensuring that both fuzzy and semantic search features deliver relevant results for diverse query types.
Performance Optimization: Managing computational efficiency for large datasets with minimal latency.
Language Mapping: Translating and normalizing regional terms like Romanized Hindi into English equivalents.
Frontend-Backend Integration: Establishing a smooth data flow between the user interface and backend systems.

Accomplishments that we're proud of

Developed an efficient semantic search engine.

- Enhanced accessibility with regional language support, making the tool more inclusive.

What we learned

Data Handling: Preprocessing and cleaning large datasets to ensure high-quality results.
Search Technologies: Understanding and leveraging tools like TF-IDF, SVD, and FAISS for efficient query handling.
Backend Integration: Building robust APIs for secure and efficient data transfer between client and server.
Collaboration: Working as a team to divide responsibilities and integrate diverse technical expertise.
User-Centric Design: Developing features tailored to end-user needs, such as spelling correction and regional language support.

What's next for Semantic Similarity Search

Expanded Regional Support: Extend to more languages and dialects for a wider audience.
Real-Time Suggestions: Add features like autocomplete and dynamic query suggestions.
Personalized Recommendations: Integrate user behavior data to recommend products based on search history and preferences.
Scalability Improvements: Optimize the system to handle even larger datasets and more complex queries.
Advanced AI Integration: Explore deep learning models like BERT for enhanced semantic understanding and multilingual support.

Built With

css3
fuzzywuzzy
html5
javascript
json
md
python
tf-idf
word2vec

Updates

Ishita R started this project — Nov 24, 2024 12:01 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.