🔍 LocationTrust - ML for Trustworthy Location Reviews
📋 Project Overview
LocationTrust is an advanced machine learning application that automatically detects policy violations and assesses the quality of Google location reviews. Built as a comprehensive solution for trustworthy review analysis, the system identifies:
- Advertisement violations: Reviews promoting other businesses or services
- Irrelevant content: Reviews unrelated to the location or experience
- Rant detection: Emotional outbursts from users who likely haven't visited the location
- Quality assessment: Overall trustworthiness scoring of reviews
🎯 Key Features
- Interactive Web Interface: Streamlit-based dashboard with real-time processing
- Flexible Data Input: Support for various CSV formats with configurable column mapping
- Advanced NLP Processing: Powered by Hugging Face transformers and spaCy
- Comprehensive Evaluation: Detailed metrics, visualizations, and exportable results
- Scalable Architecture: Batch processing with concurrent execution for large datasets
🏗️ System Architecture
📁 LocationTrust/
├── 🎯 app.py # Main Streamlit application
├── 📦 src/ # Core modules
│ ├── data_processor.py # Data preprocessing and feature extraction
│ ├── model_handler.py # Hugging Face model integration
│ ├── policy_detector.py # Policy violation detection logic
│ ├── evaluation.py # Metrics calculation and evaluation
│ └── visualization.py # Charts and dashboard components
├── 🛠️ utils/ # Utility functions
│ ├── helpers.py # General helper functions
│ └── prompts.py # ML model prompts and templates
├── 📊 datasets/ # Sample datasets
│ ├── reviews.csv # General review dataset
│ └── sepetcioglu_restaurant.csv # Restaurant-specific dataset
└── 📋 requirements.txt # Python dependencies
🚀 Setup Instructions
Prerequisites
- Python 3.8+
- Hugging Face Account (for API access)
- Git (for cloning the repository)
1. Clone the Repository
git clone https://github.com/stinkray77/LocationTrust.git
cd LocationTrust
2. Install Dependencies
Option A: Using pip
pip install -r requirements.txt
Option B: Using uv (recommended)
pip install uv
uv sync
3. Get Hugging Face API Key
- Create account at huggingface.co
- Go to Settings > Access Tokens
- Create new token with Read permissions
- Copy the token for environment setup
4. Environment Configuration
For Local Development:
Create a .env file or set environment variable:
export HUGGINGFACE_API_KEY="your_token_here"
For Streamlit Cloud:
Add to your app's secrets in the Streamlit dashboard:
HUGGINGFACE_API_KEY = "your_token_here"
5. Run the Application
Local Development:
streamlit run app.py
Production Deployment:
The app is deployed on Streamlit Cloud at: https://locationtrust.streamlit.app
📊 How to Reproduce Results
Using Sample Data
The repository includes sample datasets in the datasets/ folder:
- Launch the application following setup instructions above
- Navigate to "📊 Data Upload & Processing"
- Upload sample data: Use
datasets/reviews.csvordatasets/sepetcioglu_restaurant.csv - Configure column mapping:
- Review Text:
review_textortext - Rating:
rating(optional) - Location:
locationorbusiness_name(optional)
- Review Text:
Step-by-Step Workflow
1. Data Processing
- Upload your CSV file with location reviews
- Map columns to standard format
- Preview processed data with extracted features
2. Model Configuration
- Choose Model: Select from available Hugging Face models
- Default:
microsoft/DialoGPT-medium(lightweight) - Advanced:
facebook/bart-large-mnli(higher accuracy)
- Default:
- Set Parameters:
- Max Length: 256 tokens (recommended)
- Temperature: 0.3 (balanced creativity)
- Confidence Threshold: 0.7 (default)
3. Policy Detection
- Run Analysis: Process reviews through the ML pipeline
- Monitor Progress: Real-time progress tracking with batch processing
- Review Results: Detailed violation detection with confidence scores
4. Results & Evaluation
- Performance Metrics: Precision, Recall, F1-Score for each violation type
- Interactive Visualizations:
- Violation distribution charts
- Confidence score histograms
- Rating vs. trustworthiness scatter plots
- Detailed Analysis: Review-by-review breakdown with explanations
5. Export & Summary
- Download Results: JSON export with full analysis
- Generate Report: Automated summary with key insights
- Save Configuration: Export model settings for reproducibility
Expected Results
Using the sample datasets, you should see:
Performance Metrics:
- Advertisement Detection: ~85-90% F1-Score
- Irrelevant Content: ~80-85% F1-Score
- Rant Detection: ~75-80% F1-Score
- Overall Accuracy: ~82-87%
Key Insights:
- ~15-25% of reviews contain policy violations
- Advertisement violations most common in restaurant reviews
- Rant detection correlates with extreme ratings (1★ or 5★)
🔧 Customization
Adding New Models
Update Model List in
src/model_handler.py:AVAILABLE_MODELS = [ "your-model-name", # existing models... ]Configure Model Parameters for optimal performance
Custom Policy Detection
- Modify Rules in
src/policy_detector.py - Add New Violation Types with custom logic
- Update Evaluation Metrics in
src/evaluation.py
Data Format Support
- Required Column: Review text (any column name)
- Optional Columns: Rating, location, date, user info
- Supported Formats: CSV files with UTF-8 encoding
🔍 Troubleshooting
Common Issues
Dependencies:
# Install missing spaCy model
python -m spacy download en_core_web_sm
# Update packages
pip install --upgrade -r requirements.txt
API Issues:
- Verify Hugging Face API key is valid
- Check model availability and permissions
- Monitor API rate limits
Memory Issues:
- Reduce batch size in model configuration
- Use lighter models for large datasets
- Process data in smaller chunks
Getting Help
- Issues: GitHub Issues
- Documentation: See inline code comments
- Community: Streamlit Community Forum
📜 License
This project is open source and available under the MIT License.
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Built with ❤️ using Streamlit, Hugging Face, and modern NLP techniques
Log in or sign up for Devpost to join the conversation.