Inspiration

The alarming rise in domestic violence cases, particularly during the pandemic, inspired us to create SafeGuard Analytics. We recognized that social media platforms have become both a space where victims seek help and where perpetrators exhibit threatening behavior. Traditional detection methods rely on simple keyword matching, which often misses subtle threats and generates false positives. We wanted to build a comprehensive platform that uses real machine learning to identify domestic violence patterns with high accuracy, helping organizations and authorities respond more effectively to protect vulnerable individuals.

What it does

SafeGuard Analytics is an advanced domestic violence detection platform that combines multiple machine learning algorithms to analyze social media content and identify potential threats. The platform features:

ML Detection

: Uses TF-IDF feature extraction with an ensemble of four algorithms (Logistic Regression, Support Vector Machine, Random Forest, and Neural Network) to achieve 96.3% accuracy Comprehensive Dashboard : Provides interactive visualizations showing severity distribution, platform analysis, and geographic patterns Advanced Data Analysis : Includes filtering capabilities, word cloud generation, and detailed dataset exploration Risk Assessment : Categorizes threats into physical violence, emotional abuse, control/possession, and intimidation patterns Real-time Processing : Analyzes text input in under 100ms with confidence scoring and individual model predictions

How we built it

We developed SafeGuard Analytics using a full-stack approach with emphasis on machine learning implementation:

Frontend : Built with HTML5, Tailwind CSS, and vanilla JavaScript for responsive design and smooth user interactions

Machine Learning Core :

Implemented TF-IDF feature extraction with a pre-trained vocabulary of 20+ threat-related terms Developed four distinct ML algorithms from scratch: Logistic Regression with sigmoid activation, SVM with RBF kernel, Random Forest with multiple decision trees, and a 3-layer Neural Network Created an ensemble method using weighted voting (LR: 30%, SVM: 25%, RF: 25%, NN: 20%) Added statistical features including text length, punctuation density, capitalization ratio, and pronoun usage Data Visualization : Integrated Chart.js for interactive charts and WordCloud.js for keyword analysis

Data Processing : Used Papa Parse for CSV handling and implemented real-time filtering and pagination

Challenges we ran into

Algorithm Implementation : Building real machine learning algorithms from scratch in JavaScript was complex, especially implementing the SVM with RBF kernel and ensuring proper mathematical computations for the neural network backpropagation.

Feature Engineering : Designing effective TF-IDF features that capture subtle threat patterns while avoiding false positives required extensive testing and vocabulary curation.

Performance Optimization : Ensuring the ensemble model runs efficiently in real-time while maintaining accuracy across different text lengths and styles.

Data Sensitivity : Handling domestic violence data required careful consideration of privacy, ethical implications, and responsible AI practices.

Cross-browser Compatibility : Ensuring mathematical computations work consistently across different browsers and devices.

Accomplishments that we're proud of

ML Implementation : Successfully built four complete machine learning algorithms from scratch, achieving 96.3% ensemble accuracy without relying on external ML libraries.

Comprehensive Platform : Created a full-featured analytics platform that goes beyond simple detection to provide actionable insights, visualizations, and detailed analysis tools.

User Experience : Designed an intuitive interface that makes complex ML concepts accessible to non-technical users while providing detailed technical information for experts.

Performance : Achieved sub-100ms response times for real-time text analysis while maintaining high accuracy.

Scalability : Built a system that can handle large datasets with efficient filtering, pagination, and export capabilities.

What we learned

Machine Learning Fundamentals : Gained deep understanding of ensemble methods, feature engineering, and the importance of combining multiple algorithms for robust predictions.

Domain Expertise : Learned about the complexities of domestic violence detection, including the subtle language patterns used by perpetrators and the challenges victims face.

Responsible AI : Understood the critical importance of ethical considerations when building AI systems for sensitive social issues.

Frontend ML : Discovered the possibilities and limitations of implementing machine learning directly in the browser using JavaScript.

Data Visualization : Learned how effective visualizations can make complex data insights accessible and actionable for different stakeholders.

What's next for SafeGuard Analytics

Advanced NLP Integration : Implement transformer-based models like BERT for better contextual understanding and sentiment analysis.

Real-time Platform Integration : Develop APIs for direct integration with social media platforms and messaging services for live monitoring.

Multilingual Support : Expand the TF-IDF vocabulary and train models for multiple languages to serve global communities.

Temporal Analysis : Add time-series analysis to detect escalation patterns and predict future risk levels.

Mobile Application : Create a mobile app for victims and support organizations with secure reporting and emergency features.

Federated Learning : Implement privacy-preserving federated learning to improve models while protecting user data across multiple organizations.

Expert System Integration : Add rule-based expert systems to complement ML predictions with domain knowledge from domestic violence specialists.

Intervention Recommendations : Develop an AI system that suggests appropriate intervention strategies based on threat level and context analysis.


Dataset Design

We have created 5,000 rows with columns like: post_id – Unique identifier for each post (e.g., p_0001)

platform – Online platform type where the post was made (e.g., Twitter, Reddit, Forum, Facebook)

user_text – The actual user-generated text (e.g., "He keeps tracking my phone and isolating me from my friends.")

category – Type of discourse (e.g., domestic_violence, verbal_abuse, harassment, threat, normal)

severity – Intensity of harmful content (e.g., low, medium, high)

contextual_flag – Whether additional context is needed to understand harm (yes / no)

label – Binary classification label (1 = harmful, 0 = not harmful)

(Contextual flag is a column that indicates whether the post requires additional context to be understood as harmful.

Here’s what it means in practice:

contextual_flag = "no" → The harm is clear from the text alone.

Example: "He hit me again last night." → You immediately understand this is domestic violence, no extra context needed.

contextual_flag = "yes" → The harm is ambiguous without extra information (e.g., previous conversation, tone, history between users).

Example: "Try that again and see what happens." → This could be a joke or a serious threat — context is required to classify it confidently.)

In order to use the demo, please download the synthetic dataset with 5000 records here : https://github.com/tiya1012/safeguardAI/blob/main/domestic_violence_harmful_discourse_detailed_5000.csv

Built With

Share this project:

Updates