AdVanguard: Elevating Digital Advertising Integrity

Inspiration

In today's digital age, online advertising plays a pivotal role in connecting businesses with their target audiences. However, maintaining the integrity of online advertising, ensuring user safety, and optimizing the advertiser experience are ongoing challenges. To address these issues, we embark on an innovative journey to develop a sophisticated ad review and moderation system that not only safeguards users but also maximizes the value of each ad impression. Our vision is to create a dynamic ecosystem where ads are thoroughly evaluated, matched with the most suitable moderators, and delivered to users seamlessly.

By leveraging data science and machine learning algorithms, this project not only improves the quality of ads users encounter but also enhances the advertiser's return on investment. It ensures that scams are caught, irrelevant content is minimized, and user experience is prioritized.

What it does

Ad Quality and Risk Assessment:
Develop a cutting-edge model that synthesizes multiple indicators of an ad's risk and value, allowing us to prioritize ad reviews effectively.
Ensure that every ad displayed meets the highest standards of quality and user safety.
Matching Content with Moderators:
Implement an intelligent matching mechanism that pairs ads with moderators who possess the expertise needed to ensure precise and efficient reviews.
Elevate the review accuracy, efficiency, and overall experience for both users and moderators.
Moderator Scoring and Optimization:
Create a scoring model for moderators that evaluates their performance based on productivity, handling time, ad utilization, and ad accuracy.
Empower the best moderators to scrutinize the most critical ads, promoting a virtuous cycle of continuous improvement.

How we built it

SQL
Data cleaning
Python (Jupyter Notebook)
Data manipulation
Exploratory data analysis
Optimization - Gaussian Progress Regression (GPR)
Predictive modeling - content-based filtering

Libraries used: NumPy, pandas, matplotlib, seaborn, scikit-learn,tensor flow,keras`

Challenges we ran into

All of us are from engineering/physics major with little data science and machine learning knowledge - we did our own learning and research in a short period of time
Model training time is considerably long as it can take up to half an hour and this can hinder the development and iteration of the models
Difficulty in relating the ads data to moderator data - we defined a 'scoring' variable to relate both datasets Long model training times can hinder the development and iteration of machine learning models. This challenge requires optimizing model architectures, hyperparameters, and hardware resources to reduce training times effectively.