Optimizing Advertisement Moderation

Description of solution

We score both advertisements and moderators, then match advertisements to the best fitting moderator.

Priority Score

To score the advertisements, we use a linear combination of the log transformations of ad revenue, average ad revenue and number of punishments, then multiplied by the task complexity. For the reviewers, we use a linear combination of accuracy and the log transformations of productivity and handling time. The usage of these non-linear transformations were decided based on the distributions of the values of each of these columns as seen in our exploratory ipynb notebook. The specific weights in the linear combinations could be changed based on how much we want to weigh the different variables, and the final scores are scaled to be between 0 and 1.

Matching Algo

To match advertisements to moderators, we define an objective function that takes into accounts:

squared difference of priority score of the advertisement and the moderator
the market match of the advertisement's delivery country and the moderator's expertise
the current utilisation of the moderator
the number of tasks the moderator has already been assigned The objective function can only be greater than equal to 0.

The goal of our stochastic optimization is to achieve the lowest possible value of our objective function; the lower the value, the better the match between an advertisement and a reviewer. Several algorithms and variants were explored but did not prove to be as performant. The selected algorithm starts with a random assignment of reviewers to each ad, then use simulated annealing to reassign a reviewer to an ad if it results in a lower mismatch. It also uses some amount of randomness accepting/rejecting a reassignment (see here: https://www.mit.edu/~dbertsim/papers/Optimization/Simulated%20annealing.pdf) in order to achieve relatively better scores in the same number of iterations.

Accomplishments that we're proud of

We successfully implemented a simulated annealing algorithm that finds a near-optimal solution, that is, an assignment scheme of advertisements to reviewers. The selected algorithm has an acceptable runtime of upto 5 minutes.

We also spotted some discrepancies in the original dataset. For example, some reviewers have an utilisation greater than 1, which does not make sense. Some advertisements' start date is not available. We cleaned and re-validated these dirty data points to ensure that our further processing is free of side-effects.

What we learned

We learned how to perform data cleaning, how to implement a variety of stochastic optimization algorithms, and how to evaluate and compare them. After we decided to use the simulated annealing algorithms, we also learned about many variants in this algorithm family.

What's next for Optimizing Advertisement Moderation

If possible, we will try to implement a simulated annealing algorithm with more changes. That is, instead of changing the assignments of one or two advertisements, we can change them in batches. The batch size can also decrease as the temperature decreases.

Bio

Sneha Kumar is a final year student from the National University of Singapore pursuing a Bachelor’s degree with a primary major in Data Science and Analytics and a secondary major in Innovation and Design. While she primarily focuses on machine learning and artificial intelligence, Sneha’s range of interests also include data engineering and front-end development.

Barnabas is a final year student from the National University of Singapore pursuing a Bachelor’s degree with a major in Data Science and Analytics and a minor in Computer Science. He is interested in machine learning, database systems and optimization algorithms.

Huang Hongyi is a penultimate-year student at the National University of Singapore, pursuing a Bachelor's degree with a first major in Computer Science and a second major in Mathematics. His interests lie in the domain of algorithms and theory, particularly optimization algorithms and formal methods.