"Duv-Duv Gap" - a phrase in Uzbek meaning "gossip in the streets"
NewsQ Challenge on creating a news recommendation engine for a country other than the US. We decided to build one for Uzbekistan!
What it does
Duv-Duv Gap is a framework for low-resource languages by which we can use historical data of news articles and related metadata and predict the user engagement of new articles.
How we built it
We started with determining several Uzbek news sources that represented the majority of the viewers in the country. Then we scraped around 24,000 news articles and metadata for these articles. For each article we extracted several data points such as Title, Content, Date/Time posted, Number of views, Source, Number of images, Number of hyperlinks, and Number of quotes. Our next task was to decide on the metrics to quantify the quality of the news. Obviously, there are a lot of ways to define the quality of news/information among which relevance, facticity, style, potential impact play a big role. However, measuring these features is nothing but trivial and this is especially true for languages with poor data resources and relevant technologies. As a team, we spent a considerable amount of time over defining a metric that is meaningful -> does it actually capture the quality of news? simple -> is it straightforward and easily comprehensible? universal -> is it language-agnostic? After a lot of failed solutions, we reached our final solution. We define the target (label) as the normalized scalar value of NumberOfViews / ActivePeriod * SubscriberCount with respect to its own source. This way we avoid the problem of domain mismatch and bias towards small news outlets. Formally, the formula would be defined as:
MinMaxScalerOfSource(NumberOfViews / ActivePeriod * SubscriberCount)
Challenges we ran into
One of the main challenges was the lack of extensive language technologies and resources relating to Uzbek. In fact, there are no automated fact-checkers, factual databases, grammar, or dependency parsers and lemmatization tools.
Accomplishments that we're proud of
We are proud to have built the first news aggregation and ranking engine for Uzbek language that is fully functional.
What we learned
For some team members, it was the first time building a recommendation and ranking system, so it was a valuable experience. For some others, it was a great coding challenge where they got to learn and write code in HTML, CSS, and JS.
What's next for Duv-Duv Gap
We want to make it an open-source project that the news agencies in Uzbekistan can utilize