Applovin Query Challenge

Inspiration

As a team, we were interested in learning more about databases, how queries are made, and different methods of storing data. Applovin's challenge seemed like a great opportunity to dip our toes into the world of data through optimizing queries.

What it does

Our scripts utilize data aggregation, caching, and DuckDB to speed up queries on a given dataset.

How we built it

Building off of the starting scripts, we tested and implemented a number of optimizations, learning and testing the viability of different techniques, such as indexing, partitioning, and sorting.

Challenges we ran into

Many of the techniques we wanted to implement weren't feasible, since preprocessing gigabytes of information took a significant amount of time. We were time constrained to around 5-10 minutes of data preprocessing time, and combining techniques like sorting and indexing took too long.

Accomplishments that we're proud of

Any queries that fall into our aggregated data tables are extremely fast, and we spent a good amount of time testing which columns to aggregate, coming up with a pretty good set of aggregations (we think).

What we learned

As a team, we gained a lot of experience working together and loved the opportunity to learn about how databases organize data and how architectural decisions influence performance.