Team name: Aenocyon dirus
Team members:
Anastasia Horvat
Daniel Kim
Gabby Ruehle
Robert Seybold
Inspiration
As 4 Master's students in the Institute of Advanced Analytics, we wanted to incorporate what we learned into real-world data. Logistic Regression was one of the main topics that were covered in the past month, and using this model to predict who will be purchasing apps was both very exciting and informative.
What it does
Our project included two logistic regression models. First, a binary logistic regression model was made to differentiate the 0's from the 1's. We wanted to find out who was willing to pay at least a penny on an app. We believed customers who were willing to spend money and customers who were not willing to spend money were two different types of customers. From this model, we were able to provide app developers with the characteristics of these two types of customers so that they could make informed business decisions. After differentiating the 1's from the 0's, the team created an ordinal logistic regression model to find out how much money the customers were willing to spend. We binned the response variable into 3 different categories and found variables that differentiated customers who were willing to spend less than 2 dollars, more than 2 dollars but less than 5 dollars, or more than 5 dollars. Using this model, we were able to find the consumer characteristics associated with a willingness to spend more for an app.
How we built it
Our project was written all in R. We used packages such as dplyr and tidyverse to write the code. We used tests such as Mantel-Haenszel and Chi-Square tests to test significance between variables, and used backward elimination, and stepwise selection to run different models.
Challenges we ran into
The response variable that we were interested in was the number of dollars that people have spent on app purchases. Since this was a global dataset and also a free response, the data required a lot of cleaning. We had to locate the currency indicators such as the currency symbol or the name of the currency. We also had to find typos and correct them. Then, the column was calculated to be in US Dollars.
Accomplishments that we're proud of
As a team, we are proud of the teamwork that was required for this project. With a very busy schedule during the week, our team had to meet outside of business hours to complete all the work required. We are also proud of implementing what we learned in the classroom into real data.
What we learned
As a team, we learned more about the complications that come with real-world data and the cleaning that is often necessary to produce results. We also learned more about logistic regression models and cumulative logit models and how to interpret both for business impact.
What's next for App Purchasing Behavior
The main findings from this analysis was people were more likely to spend money on game and entertainment apps, as well as people were willing to spend the most amount of money on professional apps. The next step in our analysis would be to look into characteristics of these types of apps (gaming/entertainment and professional) that make them more desirable to purchase. This would allow app developers to implement those features into future apps.
Built With
- dplyr
- r
- tidyverse

Log in or sign up for Devpost to join the conversation.