Phone Matcher

Inspiration

The ubiquitousness of smartphones, has also brought with it, the inability for a user to use his/her phone for an entire day, because battery technology has far lagged all other smartphone technology. This makes most modern smartphones unreliable. An average user's usage of smartphones, in terms of app usage, location, WiFi and mobile data parameters, remains fairly static. In such a scenario, we wondered, was it possible that some phones would work better than others to perform, the same tasks a user requires it for day after day.

What it does

Using M2Catalyst's huge knowledge base, our program learns models to fit usage log characteristics to device energy usage for those characteristics. Using these models, for any user who wants to know the best phone (device) for his characteristics across all apps used by him, our program will be able to recommend the best phone for someone with his usage characteristics (in terms of battery or energy).

How we built it

For every device type, and for every app version, we pooled user logs across all users and timestamps to look at features including CPU usage, run time, mobile data usage and memory used. For every device and app version combination, we fit a linear regression model to take in these features and estimate the energy or battery used.

Once we have these black box models for every device type and app combination, we can take a particular user's timestamped logs and estimate the total energy footprints across all the device types. We then rank the devices in ascending order of total energy usage. The intuition is that these models will be able to estimate what is the total battery used by a user having certain app usage characteristics if he/she had been using a particular device. Since a user's characteristics don't change from device to device, the device which gives the optimal energy usage will be the best for that particular user.

Challenges we ran into

Handling the sheer amount log data - in order, to parse and obtain the features that we are using to build the models. This was a fairly obvious challenge for us, to be able to parse, and mine the features we were looking for.
Finding the most efficient serialization of feature data, so we could exchange information, and run model building scripts. A shoutout to MessagePack!
Malformed CSVs, in M2C's dataset led to some loss in training data and features. We had to drop out applications like Twitter, merely because, there was a ',' somewhere in there which was throwing off our CSV parsers.

Accomplishments that we're proud of

Our hunch, that we could model the coupling of device and application usage in terms of phone battery metrics, actually held.
That we got somewhere, in terms of this entire enterprise.

What we learned

That MessagePack is like the best serializer ever!
We learned some nifty ways to handle such large amounts of data. Seeing as this was our first ever data hack, that is probably the most useful thing we learnt.

What's next for Phone Matcher

Improve delivery of training data, and possibly, clean up and use all of M2Catalyst's log data and build a more comprehensive model.
Build a webapp, platform that lets a user use this model, to get phone suggestions.
Revisit, the models, to find insight about how some phone manufacturers perform as compared to others, for some forms of usage. This could be useful information about what improvements can be scoped for some phones.
Explore more advanced models than linear regression to estimate energy usage and perform feature engineering.