We were quite intrigued by the current reward system that Helsana has in place, as well as with the problems they face with this program. We wanted to take advantage of the real dataset provided to add value to the customer and the company, using our favourite data analytics techniques!

What it does

This client facing web app consists of a history of the activities that the user has accomplished. To enrich this experience, we used some data analytics and statistical inference to provide recommendations on what activity the user should do next, based on their past preferences. We also mark suspicious activities uploaded as the user, which could be fraudulent, using the entire database to detect abnormal activities. Once a fraud is detected, we can deduct some points, hoping that the user would avoid providing false information in the future.

This way, we want to encourage users to be honest, as well as give them a chance to correct or provide additional proof. This mechanism promotes values of transparency and trust between the client and the company. Having an automatic way of checking potential frauds allows the system to be more scalable and sustainable in the long term, such that more clients could benefit from the program.

Apart from the main functionalities mentioned above, we also implemented a simple login page where the login form is fully validated as well.

How I built it

Front end development

The front-end was built using React.js. Tech stack included JavaScript, HTML, CSS.

Back-end development

The back-end was implemented using Django, Django REST framework and connecting directly to MSSQL

Model analysis

Considering the participants' choices to be markov-chains, we obtain the transition matrix of the activities. By aggregating all of the transition matrices we create a global activity transition matrix.

We take into account a user's last activity type and focus on its corresponding row in the transition matrix.

We perform a Multinomial logistic regression using a softargmax function on the row to recommend one of the activities to the users.

Note 1.: Daily exercise activity is excluded from the matrix, since it will be independently recommended to the user everyday.

Note 2: Bonus achieving activities will be included in the matrix, as they have correlations with other types of activities, but they will not be recommended to the users. (they are excluded from the columns)

We also used statistical analysis to detect outlier activities for the fraud detection feature.

Challenges I ran into

The first challenge was to be able to coordinate between the different team members, as three out of four of us were participating remotely in three different time zones! On the technical side, making use of the data provided to us was quite a challenge, as the dataset was very heterogeneous and the number of features given to us was limited. It was also quite challenging to come up with the fraud detection system as no activities were already labelled fraudulent or not.

Accomplishments that I'm proud of

Being able to run statistical analysis on a relative small dataset with few entries for each participant.

What I learned

We learned how to deal with raw data and extract value out of it. We also perfected our frontend, backend, devops and database analysis skills!

What's next for DataSoundsNicetoMe

Suggestions for data-collection processes:

  • Categorize activities into different categories for example nutrition, fitness, recreation, and loyalty (including bonus programs) so you can analyze data effectively.

  • Create a dataset of fraudulent activities

Deployment and login

The application was deployed to a micro EC2 instance on AWS. Since it's a demo project, it's running on the embedded Django dev server (hence port 8000)

When logging in, use the following user IDs to log in (the password is always test)

  • 63803693 : This user has a lot of activities
  • 586550651: This user has two fraudulent activities
Share this project: