Data Friend

Screenshot of our application
Data represented in an experimental different view
GIF
Video Walkrthough of our app

Inspiration

As a team with a ton of experience in the field of data-science, we could all relate to the black-box problem that comes with various feature-extraction/clustering/ML algorithms. While the algorithms are really best of the best and do the job perfectly, the humans tend to struggle with understanding the actual reason why the final result works to begin with. A similar feeling arises with E-Commerce and advertising, where while the advertising algorithm seemingly maximizes the revenue, the user never feels like his interessets are being understood properly. With that problem in mind, we sat down to create something that presents ML/Statistical analysis about data in a human-friendly way. That's how the idea about "Data Friend" was born.

What it does

The tool "Data Friend" was designed with the goal in mind to make feature extraction and spotting connections between seemingly unrelated datapoints as humanly easy as possible. The final product closely resembles our goal, with the visualisation software making a lot of seemingly unrelated data pop into logical groups, that can then be further analysed both by software and by a human.

Even if human interraction is not welcome by the client, the profiling software still tries to replace and improve upon the human-like suggesting patterns that might lead to overall higher client retention due to client's feeling more welcome and cared for, at least while shopping for those specific shops

How we built it

The project is divided into two central parts - one built with Python's scientific computing library SciPy, and the other one with the known frameworks like MaterialUI and most importantly Vite for the lightning fast visualisations. The parts were built completely separately from each-other, which was an intentional decision by our team, to make sure that both parts work flawlessly on their own and are self-contained. For the first part, Python seemed like the obvious choice for data analysis and the broad pallet of data-analysis tools like Pandas that we used to preprocess and clean the dataset that was provided. As for the 2nd part of the project, D3 seemed like the best tool for the job, since it offers both a wide array of visualisation tools for the browser and also scalability of our foundation for the future.

Challenges we ran into

Continuing with the journey of D3, we unfortunately soon realised that D3 and React is quite a tricky tool-combination, due to the fact that React, just like D3 needs to access the DOM to function properly. In order to manipulate the objects appended by React, we needed to create individual accessibility route for D3, since the DOM elements that D3 needs in order to display the data are not present beforehand, but for the data creation, the elements are necessary. Therefore a workaround had to be built such that D3 can actually compute the data, because the element generation takes place during runtime, but for React, the components are generated only after everything is computed. We needed to create the component as an element and replace the reference in the DOM post-computation.

Another challenge was balancing between the two "goals" as defined by the jury. With the time so limited, and energy levels sinking fast, it was clear, that the team will have to focus on one of the expected functionalities. While the competitive excitement pushed me further into the depths of scientific papers about recommender systems, with more answers came more questions and with more information about the data set, came more edge-cases.

While I was struggling, the rest of the teammates were dealing with challenges of their own - one of the biggest ones, was a moral challenge, of how to balance the competitive spirit and the will to help our teammates, for whom HackKosice was their first hackathon experience. A decision was eventually made and more time was spent coaching and sharing positive experiences at the cost of a feature or two... In the end, everyone is here to have fun :)

Accomplishments that we're proud of

Me and my teammates are extremely proud of the visualisation software that we have built, that could allow E-Commerce companies, to not only use the data to it's fullest, but understand the meaning of significant data features as humanly as possible. Only with a fitting explanation does a machine-learning algoirithm shine, without an explanation or understanding, it's more of a black-box mistery and a headache!

I myself (Mantas) am happy to have tackled the point-highscore-problem and having tried to predict the most accurate future purchases on the data provided. On one hand, the most logical, scalable and profesional solutions can definitely get the job done, however it looked like 24 hours would be an unreasonable amount of time to build something truly general-purpose without overfitting to the dataset. While the idea definitely crossed my mind (and the jury has probably spotted my 100+ submission period, where we considered turning the task into an optimization problem and probably unintentionally DDoS'ing the server 😅), we all settled on the fact that the focus should rather be on the user experience, transparancy and information-rich data overview.

Gilles, was the leading force behind data-vizualisation and is very happy with the refresh on JS/React/D3, his old toolset with which he has had a lot of experience in the past, but haven't actively used in the recent years. His graphs and spacial ordering of data is the heart of the project and also the main selling point.

Gilles also deployed the application to Google Cloud in order for participants of the hackathon to see the application for themselves. First you need to download the sample_submission that we put in a Google Storage Bucket and then click on the second link that was generated by Google Cloud Run, where the application is deployed. You then need to upload the data by clicking on IMPORT FILE and secondly choose an example user to display the data in the graph.

What we learned

Gilles working on the most important part of the project, had to recieve assistance from his teammates. This quickly led the team to the documentations of React and forced even those completely unfamiliar with JS to dip their toes into the world of component-based programming.

I myself, felt as if I had absorbed a years worth of user modeling and recommender systems lectures - both a well known paper by Koen Verstrepen and the complete lecture slides of my own university, proved to be invaluable sources of information.

When it came to team-spirit and learning, Dalibor was the clear winner - using the opportiny to ask questions he had and solve all the bugs that were to come, he as well, got to play around with React with the help of Gilles, who selflessly took the role of the mentor in the group.

What's next for Data Friend

Like previously mentioned regarding the points and predicting, to make a system like this truly scalable and more general, is a big task, not to be completed in under 24 hours. However, when it comes to the plans that could be implemented within a week, month or even more time, the team isn't lacking ideas - speed optimization for data cleaning, obviously a profesional UI and most importantly a mathematically sound prediction model that can withstand the test of time and data changes. The team spirit and experience was also good enough, to allow the team to even look further and dream about opportunities after the Hackathon, most likely even build upon the foundation that we have built in the last 24 hours.