It is a challenging task to find like minded people in social media platforms or in real life. We try to solve this by clustering instant messages and connecting like minded people based on their instant message patterns.
What it does
It connects like minded people together. For example, if a group of people who are interested in climate change connect together, they could discuss ideas effectively to make the world a better place.
How we built it
We collected the publicly available 20_newsgroups dataset which consists of documents on different topics which resembles conversations. We did data cleaning, data pre-processing, feature extraction and clustered them. So, now each cluster will have conversations of like minded people which we can use to recommend groups.
We used python as the primary programming language to build above mentioned pipeline.
Challenges we ran into
Each document was of different format and data cleaning was challenging as we had to develop a generic cleaning methodology to accommodate all different documents. Dimension of the data was huge in both raw and vectorized form, so we faced challenges in training the model and clustering. This also caused scalability issues.
Accomplishments that we're proud of
Exploring different embedding techniques and finding the best one for the given problem in a short time. Coming up with qualitative strategies to visualize and evaluate the quality of clustering.
What we learned
Learning the appropriate embedding is more important than the actual modelling or clustering.
What's next for Recommending groups based on Instant message patterns
Developing and end to end messaging platform which connects people of like minded people.