Similaripy

What we have done

We have taken the similarity score of all the possible pairs of vectors representation of several texts (requirements) given by GESSI UPC University group project which is calculated using a Cosine distance. Then using that information as a matrix we have created an index using NMSLIB (source: https://github.com/nmslib/nmslib) and implemented a clusterization algorithm by thresholding and selecting a number of neighbours.

Challenges we ran into

We did not have a lot of time to develop our ideas. The brainstorming was a bit rush, and we are not used to it. Moreover, the dataset and method to validate our model were a little difficult to deal with.

What we learned

We've never used nmslib or neither done a clustering algorithm so we can say that almost everything of what we've done it was new to us.

What's next for Similaripy

Re-think about the way it is computed the accuracy for the model and experiment with several parameters to get the best result. We could try several ways to compute the distance and its similarity score instead of the Cosine distance.

Built With

java
nmslib
python

Submitted to

HackNLP
- Winner 1st prize - Amazon Echo / DJI Ryze Tello Drone

Created by

I worked on the creation of the nmslib index and the algorithm to create the clusters.

Adrià Cabeza
I've been working on the small part of Java preparing the matrix. Also, I've implemented the script for evaluating the results.

Albert Suarez
Principal Software Engineer at @restbai | Graduated at @UPC | Fall 2017 edition co-director at @hackupc | Hackathon enthusiast

Updates

Adrià Cabeza started this project — May 18, 2019 11:44 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.