Purpose: Bill.com presented a challenge relevant to their applications in the business domain. Businesses can be represented as nodes within a graph, and the connections between these businesses can be represented as edges. Bill.com is interested in helping their clients by using this graph structure to provide recommendation and searches in their database. Given a smaller emulation of their dataset, we can represent each node a webpage in a social media network. Each webpage has a description and type, and they have connections between each other. With this structure, we sought to predict whether or not a link should exist between nodes, when the data set was made more sparse.
We accomplished this primarily by use of the node2vec algorithm, on both the word desciptions and node connections within the graph itself. Given these embeddings for both the nodes and words, we can create a unique feature vectors for each pair of nodes by finding the square difference between the the embeddings. With this, we implemented both a KNN and neural net model to predict whether or not there should exist an edge between two nodes. The KNN was a baseline model that we compared the neural net models performance against.
Challenges: Representing the different information in a similar dimension vector space was a challenge. We had to learn the intricacies of node2vec to process all of our different data. Another difficulty was how to represent the information we gather about each node as a pair of nodes to give a binary label for whether or not a node exists, and whether each method we explored would be beneficial. Finally, our models still need improvements. The accuracy is nowhere near what we had wanted, and varied massively compared to our validation set. There must be some overfitting somewhere that must be explored in order to gain higher performance.
Log in or sign up for Devpost to join the conversation.