What it does

Using Graph Convolutional Network to predict whether two web pages on the graph are connected.

It is a mini-implementation of GraphSAGE, a popular learning algorithm for graph data.

Accomplished ~91.5% classification accuracy on the test set!

How we built it

Data

  • nodes (number id): webpage
    • (22470 linked, 1655 isolated)
  • edge: exists if two pages link to each other (132039)
  • Page’s text description
  • Page type (label)

Pre-processing

  • Node features
    • labels: provided, 4 types
    • Embedding text one-hot vectors
    • Use Doc2Vec, decide the output feature dimension based on the raw sentence length

Problem Abstraction: Link Prediction in Graph

  • Small model — increase complexity
    • Deeper GraphSAGE
    • higher number of channels
    • longer text embedding

Graph

  • Nodes: pages
  • Edges: connectivity of pages
  • Node feature: label + (embedded) text

Built With

Share this project:

Updates