A graph consists of nodes, edges, and keywords

Similarity

Nodes

  • Shortest distance
  • Random walk distance
    • Hitting time
      • Expected length of a random path, i.e.
    • Commute time
      • Sum of hitting time of round trip
      • Single trip time can be different if the graph is directional
    • Personalized PageRank
      • Initial value of in PageRank is personalized, e.g.
      • After iteration, we get the probability of random walker walks from to
      • Said probability is protectional to the similarity
  • Vertex embedding distance
    • Embed distance into nodes, i.e.
    • Similarity is calculated based on the euclidean distance

The shorter the distance, the more similar the nodes

Graphs

  • Maximum Common Subgraph (MCS)
    • The largest subgraph with the two graphs
  • Graph edit distance (GED)
    • We increment the distance when editing edge, keyword, or node
  • Graph embedding distance (GED)
    • Same as above

Partitioning

The “cut” is based on the number of edges and/or the workload of each partition

Graph Neural Network (GNN)

Types

  • Graph Attention Network (GAN)
    • Popular
    • Performs better than GCN
  • Graph Convolution Network (GCN)
  • GraphSAGE (Sample and aggregate)
    • Performs well in a very large graph
  • Message Passing Neural Network (MPNN)
    • Basic GNN model used in the earlier years
  • Recurrent Graph Neural Network (RecGNN)
    • Basic GNN model used in the earlier years

Tasks

  • Node-level task (grouping nodes)
  • Edge-level task (node correlation)
  • Graph-level task (graph category)

Embedding

  • Node embedding
  • Edge embedding
  • Graph embedding

Steps

  1. Initialize each node with an embedding vector (feature vector or randomized vector)
  2. Update the embedding vector layer by layer ()
    • The update is done by the aggregate and update operation ( could be MAX, MIN, AVG, or NN)
      • is the neighbors of
      • is the message from to
  3. Generate the graph embedding vector according to all node embedding vector
    • This is the readout operation (R), which could be MAX, MIN, AVG, or NN