HITS

Hubs and Authorities

More good hubs pointing to a web page higher authority weight

Sampling

Root Set

Web pages that contain the query terms (split by spaces)

Web pages that are linked to the root set (either way)

Base Set

Merge the root set and the link pages

Iteration

  1. Construct the adjacency matrix (row-wise matrix, 1 or 0)
  2. Iterate over the authority vector and hub vector until at least 2 output is the same
  3. Ranking can be done based on the 2 resulting vector

The sum of the elements in the authority vector is always the number of web pages

Normalization can be done to the authority/hub vector, i.e.

PageRank

Iteration

  1. Construct the stochastic matrix (column-wise matrix, probability)
  2. Iterate over the PageRank vector until at least 2 output is the same

Spider Trap

A set of web pages with no outgoing links, which would take over the importance

It can be mitigated with the following equation, where and are constants that sum up to 1

The sum of the elements in the PageRank vector is always same as the sum in the initial vector