HITS
Hubs and Authorities
More good hubs pointing to a web page higher authority weight
Sampling
Root Set
Web pages that contain the query terms (split by spaces)
Link Pages
Web pages that are linked to the root set (either way)
Base Set
Merge the root set and the link pages
Iteration
- Construct the adjacency matrix (row-wise matrix, 1 or 0)
- Iterate over the authority vector and hub vector until at least 2 output is the same
- Ranking can be done based on the 2 resulting vector
The sum of the elements in the authority vector is always the number of web pages
Normalization can be done to the authority/hub vector, i.e.
PageRank
Iteration
- Construct the stochastic matrix (column-wise matrix, probability)
- Iterate over the PageRank vector until at least 2 output is the same
Spider Trap
A set of web pages with no outgoing links, which would take over the importance
It can be mitigated with the following equation, where and are constants that sum up to 1
The sum of the elements in the PageRank vector is always same as the sum in the initial vector