Hyperlink Induced Topic Search (HITS) algorithm is used for link analysis. It discovers web pages and ranks them based on their relevance to a search query. The idea behind the HITS algorithm comes from the fact that websites link to one another. Thus, a website will link to another relevant website and also get connected by other authoritative websites.
To understand the HITS algorithm, the first thing to do is to understand authority and hubs. HITS algorithm uses authorities and hubs for defining the recursive relations existing between web pages.
- Authority: For a search engine query, the web pages highly relevant to the query are called roots and potential authorities. A node is regarded as a high-quality node if many other high-quality nodes link to it.
- Hub: Hubs are web pages that point to the Root pages but are not very relevant to the query. A node is regarded as a high-quality node if it also links to many other high-quality nodes.
Therefore, a hub is a web page linking to many authorities, while authority is a web page that hubs link to.