The page after search engine crawled and inverted index calculated by indexing process, the search engine can handle user search.After users fill in keywords in the search box, the index ranking procedure call library data, calculate ranking and displayed to the user, the ranking process is to interact directly with the user.
1. How Search Engines Rank Web Pages Part I
The page after search engine crawled and inverted index calculated by indexing process, the
search engine can handle user search.After users fill in keywords in the search box, the index
ranking procedure call library data, calculate ranking and displayed to the user, the ranking
process is to interact directly with the user.
1.Handle of keyword
After search engine receives the keyword of input by user, need some processing keyword, to
enter the ranking process.Find matching files containing all keywords, correlation calculation also
cannot be performed, because files found often hundreds of thousands or millions, or even tens of
millions.So many file correlation calculations on time,the time of needs is long.Users do not
actually need to know the all page of matching, the vast majority of users will only see the last two,
that is, the first 20 results.The search engine does not need to calculate so much correlation of
page, only calculating the most important part of the page. People who commonly used search
engine will be noted. The search results page displays up to 100.User click the “NEXT ” at the
bottom of the search results page, you can only see up to 100, that is 1000 search results.So the
search engines only need to calculate the correlation of 1000 results, will be able to meet
demand.But the problem is that, without calculating the correlation yet,the search engine how to
know that the 1000 document is most relevant?So the selection of a subset of the initial page
which used for the final correlation calculation,must rely on other characteristics, of which the
most important is the age weight. Because all matching file has the basic correlation(These files
contain keywords of search), search engines usually use non-related page features selected an
initial subset.How much is the initial number themselves?Tens of thousands?Perhaps more,
outsiders did not know.But, to be sure, when a huge number of pages of the match, the search
engine will not be calculated with many pages,and must select a subset of higher page
weight.Then page of subset conduct on correlation calculations.
2. 2.How Search engine Calculate Relevancy
After initial subset is elected, the content of webpages in the subset calculated keywords
correlation.The most important step in Calculate the correlation in the rank of the process.The
correlation calculation algorithm of the search engine most seoer interest part.
Main affecting factors of the correlation include the following aspects.
(1) Degree of commonly used keywords.After multiple keywords for segmentation, contribute to
the significance of the entire search string is not the same.More commonly used Word meaning to
search word contribution the smaller, less common word of meaning of Search words the larger
the contribution.For example, suppose the user to enter the search term “We Pluto”.”We” the term
is often a very high degree, Will appear on many webpages, Its identification of the extent of the
“We Pluto” search word and significance of a very minor contribution to the correlation.Find the
webpages that contains the “we” word, negligible impact on the relevance of search rankings,
there are too many webpages contains the “we” word.Do not regularly use the word “Pluto”,”We
Pluto” Search word meaning contribution is much greater.Webpages that contain the word
“Pluto”,”We Pluto” search term will be more relevant.Vulgar words are stop words, significance
of the webpages and did not affect the words.So the search engine to search words in the keyword
does not make no exception handling, weighted according to the usual extent.Not commonly used
word high weighting coefficient, weight coefficient of ordinary words in low, ranking algorithm to
give more attention to not commonly used words.We assume that A. B two pages are the
emergence of “we” and “Pluto” two words.But the word “we” in the A page appears in ordinary
text, the word “Pluto” A page appears in the title tag.B page opposite,”We” appears in the title
tag.And “Pluto” appears in ordinary text.Then according to the search term “we Pluto.”, the A
page will be more germane.
(2) Word frequency and density.Generally believed that in the case of no keywords accumulation,
many times the search terms appear on the page, the higher the density, description page more
relevant to the search term.Of course, this is just a substantially truncated, not necessarily the case,
correlation calculations, there are other factors.Frequency of occurrence and density is only part of
the factors, and the degree of importance is getting lower and lower.
(3) keywords type and position.As mentioned in the index part, Format webpages keyword and
position are recorded in the index database.Keywords appear in the more prominent position, such
as title tags, bold, H1, more relevant help page with keywords.This part of the page SEO to solve.
distance.The segmentation keywords complete match occurs, description with the
Keywords distance
most relevant search terms.Such as search “method reducing weight”,continuous complete way to
lose weight, “words on the webpages is the most relevant.If two words “weight loss” and
“methods” are not consecutive match, appears close, the search engine that the correlation is
slightly larger.
(4) Link analysis and page weight.In addition to the factors of the page itself, Links between pages
and weight relationships affect the keyword correlation,The imperative is the anchor text. The
page has more import more to the search term in the anchor text link on a page with the
stronger.Link analysis also includes the theme of the original page link, anchor text surrounding
text.