SEOMoz - The Beginner's Guide to Search Engine Optimization
Search engines
1. SEARCH
A PRACTICAL GUIDE TO THE FUTURE
INFORMATION THAT’S HARD TO FIND WILL
REMAIN INFORMATION THAT’S HARDLY
FOUND.
Copyleft
2. “Even a blind squirrel finds a nut ,
occasionally.” But few of us are determined
enough to search through millions, or
billions, of pages of information to find our
“nut.” So, to reduce the problem to a, more
or less, manageable solution, web “search
engines” were introduced a few years ago.
3. Finding key information
from gigantic World Wide
Web is similar to find a
needle lost in haystack. For
this purpose we would use a
special magnet that would
automatically, quickly and
effortlessly attract that
needle for us.
In this scenario magnet is
“Search Engine”
4.
5. Search
COMPUTING to examine a computer file, disk,
database, or network for particular information.
Engine
Something that supplies the driving force or energy
to a movement, system, or trend.
Search Engine
A computer program that searches for particular
keywords and returns a list of documents in which
they were found, especially a commercial service
that scans documents on the Internet.
6. Search is a Wicked Problem
• No definitive formulation.
• Considerable uncertainty. Complex interdependencies.
• Incomplete, contradictory, and changing requirements.
• Stakeholders have radically different world views and
different frames for understanding the project or process.
• The problem is never solved.
Roles Language Input Index Metadata Design
Goals Vocabulary Interaction Algorithms Controlled Vocabulary Interaction
Tasks Syntax Feedback Linguistics Knowledge Management Behavior
User
?
Query
Search
Interface
Search
Engine
Ask, Browse, or Search Again
Content Results
6
9. 1st Generation (ca 1994):
• AltaVista, Excite, Infoseek…
• Ranking based on Content:
Pure Information Retrieval
2nd Generation (ca 1996):
• Lycos
• Ranking based on Content + Structure
Site Popularity
3rd Generation (ca 1998):
• Google, Teoma, Yahoo
• Ranking based on Content + Structure + Value
Page Reputation
In the Works
• Ranking based on “the need behind the query”
10. Content Similarity Ranking:
The more rare words two documents share,
the more similar they are
Documents are treated as “bags of words”
(no effort to “understand” the contents)
Similarity is measured by vector angles
t3
Query Results are ranked d
by sorting the angles 2
between query and documents d1
θ
t1
t2
11. A hyperlink
from a page in site A www.aa.com
to some page in site B 1
is considered a popularity vote www.bb.com
from site A to site B 2
Rank similar documents
www.cc.com
according to popularity 1 www.dd.com
2
www.zz.com
0
12. The reputation “PageRank” of a page Pi =
the sum of a fraction of the reputations of all
pages Pj that point to Pi
Idea similar to academic co-citations
Beautiful Math behind it
• PR = principal eigenvector
of the web‟s link matrix
• PR equivalent to the chance
of randomly surfing to the page
HITS algorithm tries to recognize
“authorities” and “hubs”
13.
14. Check for duplicates,
crawl the store the
web documents
DocIds
user create an
inverted
query index
Search
Show results Inverted
engine
To user index
servers
15. Crawling
Follow links to find information
Indexing
Record what words appear where
Ranking
What information is a good match to a user
query? What information is inherently good?
Displaying
Find a good format for the information
20. But Google is usually so good in finding info…
Why does it do that?
21. • I try another search engine.
• I try different keywords but if I still can't find
an answer, I just think real hard for an
answer.
• I focus on the encyclopedia.
23. don’t know how to form a sound search
query;
don’t have a strategy for dealing with poor
results;
can’t articulate how they know content is
credible;
don’t check the author or date of an article.
24. Step 1 – define the data you want
Step 2 – figure out where it‟s likely to be
found
Step 3 – select the search tool most likely
to provide it
Step 4 – learn how to interpret your results
25. The most commonly used search tools are
• Search Engines
• Subject Directories
Other search tools include
• Targeted directories
• Focused Crawlers
• Portals
• Vortals
• Meta-tools
• Value-added search services
26. Searchengines are the preferred tool
when you:
• Are looking for something very specific
• Need to pin down a quick fact or two
• Need to know if any information exists at all on a
subject
• Want mass quantities of links, but are not
concerned about quality control.
27. A subject directory is a database of titles,
citations, and websites organized by
category.
Advantage – Most directories are edited,
maintained and created by people.
• Usually they are carefully evaluated and annotated for
this reason.
Disadvantage – Typically include a smaller
number of sites than a search engine due
to the great amount of human effort
involved.
28. Open Directory Project - The largest, most
comprehensive human-edited directory of the
Web. It is constructed and maintained by a
vast, global community of volunteer editors.
Closed model directories such as Yahoo! And
LookSmart are pulled together by professional
editors who select the links and set up the
categories. The user generally gets high
quality results
29. Subject directories are organized and
selective.
They are useful when you want to know
more about broad-based subjects, such as
• General topics
• Popular topics
• Targeted directories
• Current events
• Product information
30. Many search engines are now hybrids-
search tools that have an engine as well
as a directory.
Sometimes targeted directories are
matched with focused crawlers to produce
a very powerful hybrid search tool. (e.g.
http://www.FirstGov.gov
31. Metasearches use multiple engines to look for
your keywords.
Advantage – You have many search engines all
looking for what you need. Great when you are
looking for something that is hard to find.
Disadvantage – It‟s hard to fine tune your search
and narrow things down. Also, Metasearches
can sometimes give you more information than
what you need.
32. Beaucoup! – www.beaucoup.com
Clusty – http://clusty.com
Mamma, “the mother of all search
engines”- www.mamma.com
Ixquick – www.ixquick.com
33. Yahooligans – Made for ages 7-12, pages are
hand picked to be appropriate for children. Not
only will the content on these pages be
monitored, but so are the ads that are displayed.
Froogle – Made for the frugal shopper, this
offshoot of Google has engines that catalog
products and finds you the cheapest price for a
given item on the internet. It‟s in it‟s “beta”
version so they are still working out some kinks.
34. Boolean Operators (AND, OR, and
NOT)
• AND:
Limits the number of „hits‟ (results) you receive
In many search sites, this is implied (if you type
two or more words, it assumes you want x AND y
AND z, etc.)
• OR:
Increases the number of „hits‟ you receive
Synonyms for words can be used
• NOT:
Limits the number of „hits‟ you receive
Useful for getting rid of words that have more than
one meaning
Ex: Sun NOT Microsystems
Sometimes a (-) sign (like for Google)
35. Phrase Search
Usually quotation marks are used: “ “
Useful for a specific search (song lyrics, part of a poem, etc.)
Ex: “fly me to the moon”
Truncation and Wildcards
Used as placeholders for additional characters - usually (*)
Truncation = finds any characters that come after the placeholder
• Ex: Red* --> red, reds, redwood, redding, etc.
Wildcards = finds different characters within a word
• Ex: Wom*n --> woman, women
Stop Words
Small words that are used often
Some stop words include: and, the, a, not, to, be, etc.
• Ex: Give me a cookie and Give me cookie would yield similar results
Most search engines and databases ingore these
36. Limiters
Most search engines and databases provide other ways to narrow your search
Often found under Advanced Search
Varies greatly!
• Search limiters
Keyword (usually default)
Title
Author
Subject
Multiple search boxes
• Other limiters
Date
Language
Type ( book, dvd, magazine, etc.) OR (web: .gov, .edu, .org)
• Google Advanced Search
• Wilson Select Plus
37. Power searching also uses math, the
universal language.
Uses symbols of + and – and “”.
Example: “Clinton – Lewinsky” on Yahoo!
38. Usethese commands in the search
window.
• intitle: Find sites with one search term in the title.
• allintitle: Find sites with all search terms in the title.
• inurl: Find sites with one search term in the URL.
• allinurl: Find sites with all search terms in the URL.
• site: Limit your search to a specific web site.
• filetype: Specify a type of document to search.
8/2/2007
39. Find pages containing the term in the title:
intitle:[search term]
Find pages with terms in the text:
allintext:[search terms]
Find similar pages to a certain website:
related:[insert URL]
Find pages with the term in the URL:
inurl:[insert search term]
Try it out!
40. Find pages containing the term in the title:
title:[search term]
Find pages with the term in the URL:
url.all:[search term]
41. Also called “deep web” consists of
materials search engines will not or cannot
index.
Usually consists of web-based databases
or pdf files.
Example: American Memory Project:
Jackie Robinson.
42. Google – The only traditional search
engine that can recognize .pdf and .doc
files.
Profusion – a Metasearch tool that lets you
search .pdf files.
43. Google
By far the most used search site (76% of searches on the Internet are done using Google).
Simple one line search box
Phrase completion function
Did you mean function
I‟m Feeling Lucky!
Other search options
• Images, Videos, Maps, News, Shopping (limiters)
• Search strategies
TYPE INCLUDED? HOW
Boolean operators Yes AND = [default] OR = OR(capitalized) NOT = [-]
(AND, OR, NOT)
Phrase Search Yes Quotation marks [“ “]
Wildcards / Truncation Some No truncation (Google automatically searches other endings)
Wildcards = [*]
Advanced search Yes Limit by Language, File type, Domain, etc.
45. Bing (Microsoft‟s latest search engine)
Starts out with a simple one box search, but becomes more complex
Phrase completion function
Web site review function
Related searches
Other search options
• Images, Videos, Maps (localized), News, Shopping, History (limiters)
• Search strategies
TYPE INCLUDED? HOW
Boolean operators Yes AND = [default] OR = OR(capitalized) NOT = NOT (capitalized)
(AND, OR, NOT)
Phrase Search Yes Quotation marks [“ “]
Wildcards / Truncation No No truncation or wildcard options
Advanced search Yes Limit by Terms. [under Preferences] Domain, Country/Region, Language,
Filter
47. Yahoo! Search
Much more than a search engine (search.yahoo.com for ONLY search)
Search Assist / Also try:
Sponsored results
Related searches
Other search options
• Images, video, local, shopping, jobs, news, sports, weather, etc. (limiters)
• Search strategies
TYPE INCLUDED? HOW
Boolean operators Yes AND = [default] OR = OR(capitalized) NOT = [-]
(AND, OR, NOT)
Phrase Search Yes Quotation marks [“ “]
Wildcards / Truncation No No truncation or wildcard options
Advanced search Yes Limit by Terms, Last updated, Domain, Country, Language, Filter
49. Dogpile
Meta search engines search multiple other search sites
Simple one line search box
Phrase complete function
Did you mean function
Other search options
• Images, video, news, white and yellow pages (limiters)
• Search strategies
TYPE INCLUDED? HOW
Boolean operators No * Advanced search terms function in a similar way
(AND, OR, NOT)
Phrase Search No * Advanced search terms function in a similar way
Wildcards / Truncation No No truncation or wildcard options
Advanced search Yes Limit by Terms, Domain. [under preferences] Filter, Bold search terms, #
displays
51. Clutsy
Simple one line search box
Clusters function (groups results into subjects)
Sources and Sites function
Did you mean function
Other search options
• News, Images, Wikipedia, Blogs, Jobs (limiters)
• Search strategies
TYPE INCLUDED? HOW
Boolean operators Yes AND = [default] OR = OR(capitalized) NOT = [-]
(AND, OR, NOT)
Phrase Search Yes Quotation marks [“ “]
Wildcards / Truncation No No truncation or wildcard options
Advanced search Yes Limit by Host (domain), Language, Type, # Results in a Cluster, Filter
52. Surfwax (meta search engine)
Can view contents of your search in a sidebar (Snap)
Is very cluttered / complex
Can broaden or narrow your search (Focus)
Sort by and results functions
Useful if you are „browsing‟ the Web without a clear topic
Wikipedia (online encyclopedia)
Encyclopedia in which anyone can edit content
• Vast amount of information on practically any subject
• Reliability somewhat in question
• List of references
Best if you are looking for specific information or as a place to start a search
Useful if you are „browsing‟ the Web without a clear topic
YouTube (videos posted by anyone)
Video of practically anything you can think of
Anyone can post a video clip
Difficult to find information. Cluttered.
Many others
Just search the words “search engines” in your favorite search
54. 1. Most search engines have vanished.
2. Google is a big player.
3. 63% of Internet users use a search engine in a
given session.
4. Approximately 94 million adults use the internet
on an average day.
5. This means approximately 59.22 MILLION people
use search engines in an average day.
6. Microsoft realized Internet is here to stay
i. Dominates the browser market.
ii. Realizes search is critical.