SlideShare a Scribd company logo
1 of 12
The Little Search Engine That Can! How To Get Started on an Internet Search Mrs. Cathleen Carpenter Course: Searching and Researching on the Internet
Human-Powered Directories A human-powered directory, such as the Open Directory, depends on people for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted. Changing your web pages has no effect on your listing.  How Search Engines Work The term "search engine" is often used generically to describe both crawler-based search engines and human-powered directories. These two types of search engines gather their listings in very different ways. Crawler-Based Search Engines Crawler-based search engines, such as Google, create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found.  If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role. Resource: Danny Sullivan, Search Engine Watch, Mar 14, 2007, http://searchenginewatch.com/showPage.html?page=2168031 If this student were paying attention, he might be able to do a search and find a job that allows him to sleep all day!
Search Engine Elements Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled." The spider returns to the site on a regular basis, such as every month or two, to look for changes. Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information. Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant. Resource: Danny Sullivan, Search Engine Watch, Mar 14, 2007, http://searchenginewatch.com/showPage.html?page=2168031 CRAWLER INDEX SOFTWARE
Remember, you are smarter than a computer. Use your intelligence.  Search engines are fast, but dumb.   A search engine's ability to understand what you want is very limited.  It will obediently look for occurrences of your keywords all over the Web, but it doesn't understand what your keywords mean or why they're important to you. To a search engine, a keyword is just a string of characters.  It doesn't know the difference between cancer the crab and cancer the disease...and it doesn't care.  But  you   know what you query means (at least, we hope you do!).  Therefore, you must supply the brains.  The search engine will supply the raw computing power. Key to Success... 1. Know where to look first. 2. Fine tune your key words. 3. Be refined. 4. Query (search) by example 5. Anticipate the answers. Resource: http://www.monash.com/spidap5.html
3. Be Refined  Read the help files and take advantage of the available search refining options.  Use phrases, if possible.  Use the Boolean AND (or the character +) to include other keywords that you would expect to find in relevant documents.    Learn to EXCLUDE with the Boolean NOT.  Excluding is particularly important as the Web grows and more documents are posted.   Resource: http://www.monash.com/spidap5.html 1. Know Where To Look First   Are you looking for information about a person?  A company?  A software product? A health-related problem?  Do you want to find a job?  Get a date? Plan a vacation?  Do you need to research a term paper?  Document a news story? There are various databases containing specific information that might be more useful to you than a general search engine. 2. Fine-tune your keywords  If you're searching on a noun (the name of a person, place or thing), remember that most nouns are subsets of other nouns.  Enter the smallest possible subset that describes what you want.  Be specific.  Example:  If you want to buy a car, don't enter the keyword "car" if you can enter the keyword "Toyota."  Better still, enter the phrase "Toyota Dealerships" AND the name of the city where you live.
4. Query by example   Take advantage of the option that many search engine sites are now offering: you can "query by example," or "find similar sites," to the ones that come up on your initial hit list.  Essentially what you're doing is telling the search engine, "yes, this looks promising, give me more like this one."  5.  Anticipate the answers Before searching, try to imagine what the ideal page you would like to access would look like.  Think about the words its title would contain.  Think about what words would be in the first couple of sentences of a webpage that you would consider useful.  Use those words, or that phrase, when you enter your query. Resource: http://www.monash.com/spidap5.html Example:   If you want to find out how medical details about your grandmother's diagnosis of Alzheimer's Disease, try entering "Alzheimer's" AND "symptoms" AND "prognosis."  If you want to find out about Alzheimer's care and community resources, query on "Alzheimer's" AND "support groups" AND "resources" AND NOT "symptoms."
Search for anything using your favorite crawler-based search engine. Nearly instantly, the search engine will sort through the millions of pages it knows about and present you with ones that match your topic. The matches will even be ranked, so that the most relevant ones come first. Of course, the search engines don't always get it right. Non-relevant pages  make it through, and sometimes it may take a little more digging to find  what you are looking for. But, by and large, search engines do an amazing job. As WebCrawler founder Brian Pinkerton puts it, "Imagine walking up to a librarian and saying, 'travel.' They’re going to look at you with a blank face.“ OK -- a librarian's not really going to stare at you with a vacant expression. Instead, they're going to ask you questions to better understand what you are looking for. Unfortunately, search engines don't have the ability to ask a few questions to focus your search, as a librarian can. They also can't rely on judgment and past experience to rank web pages, in the way humans can. So, how do crawler-based search engines go about determining relevancy, when confronted with hundreds of millions of web pages to sort through? They follow a set of rules, known as an algorithm and all major search engines follow some general rules. Why Your Search Results Come Back in an Certain Order Resource: Danny Sullivan, Search Engine Watch, Mar 14, 2007, http://searchenginewatch.com/showPage.html?page=2168031
Search engines will also check to see if the search keywords appear near the top of a web page, such as in the headline or in the first few paragraphs of text. They assume that any page relevant to the topic will mention those words right from the beginning. Frequency is the other major factor in how search engines determine relevancy. A search engine will analyze how often keywords appear in relation to other words in a web page.  Those with a higher frequency are often deemed more relevant than other web pages. Resource: Danny Sullivan, Search Engine Watch, Mar 14, 2007, http://searchenginewatch.com/showPage.html?page=2168031 Location, Location, Location...and Frequency One of the main rules involves the location and frequency of keywords on a web page. Call it the location/frequency method, for short. Remember the librarian mentioned before? They need to find books to match your request of "travel," so it makes sense that they first look at books with travel in the title. Search engines operate the same way.  Pages with the search terms appearing in the HTML title tag are often assumed to be more relevant than others to the topic. Search engines may also penalize pages or exclude them from the index, if they detect search engine "spamming." An example is when a word is repeated hundreds of times on a page, to increase the frequency and propel the page higher in the listings. Search engines watch for common spamming methods in a variety of ways, including following up on complaints from their users.
What is a keyword, exactly?  It can simply be any word on a webpage.  For example, I used the word "simply" in the previous sentence, making it one of the keywords for this particular webpage in some search engine's index.   However, since the word "simply" has nothing to do with the subject of this webpage (i.e., how search engines work), it is not a very useful keyword.   Useful keywords and key phrases for this page would be "search," "search engines," "search engine methods," "how search engines work," "ranking" "relevancy," "search engine tutorials," etc.  Those keywords would actually tell a user something about the subject and content of this page. Keyword Searching Resource: http://www.monash.com/spidap5.html Unless the author of the Web document specifies the keywords for her document  it's up to the search engine to determine them.  Essentially, this means that search engines pull out and index words that appear to be significant.  Since engines are software programs, not rational human beings, they work according to rules established by their creators for what words are  usually  important in a broad range of documents.
The title of a page, for example, usually gives useful information about the subject of the page (if it doesn't, it should!).  Words that are mentioned towards the beginning of a document (think of the "topic sentence" in a high school essay, where you lay out the subject you intend to discuss) are given more weight by most search engines.   The same goes for words that are repeated several times throughout the document.  Problems? Keyword searches have a tough time distinguishing between words that are spelled the same way, but mean something different (i.e. hard cider, a hard stone, a hard exam, and the hard drive on your computer). This often results in hits that are completely irrelevant to your query (search). Search engines also cannot return hits on keywords that mean the same, but are not actually entered in your query. A query on heart disease would not return a document that used the word "cardiac" instead of "heart."  Resource: http://www.monash.com/spidap5.html
Most sites offer two different types of searches--"basic" and "refined" or "advanced."  In a "basic" search, you just enter a keyword without sifting through any pull-down menus of additional options.  Depending on the engine, though, "basic" searches can be quite complex.  Advanced search refining options differ from one search engine to another, but some of the possibilities include the ability to search on more than one word, to give more weight to one search term than you give to another, and to exclude words that might be likely to muddy the results.  You might also be able to search on proper names, on phrases, and on words that are found within a certain proximity to other search terms.    Refining Your Search Resource: http://www.monash.com/spidap5.html Some search engines also allow you to specify what form you'd like your results to appear in, and whether you wish to restrict your search to certain fields on the internet (i.e., usenet or the Web) or to specific parts of Web documents (i.e., the title or URL).
Many, but not all search engines allow you to use so-called  Boolean operators  to refine your search.  These are the logical terms AND, OR, NOT , and the so-called proximal locators,  NEAR  and  FOLLOWED BY .  All graphics: www.animationfactory.com  Resource: http://www.monash.com/spidap5.html Capitalization:  This is essential for searching on proper names of people, companies or products. Unfortunately, many words in English are used both  as proper and common nouns--Bill, bill, Gates, gates, Oracle, oracle, Lotus,  lotus, Digital, digital--the list is endless.   Final Hints 1. Boolean AND means that all the terms you specify must appear in the documents, i.e., "heart" AND "attack."  You might use this if you wanted to exclude common hits that would be irrelevant to your query.    2. Boolean OR means that at least one of the terms you specify must appear in the documents, i.e., bronchitis, acute OR chronic.  You might use this if you didn't want to rule out too much. 3. Boolean NOT means that at least one of the terms you specify must not appear in the documents. You might use this if you anticipated results that would be totally off-base, i.e., nirvana AND Buddhism, NOT Cobain. 4. NEAR means that the terms you enter should be within a certain number of words of each other.  FOLLOWED BY means that one term must directly follow the other.

More Related Content

What's hot

Search engine optimization (seo)
Search engine optimization (seo)Search engine optimization (seo)
Search engine optimization (seo)
Sergey Pavlov
 
Search Engine Marketing For Office Websites
Search  Engine  Marketing For  Office  WebsitesSearch  Engine  Marketing For  Office  Websites
Search Engine Marketing For Office Websites
Christian Veillette
 

What's hot (20)

search engine optimization ppt
search engine optimization pptsearch engine optimization ppt
search engine optimization ppt
 
Introduction to SEO Strategies and Techniques
Introduction to SEO Strategies and TechniquesIntroduction to SEO Strategies and Techniques
Introduction to SEO Strategies and Techniques
 
Introduction to SEO Strategies and Techniques
Introduction to SEO Strategies and TechniquesIntroduction to SEO Strategies and Techniques
Introduction to SEO Strategies and Techniques
 
Clickbank cash-success-secrets
Clickbank cash-success-secrets Clickbank cash-success-secrets
Clickbank cash-success-secrets
 
SEARCH ENGINE
SEARCH ENGINESEARCH ENGINE
SEARCH ENGINE
 
Content Re-Optimization
Content Re-OptimizationContent Re-Optimization
Content Re-Optimization
 
Keyword research - Digital Marketing - SEO
Keyword research - Digital Marketing - SEOKeyword research - Digital Marketing - SEO
Keyword research - Digital Marketing - SEO
 
Help faq
Help faqHelp faq
Help faq
 
Keyword Research Guide using Google Keyword Planner Tool
Keyword Research Guide using Google Keyword Planner ToolKeyword Research Guide using Google Keyword Planner Tool
Keyword Research Guide using Google Keyword Planner Tool
 
Seo for-content
Seo for-contentSeo for-content
Seo for-content
 
Identifying Keywords and Searching Techniques
Identifying Keywords and Searching TechniquesIdentifying Keywords and Searching Techniques
Identifying Keywords and Searching Techniques
 
Search engine optimization (seo)
Search engine optimization (seo)Search engine optimization (seo)
Search engine optimization (seo)
 
SEJ Summit 2015: Engaging Content Marketing for 'Boring' Industries by Mindy ...
SEJ Summit 2015: Engaging Content Marketing for 'Boring' Industries by Mindy ...SEJ Summit 2015: Engaging Content Marketing for 'Boring' Industries by Mindy ...
SEJ Summit 2015: Engaging Content Marketing for 'Boring' Industries by Mindy ...
 
Google+ and Its Effect on Your Company's Marketing Campaigns
Google+ and Its Effect on Your Company's Marketing CampaignsGoogle+ and Its Effect on Your Company's Marketing Campaigns
Google+ and Its Effect on Your Company's Marketing Campaigns
 
Search Engine Marketing For Office Websites
Search  Engine  Marketing For  Office  WebsitesSearch  Engine  Marketing For  Office  Websites
Search Engine Marketing For Office Websites
 
SEO Strategy Guide [2019]
 SEO Strategy Guide [2019] SEO Strategy Guide [2019]
SEO Strategy Guide [2019]
 
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
 
Keyword research session
Keyword research sessionKeyword research session
Keyword research session
 
Pakar SEO Aceh
Pakar SEO AcehPakar SEO Aceh
Pakar SEO Aceh
 
Gidon Session 3
Gidon Session 3Gidon Session 3
Gidon Session 3
 

Viewers also liked (7)

Escrito Para Você
Escrito Para VocêEscrito Para Você
Escrito Para Você
 
Mueck
MueckMueck
Mueck
 
Globalization Workshop 2008 Rrlc
Globalization Workshop 2008 RrlcGlobalization Workshop 2008 Rrlc
Globalization Workshop 2008 Rrlc
 
Reglas Masculinas
Reglas MasculinasReglas Masculinas
Reglas Masculinas
 
Fotos De Noche
Fotos De NocheFotos De Noche
Fotos De Noche
 
Mujeres segun los Ingenieros
Mujeres segun los IngenierosMujeres segun los Ingenieros
Mujeres segun los Ingenieros
 
Week18
Week18Week18
Week18
 

Similar to Database Lecture.Carpenter

Academic Skills 4
Academic Skills 4Academic Skills 4
Academic Skills 4
Hala Nur
 
Searching the Internet
Searching the Internet Searching the Internet
Searching the Internet
guest32ae6
 
How to use internet effectivetly
How to use internet effectivetlyHow to use internet effectivetly
How to use internet effectivetly
Kro0485
 
Search engine rampage
Search engine rampageSearch engine rampage
Search engine rampage
Confidential
 
Lesson Six Researching And The Internet
Lesson Six   Researching And The InternetLesson Six   Researching And The Internet
Lesson Six Researching And The Internet
bsimoneaux
 
Promoting your website_through_search_engine
Promoting your website_through_search_enginePromoting your website_through_search_engine
Promoting your website_through_search_engine
Khirulnizam Abd Rahman
 
Lesson 4: Researching & The Internet
Lesson 4: Researching & The InternetLesson 4: Researching & The Internet
Lesson 4: Researching & The Internet
bsimoneaux
 

Similar to Database Lecture.Carpenter (20)

Academic Skills 4
Academic Skills 4Academic Skills 4
Academic Skills 4
 
Searching the Internet
Searching the Internet Searching the Internet
Searching the Internet
 
SEO Interview FAQ
SEO Interview FAQSEO Interview FAQ
SEO Interview FAQ
 
Intro to Search Engine Optimization
Intro to Search Engine OptimizationIntro to Search Engine Optimization
Intro to Search Engine Optimization
 
Your own online magazine
Your own online magazineYour own online magazine
Your own online magazine
 
Your own online_magazine
Your own online_magazineYour own online_magazine
Your own online_magazine
 
Search Engine Manifesto
Search Engine  ManifestoSearch Engine  Manifesto
Search Engine Manifesto
 
How to use internet effectivetly
How to use internet effectivetlyHow to use internet effectivetly
How to use internet effectivetly
 
Search Engine
Search EngineSearch Engine
Search Engine
 
Search Enginesv2
Search Enginesv2Search Enginesv2
Search Enginesv2
 
How to Use Search and Discovery Engines like Flipora
How to Use Search and Discovery Engines like FliporaHow to Use Search and Discovery Engines like Flipora
How to Use Search and Discovery Engines like Flipora
 
Search Engine Manifesto
Search Engine ManifestoSearch Engine Manifesto
Search Engine Manifesto
 
Your own online_magazine
Your own online_magazineYour own online_magazine
Your own online_magazine
 
Search engine rampage
Search engine rampageSearch engine rampage
Search engine rampage
 
INSTAGRAM SEARCH ENGINE MANIFESTO
INSTAGRAM SEARCH ENGINE MANIFESTOINSTAGRAM SEARCH ENGINE MANIFESTO
INSTAGRAM SEARCH ENGINE MANIFESTO
 
Lesson Six Researching And The Internet
Lesson Six   Researching And The InternetLesson Six   Researching And The Internet
Lesson Six Researching And The Internet
 
Promoting your website_through_search_engine
Promoting your website_through_search_enginePromoting your website_through_search_engine
Promoting your website_through_search_engine
 
Successful search strategies
Successful search strategiesSuccessful search strategies
Successful search strategies
 
SEO and the HTML title Element
SEO and the HTML title ElementSEO and the HTML title Element
SEO and the HTML title Element
 
Lesson 4: Researching & The Internet
Lesson 4: Researching & The InternetLesson 4: Researching & The Internet
Lesson 4: Researching & The Internet
 

More from ccarpen2 (20)

Week19
Week19Week19
Week19
 
Week17
Week17Week17
Week17
 
Week16
Week16Week16
Week16
 
Week15
Week15Week15
Week15
 
Week14
Week14Week14
Week14
 
Week13
Week13Week13
Week13
 
Week12
Week12Week12
Week12
 
Week11
Week11Week11
Week11
 
Week10
Week10Week10
Week10
 
Week9
Week9Week9
Week9
 
Week8
Week8Week8
Week8
 
Week7
Week7Week7
Week7
 
Week6
Week6Week6
Week6
 
Week5
Week5Week5
Week5
 
Week4
Week4Week4
Week4
 
Week3
Week3Week3
Week3
 
Week2
Week2Week2
Week2
 
Week1.Vocabulary
Week1.VocabularyWeek1.Vocabulary
Week1.Vocabulary
 
Bud Not Buddy
Bud Not BuddyBud Not Buddy
Bud Not Buddy
 
Carpenter Mission Statement
Carpenter Mission StatementCarpenter Mission Statement
Carpenter Mission Statement
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Database Lecture.Carpenter

  • 1. The Little Search Engine That Can! How To Get Started on an Internet Search Mrs. Cathleen Carpenter Course: Searching and Researching on the Internet
  • 2. Human-Powered Directories A human-powered directory, such as the Open Directory, depends on people for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted. Changing your web pages has no effect on your listing. How Search Engines Work The term "search engine" is often used generically to describe both crawler-based search engines and human-powered directories. These two types of search engines gather their listings in very different ways. Crawler-Based Search Engines Crawler-based search engines, such as Google, create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found. If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role. Resource: Danny Sullivan, Search Engine Watch, Mar 14, 2007, http://searchenginewatch.com/showPage.html?page=2168031 If this student were paying attention, he might be able to do a search and find a job that allows him to sleep all day!
  • 3. Search Engine Elements Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled." The spider returns to the site on a regular basis, such as every month or two, to look for changes. Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information. Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant. Resource: Danny Sullivan, Search Engine Watch, Mar 14, 2007, http://searchenginewatch.com/showPage.html?page=2168031 CRAWLER INDEX SOFTWARE
  • 4. Remember, you are smarter than a computer. Use your intelligence.  Search engines are fast, but dumb.   A search engine's ability to understand what you want is very limited.  It will obediently look for occurrences of your keywords all over the Web, but it doesn't understand what your keywords mean or why they're important to you. To a search engine, a keyword is just a string of characters.  It doesn't know the difference between cancer the crab and cancer the disease...and it doesn't care. But you know what you query means (at least, we hope you do!).  Therefore, you must supply the brains.  The search engine will supply the raw computing power. Key to Success... 1. Know where to look first. 2. Fine tune your key words. 3. Be refined. 4. Query (search) by example 5. Anticipate the answers. Resource: http://www.monash.com/spidap5.html
  • 5. 3. Be Refined Read the help files and take advantage of the available search refining options.  Use phrases, if possible.  Use the Boolean AND (or the character +) to include other keywords that you would expect to find in relevant documents.   Learn to EXCLUDE with the Boolean NOT.  Excluding is particularly important as the Web grows and more documents are posted.   Resource: http://www.monash.com/spidap5.html 1. Know Where To Look First Are you looking for information about a person?  A company?  A software product? A health-related problem?  Do you want to find a job?  Get a date? Plan a vacation?  Do you need to research a term paper?  Document a news story? There are various databases containing specific information that might be more useful to you than a general search engine. 2. Fine-tune your keywords If you're searching on a noun (the name of a person, place or thing), remember that most nouns are subsets of other nouns.  Enter the smallest possible subset that describes what you want.  Be specific.  Example:  If you want to buy a car, don't enter the keyword "car" if you can enter the keyword "Toyota."  Better still, enter the phrase "Toyota Dealerships" AND the name of the city where you live.
  • 6. 4. Query by example Take advantage of the option that many search engine sites are now offering: you can "query by example," or "find similar sites," to the ones that come up on your initial hit list.  Essentially what you're doing is telling the search engine, "yes, this looks promising, give me more like this one." 5.  Anticipate the answers Before searching, try to imagine what the ideal page you would like to access would look like.  Think about the words its title would contain.  Think about what words would be in the first couple of sentences of a webpage that you would consider useful.  Use those words, or that phrase, when you enter your query. Resource: http://www.monash.com/spidap5.html Example:   If you want to find out how medical details about your grandmother's diagnosis of Alzheimer's Disease, try entering "Alzheimer's" AND "symptoms" AND "prognosis."  If you want to find out about Alzheimer's care and community resources, query on "Alzheimer's" AND "support groups" AND "resources" AND NOT "symptoms."
  • 7. Search for anything using your favorite crawler-based search engine. Nearly instantly, the search engine will sort through the millions of pages it knows about and present you with ones that match your topic. The matches will even be ranked, so that the most relevant ones come first. Of course, the search engines don't always get it right. Non-relevant pages make it through, and sometimes it may take a little more digging to find what you are looking for. But, by and large, search engines do an amazing job. As WebCrawler founder Brian Pinkerton puts it, "Imagine walking up to a librarian and saying, 'travel.' They’re going to look at you with a blank face.“ OK -- a librarian's not really going to stare at you with a vacant expression. Instead, they're going to ask you questions to better understand what you are looking for. Unfortunately, search engines don't have the ability to ask a few questions to focus your search, as a librarian can. They also can't rely on judgment and past experience to rank web pages, in the way humans can. So, how do crawler-based search engines go about determining relevancy, when confronted with hundreds of millions of web pages to sort through? They follow a set of rules, known as an algorithm and all major search engines follow some general rules. Why Your Search Results Come Back in an Certain Order Resource: Danny Sullivan, Search Engine Watch, Mar 14, 2007, http://searchenginewatch.com/showPage.html?page=2168031
  • 8. Search engines will also check to see if the search keywords appear near the top of a web page, such as in the headline or in the first few paragraphs of text. They assume that any page relevant to the topic will mention those words right from the beginning. Frequency is the other major factor in how search engines determine relevancy. A search engine will analyze how often keywords appear in relation to other words in a web page. Those with a higher frequency are often deemed more relevant than other web pages. Resource: Danny Sullivan, Search Engine Watch, Mar 14, 2007, http://searchenginewatch.com/showPage.html?page=2168031 Location, Location, Location...and Frequency One of the main rules involves the location and frequency of keywords on a web page. Call it the location/frequency method, for short. Remember the librarian mentioned before? They need to find books to match your request of "travel," so it makes sense that they first look at books with travel in the title. Search engines operate the same way. Pages with the search terms appearing in the HTML title tag are often assumed to be more relevant than others to the topic. Search engines may also penalize pages or exclude them from the index, if they detect search engine "spamming." An example is when a word is repeated hundreds of times on a page, to increase the frequency and propel the page higher in the listings. Search engines watch for common spamming methods in a variety of ways, including following up on complaints from their users.
  • 9. What is a keyword, exactly?  It can simply be any word on a webpage.  For example, I used the word "simply" in the previous sentence, making it one of the keywords for this particular webpage in some search engine's index.   However, since the word "simply" has nothing to do with the subject of this webpage (i.e., how search engines work), it is not a very useful keyword.   Useful keywords and key phrases for this page would be "search," "search engines," "search engine methods," "how search engines work," "ranking" "relevancy," "search engine tutorials," etc.  Those keywords would actually tell a user something about the subject and content of this page. Keyword Searching Resource: http://www.monash.com/spidap5.html Unless the author of the Web document specifies the keywords for her document it's up to the search engine to determine them.  Essentially, this means that search engines pull out and index words that appear to be significant.  Since engines are software programs, not rational human beings, they work according to rules established by their creators for what words are usually important in a broad range of documents.
  • 10. The title of a page, for example, usually gives useful information about the subject of the page (if it doesn't, it should!).  Words that are mentioned towards the beginning of a document (think of the "topic sentence" in a high school essay, where you lay out the subject you intend to discuss) are given more weight by most search engines.   The same goes for words that are repeated several times throughout the document. Problems? Keyword searches have a tough time distinguishing between words that are spelled the same way, but mean something different (i.e. hard cider, a hard stone, a hard exam, and the hard drive on your computer). This often results in hits that are completely irrelevant to your query (search). Search engines also cannot return hits on keywords that mean the same, but are not actually entered in your query. A query on heart disease would not return a document that used the word "cardiac" instead of "heart." Resource: http://www.monash.com/spidap5.html
  • 11. Most sites offer two different types of searches--"basic" and "refined" or "advanced."  In a "basic" search, you just enter a keyword without sifting through any pull-down menus of additional options.  Depending on the engine, though, "basic" searches can be quite complex. Advanced search refining options differ from one search engine to another, but some of the possibilities include the ability to search on more than one word, to give more weight to one search term than you give to another, and to exclude words that might be likely to muddy the results.  You might also be able to search on proper names, on phrases, and on words that are found within a certain proximity to other search terms.   Refining Your Search Resource: http://www.monash.com/spidap5.html Some search engines also allow you to specify what form you'd like your results to appear in, and whether you wish to restrict your search to certain fields on the internet (i.e., usenet or the Web) or to specific parts of Web documents (i.e., the title or URL).
  • 12. Many, but not all search engines allow you to use so-called Boolean operators to refine your search. These are the logical terms AND, OR, NOT , and the so-called proximal locators, NEAR and FOLLOWED BY . All graphics: www.animationfactory.com Resource: http://www.monash.com/spidap5.html Capitalization:  This is essential for searching on proper names of people, companies or products. Unfortunately, many words in English are used both as proper and common nouns--Bill, bill, Gates, gates, Oracle, oracle, Lotus, lotus, Digital, digital--the list is endless. Final Hints 1. Boolean AND means that all the terms you specify must appear in the documents, i.e., "heart" AND "attack."  You might use this if you wanted to exclude common hits that would be irrelevant to your query.  2. Boolean OR means that at least one of the terms you specify must appear in the documents, i.e., bronchitis, acute OR chronic.  You might use this if you didn't want to rule out too much. 3. Boolean NOT means that at least one of the terms you specify must not appear in the documents. You might use this if you anticipated results that would be totally off-base, i.e., nirvana AND Buddhism, NOT Cobain. 4. NEAR means that the terms you enter should be within a certain number of words of each other.  FOLLOWED BY means that one term must directly follow the other.