SlideShare ist ein Scribd-Unternehmen logo
1 von 33
GOOGLING OF How Google Search Engine Works….
[object Object],[object Object],[object Object],[object Object],[object Object],Introduction
What is Search engine ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hi how are u WEB CRAWLER
WEB CRAWLER
Indexer ,[object Object],[object Object],Document 5 red Document 2,document 4 is Document 1,document 2,document 3 apple Document Word
[object Object],[object Object],[object Object],[object Object]
SEARCH ALGORITHM ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object]
73% 71% 64% 56% 51% Positive ranking factors 68% 56% 51% 51% 46% Negative ranking factors Keyword focused anchor text from external links External link Popularity Diversity of link sources Keyword Use Anywhere in the title tag Trustworthiness of the Domain Based on Link Distance from Trusted Cloaking with Malicious intent Link acquisition from known link brokers  Link from the page to Web Spam Pages Cloaking by User Agent Frequent Server Downtime & Site Inaccessibility
OVERALL  RANKING  FACTORS
 
Google architecture ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Crawling deeply in Google's Architecture ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],url page pagelen Url len Ecode  Doc Id
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Searching techniques ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],Page rank
Page C has a higher PageRank than Page E, even though it has fewer links to it; the link it has is of a much higher value. A web surfer who chooses a random link on every page (but with 15% likelihood jumps to a random page on the whole web) is going to be on Page E for 8.1% of the time.  (The 15% likelihood of jumping to an arbitrary page corresponds to a damping factor of 85%.) Without damping, all web surfers would eventually end up on Pages A, B, or C, and all other pages would have Page Rank zero. Page A is assumed to link to all pages in the web, because it has no outgoing links  Mathematical Page Ranks
Trust rank
Google and Web Spam ,[object Object],[object Object],[object Object],[object Object],[object Object]
Link based web spam
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Web spam detection and result
Thus based on the following features the content based spam pages can be detected by Naïve Bayesian Classifier which focuses on the no of times a word is repeated in the content of the page .  Figure 1: Figure 2:
Link Based Features    Data set is obtained by using web crawler .   For each page, links and its contents are obtained.  From data set, a full graph is built .   For each host and page, certain features are computed .   Link-based features are extracted from host graph.  ,[object Object],[object Object],[object Object]
It has been observed that a normal webpage have their graph of the supporter increasing exponentially and the number of supporters increases with the distance. But in the case of the web spam their graph has a sudden increase in the supporters over a small distance of time and decreasing to zero after some distance. The distribution of the supporters over the distance has been shown in the figure  Distribution of supporters over a distance of the spam and non-spam page Non spam spam
System performance It is important for a search engine to crawl and index efficiently. This way information can be kept up to date and major changes to the system can be tested relatively quickly In total it took roughly 9 days to download the 26 million pages (including errors) downloading the last11 million pages in just 63 hours, averaging just over 4 million pages per day or 48.5 pages per second. The indexer runs at roughly 54 pages per second. The sorters can be run completely in parallel; using four machines, the whole process of sorting takes about 24 hours.
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Future work
[object Object],[object Object],[object Object],conclusion
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],References
Thank  You  All  !!

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalIntroduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalA. LE
 
Search engine
Search engineSearch engine
Search engineswaraj27
 
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.iosrjce
 
Google Search Engine
Google Search EngineGoogle Search Engine
Google Search Engineguestf460ed0
 
Efficient focused web crawling approach
Efficient focused web crawling approachEfficient focused web crawling approach
Efficient focused web crawling approachSyed Islam
 
Training Project Report on Search Engines
Training Project Report on Search EnginesTraining Project Report on Search Engines
Training Project Report on Search EnginesShivam Saxena
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawlerishmecse13
 
Google Search Engine
Google Search Engine Google Search Engine
Google Search Engine Aniket_1415
 
Search engines powerpoint
Search engines powerpointSearch engines powerpoint
Search engines powerpointvbaker2210
 
Search Engines and its working
Search Engines and its workingSearch Engines and its working
Search Engines and its workingMukesh Kumar
 
Search Engine Optimization(SEO)
Search Engine Optimization(SEO)Search Engine Optimization(SEO)
Search Engine Optimization(SEO)Surit Datta
 
Compare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News StoriesCompare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News StoriesJason Yang
 

Was ist angesagt? (19)

Introduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalIntroduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information Retrieval
 
Search engine
Search engineSearch engine
Search engine
 
Smart Searching
Smart SearchingSmart Searching
Smart Searching
 
Search engine
Search engineSearch engine
Search engine
 
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
 
Google Search Engine
Google Search EngineGoogle Search Engine
Google Search Engine
 
Search Engine
Search EngineSearch Engine
Search Engine
 
Efficient focused web crawling approach
Efficient focused web crawling approachEfficient focused web crawling approach
Efficient focused web crawling approach
 
Training Project Report on Search Engines
Training Project Report on Search EnginesTraining Project Report on Search Engines
Training Project Report on Search Engines
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
 
Google Search Engine
Google Search Engine Google Search Engine
Google Search Engine
 
Search engines powerpoint
Search engines powerpointSearch engines powerpoint
Search engines powerpoint
 
Search Engines and its working
Search Engines and its workingSearch Engines and its working
Search Engines and its working
 
Anatomy of google
Anatomy of googleAnatomy of google
Anatomy of google
 
Search Engine Optimization(SEO)
Search Engine Optimization(SEO)Search Engine Optimization(SEO)
Search Engine Optimization(SEO)
 
Pagerank and hits
Pagerank and hitsPagerank and hits
Pagerank and hits
 
Compare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News StoriesCompare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News Stories
 
On page Optimization
On page OptimizationOn page Optimization
On page Optimization
 
Google
GoogleGoogle
Google
 

Ähnlich wie Googling of GooGle

page ranking web crawling
page ranking web crawlingpage ranking web crawling
page ranking web crawlingpradiprahul
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glancepoojagupta267
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search EngineNIKHIL NAIR
 
Google history nd architecture
Google history nd architectureGoogle history nd architecture
Google history nd architectureDivyangee Jain
 
IRJET- Page Ranking Algorithms – A Comparison
IRJET- Page Ranking Algorithms – A ComparisonIRJET- Page Ranking Algorithms – A Comparison
IRJET- Page Ranking Algorithms – A ComparisonIRJET Journal
 
SEO Tutorial - SEO Company in India
SEO Tutorial - SEO Company in IndiaSEO Tutorial - SEO Company in India
SEO Tutorial - SEO Company in Indiaannakoch32
 
Web2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google worldWeb2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google worldCarlo Vaccari
 
SEO Basics - SEO Company in India
SEO Basics - SEO Company in IndiaSEO Basics - SEO Company in India
SEO Basics - SEO Company in Indiaannakoch32
 
How search engine works
How search engine worksHow search engine works
How search engine worksleoniehannah
 
Search Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut AslantaşSearch Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut AslantaşAykut Aslantaş
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithmsAnkit Raj
 
Google indexing
Google indexingGoogle indexing
Google indexingtahoor71
 
Comparative study of different ranking algorithms adopted by search engine
Comparative study of  different ranking algorithms adopted by search engineComparative study of  different ranking algorithms adopted by search engine
Comparative study of different ranking algorithms adopted by search engineEchelon Institute of Technology
 
SEO Glossary By Rahul Gupta-SEO Lucknow-Hyderabad
SEO Glossary By Rahul Gupta-SEO Lucknow-HyderabadSEO Glossary By Rahul Gupta-SEO Lucknow-Hyderabad
SEO Glossary By Rahul Gupta-SEO Lucknow-HyderabadRahul Gupta
 
Components of a search engine
Components of a search engineComponents of a search engine
Components of a search enginePrimya Tamil
 
Digital Markeing
Digital MarkeingDigital Markeing
Digital MarkeingUTTAMTADWAL
 
Topic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability MethodTopic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability MethodIOSR Journals
 
Evaluation of Web Search Engines Based on Ranking of Results and Features
Evaluation of Web Search Engines Based on Ranking of Results and FeaturesEvaluation of Web Search Engines Based on Ranking of Results and Features
Evaluation of Web Search Engines Based on Ranking of Results and FeaturesWaqas Tariq
 

Ähnlich wie Googling of GooGle (20)

page ranking web crawling
page ranking web crawlingpage ranking web crawling
page ranking web crawling
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glance
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
 
Google history nd architecture
Google history nd architectureGoogle history nd architecture
Google history nd architecture
 
IRJET- Page Ranking Algorithms – A Comparison
IRJET- Page Ranking Algorithms – A ComparisonIRJET- Page Ranking Algorithms – A Comparison
IRJET- Page Ranking Algorithms – A Comparison
 
SEO Tutorial - SEO Company in India
SEO Tutorial - SEO Company in IndiaSEO Tutorial - SEO Company in India
SEO Tutorial - SEO Company in India
 
Web2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google worldWeb2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google world
 
SEO Basics - SEO Company in India
SEO Basics - SEO Company in IndiaSEO Basics - SEO Company in India
SEO Basics - SEO Company in India
 
How search engine works
How search engine worksHow search engine works
How search engine works
 
Search Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut AslantaşSearch Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut Aslantaş
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithms
 
Google indexing
Google indexingGoogle indexing
Google indexing
 
Comparative study of different ranking algorithms adopted by search engine
Comparative study of  different ranking algorithms adopted by search engineComparative study of  different ranking algorithms adopted by search engine
Comparative study of different ranking algorithms adopted by search engine
 
SEO Glossary By Rahul Gupta-SEO Lucknow-Hyderabad
SEO Glossary By Rahul Gupta-SEO Lucknow-HyderabadSEO Glossary By Rahul Gupta-SEO Lucknow-Hyderabad
SEO Glossary By Rahul Gupta-SEO Lucknow-Hyderabad
 
Components of a search engine
Components of a search engineComponents of a search engine
Components of a search engine
 
Search Engine
Search EngineSearch Engine
Search Engine
 
Digital Markeing
Digital MarkeingDigital Markeing
Digital Markeing
 
Topic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability MethodTopic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability Method
 
E017624043
E017624043E017624043
E017624043
 
Evaluation of Web Search Engines Based on Ranking of Results and Features
Evaluation of Web Search Engines Based on Ranking of Results and FeaturesEvaluation of Web Search Engines Based on Ranking of Results and Features
Evaluation of Web Search Engines Based on Ranking of Results and Features
 

Kürzlich hochgeladen

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Googling of GooGle

  • 1. GOOGLING OF How Google Search Engine Works….
  • 2.
  • 3.
  • 4. Hi how are u WEB CRAWLER
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. 73% 71% 64% 56% 51% Positive ranking factors 68% 56% 51% 51% 46% Negative ranking factors Keyword focused anchor text from external links External link Popularity Diversity of link sources Keyword Use Anywhere in the title tag Trustworthiness of the Domain Based on Link Distance from Trusted Cloaking with Malicious intent Link acquisition from known link brokers Link from the page to Web Spam Pages Cloaking by User Agent Frequent Server Downtime & Site Inaccessibility
  • 11. OVERALL RANKING FACTORS
  • 12.  
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. Page C has a higher PageRank than Page E, even though it has fewer links to it; the link it has is of a much higher value. A web surfer who chooses a random link on every page (but with 15% likelihood jumps to a random page on the whole web) is going to be on Page E for 8.1% of the time. (The 15% likelihood of jumping to an arbitrary page corresponds to a damping factor of 85%.) Without damping, all web surfers would eventually end up on Pages A, B, or C, and all other pages would have Page Rank zero. Page A is assumed to link to all pages in the web, because it has no outgoing links Mathematical Page Ranks
  • 23.
  • 25.
  • 26. Thus based on the following features the content based spam pages can be detected by Naïve Bayesian Classifier which focuses on the no of times a word is repeated in the content of the page . Figure 1: Figure 2:
  • 27.
  • 28. It has been observed that a normal webpage have their graph of the supporter increasing exponentially and the number of supporters increases with the distance. But in the case of the web spam their graph has a sudden increase in the supporters over a small distance of time and decreasing to zero after some distance. The distribution of the supporters over the distance has been shown in the figure Distribution of supporters over a distance of the spam and non-spam page Non spam spam
  • 29. System performance It is important for a search engine to crawl and index efficiently. This way information can be kept up to date and major changes to the system can be tested relatively quickly In total it took roughly 9 days to download the 26 million pages (including errors) downloading the last11 million pages in just 63 hours, averaging just over 4 million pages per day or 48.5 pages per second. The indexer runs at roughly 54 pages per second. The sorters can be run completely in parallel; using four machines, the whole process of sorting takes about 24 hours.
  • 30.
  • 31.
  • 32.
  • 33. Thank You All !!