SlideShare a Scribd company logo
1 of 17
WEB MINING
Presented by:
Gaurav Uniyal
161340101008
C.S.E.(Final Year)
Introduction
• Web mining is to apply data mining techniques to
extract and uncover knowledge from web documents and
services.
• Using data mining techniques to make the web more
useful and more profitable and to increase the efficiency
of our interaction with the web.
Web Mining Services
This technology has enabled e-commerce to do personalized
marketing, which eventually results in higher trade volumes.
Which eventually results in higher trade volumes.
WWW Describes…
 Web: A huge, widely-distributed, highly heterogeneous,
semi-structured, hypertext/hypermedia, interconnected
information repository.
 Web is a huge collection of documents plus
– Hyper-link information
– Access and usage information.
Tasks to Conduct
• Resource Finding.
• Information selection & Pre-processing.
• Generalization.
• Analysis.
Web Mining Classification
Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Search Result
Mining
Customized
Usage Tracking
Web Page
Content Mining
General Access
Pattern Tracking
Web Content Mining
 Discovery of useful information from web contents /data
/documents.
 Information Retrieval view.
 Database View.
Web Structure Mining
 Researchers proposed methods of using citations among
journal articles to evaluate the quality of research
papers
 Customer behavior – evaluate a quality of a product
based on the opinions of other customers (instead of
product’s description or advertisement)
Web Usage Mining
 It’s also known as Web log Mining.
 DEFINITION: Discovery of meaningful patterns from
data generated by client-server transactions (or) from
Web server logs.
 Typical Sources of Data:
 automatically generated data stored in server access
logs, referrer logs, agent logs, and client-side cookies.
 user profiles.
 Metadata: page attributes, content attributes, usage
data
 Generate simple statistical reports:
•A summary report of hits and bytes transferred.
• A list of top requested URLs.
• A list of top referrers.
• A list of most common browsers used.
• Hits per hour/day/week/month reports.
• Hits per domain report.
 Learn:
• Who is visiting you site.
• The path visitors take through your pages.
• How much time visitors spend on each page.
• The most common starting page.
• What content are your visitors going through.
• Where visitors are leaving your site.
Design of Web Log Miner
 Weblog is Filtered to generate a relational Database.
 A Data cube is generated from Database.
 OLAP is used to drill-down and roll-up in the cube
Structures
 Hubs.
 Authority.
 Mutual Reinforcing
Relationship.
 Hyperlinks can infer
 The notation of Authority.
Structures
HITS
 HITS Stands for Hyperlink-Induced Topic Search.
 It Explore interactions between hubs and authoritative
pages.
 Expand the root set into a base set.
 Apply Weight-Propagation.
 System Based on the HITS Algorithm. e.g. GOOGLE.
 Difficulties from ignoring textual contexts
• Drifting: When Hubs contains Multiple Topics.
• Topic hijacking: When Many Pages from a single web
site point to the same single Popular site.
Application of Web Mining
 Improve web server system performance.
 Improve site Design.
 Intrusion Detection.
 Predict user’s Action.
 Enhance the quality and delivery of the internet
information services to the end user.
 Facilitates Adaptive sites/personalization.
Thank You!

More Related Content

What's hot (20)

Web content mining
Web content miningWeb content mining
Web content mining
 
Semantic web
Semantic webSemantic web
Semantic web
 
Webcrawler
Webcrawler Webcrawler
Webcrawler
 
Web Scraping
Web ScrapingWeb Scraping
Web Scraping
 
Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )
 
Web scraping in python
Web scraping in pythonWeb scraping in python
Web scraping in python
 
Web Scraping
Web ScrapingWeb Scraping
Web Scraping
 
Meta Search Engine: An Introductory Study
Meta Search Engine: An Introductory StudyMeta Search Engine: An Introductory Study
Meta Search Engine: An Introductory Study
 
WebCrawler
WebCrawlerWebCrawler
WebCrawler
 
Web mining
Web miningWeb mining
Web mining
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
 
Web Scraping and Data Extraction Service
Web Scraping and Data Extraction ServiceWeb Scraping and Data Extraction Service
Web Scraping and Data Extraction Service
 
Web scraping
Web scrapingWeb scraping
Web scraping
 
Measuring the impact of Google Analytics
Measuring the impact of Google AnalyticsMeasuring the impact of Google Analytics
Measuring the impact of Google Analytics
 
Search engine
Search engineSearch engine
Search engine
 
web mining
web miningweb mining
web mining
 
Semantic web
Semantic webSemantic web
Semantic web
 
Webpage Classification
Webpage ClassificationWebpage Classification
Webpage Classification
 
Web mining
Web mining Web mining
Web mining
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 

Similar to Gaurav web mining

Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptxScrbifPt
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMSai Kumar Ale
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypresNekoGato
 
TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013Avtex
 
Benefits of Internet
Benefits of Internet Benefits of Internet
Benefits of Internet yogini sharma
 
WEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfWEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfSowmyaJyothi3
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...butest
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...butest
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawlervinay arora
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningAarshDhokai
 

Similar to Gaurav web mining (20)

Web mining
Web miningWeb mining
Web mining
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptx
 
Web mining
Web miningWeb mining
Web mining
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Web mining
Web miningWeb mining
Web mining
 
WEB MINING.
WEB MINING.WEB MINING.
WEB MINING.
 
IRT Unit_4.pptx
IRT Unit_4.pptxIRT Unit_4.pptx
IRT Unit_4.pptx
 
Web mining
Web miningWeb mining
Web mining
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypres
 
TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013
 
Benefits of Internet
Benefits of Internet Benefits of Internet
Benefits of Internet
 
WEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfWEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdf
 
E3602042044
E3602042044E3602042044
E3602042044
 
Web Mining
Web MiningWeb Mining
Web Mining
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
 
Seminar on crawler
Seminar on crawlerSeminar on crawler
Seminar on crawler
 

Recently uploaded

Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 

Recently uploaded (20)

Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 

Gaurav web mining

  • 1. WEB MINING Presented by: Gaurav Uniyal 161340101008 C.S.E.(Final Year)
  • 2. Introduction • Web mining is to apply data mining techniques to extract and uncover knowledge from web documents and services. • Using data mining techniques to make the web more useful and more profitable and to increase the efficiency of our interaction with the web.
  • 3. Web Mining Services This technology has enabled e-commerce to do personalized marketing, which eventually results in higher trade volumes. Which eventually results in higher trade volumes.
  • 4. WWW Describes…  Web: A huge, widely-distributed, highly heterogeneous, semi-structured, hypertext/hypermedia, interconnected information repository.  Web is a huge collection of documents plus – Hyper-link information – Access and usage information.
  • 5. Tasks to Conduct • Resource Finding. • Information selection & Pre-processing. • Generalization. • Analysis.
  • 6. Web Mining Classification Web Mining Web Content Mining Web Structure Mining Web Usage Mining Search Result Mining Customized Usage Tracking Web Page Content Mining General Access Pattern Tracking
  • 7. Web Content Mining  Discovery of useful information from web contents /data /documents.  Information Retrieval view.  Database View.
  • 8. Web Structure Mining  Researchers proposed methods of using citations among journal articles to evaluate the quality of research papers  Customer behavior – evaluate a quality of a product based on the opinions of other customers (instead of product’s description or advertisement)
  • 9. Web Usage Mining  It’s also known as Web log Mining.  DEFINITION: Discovery of meaningful patterns from data generated by client-server transactions (or) from Web server logs.  Typical Sources of Data:  automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies.  user profiles.  Metadata: page attributes, content attributes, usage data
  • 10.  Generate simple statistical reports: •A summary report of hits and bytes transferred. • A list of top requested URLs. • A list of top referrers. • A list of most common browsers used. • Hits per hour/day/week/month reports. • Hits per domain report.
  • 11.  Learn: • Who is visiting you site. • The path visitors take through your pages. • How much time visitors spend on each page. • The most common starting page. • What content are your visitors going through. • Where visitors are leaving your site.
  • 12. Design of Web Log Miner  Weblog is Filtered to generate a relational Database.  A Data cube is generated from Database.  OLAP is used to drill-down and roll-up in the cube
  • 13. Structures  Hubs.  Authority.  Mutual Reinforcing Relationship.  Hyperlinks can infer  The notation of Authority.
  • 15. HITS  HITS Stands for Hyperlink-Induced Topic Search.  It Explore interactions between hubs and authoritative pages.  Expand the root set into a base set.  Apply Weight-Propagation.  System Based on the HITS Algorithm. e.g. GOOGLE.  Difficulties from ignoring textual contexts • Drifting: When Hubs contains Multiple Topics. • Topic hijacking: When Many Pages from a single web site point to the same single Popular site.
  • 16. Application of Web Mining  Improve web server system performance.  Improve site Design.  Intrusion Detection.  Predict user’s Action.  Enhance the quality and delivery of the internet information services to the end user.  Facilitates Adaptive sites/personalization.