SlideShare ist ein Scribd-Unternehmen logo
1 von 16
10/9/2013 1
Web mining is to apply data mining techniques
to extract and uncover knowledge from web
documents and services.
Using data mining techniques to make the web
more useful and more profitable and to
increase the efficiency of our interaction with
the web.
10/9/2013 2
10/9/2013 3
Web: A huge, widely-distributed, highly
heterogeneous, semi-structured,
hypertext/hypermedia, interconnected
information repository.
Web is a huge collection of documents plus
– Hyper-link information
– Access and usage information
10/9/2013 4
Resource Finding.
Information selection & Pre-processing.
Generalization.
Analysis.
10/9/2013 5
WEB
MINING
WEB USAGE
MINING
WEB
STRUCTURE
MINING
WEB
CONTENT
MINING
CUSTOMIZED
USAGE
TRACKING
GENERAL
ACCESS
PATTERN
TRACKING
SEARCH
RESULT
MINING
WEB PAGE
CONTENT
MINING
10/9/2013 6
Discovery of useful information from web
contents /data /documents.
Information Retrieval view.
Database View.
10/9/2013 7
Researchers proposed methods of using citations
among journal articles to evaluate the quality of
research papers.
Customer behavior – evaluate a quality of a product
based on the opinions of other customers (instead of
product’s description or advertisement).
10/9/2013 8
It’s also known as Web log Mining.
DEFINITION
Discovery of meaningful patterns from data
generated by client-server transactions (or) from Web
server logs.
Typical Sources of Data:
automatically generated data stored in server access logs,
referrer logs, agent logs, and client-side cookies.
user profiles.
metadata: page attributes, content attributes, usage data.
10/9/2013 9
Generate simple statistical reports:
A summary report of hits and bytes transferred
A list of top requested URLs
A list of top referrers
A list of most common browsers used
Hits per hour/day/week/month reports
Hits per domain reports
Learn:
Who is visiting you site
The path visitors take through your pages
How much time visitors spend on each page
The most common starting page
Where visitors are leaving your site
10/9/2013 10
Weblog is Filtered to generate a relational Database.
A Data cube is generated from Database.
OLAP is used to drill-down and roll-up in the cube.
10/9/2013 11
WEB LOG Database
Data
Cleaning
Knowledge
Patterns
Data cube
creation
Data cube Sliced and
diced cube
Data
Mining
OLAP
Hubs.
Authority.
Mutual Reinforcing
Relationship.
Finding Authoritative
Web Pages.
Hyperlinks can infer
the notation of
Authority.
10/9/2013 12
HUBS AUTHORITIES
Hub-Authority Relations
10/9/2013 13
HITS Stands for Hyperlink-Induced Topic Search.
It Explore interactions between hubs and authoritative
pages.
Expand the root set into a base set.
Apply Weight-Propagation.
System Based on the HITS Algorithm.
- eg) GOOGLE.
Difficulties from ignoring textual contexts
-Drifting: When Hubs contains Multiple Topics.
-Topic hijacking: When Many Pages from a single web
site point to the same single Popular site.
10/9/2013 14
Improve web server system performance.
Improve site Design.
Intrusion Detection.
Predict user’s Action.
Enhance the quality and delivery of the internet
information services to the end user.
Facilitates Adaptive sites/personalization.
10/9/2013 15
10/9/2013 16

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

web mining
web miningweb mining
web mining
 
Web mining (structure mining)
Web mining (structure mining)Web mining (structure mining)
Web mining (structure mining)
 
Web mining
Web miningWeb mining
Web mining
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
Introduction to web development
Introduction to web developmentIntroduction to web development
Introduction to web development
 
An Introduction to Semantic Web Technology
An Introduction to Semantic Web TechnologyAn Introduction to Semantic Web Technology
An Introduction to Semantic Web Technology
 
WEB MINING.
WEB MINING.WEB MINING.
WEB MINING.
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 
Web Design (Tools)
Web Design (Tools)Web Design (Tools)
Web Design (Tools)
 
Web content mining
Web content miningWeb content mining
Web content mining
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
Web mining (1)
Web mining (1)Web mining (1)
Web mining (1)
 
Keyword Research Process
Keyword Research ProcessKeyword Research Process
Keyword Research Process
 
10 Principles Of Effective Web Design
10 Principles Of Effective Web Design10 Principles Of Effective Web Design
10 Principles Of Effective Web Design
 
WEB Scraping.pptx
WEB Scraping.pptxWEB Scraping.pptx
WEB Scraping.pptx
 
Web mining
Web miningWeb mining
Web mining
 
Web Scraping
Web ScrapingWeb Scraping
Web Scraping
 

Andere mochten auch

Andere mochten auch (16)

Web Mining
Web Mining Web Mining
Web Mining
 
Web Mining Presentation Final
Web Mining Presentation FinalWeb Mining Presentation Final
Web Mining Presentation Final
 
Web Usage Pattern
Web Usage PatternWeb Usage Pattern
Web Usage Pattern
 
Data mining
Data miningData mining
Data mining
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
Multimedia Database
Multimedia DatabaseMultimedia Database
Multimedia Database
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithm
 
Social Data Mining
Social Data MiningSocial Data Mining
Social Data Mining
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social network
 
Web Content Filtering for Education and Schools - Webtitan Cloud Reseller Pre...
Web Content Filtering for Education and Schools - Webtitan Cloud Reseller Pre...Web Content Filtering for Education and Schools - Webtitan Cloud Reseller Pre...
Web Content Filtering for Education and Schools - Webtitan Cloud Reseller Pre...
 
Web filtering through Software
Web filtering through SoftwareWeb filtering through Software
Web filtering through Software
 
Internet Filtering and Blocking
Internet Filtering and BlockingInternet Filtering and Blocking
Internet Filtering and Blocking
 
5463 26 web mining
5463 26 web mining5463 26 web mining
5463 26 web mining
 
Data mining
Data miningData mining
Data mining
 

Ähnlich wie Applying Data Mining Techniques to Extract Knowledge from the Web

Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Denis Shestakov
 
Pdd crawler a focused web
Pdd crawler  a focused webPdd crawler  a focused web
Pdd crawler a focused webcsandit
 
[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the web[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the webVan-Duyet Le
 
Jarrar: Introduction to Linked Data
Jarrar: Introduction to Linked DataJarrar: Introduction to Linked Data
Jarrar: Introduction to Linked DataMustafa Jarrar
 
Sekhon final 1_ppt
Sekhon final 1_pptSekhon final 1_ppt
Sekhon final 1_pptManant Sweet
 
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...ijmech
 
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...ijmech
 
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...ijmech
 
Internet browsing techniques
Internet browsing techniquesInternet browsing techniques
Internet browsing techniquesTola Odugbesan
 
Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...
Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...
Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...Rana Jayant
 
Smart crawler a two stage crawler
Smart crawler a two stage crawlerSmart crawler a two stage crawler
Smart crawler a two stage crawlerRishikesh Pathak
 
A survey on Design and Implementation of Clever Crawler Based On DUST Removal
A survey on Design and Implementation of Clever Crawler Based On DUST RemovalA survey on Design and Implementation of Clever Crawler Based On DUST Removal
A survey on Design and Implementation of Clever Crawler Based On DUST RemovalIJSRD
 
WEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfWEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfSowmyaJyothi3
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...butest
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...butest
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13DataDryad
 
`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areas`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areasinventionjournals
 

Ähnlich wie Applying Data Mining Techniques to Extract Knowledge from the Web (20)

E3602042044
E3602042044E3602042044
E3602042044
 
Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)
 
Pdd crawler a focused web
Pdd crawler  a focused webPdd crawler  a focused web
Pdd crawler a focused web
 
[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the web[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the web
 
Jarrar: Introduction to Linked Data
Jarrar: Introduction to Linked DataJarrar: Introduction to Linked Data
Jarrar: Introduction to Linked Data
 
Sekhon final 1_ppt
Sekhon final 1_pptSekhon final 1_ppt
Sekhon final 1_ppt
 
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
 
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
 
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
 
Internet browsing techniques
Internet browsing techniquesInternet browsing techniques
Internet browsing techniques
 
Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...
Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...
Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...
 
Smart crawler a two stage crawler
Smart crawler a two stage crawlerSmart crawler a two stage crawler
Smart crawler a two stage crawler
 
A survey on Design and Implementation of Clever Crawler Based On DUST Removal
A survey on Design and Implementation of Clever Crawler Based On DUST RemovalA survey on Design and Implementation of Clever Crawler Based On DUST Removal
A survey on Design and Implementation of Clever Crawler Based On DUST Removal
 
WEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfWEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdf
 
Web crawling
Web crawlingWeb crawling
Web crawling
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
 
`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areas`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areas
 

Mehr von Iniya Kannan

Telephone conversation iniya 14mba002
Telephone conversation iniya 14mba002Telephone conversation iniya 14mba002
Telephone conversation iniya 14mba002Iniya Kannan
 
Mobile App for Booking Movie Ticket
Mobile App for Booking Movie TicketMobile App for Booking Movie Ticket
Mobile App for Booking Movie TicketIniya Kannan
 
Mobile App for Movie Ticket Booking Screenshots
Mobile App for Movie Ticket Booking ScreenshotsMobile App for Movie Ticket Booking Screenshots
Mobile App for Movie Ticket Booking ScreenshotsIniya Kannan
 
Converting agricultural waste for useful purposes
Converting agricultural waste for useful purposesConverting agricultural waste for useful purposes
Converting agricultural waste for useful purposesIniya Kannan
 
Probabilistic reasoning
Probabilistic reasoningProbabilistic reasoning
Probabilistic reasoningIniya Kannan
 
16-Queen's Problem
16-Queen's Problem16-Queen's Problem
16-Queen's ProblemIniya Kannan
 

Mehr von Iniya Kannan (14)

Event iniya
Event iniyaEvent iniya
Event iniya
 
Telephone conversation iniya 14mba002
Telephone conversation iniya 14mba002Telephone conversation iniya 14mba002
Telephone conversation iniya 14mba002
 
Mobile App for Booking Movie Ticket
Mobile App for Booking Movie TicketMobile App for Booking Movie Ticket
Mobile App for Booking Movie Ticket
 
Mobile App for Movie Ticket Booking Screenshots
Mobile App for Movie Ticket Booking ScreenshotsMobile App for Movie Ticket Booking Screenshots
Mobile App for Movie Ticket Booking Screenshots
 
9 creations
9 creations9 creations
9 creations
 
Converting agricultural waste for useful purposes
Converting agricultural waste for useful purposesConverting agricultural waste for useful purposes
Converting agricultural waste for useful purposes
 
Nano technology
Nano technologyNano technology
Nano technology
 
Controller
ControllerController
Controller
 
Cmp
CmpCmp
Cmp
 
Probabilistic reasoning
Probabilistic reasoningProbabilistic reasoning
Probabilistic reasoning
 
Long run
Long runLong run
Long run
 
Ray tracing
Ray tracingRay tracing
Ray tracing
 
Tsunami
TsunamiTsunami
Tsunami
 
16-Queen's Problem
16-Queen's Problem16-Queen's Problem
16-Queen's Problem
 

Kürzlich hochgeladen

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Kürzlich hochgeladen (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Applying Data Mining Techniques to Extract Knowledge from the Web

  • 2. Web mining is to apply data mining techniques to extract and uncover knowledge from web documents and services. Using data mining techniques to make the web more useful and more profitable and to increase the efficiency of our interaction with the web. 10/9/2013 2
  • 4. Web: A huge, widely-distributed, highly heterogeneous, semi-structured, hypertext/hypermedia, interconnected information repository. Web is a huge collection of documents plus – Hyper-link information – Access and usage information 10/9/2013 4
  • 5. Resource Finding. Information selection & Pre-processing. Generalization. Analysis. 10/9/2013 5
  • 7. Discovery of useful information from web contents /data /documents. Information Retrieval view. Database View. 10/9/2013 7
  • 8. Researchers proposed methods of using citations among journal articles to evaluate the quality of research papers. Customer behavior – evaluate a quality of a product based on the opinions of other customers (instead of product’s description or advertisement). 10/9/2013 8
  • 9. It’s also known as Web log Mining. DEFINITION Discovery of meaningful patterns from data generated by client-server transactions (or) from Web server logs. Typical Sources of Data: automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies. user profiles. metadata: page attributes, content attributes, usage data. 10/9/2013 9
  • 10. Generate simple statistical reports: A summary report of hits and bytes transferred A list of top requested URLs A list of top referrers A list of most common browsers used Hits per hour/day/week/month reports Hits per domain reports Learn: Who is visiting you site The path visitors take through your pages How much time visitors spend on each page The most common starting page Where visitors are leaving your site 10/9/2013 10
  • 11. Weblog is Filtered to generate a relational Database. A Data cube is generated from Database. OLAP is used to drill-down and roll-up in the cube. 10/9/2013 11 WEB LOG Database Data Cleaning Knowledge Patterns Data cube creation Data cube Sliced and diced cube Data Mining OLAP
  • 12. Hubs. Authority. Mutual Reinforcing Relationship. Finding Authoritative Web Pages. Hyperlinks can infer the notation of Authority. 10/9/2013 12 HUBS AUTHORITIES Hub-Authority Relations
  • 14. HITS Stands for Hyperlink-Induced Topic Search. It Explore interactions between hubs and authoritative pages. Expand the root set into a base set. Apply Weight-Propagation. System Based on the HITS Algorithm. - eg) GOOGLE. Difficulties from ignoring textual contexts -Drifting: When Hubs contains Multiple Topics. -Topic hijacking: When Many Pages from a single web site point to the same single Popular site. 10/9/2013 14
  • 15. Improve web server system performance. Improve site Design. Intrusion Detection. Predict user’s Action. Enhance the quality and delivery of the internet information services to the end user. Facilitates Adaptive sites/personalization. 10/9/2013 15