SlideShare a Scribd company logo
1 of 28
Ain Shams University
College of Science
Dpt. Of Mathematics/Computer Science
Web Miming
Prepared By:
Ziyad Hazim Abid Al Jabbar
Content
Seq Subject Page
1 Data Mining and Web Mining Definitions 2
2 Introduction and Motivations 3
3 Web Mining categories 12
4 - Web Content Mining 13
5 - Web Usage Mining 14
6 - Web Structure Mining 15
7 Web Data Representation (Matrix Expression) 16
8 - Document-Keyword Co-occurrence Matrix 17
9 - Adjacent Matrix 19
10 - Usage Matrix 21
11 Similarity Functions 23
12 - Pearson correlation coefficient Function 24
13 - Cosine-Based Similarity 25
14 References 26
Data Mining
Web Mining
Data mining: is the process (Techniques,
Algorithms) of extracting information or
knowledge from a data set for the
purposes of decision making.
Web mining: is the process of
applying data mining techniques to
the pattern discovery in Web data.
2
Ziyad Hazim
Introduction and
motivations
3
Ziyad Hazim
4
(2011)
We simply get more memory
and keep it all
Ziyad Hazim
The rapid development that occurred at the beginning of the
twenty-first century in the information technology specially
in the field of web technologies.
 The expansion in the use of the internet, specially with the
official birth of e-government (2001) led to rapid
development of the e-management, e-learning, e-commerce
and e- health.
5
 The content of websites became related and covered all
daily citizen's activities like:
 Community services
 E-commerce
 E-education (E learning)
 Scientific research
 Strategically planning decision for companies and institutions.
 The widespread of social networks ((facebook:February 2004),
(Twitter, July 2006)) led to huge increasing in the use of
websites.
Ziyad Hazim
6
 Ubiquitous electronics record our decisions, our choices
in the supermarket, our financial habits, our comings
and goings. every swipe is a record in a database.
 The World Wide Web (WWW) overwhelms us with
information; meanwhile, every choice we make is
recorded. And all of these are just personal choices, and
they have countless counterparts in the world of
commerce and industry.
Ziyad Hazim
7
A Single View to the Customer
Customer
Social
Media
Gaming
Entertain
TV
Animation
Banking
Finance
Our
Known
History
Purchase
E learning
8
The Model Has Changed…
• The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming data
9
 With more than two billions pages created
by millions of Web pages, the World Wide Web
is extremely rich knowledge base and a vast
resource of multiple types of information
in varied formats.
 We could all testify to the growing gap between the
generation of web data and our understanding of it. As
the volume of web data increases, inexorably, the
proportion of it that people understand decreases
alarmingly.
Ziyad Hazim
10
11
 Experts following the activities of President of the United States of
America "President Obama", reported that on December 17th, 2013,
he held a meeting with leaders of the information technology
companies:
in order to discuss the issue of data mining.
 And once more he reiterated the matter in his speech
on January 17th, 2014 by calling for reforms in data mining system.
 That confirms how much importance is being given to data mining
globally and internationally.
Ziyad Hazim
Apple, Microsoft, Google, Yahoo, Facebook, Twitter, LinkedIn,
Salesforce, Netflix, Etsy, Dropbox, Zynga, Sherpa Global, Comcast
Web Mining
categories
Web Content
Mining
(WCM)
Web Usage
Mining
(WUM)
Web
Structure
Mining
(WSM)
12
1- Web Content Mining:
Is a process of extracting useful information from the
content of web document, that may consist of:
- Text.
- Images.
- Audio & Video.
- Structure record.
- List.
- Table.
Web Content Mining involve techniques for:
- Summarization.
- Classification.
- Clustering.
Wen
Content
13
2- Web Usage Mining:
Is a process of identifying browsing patterns by analyzing the
user navigation behavior to analyze patterns like:
- How are people using a site.
- Which pages are accessed most frequently.
- Frequency of sites per document.
- Most resent sites per document.
- How frequently each hyperlink is clicked.
- Who is visiting which document from which location.
- Most recent use of each hyperlink.
14
3- Web Structure Mining (link Mining):
Is a process of extracting patterns from hyperlinks in the web.
It generates structural summary about website and webpage by
analyzing the links.
15
Web Data Representation
Matrix expression
● The basic units for web Mining are Web page set and user session
collection.
● A page set: is a collection of whole pages within a site.
● User session: is a set of sequence of Web pages clicked by a single
user during a specific period.
Matrix expression:
Has been widely used to model the co-occurrence activity
like Web data.
16
1- Document-Keyword Co-occurrence Matrix:
● In the web content mining, the relationships between a set of
documents (pages) and a set of keyword could be represented
by a Document-Keyword Co-occurrence Matrix.
● where the rows of the matrix represent the documents.
● while the columns of the matrix correspond to the keywords.
Documents (Pages)
Keywords
17
Keywords
1- Document-Keyword Co-occurrence Matrix:,
● If a keyword appears in a document, the corresponding matrix
element value is 1, otherwise 0.
● The element value could also be a precise weight rather than 1 or 0
only. Which exactly reflects the occurrence degree of two concerned
objects of document and keyword.
● For example: the element value could represent the frequent rate of a
specific keyword in a specific document.
18
2- Adjacent Matrix: ,
● The relationships between pages via their hyperlinks, that represent
the linkage information of a Web site, could be represented by an
Adjacent Matrix.
● The intersection value (aij)of the matrix indicates the hyperlink
linking of two pages.
● If there is a hyperlink from page i to page j (i ≠j), then the value of the
element (aij) is 1, otherwise 0.
Page hyperlink
Page hyperlink
19
2- Adjacent Matrix: (Continue)
,
● The linking relationship is directional.
● A hyperlink directed from page i to page j, then the link is an out-link
for i, while an in-link for j, and vice versa.
● The ith row of the adjacent matrix, which is a page vector, represents
the out-link relationships from page i to other pages.
● The jth column of the matrix represents the in-link relationships
linked to page j from other pages.
20
3- Usage Matrix: ,
● In Web usage mining, a user session could be modeled as a page
vector, i.e. user session is a collection of pages visited by the user in
the period along with their significant weights (verity degree of visits
on different web pages).
● The total collection of user sessions can, then, be expressed a usage
matrix.
users
Web Pages
21
3- Usage Matrix:(Continue)
,
● The ith row is the sequence of pages visited by user i during period
of time.
● The jth column of the matrix represents the fact which users have
clicked this page j in the server log file.
● The element value of the matrix, ai j, reflects the access interest
exhibited by user i on page j, which could be used to derive the
underlying access pattern of users.
22
Similarity Functions:,
● The two well-known and widely used similarity functions in
information retrieval and recommender systems are:
- Pearson correlation coefficient.
- cosine similarity.
23
1- Pearson correlation coefficient Function,
● Pearson correlation coefficient used to calculate the deviations
of users’ ratings on various items from their mean ratings on
the rated items.
● The attribute weight is expressed by a feature vector of
numeric ratings on various items, e.g. the rating can be from
1 to 5 where 1 stands for the lest like voting and 5 for the
most preferable one.
● Given two users i and j, and their rating vectors Ri and Rj the
Pearson correlation coefficient is then defined by:
24
2- Cosine-Based Similarity:
● Since in a vector expression form, any vector could be
considered as a line in a multiple-dimensional space, it
is intuitive to define the similarity (or distance) between
two vectors as the cosine function of angle between
two “lines”.
● The cosine coefficient can be calculated by the ratio of
the dot product of two vectors with respect to their
vector norms. Given two vectors A and B, the cosine
similarity is then defined as:
25
References:,
● Lan, H., Eibe Frank, and M. A. Hall. "Data mining:
Practical machine learning tools and techniques."
(2011).
● Xu, Guandong, Yanchun Zhang, and Lin Li. "Web
mining and social networking: techniques and
applications”. (2011).
26
Thank you

More Related Content

What's hot (20)

Web mining
Web miningWeb mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Web mining (structure mining)
Web mining (structure mining)Web mining (structure mining)
Web mining (structure mining)
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
Web Usage Pattern
Web Usage PatternWeb Usage Pattern
Web Usage Pattern
 
Text mining
Text miningText mining
Text mining
 
Text mining
Text miningText mining
Text mining
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Architecture of data mining system
Architecture of data mining systemArchitecture of data mining system
Architecture of data mining system
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data mining
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 

Viewers also liked

Web Mining Presentation Final
Web Mining Presentation FinalWeb Mining Presentation Final
Web Mining Presentation FinalEr. Jagrat Gupta
 
Web mining
Web miningWeb mining
Web miningSilicon
 
Advantages & disadvantages of web 1.0 vs web 2.0
Advantages & disadvantages of web 1.0  vs web 2.0Advantages & disadvantages of web 1.0  vs web 2.0
Advantages & disadvantages of web 1.0 vs web 2.0Nifras Ismail
 
Advantages and disadvantages of technology
Advantages and disadvantages of technologyAdvantages and disadvantages of technology
Advantages and disadvantages of technologyHuseyin87
 
Benchmarking ppt
Benchmarking pptBenchmarking ppt
Benchmarking pptAMARAYYA
 
Microsoft hololens final ppt
Microsoft hololens final pptMicrosoft hololens final ppt
Microsoft hololens final pptrekhameenacs
 

Viewers also liked (7)

Web Mining Presentation Final
Web Mining Presentation FinalWeb Mining Presentation Final
Web Mining Presentation Final
 
Web mining
Web miningWeb mining
Web mining
 
Advantages & disadvantages of web 1.0 vs web 2.0
Advantages & disadvantages of web 1.0  vs web 2.0Advantages & disadvantages of web 1.0  vs web 2.0
Advantages & disadvantages of web 1.0 vs web 2.0
 
WEB MINING.
WEB MINING.WEB MINING.
WEB MINING.
 
Advantages and disadvantages of technology
Advantages and disadvantages of technologyAdvantages and disadvantages of technology
Advantages and disadvantages of technology
 
Benchmarking ppt
Benchmarking pptBenchmarking ppt
Benchmarking ppt
 
Microsoft hololens final ppt
Microsoft hololens final pptMicrosoft hololens final ppt
Microsoft hololens final ppt
 

Similar to Web Mining

Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Margaret Wang
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
C03406021027
C03406021027C03406021027
C03406021027theijes
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.docbutest
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.docbutest
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkEditor IJCATR
 
Web Page Recommendation Using Web Mining
Web Page Recommendation Using Web MiningWeb Page Recommendation Using Web Mining
Web Page Recommendation Using Web MiningIJERA Editor
 
Identifying the Number of Visitors to improve Website Usability from Educatio...
Identifying the Number of Visitors to improve Website Usability from Educatio...Identifying the Number of Visitors to improve Website Usability from Educatio...
Identifying the Number of Visitors to improve Website Usability from Educatio...Editor IJCATR
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web MiningIOSR Journals
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSZac Darcy
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIJwest
 
Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms dannyijwest
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...IOSR Journals
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technologyanchalsinghdm
 
Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)OUM SAOKOSAL
 
A Study of Pattern Analysis Techniques of Web Usage
A Study of Pattern Analysis Techniques of Web UsageA Study of Pattern Analysis Techniques of Web Usage
A Study of Pattern Analysis Techniques of Web Usageijbuiiir1
 

Similar to Web Mining (20)

Web Mining .ppt
Web Mining .pptWeb Mining .ppt
Web Mining .ppt
 
Web Mining .ppt
Web Mining .pptWeb Mining .ppt
Web Mining .ppt
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Web mining
Web miningWeb mining
Web mining
 
C03406021027
C03406021027C03406021027
C03406021027
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.doc
 
2000-08.doc
2000-08.doc2000-08.doc
2000-08.doc
 
Minning WWW
Minning WWWMinning WWW
Minning WWW
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social Network
 
Web Page Recommendation Using Web Mining
Web Page Recommendation Using Web MiningWeb Page Recommendation Using Web Mining
Web Page Recommendation Using Web Mining
 
Identifying the Number of Visitors to improve Website Usability from Educatio...
Identifying the Number of Visitors to improve Website Usability from Educatio...Identifying the Number of Visitors to improve Website Usability from Educatio...
Identifying the Number of Visitors to improve Website Usability from Educatio...
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web Mining
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
 
Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technology
 
Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)
 
A Study of Pattern Analysis Techniques of Web Usage
A Study of Pattern Analysis Techniques of Web UsageA Study of Pattern Analysis Techniques of Web Usage
A Study of Pattern Analysis Techniques of Web Usage
 

Recently uploaded

Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationMarko4394
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 

Recently uploaded (17)

Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentation
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 

Web Mining

  • 1. Ain Shams University College of Science Dpt. Of Mathematics/Computer Science Web Miming Prepared By: Ziyad Hazim Abid Al Jabbar
  • 2. Content Seq Subject Page 1 Data Mining and Web Mining Definitions 2 2 Introduction and Motivations 3 3 Web Mining categories 12 4 - Web Content Mining 13 5 - Web Usage Mining 14 6 - Web Structure Mining 15 7 Web Data Representation (Matrix Expression) 16 8 - Document-Keyword Co-occurrence Matrix 17 9 - Adjacent Matrix 19 10 - Usage Matrix 21 11 Similarity Functions 23 12 - Pearson correlation coefficient Function 24 13 - Cosine-Based Similarity 25 14 References 26
  • 3. Data Mining Web Mining Data mining: is the process (Techniques, Algorithms) of extracting information or knowledge from a data set for the purposes of decision making. Web mining: is the process of applying data mining techniques to the pattern discovery in Web data. 2
  • 5. Ziyad Hazim 4 (2011) We simply get more memory and keep it all
  • 6. Ziyad Hazim The rapid development that occurred at the beginning of the twenty-first century in the information technology specially in the field of web technologies.  The expansion in the use of the internet, specially with the official birth of e-government (2001) led to rapid development of the e-management, e-learning, e-commerce and e- health. 5
  • 7.  The content of websites became related and covered all daily citizen's activities like:  Community services  E-commerce  E-education (E learning)  Scientific research  Strategically planning decision for companies and institutions.  The widespread of social networks ((facebook:February 2004), (Twitter, July 2006)) led to huge increasing in the use of websites. Ziyad Hazim 6
  • 8.  Ubiquitous electronics record our decisions, our choices in the supermarket, our financial habits, our comings and goings. every swipe is a record in a database.  The World Wide Web (WWW) overwhelms us with information; meanwhile, every choice we make is recorded. And all of these are just personal choices, and they have countless counterparts in the world of commerce and industry. Ziyad Hazim 7
  • 9. A Single View to the Customer Customer Social Media Gaming Entertain TV Animation Banking Finance Our Known History Purchase E learning 8
  • 10. The Model Has Changed… • The Model of Generating/Consuming Data has Changed Old Model: Few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data 9
  • 11.  With more than two billions pages created by millions of Web pages, the World Wide Web is extremely rich knowledge base and a vast resource of multiple types of information in varied formats.  We could all testify to the growing gap between the generation of web data and our understanding of it. As the volume of web data increases, inexorably, the proportion of it that people understand decreases alarmingly. Ziyad Hazim 10
  • 12. 11  Experts following the activities of President of the United States of America "President Obama", reported that on December 17th, 2013, he held a meeting with leaders of the information technology companies: in order to discuss the issue of data mining.  And once more he reiterated the matter in his speech on January 17th, 2014 by calling for reforms in data mining system.  That confirms how much importance is being given to data mining globally and internationally. Ziyad Hazim Apple, Microsoft, Google, Yahoo, Facebook, Twitter, LinkedIn, Salesforce, Netflix, Etsy, Dropbox, Zynga, Sherpa Global, Comcast
  • 13. Web Mining categories Web Content Mining (WCM) Web Usage Mining (WUM) Web Structure Mining (WSM) 12
  • 14. 1- Web Content Mining: Is a process of extracting useful information from the content of web document, that may consist of: - Text. - Images. - Audio & Video. - Structure record. - List. - Table. Web Content Mining involve techniques for: - Summarization. - Classification. - Clustering. Wen Content 13
  • 15. 2- Web Usage Mining: Is a process of identifying browsing patterns by analyzing the user navigation behavior to analyze patterns like: - How are people using a site. - Which pages are accessed most frequently. - Frequency of sites per document. - Most resent sites per document. - How frequently each hyperlink is clicked. - Who is visiting which document from which location. - Most recent use of each hyperlink. 14
  • 16. 3- Web Structure Mining (link Mining): Is a process of extracting patterns from hyperlinks in the web. It generates structural summary about website and webpage by analyzing the links. 15
  • 17. Web Data Representation Matrix expression ● The basic units for web Mining are Web page set and user session collection. ● A page set: is a collection of whole pages within a site. ● User session: is a set of sequence of Web pages clicked by a single user during a specific period. Matrix expression: Has been widely used to model the co-occurrence activity like Web data. 16
  • 18. 1- Document-Keyword Co-occurrence Matrix: ● In the web content mining, the relationships between a set of documents (pages) and a set of keyword could be represented by a Document-Keyword Co-occurrence Matrix. ● where the rows of the matrix represent the documents. ● while the columns of the matrix correspond to the keywords. Documents (Pages) Keywords 17 Keywords
  • 19. 1- Document-Keyword Co-occurrence Matrix:, ● If a keyword appears in a document, the corresponding matrix element value is 1, otherwise 0. ● The element value could also be a precise weight rather than 1 or 0 only. Which exactly reflects the occurrence degree of two concerned objects of document and keyword. ● For example: the element value could represent the frequent rate of a specific keyword in a specific document. 18
  • 20. 2- Adjacent Matrix: , ● The relationships between pages via their hyperlinks, that represent the linkage information of a Web site, could be represented by an Adjacent Matrix. ● The intersection value (aij)of the matrix indicates the hyperlink linking of two pages. ● If there is a hyperlink from page i to page j (i ≠j), then the value of the element (aij) is 1, otherwise 0. Page hyperlink Page hyperlink 19
  • 21. 2- Adjacent Matrix: (Continue) , ● The linking relationship is directional. ● A hyperlink directed from page i to page j, then the link is an out-link for i, while an in-link for j, and vice versa. ● The ith row of the adjacent matrix, which is a page vector, represents the out-link relationships from page i to other pages. ● The jth column of the matrix represents the in-link relationships linked to page j from other pages. 20
  • 22. 3- Usage Matrix: , ● In Web usage mining, a user session could be modeled as a page vector, i.e. user session is a collection of pages visited by the user in the period along with their significant weights (verity degree of visits on different web pages). ● The total collection of user sessions can, then, be expressed a usage matrix. users Web Pages 21
  • 23. 3- Usage Matrix:(Continue) , ● The ith row is the sequence of pages visited by user i during period of time. ● The jth column of the matrix represents the fact which users have clicked this page j in the server log file. ● The element value of the matrix, ai j, reflects the access interest exhibited by user i on page j, which could be used to derive the underlying access pattern of users. 22
  • 24. Similarity Functions:, ● The two well-known and widely used similarity functions in information retrieval and recommender systems are: - Pearson correlation coefficient. - cosine similarity. 23
  • 25. 1- Pearson correlation coefficient Function, ● Pearson correlation coefficient used to calculate the deviations of users’ ratings on various items from their mean ratings on the rated items. ● The attribute weight is expressed by a feature vector of numeric ratings on various items, e.g. the rating can be from 1 to 5 where 1 stands for the lest like voting and 5 for the most preferable one. ● Given two users i and j, and their rating vectors Ri and Rj the Pearson correlation coefficient is then defined by: 24
  • 26. 2- Cosine-Based Similarity: ● Since in a vector expression form, any vector could be considered as a line in a multiple-dimensional space, it is intuitive to define the similarity (or distance) between two vectors as the cosine function of angle between two “lines”. ● The cosine coefficient can be calculated by the ratio of the dot product of two vectors with respect to their vector norms. Given two vectors A and B, the cosine similarity is then defined as: 25
  • 27. References:, ● Lan, H., Eibe Frank, and M. A. Hall. "Data mining: Practical machine learning tools and techniques." (2011). ● Xu, Guandong, Yanchun Zhang, and Lin Li. "Web mining and social networking: techniques and applications”. (2011). 26