2. Web Mining
Web mining is the use of data mining techniques
to automatically discover and extract information
from Web documents and services.
1. Web usage mining
2. Web content mining
3. Web structure mining
3. Web usage mining
Web usage mining is a process of identifying or discovering patterns from large
data sets and these patterns enable you to predict user behaviors.
➔Tableau offers a family of interactive data
Visualization products focused on business
➔Transforming data into visualization
➔This process takes only seconds or minutes
With the help of drag-and-drop interface
Official Website : http://www.tableau.com/
➔It’s a free software programming language and
software environment for statistical computing
➔The R language is widely used among data miners
for developing statistical software and data
➔Ease of use and extensibility has raised R’s
popularity substantially in recent years
6. Web content mining
Web content mining is a process of collecting useful data from websites.
This content includes news, comments, company information, product catalogs,
➔Octoparse is a simple but powerful web data mining tool
that automates web data extraction.
➔It allows you to create highly accurate extraction rules
➔The extraction rule would tell Octoparse:
➢which website Is to be open
➢where is the data you plan to crawl;
➢what kind of data you want etc.
Official Website : http://www.octoparse.com/
➔Scrapy is an open source and framework for collect data
➔It is written in Python and you can
write the rules to extract web data.
➔Supported Operating Systems:
Linux, Windows, Mac and BSD
Official Website : https://scrapy.org/
9. Web structure mining
Web structure mining is also known as link mining.
It is a process to discover the relationship between web pages linked by
information or direct link connection.
1. HITS algorithm
2. PageRank Algorithm
10. Hyperlink-Induced Topic Search(HITS) algorithm
➔Also known as hubs and authorities is a link analysis algorithm that rates Web
➔ Uses root set(most relevant pages returned by text-based algo.)
➔ Generate base set = root set + web pages that are linked from it and pages
that link to it
11. PageRank Algorithm
➔PageRank is an algorithm used by Google Search
to rank websites in their search engine results.
➔PageRank was named after Larry Page(one of
The founders of Google)
➔It assigns a numerical weighting to each element of
a hyperlinked set of documents with the purpose
of "measuring" its relative importance within the set
★ 7 Web Mining Tools Around the Web
★ Web mining Information : Wiki
★ HITS and PageRank Algorithm pdf