SlideShare ist ein Scribd-Unternehmen logo
1 von 14
KUMARAGURU COLLEGE OF TECHNOLOGY
           COIMBATORE




DATA WAREHOUSING AND DATA MINING

             Presented by


             K.Santhosh (07bcs43)
             E-Mail ID:ksanthoshselvam@gmail.com
             Contact No: 9788153199
             V.Siddharth (07bcs50)
             E-Mail ID:siddharthindian@yahoo.com
             Contact No: 9843286841
DATA WAREHOUSING AND DATA MINING


ABSTRACT:


       Fast, accurate and scalable data analysis techniques are needed to extract useful
information from huge pile of data. Data warehouse is a single, integrated source of
decision support information formed by collecting data from multiple sources, internal to
the organization as well as external, and transforming and summarizing this information
to enable improved decision making. Data warehouse is designed for easy access by users
to large amounts of information, and data access is typically supported by specialized
analytical tools and applications. Typical applications include decision support systems
and execution information system.
      Data mining is the exploration and analysis of large quantities of data in order to
discover valid, novel, potentially useful, and ultimately understandable patterns in data. It
is
“An information extraction activity whose goal is to discover hidden facts contained
in databases”.
        The process of extracting valid, previously unknown, comprehensible and
actionable information from large databases and using it to make crucial business
decisions.
Data mining finds patterns and subtle relationships in data and infers rules that allow the
prediction of future results. A data mining model is a description of a specific aspect of a
dataset. It produces output values for an assigned set of input values. Typical applications
include market segmentation, customer profiling, fraud detection, evaluation of retail
promotions, and credit risk analysis.”
DATA WAREHOUSING AND DATA MINING



Introduction:
Everyday increasingly, organizations are analyzing current and historical data to identify
useful patterns and support business strategies.
A large amount of the right information is the key to survival in today’s competitive
environment. And this kind of information can be made available only if there’s totally
integrated enterprise data warehouse.


What is data warehousing?


A data warehouse is a subject-oriented, integrated, non-volatile & time-variant
collection of data in support of management’s decisions

NEED FOR A DATA WAREHOUSE :
• IT or business staff spending a lot of time developing special reports for decision-
makers.
• Lots of PC-based or small server systems obtaining extracts of data incapable of
presenting a holistic view of the entire gamut of information.
• Same data present on different systems, in different department and users may be
unaware of this fact.
• Difficulty in getting meaningful information in a timely manner.
• Multiple systems giving different answer to the business questions.
• Less analysis by decision makers and policy planners due to non-availability of
sophisticated tools and easily decipherable, timely and comprehensive information
PURPOSE OF A DATA WAREHOUSE :
Better business intelligence for end users.
• Reduction in time to locate, access and analyze information.
• Consolidation of disparate information sources.
• Replacement of older, less-responsive decision support systems
• Faster time to market for products and services
• Strategic advantage over competitors
Data Warehouse Characteristics:
   1.Subject-orientedWH is organized around the major subjects of             the enterprise
   rather than the major application areas. This is reflected in the need to store decision-
   support data rather than application-oriented data.

   2.Integratedbecause the source data come together from different enterprise-wide
   applications systems. The source data is often inconsistent using..The integrated data
   source must be made consistent to present a unified view of the data to the users

   3.Time-variantthe source data in the WH is only accurate and valid at some point in
   time or over some time interval. The time-variance of the data warehouse is also
   shown in the extended time that the data is held, the implicit or explicit association of
   time with all data, and the fact that the data represents a series of snapshots

   4.Non-volatiledata is not update in real time but is refresh from OS on a regular
   basis. New data is always added as a supplement to DB, rather than replacement.
   The DB continually absorbs this new data, incrementally integrating it with previous
   data



DATA WAREHOUSE LIFE CYCLE:
Data warehousing is a concept. It is not a product that can be purchased off the shelf. It is
a set of hardware and software components integrated together which can be used to
analyze the massive amount of data stored in an efficient manner. It is a process through
which one can build a successful data warehouse. Following are the five steps towards
building a successful data warehouse.

   1.JUSTIFICATION

   2.REQUIREMENT ANALYSIS

   3.DESIGN

   4.DEVELOPMENT AND IMPLEMENTATION

   5.DEPLOYMENT



Main Components:
   1Operational data sourcesfor the DW is supplied from mainframe operational data
   held in first generation hierarchical and network databases, departmental data held in
   proprietary file systems, private data held on workstaions and private serves and
   external systems such as the Internet, commercially available DB, or DB assoicated
   with and organization’s suppliers or customers
   2Operational datastore(ODS)is a repository of current and integrated operational
   data used for analysis. It is often structured and supplied with data in the same way as
   the data warehouse, but may in fact simply act as a staging area for data to be moved
   into the warehouse
   3load manageralso called the frontend component, it performance all the operations
   associated with the extraction and loading of data into the warehouse. These
   operations include simple transformations of the data to prepare the data for entry into
   the warehouse
   4warehouse managerperforms all the operations associated with the management of
   the data in the warehouse. The operations performed by this component include
   analysis of data to ensure consistency, transformation and merging of source data,
   creation of indexes and views, generation of denormalizations and aggregations, and
   archiving and backing-up data
5query manageralso called backend component, it performs all the operations
  associated with the management of user queries. The operations performed by this
  component include directing queries to the appropriate tables and scheduling the
  execution of queries
  6detailed, lightly and lightly summarized data,archive/backup data
  7meta-data
  8end-user access toolscan be categorized into five main groups: data reporting and
  query tools, application development tools, executive information system (EIS) tools,
  online analytical processing (OLAP) tools, and data mining tools


Data Flows
  1Inflow- The processes associated with the extraction, cleansing, and loading of the
  data from the source systems into the data warehouse.
  2upflow- The process associated with adding value to the data in the warehouse
  through summarizing, packaging , packaging, and distribution of the data
  3downflow- The processes associated with archiving and backing-up of data in the
  warehouse
  4outflow- The process associated with making the data availabe to the end-users
  5Meta-flow- The processes associated with the management of the meta-data
Tools and Technologies:
  1The critical steps in the construction of a data warehouse:
     a. Extraction
     b. Cleansing
     c. Transformation
  1after the critical steps, loading the results into target system can be carried out either
  by separate products, or by a single, categories:
  2code generators
  3database data replication tools
  4dynamic transformation engines
The importance of managing meta-data(integration):
   1The integration of meta-data, that is ”data about data”
   2Meta-data is used for a variety of purposes and the management of it is a critical
   issue in achieving a fully integrated data warehouse
   3The major purpose of meta-data is to show the pathway back to where the data
   began, so that the warehouse administrators know the history of any item in the
   warehouse
   4The meta-data associated with data transformation and loading must describe the
   source data and any changes that were made to the data
   5The meta-data associated with data management describes the data as it is stored in
   the warehouse
   6The meta-data is required by the query manager to generate appropriate queries, also
   is associated with the user of queries


Data Warehousing Issues
    1Semantic Integration: When getting data from
      multiple sources, must eliminate mismatches,
         e.g., different currencies, DB schemas.
    2Heterogeneous Sources: Must access data from
      a variety of source formats and repositories.
          Replication capabilities can be exploited here.
    3Load, Refresh, Purge: Must load data,
      periodically refresh it, and purge too-old data.
    4Metadata Management: Must keep track of
      source, loading time, and other information for
      all data in the warehouse.
Star Schema:
       A logical structure that has a fact table containing factual data in the center,
surrounded by dimension tables containing reference data (which can be denormalized)
Snowflake Schema:
A variant of the star schema where dimension tables do not contain denormalized
data.
Starflake Schema:
        A hybrid structure that contains a mixture of star and snowflake schemas.




The benefits of data warehousing:
   1The potential benefits of data warehousing are high returns on investment.
   2substantial competitive advantage..
   3Increased productivity of corporate decision-makers..
   4More cost effective decision making
   5Better enterprise intelligence
   6Enhanced customer service
   7Better asset/liability management
   8Business process reengineering
   9Empowerment of all employees
Applications:
On Line Transaction Processing:
   OLTP systems are the major kinds of enterprise applications:
   Examples:
                   Order entry systems, Inventory control systems, Reservation
                   systems, Point-of-sale systems, Tracking systems, etc.


Executive information system (EIS) :
Present information at the highest level of summarization using corporate business
measures. They are designed for extreme ease-of-use and, in many cases, only a mouse is
required. Graphics are usually generously incorporated to provide at-a-glance indications
of performance
Decision Support Systems (DSS) :
They ideally present information in graphical and tabular form, providing the user with
the ability to drill down on selected information. Note the increased detail and data
manipulation options presented.




                                   DATA MINING
What is data mining?
    Data Mining refers to the process of analyzing the data from different perspectives
and summarizing it into useful information. Data mining software is one of the numbers
of tools used for analyzing data. It allows users to analyze from many different
dimensions or angles, categorize it, and summarize the relationship identified.
   1Data Mining is about techniques for finding and describing Structural Patterns in
   data.
Definition:
  Data mining is the process of finding correlation or patterns among fields in large
relational databases.
The process of extracting valid, previously unknown, comprehensible, and actionable
information from large databases and using it to make crucial business decisions.
(Simoudis, 1996)


Different Types of Data Mining:


       1Business Data Mining
       2Scientific Data Mining
       3Internet Data Mining


Five major elements of Data Mining:
1.Extract, transform, and load transaction data on to the data warehouse system.
   2.Store and manage data in multidimensional database system.
   3.Provide access to business analysts and information technology Professionals.
   4.Analyze the data by application software.
   5.Present the data in useful format such as graph or table.




Requirements of Data Mining:
   1Handling of different type of data
   2Efficiency and scalability of algorithm
   3Usefulness, certainty and expressiveness of result
   4Expression of various kinds of mining results
   5Interactive mining knowledge at multiple levels
   6Mining information from different sources of data
   7Protection of privacy and data security


Various kinds of data on which Data Mining is applied :
   1Relational database
   2Data warehouse
   3Transactional database
   4Multimedia database
   5Spatial and temporal data
   6Object-relational database


Data mining applications:
  The Main application for Data Mining is WEB MINING.
What is Web Mining?
                          “Web mining can be broadly defined as the automated discovery
and analysis of useful information from the Web documents and services using data
mining techniques.”
Web mining is the application of data mining or other information process
techniques to WWW, to find useful patterns. People can take advantage of these patterns
to access WWW more efficiently.


NEED FOR WEB MINING:
        Now a day, the World Wide Web is a popular and interactive medium, ideal for
publishing information. It is huge, diverse and dynamic and thus raises issue of
scalability, multimedia and temporal data respectively, due to those situations; the users
are currently “drowning” in an information overload that expands at rate that far outpaces
human ability to process and exploit it.
Domains of Web Mining:
               There are three domains that pertain to Web mining:

              1. Web Contents Mining

              2. Web Structure Mining

              3. Web Usage Mining

1. Web Content Mining
       Web content mining is an automatic process that extracts patterns from on-line
information, such as the HTML files, images, or E-mails, and it already goes beyond only
keyword extraction or some simple statistics of words and phrases in documents. Web
content mining is the "process of information or resource discovery from millions of
sources across the World Wide Web ". There are two approaches in Web content mining:

           1Agent-based approaches

           2Database approaches



 Agent-Based approaches:

       The agent-based approach involves artificial intelligence systems that can "act
autonomously or semi-autonomously on behalf of a particular user, to discover and
organize Web-based information ". Some intelligent Web agents can use a user profile to
search for relevant information, then organize and interpret the discovered information
(e.g., Harvest).
  Database approaches:
        The database approach focuses on "integrating and organizing the heterogeneous
and semi-structured data on the Web into more structured and high-level collections of
resources." These "metadata, are organized into structured collections (e.g., relational or
object-oriented databases) and can be analyzed".

2. Web Structure Mining
        The Data which describes organization of content.Intra-page structure information
includes the arrangement of various HTML or XML tags within a given page. This can
be represented as tree structure, where the <html> tag becomes the root of tree. The
principal kind of inter-page structure information is hyper-links connecting one page to
another.

3. Web Usage Mining
        Web servers record and accumulate data about user interactions whenever
requests for resources are received. Analyzing the Web access logs of different Web sites
can help to understand the user behavior and the Web structure, by improving design of
the colossal collection of resources.

Web Mining Techniques
        The common techniques for Web mining are:

    1Clustering/classification

    2Association rules

    3Path analysis

    4Sequential patterns.

  1. Clustering/classification

        This technique is used to develop profiles of items with similar characteristics.
This ability enhances the discovery of relationships that are otherwise not obvious. Eg:
Classification of Web access logs allows a company to discover the average age of
customers who order a certain product.

 2. Association rules

       Rules that govern "databases of transactions where each transaction consists of a
set of items." This technique is used to predict the correlation of items "where the
presence of one set of items in a transaction implies (with a certain degree of confidence)
the presence of other items."

 3. Path analysis

       A Technique that involves the generation of some form of graph that "represents
relation[s] defined on Web pages." This can be the physical layout of a Web site in which
the Web pages are nodes and the hypertext links between these pages are directed edges.
Eg: what paths do users travel before they go to a particular URL.

 4. Sequential patterns

       Applied to     Web access server transaction logs. The purpose is to discover
sequential patterns that indicate user visit patterns over a certain period.

Web mining as a tool:
                        Web mining can be a promising tool to address ineffective search
engines, which produce incomplete indexing, unverified reliability of retrieved
information. Web mining discovers information from mounds of data on the WWW, but
it also monitors and predicts user visit habits. This gives designers more reliable
information in structuring and designing a Web site. Web mining technology can help
librarians design Web sites with paths that can be traveled easily by end users, saving
time and effort. Eg: Web mining technology and academic librarianship
Conclusion:
   Data Warehousing provides the means to change the raw data into information for
   making effective business decisions-the emphasis on information, not data.The Data
   warehouse is the hub for decision support data.
    Data mining is a useful tool with multiple algorithms that can be tuned for specific
tasks. It can benefit business, medicine, and science. It needs more efficient algorithms to
speed up data mining process.Web mining is a huge, interdisciplinary and vary
dynamic/scientific area, converging from several research communities such as database,
information retrieval and artificial intelligence especially from machine learning and
natural language processing. This area is so broad today partly due to the interests of
various research communities.


References:
   1www.datawarehousingonline.com
   2Data Base Systems-Elmasri, Navathe
   3Data Mining Technologies-Arun K.Pujari
   4Data Mining and Data Warehousing and OLAP-A.Berson, S.J.Smith
   5Database Management System-Sylbardcards

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Harish Chand
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouseKrish_ver2
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Seerat Malik
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxGovardhanV7
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data modelmoni sindhu
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profilingShailja Khurana
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaVaibhav Khanna
 

Was ist angesagt? (20)

Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data mining
Data mining Data mining
Data mining
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptx
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schema
 

Andere mochten auch

3D Graphics & Rendering in Computer Graphics
3D Graphics & Rendering in Computer Graphics3D Graphics & Rendering in Computer Graphics
3D Graphics & Rendering in Computer GraphicsFaraz Akhtar
 
3D Geometric Transformations
3D Geometric Transformations3D Geometric Transformations
3D Geometric TransformationsIshan Parekh
 
Computer Graphics Notes (B.Tech, KUK, MDU)
Computer Graphics Notes (B.Tech, KUK, MDU)Computer Graphics Notes (B.Tech, KUK, MDU)
Computer Graphics Notes (B.Tech, KUK, MDU)Rajesh Kamboj
 
tybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notestybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notesWE-IT TUTORIALS
 
Notes 2D-Transformation Unit 2 Computer graphics
Notes 2D-Transformation Unit 2 Computer graphicsNotes 2D-Transformation Unit 2 Computer graphics
Notes 2D-Transformation Unit 2 Computer graphicsNANDINI SHARMA
 
Introduction to computer graphics
Introduction to computer graphicsIntroduction to computer graphics
Introduction to computer graphicsAmandeep Kaur
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 

Andere mochten auch (9)

Data mining
Data miningData mining
Data mining
 
3D Graphics & Rendering in Computer Graphics
3D Graphics & Rendering in Computer Graphics3D Graphics & Rendering in Computer Graphics
3D Graphics & Rendering in Computer Graphics
 
3D Geometric Transformations
3D Geometric Transformations3D Geometric Transformations
3D Geometric Transformations
 
Computer Graphics Notes (B.Tech, KUK, MDU)
Computer Graphics Notes (B.Tech, KUK, MDU)Computer Graphics Notes (B.Tech, KUK, MDU)
Computer Graphics Notes (B.Tech, KUK, MDU)
 
tybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notestybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notes
 
Notes 2D-Transformation Unit 2 Computer graphics
Notes 2D-Transformation Unit 2 Computer graphicsNotes 2D-Transformation Unit 2 Computer graphics
Notes 2D-Transformation Unit 2 Computer graphics
 
Introduction to computer graphics
Introduction to computer graphicsIntroduction to computer graphics
Introduction to computer graphics
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 

Ähnlich wie Data Mining

ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxParnalSatle
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data WarehousingAAKANKSHA JAIN
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and ImplementationSHIKHA GAUTAM
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfJerichoGerance
 
BVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxBVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxDrNilimaThakur
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingsumit621
 
Decoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdfDecoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdfDatavalley.ai
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)Krishan Pareek
 
Advances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyAdvances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyKate Campbell
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 

Ähnlich wie Data Mining (20)

ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Unit 5
Unit 5 Unit 5
Unit 5
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and Implementation
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
 
Unit 1
Unit 1Unit 1
Unit 1
 
BVRM 402 IMS UNIT V
BVRM 402 IMS UNIT VBVRM 402 IMS UNIT V
BVRM 402 IMS UNIT V
 
BVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxBVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptx
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Decoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdfDecoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdf
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
Abstract
AbstractAbstract
Abstract
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)
 
9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Advances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyAdvances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing Technology
 
DMDW 1st module.pdf
DMDW 1st module.pdfDMDW 1st module.pdf
DMDW 1st module.pdf
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 

Kürzlich hochgeladen

Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 

Kürzlich hochgeladen (20)

Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 

Data Mining

  • 1. KUMARAGURU COLLEGE OF TECHNOLOGY COIMBATORE DATA WAREHOUSING AND DATA MINING Presented by K.Santhosh (07bcs43) E-Mail ID:ksanthoshselvam@gmail.com Contact No: 9788153199 V.Siddharth (07bcs50) E-Mail ID:siddharthindian@yahoo.com Contact No: 9843286841
  • 2. DATA WAREHOUSING AND DATA MINING ABSTRACT: Fast, accurate and scalable data analysis techniques are needed to extract useful information from huge pile of data. Data warehouse is a single, integrated source of decision support information formed by collecting data from multiple sources, internal to the organization as well as external, and transforming and summarizing this information to enable improved decision making. Data warehouse is designed for easy access by users to large amounts of information, and data access is typically supported by specialized analytical tools and applications. Typical applications include decision support systems and execution information system. Data mining is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful, and ultimately understandable patterns in data. It is “An information extraction activity whose goal is to discover hidden facts contained in databases”. The process of extracting valid, previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions. Data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. A data mining model is a description of a specific aspect of a dataset. It produces output values for an assigned set of input values. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis.”
  • 3. DATA WAREHOUSING AND DATA MINING Introduction: Everyday increasingly, organizations are analyzing current and historical data to identify useful patterns and support business strategies. A large amount of the right information is the key to survival in today’s competitive environment. And this kind of information can be made available only if there’s totally integrated enterprise data warehouse. What is data warehousing? A data warehouse is a subject-oriented, integrated, non-volatile & time-variant collection of data in support of management’s decisions NEED FOR A DATA WAREHOUSE : • IT or business staff spending a lot of time developing special reports for decision- makers. • Lots of PC-based or small server systems obtaining extracts of data incapable of presenting a holistic view of the entire gamut of information. • Same data present on different systems, in different department and users may be unaware of this fact. • Difficulty in getting meaningful information in a timely manner. • Multiple systems giving different answer to the business questions. • Less analysis by decision makers and policy planners due to non-availability of sophisticated tools and easily decipherable, timely and comprehensive information
  • 4. PURPOSE OF A DATA WAREHOUSE : Better business intelligence for end users. • Reduction in time to locate, access and analyze information. • Consolidation of disparate information sources. • Replacement of older, less-responsive decision support systems • Faster time to market for products and services • Strategic advantage over competitors Data Warehouse Characteristics: 1.Subject-orientedWH is organized around the major subjects of the enterprise rather than the major application areas. This is reflected in the need to store decision- support data rather than application-oriented data. 2.Integratedbecause the source data come together from different enterprise-wide applications systems. The source data is often inconsistent using..The integrated data source must be made consistent to present a unified view of the data to the users 3.Time-variantthe source data in the WH is only accurate and valid at some point in time or over some time interval. The time-variance of the data warehouse is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots 4.Non-volatiledata is not update in real time but is refresh from OS on a regular basis. New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data, incrementally integrating it with previous data DATA WAREHOUSE LIFE CYCLE: Data warehousing is a concept. It is not a product that can be purchased off the shelf. It is a set of hardware and software components integrated together which can be used to
  • 5. analyze the massive amount of data stored in an efficient manner. It is a process through which one can build a successful data warehouse. Following are the five steps towards building a successful data warehouse. 1.JUSTIFICATION 2.REQUIREMENT ANALYSIS 3.DESIGN 4.DEVELOPMENT AND IMPLEMENTATION 5.DEPLOYMENT Main Components: 1Operational data sourcesfor the DW is supplied from mainframe operational data held in first generation hierarchical and network databases, departmental data held in proprietary file systems, private data held on workstaions and private serves and external systems such as the Internet, commercially available DB, or DB assoicated with and organization’s suppliers or customers 2Operational datastore(ODS)is a repository of current and integrated operational data used for analysis. It is often structured and supplied with data in the same way as the data warehouse, but may in fact simply act as a staging area for data to be moved into the warehouse 3load manageralso called the frontend component, it performance all the operations associated with the extraction and loading of data into the warehouse. These operations include simple transformations of the data to prepare the data for entry into the warehouse 4warehouse managerperforms all the operations associated with the management of the data in the warehouse. The operations performed by this component include analysis of data to ensure consistency, transformation and merging of source data, creation of indexes and views, generation of denormalizations and aggregations, and archiving and backing-up data
  • 6. 5query manageralso called backend component, it performs all the operations associated with the management of user queries. The operations performed by this component include directing queries to the appropriate tables and scheduling the execution of queries 6detailed, lightly and lightly summarized data,archive/backup data 7meta-data 8end-user access toolscan be categorized into five main groups: data reporting and query tools, application development tools, executive information system (EIS) tools, online analytical processing (OLAP) tools, and data mining tools Data Flows 1Inflow- The processes associated with the extraction, cleansing, and loading of the data from the source systems into the data warehouse. 2upflow- The process associated with adding value to the data in the warehouse through summarizing, packaging , packaging, and distribution of the data 3downflow- The processes associated with archiving and backing-up of data in the warehouse 4outflow- The process associated with making the data availabe to the end-users 5Meta-flow- The processes associated with the management of the meta-data Tools and Technologies: 1The critical steps in the construction of a data warehouse: a. Extraction b. Cleansing c. Transformation 1after the critical steps, loading the results into target system can be carried out either by separate products, or by a single, categories: 2code generators 3database data replication tools 4dynamic transformation engines
  • 7. The importance of managing meta-data(integration): 1The integration of meta-data, that is ”data about data” 2Meta-data is used for a variety of purposes and the management of it is a critical issue in achieving a fully integrated data warehouse 3The major purpose of meta-data is to show the pathway back to where the data began, so that the warehouse administrators know the history of any item in the warehouse 4The meta-data associated with data transformation and loading must describe the source data and any changes that were made to the data 5The meta-data associated with data management describes the data as it is stored in the warehouse 6The meta-data is required by the query manager to generate appropriate queries, also is associated with the user of queries Data Warehousing Issues 1Semantic Integration: When getting data from multiple sources, must eliminate mismatches, e.g., different currencies, DB schemas. 2Heterogeneous Sources: Must access data from a variety of source formats and repositories. Replication capabilities can be exploited here. 3Load, Refresh, Purge: Must load data, periodically refresh it, and purge too-old data. 4Metadata Management: Must keep track of source, loading time, and other information for all data in the warehouse. Star Schema: A logical structure that has a fact table containing factual data in the center, surrounded by dimension tables containing reference data (which can be denormalized) Snowflake Schema:
  • 8. A variant of the star schema where dimension tables do not contain denormalized data. Starflake Schema: A hybrid structure that contains a mixture of star and snowflake schemas. The benefits of data warehousing: 1The potential benefits of data warehousing are high returns on investment. 2substantial competitive advantage.. 3Increased productivity of corporate decision-makers.. 4More cost effective decision making 5Better enterprise intelligence 6Enhanced customer service 7Better asset/liability management 8Business process reengineering 9Empowerment of all employees Applications: On Line Transaction Processing: OLTP systems are the major kinds of enterprise applications: Examples: Order entry systems, Inventory control systems, Reservation systems, Point-of-sale systems, Tracking systems, etc. Executive information system (EIS) : Present information at the highest level of summarization using corporate business measures. They are designed for extreme ease-of-use and, in many cases, only a mouse is required. Graphics are usually generously incorporated to provide at-a-glance indications of performance Decision Support Systems (DSS) :
  • 9. They ideally present information in graphical and tabular form, providing the user with the ability to drill down on selected information. Note the increased detail and data manipulation options presented. DATA MINING What is data mining? Data Mining refers to the process of analyzing the data from different perspectives and summarizing it into useful information. Data mining software is one of the numbers of tools used for analyzing data. It allows users to analyze from many different dimensions or angles, categorize it, and summarize the relationship identified. 1Data Mining is about techniques for finding and describing Structural Patterns in data. Definition: Data mining is the process of finding correlation or patterns among fields in large relational databases. The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions. (Simoudis, 1996) Different Types of Data Mining: 1Business Data Mining 2Scientific Data Mining 3Internet Data Mining Five major elements of Data Mining:
  • 10. 1.Extract, transform, and load transaction data on to the data warehouse system. 2.Store and manage data in multidimensional database system. 3.Provide access to business analysts and information technology Professionals. 4.Analyze the data by application software. 5.Present the data in useful format such as graph or table. Requirements of Data Mining: 1Handling of different type of data 2Efficiency and scalability of algorithm 3Usefulness, certainty and expressiveness of result 4Expression of various kinds of mining results 5Interactive mining knowledge at multiple levels 6Mining information from different sources of data 7Protection of privacy and data security Various kinds of data on which Data Mining is applied : 1Relational database 2Data warehouse 3Transactional database 4Multimedia database 5Spatial and temporal data 6Object-relational database Data mining applications: The Main application for Data Mining is WEB MINING. What is Web Mining? “Web mining can be broadly defined as the automated discovery and analysis of useful information from the Web documents and services using data mining techniques.”
  • 11. Web mining is the application of data mining or other information process techniques to WWW, to find useful patterns. People can take advantage of these patterns to access WWW more efficiently. NEED FOR WEB MINING: Now a day, the World Wide Web is a popular and interactive medium, ideal for publishing information. It is huge, diverse and dynamic and thus raises issue of scalability, multimedia and temporal data respectively, due to those situations; the users are currently “drowning” in an information overload that expands at rate that far outpaces human ability to process and exploit it. Domains of Web Mining: There are three domains that pertain to Web mining: 1. Web Contents Mining 2. Web Structure Mining 3. Web Usage Mining 1. Web Content Mining Web content mining is an automatic process that extracts patterns from on-line information, such as the HTML files, images, or E-mails, and it already goes beyond only keyword extraction or some simple statistics of words and phrases in documents. Web content mining is the "process of information or resource discovery from millions of sources across the World Wide Web ". There are two approaches in Web content mining: 1Agent-based approaches 2Database approaches Agent-Based approaches: The agent-based approach involves artificial intelligence systems that can "act autonomously or semi-autonomously on behalf of a particular user, to discover and organize Web-based information ". Some intelligent Web agents can use a user profile to
  • 12. search for relevant information, then organize and interpret the discovered information (e.g., Harvest). Database approaches: The database approach focuses on "integrating and organizing the heterogeneous and semi-structured data on the Web into more structured and high-level collections of resources." These "metadata, are organized into structured collections (e.g., relational or object-oriented databases) and can be analyzed". 2. Web Structure Mining The Data which describes organization of content.Intra-page structure information includes the arrangement of various HTML or XML tags within a given page. This can be represented as tree structure, where the <html> tag becomes the root of tree. The principal kind of inter-page structure information is hyper-links connecting one page to another. 3. Web Usage Mining Web servers record and accumulate data about user interactions whenever requests for resources are received. Analyzing the Web access logs of different Web sites can help to understand the user behavior and the Web structure, by improving design of the colossal collection of resources. Web Mining Techniques The common techniques for Web mining are: 1Clustering/classification 2Association rules 3Path analysis 4Sequential patterns. 1. Clustering/classification This technique is used to develop profiles of items with similar characteristics. This ability enhances the discovery of relationships that are otherwise not obvious. Eg:
  • 13. Classification of Web access logs allows a company to discover the average age of customers who order a certain product. 2. Association rules Rules that govern "databases of transactions where each transaction consists of a set of items." This technique is used to predict the correlation of items "where the presence of one set of items in a transaction implies (with a certain degree of confidence) the presence of other items." 3. Path analysis A Technique that involves the generation of some form of graph that "represents relation[s] defined on Web pages." This can be the physical layout of a Web site in which the Web pages are nodes and the hypertext links between these pages are directed edges. Eg: what paths do users travel before they go to a particular URL. 4. Sequential patterns Applied to Web access server transaction logs. The purpose is to discover sequential patterns that indicate user visit patterns over a certain period. Web mining as a tool: Web mining can be a promising tool to address ineffective search engines, which produce incomplete indexing, unverified reliability of retrieved information. Web mining discovers information from mounds of data on the WWW, but it also monitors and predicts user visit habits. This gives designers more reliable information in structuring and designing a Web site. Web mining technology can help librarians design Web sites with paths that can be traveled easily by end users, saving time and effort. Eg: Web mining technology and academic librarianship
  • 14. Conclusion: Data Warehousing provides the means to change the raw data into information for making effective business decisions-the emphasis on information, not data.The Data warehouse is the hub for decision support data. Data mining is a useful tool with multiple algorithms that can be tuned for specific tasks. It can benefit business, medicine, and science. It needs more efficient algorithms to speed up data mining process.Web mining is a huge, interdisciplinary and vary dynamic/scientific area, converging from several research communities such as database, information retrieval and artificial intelligence especially from machine learning and natural language processing. This area is so broad today partly due to the interests of various research communities. References: 1www.datawarehousingonline.com 2Data Base Systems-Elmasri, Navathe 3Data Mining Technologies-Arun K.Pujari 4Data Mining and Data Warehousing and OLAP-A.Berson, S.J.Smith 5Database Management System-Sylbardcards