SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Data Mining-PART I
By
M.Dhilsath Fathima
Topics to cover..
Introduction
Types of Data
 Data Mining Functionalities
Interestingness of Patterns
 Classification of Data Mining Systems
Data Mining Task Primitives
Integration of a Data Mining System with a Data
Warehouse
Issues
Data Preprocessing.
What is Database?
A database is any organized
collection of data.
Examples Co-workers
Examples Patient Information
Examples Airline reservation system
DATABASE
• Database: Shared collection of logically
related data (and a description of this
data), designed to meet the information
needs of an organization.
• Database management System: A
software system that enables users to
define, create, and maintain the
Who and How to do it ?
• Database Management System (DBMS) does this job.
• Using Software tools: Access, FileMaker, Lotus Notes,
Oracle or SQL Server, …….
• It includes tools to add, modify or delete data from the
database, ask questions (or queries) about the data
stored in the database and produce reports summarizing
selected contents.
Why do we need a database?
• Keep records of our:
– Clients
– Staff
– Volunteers
• To keep a record of activities.
• Keep sales records
• Develop reports
• Perform Querying
Data vs. information
• What is data?
–Data is
unprocessed
information.
• What is information?
–Information is data that
have been organized
and communicated in a
logical and meaningful
manner.
Purpose of Database system/
Stages of Database System
– Data is converted into information, and information is converted
into knowledge.
– Knowledge; information evaluated and organized so that it can be
used purposefully.
Data
(Unprocessed
information)
Data
(Unprocessed
information)
Information
(processed Data)
Information
(processed Data)
Knowledge
(Evaluated Information
using measures)
Knowledge
(Evaluated Information
using measures)
Action
(Data Analysis &
Future Prediction)
Action
(Data Analysis &
Future Prediction)
Is to transformIs to transform
12
Data Mining works with
Warehouse Data
• Data Warehousing provides the
Enterprise with a memory.
• Data Mining provides the
Enterprise with intelligence
Data Mining works with
Warehouse Data
What is data Mining?
• Now a days, huge data sets have become available due
to advances in technology.
• As a result, there is an increasing interest in various
scientific communities to explore the use of emerging
data mining techniques for the analysis of these large
data sets .
• Data mining is the extraction of implicit, previously
unknown and potentially useful
information,patterns,associations from data .
• Data mining is the Exploration & analysis, by
automatic or semi-automatic means, of large
quantities of data in order to discover meaningful
patterns .
WHO USES DATAMİNİNG?
•Banking
–future prediction
•Amazon.com (Online Stores)
–recommendation
•Facebook
–prediction how active a user will be after
3 months.
13/03/16 Seval Ünver | CENG 553 15
Datamining is…
13/03/16 Seval Ünver | CENG 553 16
DATAMİNİNG İS NOT…
• Data warehousing
• SQL / Ad Hoc Queries /
Reporting
• Online Analytical
Processing (OLAP)
• Data Visualization
DATAMİNİNG İS …
• Explores Data
• Find Patterns
• Performs Prediction
13/03/16 Seval Ünver | CENG 553 17
KDD Process
• Knowledge discovery in databases (KDD) is a
multi step process of finding useful information
and patterns in data
• Data Mining is the use of algorithms to extract
information and patterns derived by the KDD
process.
• Many texts treat KDD and Data Mining as the
same process, but it is also possible to think of
Data Mining as the discovery part of KDD.
Steps of KDD Process
STEPS OF KDD PROCESS
1. Selection-
Data Extraction -Obtaining Data from heterogeneous data
sources -Databases, Data warehouses, World wide web or
other information repositories.
2. Preprocessing-
Data Cleaning- Incomplete , noisy, inconsistent data to be
cleaned- Missing data may be ignored or predicted,
erroneous data may be deleted or corrected.
3. Transformation-
Data Integration- Combines data from multiple sourcesCombines data from multiple sources
into a coherent store -Data can be encoded in commoninto a coherent store -Data can be encoded in common
formats, normalized, reduced.formats, normalized, reduced.
Steps of KDD Process
4. D4. Data mining –
Apply algorithms to transformed data an extract patterns.
5. Pattern Interpretation/evaluation
Pattern Evaluation- Evaluate the interestingness of resulting
patterns or apply interestingness measures to filter out
discovered patterns.
Knowledge presentation- present the mined knowledge-
visualization techniques can be used.
Types of Data /
What kind of Data can be mined
• Data mining should be applicable to any kind of information
repository. However, algorithms and approaches may differ
when applied to different types of data.
• Relational Databases
• Data Warehouse
• Transaction Databases
• Advanced DB systems and information repositories
– Spatial databases
– Time-series data
– multimedia databases
– WWW
Relational Databases
– A relational database consists
of a set of tables containing
either values of entity
attributes, or values of
attributes from entity
relationships.
– Tables have columns and rows,
where columns represent
attributes and rows represent
tuples.
– A tuple in a relational table
corresponds to either an object
or a relationship between
objects and is identified by a set
of attribute values representing
a unique key.
Data Warehouse
• A data warehouse as a
storehouse, is a repository
of data collected from
multiple data sources (often
heterogeneous) and is
intended to be used as a
whole under the same
unified schema. A data
warehouse gives the option
to analyze data from
different sources under the
same roof.
Transaction Databases
• A transaction database is a set of
records representing
transactions, each with a time
stamp, an identifier and a set of
items. Associated with the
transaction files could also be
descriptive data for the items.
• Transactions are usually stored
in flat files or stored in two
normalized transaction tables,
one for the transactions and one
for the transaction items.
• Applications: Airline reservation,
Railway reservation, Log records
etc.
MULTIMEDIA DATABASE
• Multimedia databases include video,
images, audio, Sound clips, and text data.
They can be stored on extended object-
relational or object-oriented databases, or
simply on a file system.
• Ex: Digital Music Player, Social Media,
Electronic publishing.
Spatial Databases
• A spatial database is a
database that is enhanced to
store and access spatial data
that defines a geometric
space.
• These data are often
associated with geographic
locations and features, or
constructed features like
cities. Data on spatial
databases are stored as
coordinates, points, lines,
polygons and topology.
• Ex: store geographical
information like maps, and
global or regional
positioning.
Time Series Database
• A Time-Series
Database is a
database that
contains data for each
point in time.
• Examples: Weather
Data, stock market
data , Browser logged
activities, ocean tides.
Time Series Database-Example
World Wide Web
• The World Wide Web is the most
heterogeneous and dynamic repository
available.
• Data in the World Wide Web is organized
in inter-connected documents. These
documents can be text, audio, video, raw
data, and even applications.
Typical Architecture of Data Mining
System
Integration of a Data Mining System with a Database/Data
Warehouse System
The list of Integration Schemes is as follows −
• No Coupling − In this scheme, the data mining system does not
utilize any of the database or data warehouse functions. It fetches
the data directly from a particular source and processes that data
using some data mining algorithms. The data mining result is stored
in another file.(Ex :Collect data directly from Transactional database)
• Loose Coupling/Semi−tight Coupling - In this scheme, the data
mining system may use some of the functions of database and data
warehouse system. It fetches the data from the data respiratory
managed by these systems and performs data mining on that data or
fetch directly from particular sources. (Ex: Taken from transactional
DB+ Database/DWH)
• Tight coupling − In this scheme, the data mining system is smoothly
integrated into the database or data warehouse system. The data
mining subsystem is treated as one functional component of an
information system.
Integrated architecture of a Data Mining with DWH/
AN OLAM SYSTEM ARCHITECTURE
Data Mining Task Primitives
• We can specify a data mining task in the form of a data mining
query.
• This query is input to the system.
• A data mining query is defined in terms of data mining task
primitives.
• Note − These primitives allow us to communicate in an interactive
manner with the data mining system. Here is the list of Data Mining
Task Primitives −
1. Kind of knowledge to be mined.
2. Set of task relevant data to be mined.
3. Representation for visualizing the discovered patterns.
4. Background knowledge to be used in discovery process.
5. Interestingness measures and thresholds for pattern evaluation.
Data Mining Task Primitives-
Example of Data mining query
• use database AllElectronics_db use state_
location_hierarchy for B.address mine
characteristics as customerPurchasing analyze
count% in relevance to
C.age,I.type,I.place_made from customer C,
item I, purchase P, items_sold S, branch B
where I.item_ID = S.item_ID and P.cust_ID =
C.cust_ID and P.method_paid = "AmEx" and
B.address = "Canada" and I.price ≥ 100 with
noise threshold = 5% display as table
Data Mining Task Primitives-cont..
1. Kind of knowledge to be mined
– It refers to the kind of functions to be performed.
These functions are −
• Characterization
• Association and Correlation Analysis
• Classification
• Prediction
• Clustering
• Outlier Analysis
1. Set of task relevant data to be mined
– This is the portion of database in which the user is interested.
This portion includes the following −
• Database Attributes
• Data Warehouse dimensions of interest
Data Mining Task Primitives-cont..
3. Representation for visualizing the discovered
patterns
– This refers to the form in which discovered patterns
are to be displayed. These representations may
include the following −
• Rules
• Tables
• Charts
• Graphs
• Decision Trees
• Cubes
Data Mining Task Primitives-cont..
4. Background knowledge
– The background knowledge allows data to be mined at multiple
levels of abstraction. For example, the Concept hierarchies are
one of the background knowledge that allows data to be mined at
multiple levels of abstraction.
5.Interestingness measures and thresholds for
pattern evaluation
– This is used to evaluate the patterns that are discovered by the process of
knowledge discovery. There are different interesting measures for
different kind of knowledge.
Classification of Data mining System
Classification of Data mining
System(Cont..)
Data to be mined
Relational, data warehouse, transactional, stream, object-
oriented/relational, active, spatial, time-series, text, multi-media,
heterogeneous, legacy, WWW
Knowledge to be mined
Characterization, discrimination, association, classification, clustering,
trend/deviation, outlier analysis, etc.
Multiple/integrated functions and mining at multiple levels
Techniques utilized
Database-oriented, data warehouse (OLAP), machine learning, statistics,
visualization, etc.
Applications adapted
Retail, telecommunication, banking, fraud analysis, bio-data mining,
stock market analysis, text mining, Web mining, etc.

Weitere ähnliche Inhalte

Was ist angesagt?

Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
Rupsee
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
jagdish_93
 

Was ist angesagt? (20)

Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 
data warehousing
data warehousingdata warehousing
data warehousing
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
Artificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge AcquisitionArtificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge Acquisition
 
The Distinction Between Business Intelligence (BI) and Corporate Performance ...
The Distinction Between Business Intelligence (BI) and Corporate Performance ...The Distinction Between Business Intelligence (BI) and Corporate Performance ...
The Distinction Between Business Intelligence (BI) and Corporate Performance ...
 
7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them
 
Artificial Intelligence: Case-based & Model-based Reasoning
Artificial Intelligence: Case-based & Model-based ReasoningArtificial Intelligence: Case-based & Model-based Reasoning
Artificial Intelligence: Case-based & Model-based Reasoning
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Introduction to Expert Systems {Artificial Intelligence}
Introduction to Expert Systems {Artificial Intelligence}Introduction to Expert Systems {Artificial Intelligence}
Introduction to Expert Systems {Artificial Intelligence}
 
CS8091_BDA_Unit_II_Clustering
CS8091_BDA_Unit_II_ClusteringCS8091_BDA_Unit_II_Clustering
CS8091_BDA_Unit_II_Clustering
 
Data clustering
Data clustering Data clustering
Data clustering
 
Csc446: Pattern Recognition
Csc446: Pattern Recognition Csc446: Pattern Recognition
Csc446: Pattern Recognition
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 

Ähnlich wie Unit 3 part i Data mining

Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
bhagathk
 

Ähnlich wie Unit 3 part i Data mining (20)

Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
Data mining basic concept and Data warehousing
Data mining basic concept and Data warehousingData mining basic concept and Data warehousing
Data mining basic concept and Data warehousing
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Ch~2.pdf
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
Database management system
Database management systemDatabase management system
Database management system
 
ch2 DS.pptx
ch2 DS.pptxch2 DS.pptx
ch2 DS.pptx
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Management information system database management
Management information system database managementManagement information system database management
Management information system database management
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Ch_2.pdf
Ch_2.pdfCh_2.pdf
Ch_2.pdf
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining
 
Dm unit i r16
Dm unit i   r16Dm unit i   r16
Dm unit i r16
 
Data base and data entry presentation by mj n somya
Data base and data entry presentation by mj n somyaData base and data entry presentation by mj n somya
Data base and data entry presentation by mj n somya
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
 
Dwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousing
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
 
MS-CIT Unit 9.pptx
MS-CIT Unit 9.pptxMS-CIT Unit 9.pptx
MS-CIT Unit 9.pptx
 

Mehr von Dhilsath Fathima

Mehr von Dhilsath Fathima (11)

Information Security
Information SecurityInformation Security
Information Security
 
Sdlc model
Sdlc modelSdlc model
Sdlc model
 
engineer's are responsible for safety
engineer's are responsible for safetyengineer's are responsible for safety
engineer's are responsible for safety
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data mining
 
business analysis-Data warehousing
business analysis-Data warehousingbusiness analysis-Data warehousing
business analysis-Data warehousing
 
Profession & professionalism
Profession & professionalismProfession & professionalism
Profession & professionalism
 
Engineering as social experimentation
Engineering as social experimentation Engineering as social experimentation
Engineering as social experimentation
 
Moral autonomy & consensus &controversy
Moral autonomy & consensus &controversyMoral autonomy & consensus &controversy
Moral autonomy & consensus &controversy
 
Virtues
VirtuesVirtues
Virtues
 
Data extraction, cleanup & transformation tools 29.1.16
Data extraction, cleanup & transformation tools 29.1.16Data extraction, cleanup & transformation tools 29.1.16
Data extraction, cleanup & transformation tools 29.1.16
 
Business analysis
Business analysisBusiness analysis
Business analysis
 

Kürzlich hochgeladen

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 

Kürzlich hochgeladen (20)

Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stage
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 

Unit 3 part i Data mining

  • 2. Topics to cover.. Introduction Types of Data  Data Mining Functionalities Interestingness of Patterns  Classification of Data Mining Systems Data Mining Task Primitives Integration of a Data Mining System with a Data Warehouse Issues Data Preprocessing.
  • 3. What is Database? A database is any organized collection of data.
  • 7. DATABASE • Database: Shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization. • Database management System: A software system that enables users to define, create, and maintain the
  • 8. Who and How to do it ? • Database Management System (DBMS) does this job. • Using Software tools: Access, FileMaker, Lotus Notes, Oracle or SQL Server, ……. • It includes tools to add, modify or delete data from the database, ask questions (or queries) about the data stored in the database and produce reports summarizing selected contents.
  • 9. Why do we need a database? • Keep records of our: – Clients – Staff – Volunteers • To keep a record of activities. • Keep sales records • Develop reports • Perform Querying
  • 10. Data vs. information • What is data? –Data is unprocessed information. • What is information? –Information is data that have been organized and communicated in a logical and meaningful manner.
  • 11. Purpose of Database system/ Stages of Database System – Data is converted into information, and information is converted into knowledge. – Knowledge; information evaluated and organized so that it can be used purposefully. Data (Unprocessed information) Data (Unprocessed information) Information (processed Data) Information (processed Data) Knowledge (Evaluated Information using measures) Knowledge (Evaluated Information using measures) Action (Data Analysis & Future Prediction) Action (Data Analysis & Future Prediction) Is to transformIs to transform
  • 12. 12 Data Mining works with Warehouse Data • Data Warehousing provides the Enterprise with a memory. • Data Mining provides the Enterprise with intelligence
  • 13. Data Mining works with Warehouse Data
  • 14. What is data Mining? • Now a days, huge data sets have become available due to advances in technology. • As a result, there is an increasing interest in various scientific communities to explore the use of emerging data mining techniques for the analysis of these large data sets . • Data mining is the extraction of implicit, previously unknown and potentially useful information,patterns,associations from data . • Data mining is the Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns .
  • 15. WHO USES DATAMİNİNG? •Banking –future prediction •Amazon.com (Online Stores) –recommendation •Facebook –prediction how active a user will be after 3 months. 13/03/16 Seval Ünver | CENG 553 15
  • 16. Datamining is… 13/03/16 Seval Ünver | CENG 553 16
  • 17. DATAMİNİNG İS NOT… • Data warehousing • SQL / Ad Hoc Queries / Reporting • Online Analytical Processing (OLAP) • Data Visualization DATAMİNİNG İS … • Explores Data • Find Patterns • Performs Prediction 13/03/16 Seval Ünver | CENG 553 17
  • 18. KDD Process • Knowledge discovery in databases (KDD) is a multi step process of finding useful information and patterns in data • Data Mining is the use of algorithms to extract information and patterns derived by the KDD process. • Many texts treat KDD and Data Mining as the same process, but it is also possible to think of Data Mining as the discovery part of KDD.
  • 19. Steps of KDD Process
  • 20. STEPS OF KDD PROCESS 1. Selection- Data Extraction -Obtaining Data from heterogeneous data sources -Databases, Data warehouses, World wide web or other information repositories. 2. Preprocessing- Data Cleaning- Incomplete , noisy, inconsistent data to be cleaned- Missing data may be ignored or predicted, erroneous data may be deleted or corrected. 3. Transformation- Data Integration- Combines data from multiple sourcesCombines data from multiple sources into a coherent store -Data can be encoded in commoninto a coherent store -Data can be encoded in common formats, normalized, reduced.formats, normalized, reduced.
  • 21. Steps of KDD Process 4. D4. Data mining – Apply algorithms to transformed data an extract patterns. 5. Pattern Interpretation/evaluation Pattern Evaluation- Evaluate the interestingness of resulting patterns or apply interestingness measures to filter out discovered patterns. Knowledge presentation- present the mined knowledge- visualization techniques can be used.
  • 22. Types of Data / What kind of Data can be mined • Data mining should be applicable to any kind of information repository. However, algorithms and approaches may differ when applied to different types of data. • Relational Databases • Data Warehouse • Transaction Databases • Advanced DB systems and information repositories – Spatial databases – Time-series data – multimedia databases – WWW
  • 23. Relational Databases – A relational database consists of a set of tables containing either values of entity attributes, or values of attributes from entity relationships. – Tables have columns and rows, where columns represent attributes and rows represent tuples. – A tuple in a relational table corresponds to either an object or a relationship between objects and is identified by a set of attribute values representing a unique key.
  • 24. Data Warehouse • A data warehouse as a storehouse, is a repository of data collected from multiple data sources (often heterogeneous) and is intended to be used as a whole under the same unified schema. A data warehouse gives the option to analyze data from different sources under the same roof.
  • 25. Transaction Databases • A transaction database is a set of records representing transactions, each with a time stamp, an identifier and a set of items. Associated with the transaction files could also be descriptive data for the items. • Transactions are usually stored in flat files or stored in two normalized transaction tables, one for the transactions and one for the transaction items. • Applications: Airline reservation, Railway reservation, Log records etc.
  • 26. MULTIMEDIA DATABASE • Multimedia databases include video, images, audio, Sound clips, and text data. They can be stored on extended object- relational or object-oriented databases, or simply on a file system. • Ex: Digital Music Player, Social Media, Electronic publishing.
  • 27. Spatial Databases • A spatial database is a database that is enhanced to store and access spatial data that defines a geometric space. • These data are often associated with geographic locations and features, or constructed features like cities. Data on spatial databases are stored as coordinates, points, lines, polygons and topology. • Ex: store geographical information like maps, and global or regional positioning.
  • 28. Time Series Database • A Time-Series Database is a database that contains data for each point in time. • Examples: Weather Data, stock market data , Browser logged activities, ocean tides.
  • 30. World Wide Web • The World Wide Web is the most heterogeneous and dynamic repository available. • Data in the World Wide Web is organized in inter-connected documents. These documents can be text, audio, video, raw data, and even applications.
  • 31. Typical Architecture of Data Mining System
  • 32. Integration of a Data Mining System with a Database/Data Warehouse System The list of Integration Schemes is as follows − • No Coupling − In this scheme, the data mining system does not utilize any of the database or data warehouse functions. It fetches the data directly from a particular source and processes that data using some data mining algorithms. The data mining result is stored in another file.(Ex :Collect data directly from Transactional database) • Loose Coupling/Semi−tight Coupling - In this scheme, the data mining system may use some of the functions of database and data warehouse system. It fetches the data from the data respiratory managed by these systems and performs data mining on that data or fetch directly from particular sources. (Ex: Taken from transactional DB+ Database/DWH) • Tight coupling − In this scheme, the data mining system is smoothly integrated into the database or data warehouse system. The data mining subsystem is treated as one functional component of an information system.
  • 33. Integrated architecture of a Data Mining with DWH/ AN OLAM SYSTEM ARCHITECTURE
  • 34. Data Mining Task Primitives • We can specify a data mining task in the form of a data mining query. • This query is input to the system. • A data mining query is defined in terms of data mining task primitives. • Note − These primitives allow us to communicate in an interactive manner with the data mining system. Here is the list of Data Mining Task Primitives − 1. Kind of knowledge to be mined. 2. Set of task relevant data to be mined. 3. Representation for visualizing the discovered patterns. 4. Background knowledge to be used in discovery process. 5. Interestingness measures and thresholds for pattern evaluation.
  • 35. Data Mining Task Primitives- Example of Data mining query • use database AllElectronics_db use state_ location_hierarchy for B.address mine characteristics as customerPurchasing analyze count% in relevance to C.age,I.type,I.place_made from customer C, item I, purchase P, items_sold S, branch B where I.item_ID = S.item_ID and P.cust_ID = C.cust_ID and P.method_paid = "AmEx" and B.address = "Canada" and I.price ≥ 100 with noise threshold = 5% display as table
  • 36. Data Mining Task Primitives-cont.. 1. Kind of knowledge to be mined – It refers to the kind of functions to be performed. These functions are − • Characterization • Association and Correlation Analysis • Classification • Prediction • Clustering • Outlier Analysis 1. Set of task relevant data to be mined – This is the portion of database in which the user is interested. This portion includes the following − • Database Attributes • Data Warehouse dimensions of interest
  • 37. Data Mining Task Primitives-cont.. 3. Representation for visualizing the discovered patterns – This refers to the form in which discovered patterns are to be displayed. These representations may include the following − • Rules • Tables • Charts • Graphs • Decision Trees • Cubes
  • 38. Data Mining Task Primitives-cont.. 4. Background knowledge – The background knowledge allows data to be mined at multiple levels of abstraction. For example, the Concept hierarchies are one of the background knowledge that allows data to be mined at multiple levels of abstraction. 5.Interestingness measures and thresholds for pattern evaluation – This is used to evaluate the patterns that are discovered by the process of knowledge discovery. There are different interesting measures for different kind of knowledge.
  • 39. Classification of Data mining System
  • 40. Classification of Data mining System(Cont..) Data to be mined Relational, data warehouse, transactional, stream, object- oriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW Knowledge to be mined Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc. Multiple/integrated functions and mining at multiple levels Techniques utilized Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, etc. Applications adapted Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc.

Hinweis der Redaktion

  1. Data mining has become popular in many applications, especially in business. To name a few examples:CapitalOne bank uses data mining to predict whether a loan applicant will default on the loan, given information about his/her demographics, credit history, type of loan, etc.  Netflix (the largest DVD-by-mail rental company) andAmazon.com use data mining to provide recommendations to their customers (“you might also be interested in ___”).  British law enforcement and intelligence agencies use data mining to look for data patterns that might point to developing crime trends or security threats. Facebook uses data mining to predict how active a user will be after 3 months. Children's Hospital in Boston uses data mining to sift through emergency room patient records for detecting domestic abuse Pandora (an Internet music radio offering customized music) chooses the next song to play using data mining algorithms.
  2. Statistical Methods: Multivariate analysis - classification; discriminant analysis, - regression, clustering, dimensionality reduction, hypothesis testing, variance analysis, - association(dependency) (Rencher, 1995). Memory Based Methods: memory-based, instance-based methods; case-based reasoning k-nearest neighbor (Mitchell, 1997). Artificial Neural Networks: (Bishop, 1996) Decision Trees:(IF-THEN rules) (Mitchell, 1997) (rule extraction)