SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Submitted by,
M. Kavitha M.Sc.,
Nadar Saraswathi College of
Art & Science, Theni.
Data Mining
Data Integration and
Transformation
Data Integration
* Data Integration involves combining data from
several disparate source, which are stored using various
technologies and provide a unified view of the data.
* The later initiative is often called a data warehouse.
* It merges the data from multiple data stores (data
source).
* It includes multiple databases, data cubes or flat
files.
* Metadata, correlation analysis, data conflict detection
and resolution of semantic heterogeneity contribute towards
smooth data integration.
Advantages :
1. Independence.
2. Faster query processing.
3. Complex query processing.
4. Advanced data summarization & storage possible.
5. High volume data processing.
Disadvantages :
1. Latency (since data needs to be loaded using ETL).
2. Costlier (data localization, infrastructure, security).
There are a number of issues to consider during data integration.
1. Schema Integration.
2. Redundancy.
3. Detection and resolution of data value conflicts.
Schema integration :
The real-world entities from multiple source be matched
is referred to as the entity identification problem.
For example,
Data analyst or the computer be sure that customer_id in
one database and cust_number in another refer to the same
entity. Databases and data warehouses that is a data about the
data it’s a meta data.
Redundancy :
* It is another important issue.
* An attribute may be redundant if it can be “derived”
from another table, such as annual revenue.
* Some redundancies can be detected by correlation
analysis.
For example,
Two attributes, such analysis can measure how
strongly one attribute implies the other based on the
available data.
The correlation between attributes attribute A and B by
Detection and resolution of data value conflicts :
* A third important issue in data integration is the
detection and resolution of data value conflicts.
* The same real-world entity, attribute values from
different sources. This may be due to differences in
representation, scaling, or encoding.
* An attribute in one system may be recorded at a
lower level of abstraction than the “same” attribute in another.
* For example, the total sales in one database may
refer to one branch of All Electronics, an attribute of the same
name in another database may refer to the total sales for All
Electronics stores in a given region.
Data Transformation
* Data transformation the data are transformed or
consolidated into forms in appropriate for mining.
* Data transformation can involve
1. Smoothing.
2. Aggregation.
3. Generalization.
4. Normalization.
5. Attribute construction.
Smoothing :
Which works to remove the noise from data. Such
techniques include binning, clustering and regression.
Aggregation :
* Where summary or aggregation operations are applied
to the data.
* For example, the daily sales data may be aggregated so
as to compute monthly and annual total amounts.
Generalization :
* The data where low-level or “primitive” data are placed
by higher-level concepts through the use of concept through
the use of concept hierarchies.
* For example, the attributes like street can be
generalized to higher-level concept city or country when the
numeric attributes to higher-level concept young, middle-
aged and street.
Normalization :
Where the attribute data are scaled so as to fall within
a specified range, such as -1.0 to 1.0 or 0.0 to 1.0
Attribute construction :
Where new attribute are a constructed and added
from the given set of attributes to help the mining
process.
There are many method for data normalization.
* Min-Max normalization.
* Z-Score normalization.
* Normalization by decimal scaling.
Min – Max Normalization :
It performs a linear transformation on the original data.
Suppose that min A and max A are the minimum and
maximum values of attributes A. A Min – Max
normalization maps a value v of A to v’ in the range.
Z – Score Normalization :
The Z – Score normalization a value of an attribute A
are normalized based on the mean and standard deviation of
A. A value v of A is normalized to v’
Normalization by Decimal Scaling :
Normalization by decimal scaling normalizes by moving
the decimal point of values of attribute A.
The number of decimal points moved depends on the
maximum absolute value of A. A value v of A is normalized
to v’ by computing
where j is the smallest integer such that Max(|V’|) < 1.
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data miningKrish_ver2
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data MiningDHIVYADEVAKI
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture janani thirupathi
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methodsProf.Nilesh Magar
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
Relational Data Model Introduction
Relational Data Model IntroductionRelational Data Model Introduction
Relational Data Model IntroductionNishant Munjal
 
2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMSkoolkampus
 
Entity Relationship Diagram
Entity Relationship DiagramEntity Relationship Diagram
Entity Relationship DiagramShakila Mahjabin
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Harish Chand
 
10. XML in DBMS
10. XML in DBMS10. XML in DBMS
10. XML in DBMSkoolkampus
 
OLAP operations
OLAP operationsOLAP operations
OLAP operationskunj desai
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMSkoolkampus
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization janani thirupathi
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management systemPooja Dixit
 

Was ist angesagt? (20)

4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Relational Data Model Introduction
Relational Data Model IntroductionRelational Data Model Introduction
Relational Data Model Introduction
 
2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS
 
Entity Relationship Diagram
Entity Relationship DiagramEntity Relationship Diagram
Entity Relationship Diagram
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
10. XML in DBMS
10. XML in DBMS10. XML in DBMS
10. XML in DBMS
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
OLAP operations
OLAP operationsOLAP operations
OLAP operations
 
Relational model
Relational modelRelational model
Relational model
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMS
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 

Ähnlich wie Data Integration and Transformation Techniques for Data Mining

Unit_2_Feature Engineering.pdf
Unit_2_Feature Engineering.pdfUnit_2_Feature Engineering.pdf
Unit_2_Feature Engineering.pdfPyingkodi Maran
 
Attribute oriented analysis
Attribute oriented analysisAttribute oriented analysis
Attribute oriented analysisHirra Sultan
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data PreprocessingT Kavitha
 
data processing.pdf
data processing.pdfdata processing.pdf
data processing.pdfDimpyJindal4
 
Data Preprocessing&tools
Data Preprocessing&toolsData Preprocessing&tools
Data Preprocessing&toolsAmandeep Gill
 
Data MIning: Data processing
Data MIning: Data processingData MIning: Data processing
Data MIning: Data processingDatamining Tools
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine LearningPyingkodi Maran
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
Characterization and Comparison
Characterization and ComparisonCharacterization and Comparison
Characterization and ComparisonBenjamin Franklin
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.pptPalaniKumarR2
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.pptSamPrem3
 

Ähnlich wie Data Integration and Transformation Techniques for Data Mining (20)

Data integration
Data integrationData integration
Data integration
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
Unit_2_Feature Engineering.pdf
Unit_2_Feature Engineering.pdfUnit_2_Feature Engineering.pdf
Unit_2_Feature Engineering.pdf
 
Chapter 3.pdf
Chapter 3.pdfChapter 3.pdf
Chapter 3.pdf
 
Data preprocess
Data preprocessData preprocess
Data preprocess
 
Attribute oriented analysis
Attribute oriented analysisAttribute oriented analysis
Attribute oriented analysis
 
1234
12341234
1234
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
data processing.pdf
data processing.pdfdata processing.pdf
data processing.pdf
 
Data Preprocessing&tools
Data Preprocessing&toolsData Preprocessing&tools
Data Preprocessing&tools
 
Datapreprocessing
DatapreprocessingDatapreprocessing
Datapreprocessing
 
Data MIning: Data processing
Data MIning: Data processingData MIning: Data processing
Data MIning: Data processing
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Analysis Of Attribute Revelance
Analysis Of Attribute RevelanceAnalysis Of Attribute Revelance
Analysis Of Attribute Revelance
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Characterization and Comparison
Characterization and ComparisonCharacterization and Comparison
Characterization and Comparison
 
Unit 3-2.ppt
Unit 3-2.pptUnit 3-2.ppt
Unit 3-2.ppt
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
 

Mehr von kavitha muneeshwaran (13)

Physical Security
Physical SecurityPhysical Security
Physical Security
 
Digital Audio
Digital AudioDigital Audio
Digital Audio
 
Java
JavaJava
Java
 
Data structure
Data structureData structure
Data structure
 
Internet Programming with Java
Internet Programming with JavaInternet Programming with Java
Internet Programming with Java
 
Digital image processing
Digital image processing  Digital image processing
Digital image processing
 
Staffing level estimation
Staffing level estimation Staffing level estimation
Staffing level estimation
 
Transaction Management - Deadlock Handling
Transaction Management - Deadlock HandlingTransaction Management - Deadlock Handling
Transaction Management - Deadlock Handling
 
Process
ProcessProcess
Process
 
Digital Logic circuit
Digital Logic circuitDigital Logic circuit
Digital Logic circuit
 
C and C++ functions
C and C++ functionsC and C++ functions
C and C++ functions
 
I/O system in intel 80386 microcomputer architecture
I/O system in intel 80386 microcomputer architectureI/O system in intel 80386 microcomputer architecture
I/O system in intel 80386 microcomputer architecture
 
narrow Band ISDN
narrow Band ISDNnarrow Band ISDN
narrow Band ISDN
 

Kürzlich hochgeladen

CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 

Kürzlich hochgeladen (20)

CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 

Data Integration and Transformation Techniques for Data Mining

  • 1. Submitted by, M. Kavitha M.Sc., Nadar Saraswathi College of Art & Science, Theni. Data Mining Data Integration and Transformation
  • 2. Data Integration * Data Integration involves combining data from several disparate source, which are stored using various technologies and provide a unified view of the data. * The later initiative is often called a data warehouse. * It merges the data from multiple data stores (data source). * It includes multiple databases, data cubes or flat files. * Metadata, correlation analysis, data conflict detection and resolution of semantic heterogeneity contribute towards smooth data integration.
  • 3. Advantages : 1. Independence. 2. Faster query processing. 3. Complex query processing. 4. Advanced data summarization & storage possible. 5. High volume data processing. Disadvantages : 1. Latency (since data needs to be loaded using ETL). 2. Costlier (data localization, infrastructure, security).
  • 4. There are a number of issues to consider during data integration. 1. Schema Integration. 2. Redundancy. 3. Detection and resolution of data value conflicts. Schema integration : The real-world entities from multiple source be matched is referred to as the entity identification problem. For example, Data analyst or the computer be sure that customer_id in one database and cust_number in another refer to the same entity. Databases and data warehouses that is a data about the data it’s a meta data.
  • 5. Redundancy : * It is another important issue. * An attribute may be redundant if it can be “derived” from another table, such as annual revenue. * Some redundancies can be detected by correlation analysis. For example, Two attributes, such analysis can measure how strongly one attribute implies the other based on the available data. The correlation between attributes attribute A and B by
  • 6. Detection and resolution of data value conflicts : * A third important issue in data integration is the detection and resolution of data value conflicts. * The same real-world entity, attribute values from different sources. This may be due to differences in representation, scaling, or encoding. * An attribute in one system may be recorded at a lower level of abstraction than the “same” attribute in another. * For example, the total sales in one database may refer to one branch of All Electronics, an attribute of the same name in another database may refer to the total sales for All Electronics stores in a given region.
  • 7. Data Transformation * Data transformation the data are transformed or consolidated into forms in appropriate for mining. * Data transformation can involve 1. Smoothing. 2. Aggregation. 3. Generalization. 4. Normalization. 5. Attribute construction. Smoothing : Which works to remove the noise from data. Such techniques include binning, clustering and regression.
  • 8. Aggregation : * Where summary or aggregation operations are applied to the data. * For example, the daily sales data may be aggregated so as to compute monthly and annual total amounts. Generalization : * The data where low-level or “primitive” data are placed by higher-level concepts through the use of concept through the use of concept hierarchies. * For example, the attributes like street can be generalized to higher-level concept city or country when the numeric attributes to higher-level concept young, middle- aged and street.
  • 9. Normalization : Where the attribute data are scaled so as to fall within a specified range, such as -1.0 to 1.0 or 0.0 to 1.0 Attribute construction : Where new attribute are a constructed and added from the given set of attributes to help the mining process. There are many method for data normalization. * Min-Max normalization. * Z-Score normalization. * Normalization by decimal scaling.
  • 10. Min – Max Normalization : It performs a linear transformation on the original data. Suppose that min A and max A are the minimum and maximum values of attributes A. A Min – Max normalization maps a value v of A to v’ in the range. Z – Score Normalization : The Z – Score normalization a value of an attribute A are normalized based on the mean and standard deviation of A. A value v of A is normalized to v’
  • 11. Normalization by Decimal Scaling : Normalization by decimal scaling normalizes by moving the decimal point of values of attribute A. The number of decimal points moved depends on the maximum absolute value of A. A value v of A is normalized to v’ by computing where j is the smallest integer such that Max(|V’|) < 1.