SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Introduction to Data Mining Ch. 2 Data Preprocessing Heon Gyu Lee ( [email_address] ) http://dblab.chungbuk.ac.kr/~hglee DB/Bioinfo., Lab.  http://dblab.chungbuk.ac.kr Chungbuk National University
Why Data Preprocessing? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is Data? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Attributes Objects
Types of Attributes  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Discrete and Continuous Attributes  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Quality  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Noise ,[object Object],[object Object],Two Sine Waves Two Sine Waves + Noise
Outliers ,[object Object]
Missing Values ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Duplicate Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Major Tasks in Data Preprocessing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Forms of Data Preprocessing
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Data Cleaning
Data Cleaning  : How to Handle Missing Data? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Cleaning  : How to Handle Noisy Data? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Cleaning  : Binning Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Cleaning : Regression x y y = x + 1 X1 Y1 Y1’
Data Cleaning : Cluster Analysis
Data Integration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Integration  : Handling Redundancy in Data Integration ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Integration :  Correlation Analysis (Numerical Data) ,[object Object],[object Object],[object Object],[object Object]
Data Integration  : Correlation Analysis (Categorical Data) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Chi-Square Calculation: An Example ,[object Object],[object Object],1500 1200 300 Sum(col.) 1050 1000(840) 50(210) Not like science fiction 450 200(360) 250(90) Like science fiction Sum (row) Not play chess Play chess
Data Transformation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Transformation : Normalization ,[object Object],[object Object],[object Object],[object Object],[object Object],Where  j  is the smallest integer such that Max(| ν ’ |) < 1
Data Reduction Strategies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Reduction : Aggregation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Reduction : Aggregation Standard Deviation of Average Monthly Precipitation Standard Deviation of Average Yearly Precipitation Variation of Precipitation in Australia
Data Reduction : Sampling  ,[object Object],[object Object],[object Object],[object Object]
Data Reduction : Types of Sampling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Reduction  : Dimensionality Reduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dimensionality Reduction : PCA ,[object Object],x 2 x 1 e
Dimensionality Reduction : PCA ,[object Object],[object Object],x 2 x 1 e
Data Reduction  : Feature Subset Selection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Reduction  : Feature Subset Selection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Reduction  : Feature Creation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Reduction  : Mapping Data to a New Space Two Sine Waves Two Sine Waves + Noise Frequency ,[object Object],[object Object]
Data Reduction  : Discretization Using Class Labels ,[object Object],3 categories for both x and y 5 categories for both x and y
Data Reduction  : Discretization Without Using Class Labels Data Equal interval width Equal frequency K-means
Data Reduction  : Attribute Transformation ,[object Object],[object Object],[object Object]
Question & Answer

Weitere ähnliche Inhalte

Was ist angesagt?

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ksamyMCA
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
kayathri02
 

Was ist angesagt? (20)

Data preprocess
Data preprocessData preprocess
Data preprocess
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPT
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Data PreProcessing
Data PreProcessingData PreProcessing
Data PreProcessing
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
 

Ähnlich wie Data Preprocessing

03 preprocessing
03 preprocessing03 preprocessing
03 preprocessing
purnimatm
 
Data Preprocessing and Visualizsdjvnovrnververdfvdfation
Data Preprocessing and VisualizsdjvnovrnververdfvdfationData Preprocessing and Visualizsdjvnovrnververdfvdfation
Data Preprocessing and Visualizsdjvnovrnververdfvdfation
wokati2689
 
03Predddddddddddddddddddddddprocessling.ppt
03Predddddddddddddddddddddddprocessling.ppt03Predddddddddddddddddddddddprocessling.ppt
03Predddddddddddddddddddddddprocessling.ppt
a99150433
 

Ähnlich wie Data Preprocessing (20)

03Preprocessing01.pdf
03Preprocessing01.pdf03Preprocessing01.pdf
03Preprocessing01.pdf
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
Datapreprocessing
DatapreprocessingDatapreprocessing
Datapreprocessing
 
03 preprocessing
03 preprocessing03 preprocessing
03 preprocessing
 
Chapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptChapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.ppt
 
Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3
 
02Data updated.pdf
02Data updated.pdf02Data updated.pdf
02Data updated.pdf
 
Cs501 data preprocessingdw
Cs501 data preprocessingdwCs501 data preprocessingdw
Cs501 data preprocessingdw
 
Unit 3-2.ppt
Unit 3-2.pptUnit 3-2.ppt
Unit 3-2.ppt
 
data processing.pdf
data processing.pdfdata processing.pdf
data processing.pdf
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessing
 
03Preprocessing_plp.pptx
03Preprocessing_plp.pptx03Preprocessing_plp.pptx
03Preprocessing_plp.pptx
 
03Preprocessing.ppt
03Preprocessing.ppt03Preprocessing.ppt
03Preprocessing.ppt
 
03Preprocessing_plp.pptx
03Preprocessing_plp.pptx03Preprocessing_plp.pptx
03Preprocessing_plp.pptx
 
Data Preprocessing and Visualizsdjvnovrnververdfvdfation
Data Preprocessing and VisualizsdjvnovrnververdfvdfationData Preprocessing and Visualizsdjvnovrnververdfvdfation
Data Preprocessing and Visualizsdjvnovrnververdfvdfation
 
03Preprocessing for student computer sciecne.ppt
03Preprocessing for student computer sciecne.ppt03Preprocessing for student computer sciecne.ppt
03Preprocessing for student computer sciecne.ppt
 
03Preprocessing.ppt
03Preprocessing.ppt03Preprocessing.ppt
03Preprocessing.ppt
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
03Predddddddddddddddddddddddprocessling.ppt
03Predddddddddddddddddddddddprocessling.ppt03Predddddddddddddddddddddddprocessling.ppt
03Predddddddddddddddddddddddprocessling.ppt
 

Mehr von Object-Frontier Software Pvt. Ltd (9)

Chap9
Chap9Chap9
Chap9
 
Wsh96 Wilkinson
Wsh96 WilkinsonWsh96 Wilkinson
Wsh96 Wilkinson
 
Dc 11 Brucepotter
Dc 11 BrucepotterDc 11 Brucepotter
Dc 11 Brucepotter
 
Ieee 802.11overview
Ieee 802.11overviewIeee 802.11overview
Ieee 802.11overview
 
Presentation
PresentationPresentation
Presentation
 
Gsm Network
Gsm NetworkGsm Network
Gsm Network
 
GPRS
GPRSGPRS
GPRS
 
CORBA
CORBACORBA
CORBA
 
Rmi
RmiRmi
Rmi
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Data Preprocessing

  • 1. Introduction to Data Mining Ch. 2 Data Preprocessing Heon Gyu Lee ( [email_address] ) http://dblab.chungbuk.ac.kr/~hglee DB/Bioinfo., Lab. http://dblab.chungbuk.ac.kr Chungbuk National University
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. Forms of Data Preprocessing
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. Data Cleaning : Regression x y y = x + 1 X1 Y1 Y1’
  • 18. Data Cleaning : Cluster Analysis
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. Data Reduction : Aggregation Standard Deviation of Average Monthly Precipitation Standard Deviation of Average Yearly Precipitation Variation of Precipitation in Australia
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. Data Reduction : Discretization Without Using Class Labels Data Equal interval width Equal frequency K-means
  • 40.