SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Dr. D. Sugumar
Associate Professor/ECE
Karunya
Overview: Analytics Problem Solving
• Goal:
• Understand the basic characteristics of the data
• Examples for characteristics:
• Structure
• Size
• Completeness
• Relationships
• Usually interactive and semi-automated
• Text editors, system calls (head/more/less), etc. to look at raw data
directly
• Helps to understand the structure
• Statistics and visualizations to learn about distributions and
relationships
• Exploration should also include meta data
• Feature names, trace links, etc.
• Overview
• Summary Statistics
• Visualization for Data Exploration
• Summary
• Summarize data through single value
• Do not predict anything about the data (→ inductive statistics)
• Common statistics covered in this course
• Central tendency (mean/median/mode)
• Variability (standard deviation, interquartile range)
• Range of data (min/max)
• Other important statistics
• Kurtosis and skewness for the shape of distributions
• More measures for central tendency, e.g., trimmed means, harmonic mean
• „Typical“ value of the data
• Arithmetic mean
• 𝑚𝑒𝑎𝑛 𝑥 =
1
𝑛
σ𝑖=1
𝑛
𝑥𝑖 with 𝑥 = 𝑥1, … , 𝑥𝑛 ∈ ℝ𝑛
• Median
• The value that separates the higher half from the data of the lower half
• Mode
• The value that appears most in the data
• Measure for the spread of the data
• Also called dispersion
• Standard deviation
• Measure for the difference of observation to the arithmetic mean
• 𝑠𝑑 𝑥 =
σ𝑖=1
𝑛 𝑥𝑖−𝑚𝑒𝑎𝑛 𝑥
2
𝑛−1
• Interquartile Range (IQR)
• Percentile: value below which a given percentage falls
• Difference between the 75% percentile and the 25% percentile
The median is the 50% percentile
• Range for which values are observed
• Can be infinite!
• Minimum
• Smallest observed value
• Maximum
• Largest observed value
• May be strongly distorted by invalid data
• Makes it also a good tool to discover invalid data
• Random typing on the keypad
• 𝑥 = (1,2,1,1,3,4,5,2,3,4,5,1,3,2,1,6,5,4,9,4,3,6,1,5,6,8,4,6,5,1,3,2,1,6,8,7,6,1,3,1,6,8,4,7,6,4,3,5,4,9,7,4,3,1,4,6,8,7,9, 1,4,6,1,3,8,6,7,4,9,6,5,1,3,6,8,7)
• central tendency:
• mean: 4.46052631579
• median: 4.0
• mode (count): 1 (14)
• variability
• sd: 2.41944311488
• IQR: 3.0
• range
• min: 1
• max: 9
• Overview
• Summary Statistics
• Visualization for Data Exploration
• Summary
Numbers are made up and pie charts should actually be avoided
i
X y
10.00 8.04
8.00 6.95
13.00 7.58
9.00 8.81
11.00 8.33
14.00 9.96
6.00 7.24
4.00 4.26
12.00 10.84
7.00 4.82
5.00 5.68
ii
x y
10.00 9.14
8.00 8.14
13.00 8.74
9.00 8.77
11.00 9.26
14.00 8.10
6.00 6.13
4.00 3.10
12.00 9.13
7.00 7.26
5.00 4.74
iii
x y
10.00 7.46
8.00 6.77
13.00 12.74
9.00 7.11
11.00 7.81
14.00 8.84
6.00 6.08
4.00 5.39
12.00 8.15
7.00 6.42
5.00 5.73
iv
x y
8.00 6.58
8.00 5.76
8.00 7.71
8.00 8.84
8.00 8.47
8.00 7.04
8.00 5.25
19.00 12.50
8.00 5.56
8.00 7.91
8.00 6.89
Have the same
• Mean
• standard
deviation
• correlation
between x and y
• linear regression
Anscombe‘s Quartet
Plots of the Boston house prices data set
http://archive.ics.uci.edu/ml/machine-learning-databases/housing/
Extremley skewed
Mixture of two
normals after
taking the
logarithm
Looks like an artificially high value
→ Groups all higher incomes
Median
75%
percentile
25%
percentile
Range of
data except
outliers
Outlier
The outlier definition can change. We
used „more than 1.5 times the IQR
away from the 25%/75% percentile.”
You should always check this in the
package you use.
No correlation
visible
Strong linear
correlation
Histogram of data in
the column
Good separation
of blue, but green
and orange are
overlapping
Good
separation of
all three
classes
Density plots of
data in the
column
separated by
classes
Colors show
strength of
correlation
Correlation
between
premiums and
losses
Correlation
between reasons
for accidents
There are different correlation
coefficients. We used Pearson‘s
coefficient, which measures linear
correlations.
Cannot see
structure due to
amount of data
Hexagonal bins
reveal the structure
Linear trend
Regular noise pattern →
Seasonal?
Range of values
• Overview
• Summary Statistics
• Visualization for Data Exploration
• Summary
• Important to understand the data available
• Summary statistics provide a good overview
• Can be deceptive!
• Visualization is a powerful way to understand data
• Understanding of meta data and how domain experts
understand data equally important!

Weitere ähnliche Inhalte

Ähnlich wie 07 Data-Exploration.pdf

Data Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingData Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingSOMASUNDARAM T
 
Biostatistics.pptx
Biostatistics.pptxBiostatistics.pptx
Biostatistics.pptxTawhid4
 
Real life application of statistics in engineering
Real life application of statistics in engineeringReal life application of statistics in engineering
Real life application of statistics in engineeringJannatulFerdous160
 
Applied statistics lecture_2
Applied statistics lecture_2Applied statistics lecture_2
Applied statistics lecture_2Daria Bogdanova
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISBabasID2
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityIkbal Ahmed
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in MalaysiaAhmed Elmalla
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023ayesha455941
 
Exploring Data (1).pptx
Exploring Data (1).pptxExploring Data (1).pptx
Exploring Data (1).pptxgina458018
 
Role of Statistics In Research.pptx
Role of Statistics In Research.pptxRole of Statistics In Research.pptx
Role of Statistics In Research.pptxDipsanuPaul
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxJANNU VINAY
 
Graph Models for Deep Learning
Graph Models for Deep LearningGraph Models for Deep Learning
Graph Models for Deep LearningExperfy
 
PG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisPG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisAashish Patel
 
advance data analyst course in mohali
advance  data  analyst  course in mohaliadvance  data  analyst  course in mohali
advance data analyst course in mohaliseekhodigitalindia
 

Ähnlich wie 07 Data-Exploration.pdf (20)

Data Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingData Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report Writing
 
Biostatistics.pptx
Biostatistics.pptxBiostatistics.pptx
Biostatistics.pptx
 
Real life application of statistics in engineering
Real life application of statistics in engineeringReal life application of statistics in engineering
Real life application of statistics in engineering
 
RM UNIT 6.pptx
RM UNIT 6.pptxRM UNIT 6.pptx
RM UNIT 6.pptx
 
Data in science
Data in science Data in science
Data in science
 
Applied statistics lecture_2
Applied statistics lecture_2Applied statistics lecture_2
Applied statistics lecture_2
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
 
statistical analysis
statistical analysisstatistical analysis
statistical analysis
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & Normality
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023
 
Exploring Data (1).pptx
Exploring Data (1).pptxExploring Data (1).pptx
Exploring Data (1).pptx
 
Data Analysis, Intepretation
Data Analysis, IntepretationData Analysis, Intepretation
Data Analysis, Intepretation
 
Role of Statistics In Research.pptx
Role of Statistics In Research.pptxRole of Statistics In Research.pptx
Role of Statistics In Research.pptx
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
 
Graph Models for Deep Learning
Graph Models for Deep LearningGraph Models for Deep Learning
Graph Models for Deep Learning
 
PG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisPG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data Analysis
 
advance data analyst course in mohali
advance  data  analyst  course in mohaliadvance  data  analyst  course in mohali
advance data analyst course in mohali
 

Mehr von SugumarSarDurai (19)

Parking NYC.pdf
Parking NYC.pdfParking NYC.pdf
Parking NYC.pdf
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Power BI.pdf
Power BI.pdfPower BI.pdf
Power BI.pdf
 
Unit 6.pdf
Unit 6.pdfUnit 6.pdf
Unit 6.pdf
 
Unit 5.pdf
Unit 5.pdfUnit 5.pdf
Unit 5.pdf
 
06 Excel.pdf
06 Excel.pdf06 Excel.pdf
06 Excel.pdf
 
05 python.pdf
05 python.pdf05 python.pdf
05 python.pdf
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf
 
UNit4.pdf
UNit4.pdfUNit4.pdf
UNit4.pdf
 
UNit4d.pdf
UNit4d.pdfUNit4d.pdf
UNit4d.pdf
 
Unit 4 Time Study.pdf
Unit 4 Time Study.pdfUnit 4 Time Study.pdf
Unit 4 Time Study.pdf
 
Unit 3 Micro and Memo motion study.pdf
Unit 3 Micro and Memo motion study.pdfUnit 3 Micro and Memo motion study.pdf
Unit 3 Micro and Memo motion study.pdf
 
02 Work study -Part_1.pdf
02 Work study -Part_1.pdf02 Work study -Part_1.pdf
02 Work study -Part_1.pdf
 
02 Method Study part_2.pdf
02 Method Study part_2.pdf02 Method Study part_2.pdf
02 Method Study part_2.pdf
 
01 Production_part_2.pdf
01 Production_part_2.pdf01 Production_part_2.pdf
01 Production_part_2.pdf
 
01 Production_part_1.pdf
01 Production_part_1.pdf01 Production_part_1.pdf
01 Production_part_1.pdf
 
01 Industrial Management_Part_1a .pdf
01 Industrial Management_Part_1a .pdf01 Industrial Management_Part_1a .pdf
01 Industrial Management_Part_1a .pdf
 
01 Industrial Management_Part_1 .pdf
01 Industrial Management_Part_1 .pdf01 Industrial Management_Part_1 .pdf
01 Industrial Management_Part_1 .pdf
 

Kürzlich hochgeladen

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 

Kürzlich hochgeladen (20)

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 

07 Data-Exploration.pdf

  • 1. Dr. D. Sugumar Associate Professor/ECE Karunya
  • 2.
  • 4. • Goal: • Understand the basic characteristics of the data • Examples for characteristics: • Structure • Size • Completeness • Relationships
  • 5. • Usually interactive and semi-automated • Text editors, system calls (head/more/less), etc. to look at raw data directly • Helps to understand the structure • Statistics and visualizations to learn about distributions and relationships • Exploration should also include meta data • Feature names, trace links, etc.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. • Overview • Summary Statistics • Visualization for Data Exploration • Summary
  • 12. • Summarize data through single value • Do not predict anything about the data (→ inductive statistics) • Common statistics covered in this course • Central tendency (mean/median/mode) • Variability (standard deviation, interquartile range) • Range of data (min/max) • Other important statistics • Kurtosis and skewness for the shape of distributions • More measures for central tendency, e.g., trimmed means, harmonic mean
  • 13. • „Typical“ value of the data • Arithmetic mean • 𝑚𝑒𝑎𝑛 𝑥 = 1 𝑛 σ𝑖=1 𝑛 𝑥𝑖 with 𝑥 = 𝑥1, … , 𝑥𝑛 ∈ ℝ𝑛 • Median • The value that separates the higher half from the data of the lower half • Mode • The value that appears most in the data
  • 14. • Measure for the spread of the data • Also called dispersion • Standard deviation • Measure for the difference of observation to the arithmetic mean • 𝑠𝑑 𝑥 = σ𝑖=1 𝑛 𝑥𝑖−𝑚𝑒𝑎𝑛 𝑥 2 𝑛−1 • Interquartile Range (IQR) • Percentile: value below which a given percentage falls • Difference between the 75% percentile and the 25% percentile The median is the 50% percentile
  • 15. • Range for which values are observed • Can be infinite! • Minimum • Smallest observed value • Maximum • Largest observed value • May be strongly distorted by invalid data • Makes it also a good tool to discover invalid data
  • 16. • Random typing on the keypad • 𝑥 = (1,2,1,1,3,4,5,2,3,4,5,1,3,2,1,6,5,4,9,4,3,6,1,5,6,8,4,6,5,1,3,2,1,6,8,7,6,1,3,1,6,8,4,7,6,4,3,5,4,9,7,4,3,1,4,6,8,7,9, 1,4,6,1,3,8,6,7,4,9,6,5,1,3,6,8,7) • central tendency: • mean: 4.46052631579 • median: 4.0 • mode (count): 1 (14) • variability • sd: 2.41944311488 • IQR: 3.0 • range • min: 1 • max: 9
  • 17. • Overview • Summary Statistics • Visualization for Data Exploration • Summary
  • 18. Numbers are made up and pie charts should actually be avoided
  • 19. i X y 10.00 8.04 8.00 6.95 13.00 7.58 9.00 8.81 11.00 8.33 14.00 9.96 6.00 7.24 4.00 4.26 12.00 10.84 7.00 4.82 5.00 5.68 ii x y 10.00 9.14 8.00 8.14 13.00 8.74 9.00 8.77 11.00 9.26 14.00 8.10 6.00 6.13 4.00 3.10 12.00 9.13 7.00 7.26 5.00 4.74 iii x y 10.00 7.46 8.00 6.77 13.00 12.74 9.00 7.11 11.00 7.81 14.00 8.84 6.00 6.08 4.00 5.39 12.00 8.15 7.00 6.42 5.00 5.73 iv x y 8.00 6.58 8.00 5.76 8.00 7.71 8.00 8.84 8.00 8.47 8.00 7.04 8.00 5.25 19.00 12.50 8.00 5.56 8.00 7.91 8.00 6.89 Have the same • Mean • standard deviation • correlation between x and y • linear regression Anscombe‘s Quartet
  • 20. Plots of the Boston house prices data set http://archive.ics.uci.edu/ml/machine-learning-databases/housing/ Extremley skewed Mixture of two normals after taking the logarithm Looks like an artificially high value → Groups all higher incomes
  • 21. Median 75% percentile 25% percentile Range of data except outliers Outlier The outlier definition can change. We used „more than 1.5 times the IQR away from the 25%/75% percentile.” You should always check this in the package you use.
  • 23. Good separation of blue, but green and orange are overlapping Good separation of all three classes Density plots of data in the column separated by classes
  • 24. Colors show strength of correlation Correlation between premiums and losses Correlation between reasons for accidents There are different correlation coefficients. We used Pearson‘s coefficient, which measures linear correlations.
  • 25. Cannot see structure due to amount of data Hexagonal bins reveal the structure
  • 26. Linear trend Regular noise pattern → Seasonal? Range of values
  • 27. • Overview • Summary Statistics • Visualization for Data Exploration • Summary
  • 28. • Important to understand the data available • Summary statistics provide a good overview • Can be deceptive! • Visualization is a powerful way to understand data • Understanding of meta data and how domain experts understand data equally important!