SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Why are descriptive statistics important?
In January 1986, the space shuttle Challenger broke apart shortly after liftoff. The
accident was caused by a part that was not designed to fly at the unusually cold
temperature of 29◦ F at launch.
Here are the launch-temperatures of the first 25 shuttle missions (in degrees F):
66,70,69,80,68,67,72,70,70,57,63,70,78,67,53,67,75,70,81,76,79,75,76,58,29
Temperature
Frequency
20 30 40 50 60 70 80 90
0
2
4
6
8
10
The two most important functions of descriptive statistics are:
I Communicate information
I Support reasoning about data
When exploring data of large size, it becomes essential to use summaries.
Graphical summaries of data
It is best to use a graphical summary to communicate information, because people
prefer to look at pictures rather than at numbers.
There are many ways to visualize data. The nature of the data and the goal of the
visualization determine which method to choose.
Pie chart and dot plot
California
Oregon
Washington
Other US
International
California
Oregon
Washington
Other US
International
0 10 20 30 40
Percent
The dot plot makes it easier to compare frequencies of various categories, while the pie
chart allows more easily to eyeball what fraction of the total a category corresponds to.
Bar graph
When the data are quantitative (i.e. numbers), then they should be put on a number
line. This is because the ordering and the distance between the numbers convey
important information.
The bar graph is essentially a dot plot put on its side.
3 4 5 6
Number of assignments completed
0
2
4
6
8
The histogram
The histogram allows to use blocks with different widths.
Key point: The areas of the blocks are
proportional to frequency.
So the percentage falling into a block can be figured without a vertical scale since the
total area equals 100%.
But it’s helpful to have a vertical scale (density scale). Its unit is ‘% per unit’, so in the
above example the vertical unit is ‘% per year’.
The histogram gives two kinds of information about the data:
1. Density (crowding): The height of the bar tells how many subjects there are for one
unit on the horizontal scale. For example, the highest density is around age 19 as
.04 = 4% of all subjects are age 19. In contrast, only about 0.7% of subjects fall into
each one year range for ages 60–80.
2. Percentages (relative frequences): Those are given by
area = height x width.
For example, about 14% of all subjects fall into the age range 60–80, because the
corresponding area is (20 years) x (0.7 % per year)=14 %. Alternatively, you can find
this answer by eyeballing that this area makes up roughly 1/7 of the total area of the
histogram, so roughly 1/7=14% of all subjects fall in that range.
The boxplot (box-and-whisker plot)
The boxplot depicts five key numbers of the data:
10
15
20
25
30
Miles
per
gallon
for
32
cars
The boxplot conveys less information than a histogram, but it takes up less space and
so is well suited to compare several datasets:
4 6 8
10
15
20
25
30
Number of Cylinders
Miles
per
gallon
The scatterplot
The scatterplot is used to depict data that come as pairs.
6 8 10 12 14 16
0
5000
10000
15000
20000
25000
Education
Income
The scatterplot visualizes the relationship
between the two variables.
Providing context is important
Statistical analyses typically compare the observed data to a reference. Therefore
context is essential for graphical integrity.
I ‘The Visual Display of Quantitative Information’ by Edward Tufte (p.74)
One way to provide context is by using small multiples. The compact design of the
boxplot makes it well suited for this task:
Providing context with small multiples
Pitfalls when visualizing data
Sophisticated software makes it tempting to produce showy but poor visualizations:
ale up the projects so it can
elf," Webb said.
gether experts from many-
se cooperation results in a
ogical breakthroughs.
is heading towards inter-
Webb said. "Having the
biologists all in the same
roughs. I think it is a good
erdisciplinary program at
ofcooperation between dis-
ing funding requirements.
fficult because it is general-
experiments. Often biolo-
ogy development is critical,
ding. You are going after
al sources that are used to
al way," Webb said.
fense Advanced Research
n more willing to take the
ayoff projects, according to
rtment of Defense funding
ects that are long-range,
ard, like those taken on by
over the 2000 figure.
"The total is surprisinglylarge in
light ofthe overall sharp decline in
stock market values over this peri-
od," RAND's Council forAid to Ed-
ucation said.The non-profit Coun-
cil forAid to Education has tracked
"1 expected growth in giving
by foundations to be there, but 1
didn't expect it to be nearly as
high as it was," Kaplan said.
"There was so much bad news in
the stock market, especially dur-
ing that fiscal period, with the
ority now is its Campaign fo
Undergraduate Education
launched by University Presi-
dent Hennessy in 2000 with th
goal to raise $ 1 billion over five
years.
This is "the largest campaign
specifically for undergraduat
education ever undertaken b
any university," Henness
wrote in a 2000 letter introduc-
ing the program.
As of March 31, Stanford
campaign had already raise
$733 million, although a portio
ofthat totalreflected as-yet unme
commitments to match dona
tions.
Columbia University, wi
$359 million in 2001, was just be
hind Stanford. Indiana Universi
ty was the most-funded pub
university, in seventh place wi
$301 million.
AARON STAPLE/The Stanford Daily
Steakburger"
I The ‘Ghettysburg Powerpoint Presentation’ by Peter Norvig
Numerical summary measures
For summerizing data with one number, use the mean (=average) or the median.
The median is the number that is larger than half the data and smaller than the other
half.
Mean vs. median
Mean and median are the same when the histogram is symmetric.
100 measurements of the speed of light
km/sec
600 700 800 900 1000 1100
0
5
10
15
20
25
30
Mean vs. median
When the histogram is skewed to the right, then the mean can be much larger than the
median.
So if the histogram is very skewed, then use the median.
Mean vs. median
If the median sales price of 10 homes is $ 1 million, then we know that 5 homes sold for
$ 1 million or more.
If we are told that the average sale price is $ 1 million, then we can’t draw such a
conclusion:
Percentiles
The 90th percentile of incomes is
$ 135,000: 90% of households report an
income of $ 135,000 or less, 10% report
more.
The 75th percentile is called 3rd quartile: $ 85,000
The 50th percentile is the median: $ 50,000
The 25th percentile is called 1st quartile.
Five-number summary
Recall that the boxplot gives a five-number summary of the data:
the smallest number, 1st quartile, median, 3rd quartile, largest number.
10
15
20
25
30
Miles
per
gallon
for
32
cars
The interquartile range = 3rd quartile − 1st quartile.
It measures how spread out the data are.
The standard deviation
A more commonly used measure of spread is the standard deviation.
x̄ stands for the average of the numbers x1, . . . , xn.
The standard deviation of these numbers is
s =
v
u
u
t 1
n
n
X
i=1
(xi − x̄)2 or
v
u
u
t 1
n − 1
n
X
i=1
(xi − x̄)2
The two numbers x̄ and s are often used to summarize data. Both are sensitive to a
few large or small data.
If that is a concern, use the median and the interquartile range.

Weitere ähnliche Inhalte

Ähnlich wie Why descriptive stats are important for understanding shuttle launch temps

As mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docxAs mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docxfredharris32
 
Statistics Review
Statistics ReviewStatistics Review
Statistics Reviewbpneill
 
BIOSTAT.pptx
BIOSTAT.pptxBIOSTAT.pptx
BIOSTAT.pptxDoiLoreto
 
Intro to statistics formatted
Intro to statistics   formattedIntro to statistics   formatted
Intro to statistics formattedaurrico
 
Type of data @ web mining discussion
Type of data @ web mining discussionType of data @ web mining discussion
Type of data @ web mining discussionCherryBerry2
 
Type of data @ Web Mining Discussion
Type of data @ Web Mining DiscussionType of data @ Web Mining Discussion
Type of data @ Web Mining DiscussionCherryBerry2
 
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docxInstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docxdirkrplav
 
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docxTSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docxnanamonkton
 
03 chapter 3 application .pptx
03 chapter 3 application .pptx03 chapter 3 application .pptx
03 chapter 3 application .pptxHendmaarof
 
Instructions and Advice · This assignment consists of six que.docx
Instructions and Advice · This assignment consists of six que.docxInstructions and Advice · This assignment consists of six que.docx
Instructions and Advice · This assignment consists of six que.docxdirkrplav
 
1 BBS300 Empirical Research Methods for Business .docx
1  BBS300 Empirical  Research  Methods  for  Business .docx1  BBS300 Empirical  Research  Methods  for  Business .docx
1 BBS300 Empirical Research Methods for Business .docxoswald1horne84988
 

Ähnlich wie Why descriptive stats are important for understanding shuttle launch temps (14)

As mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docxAs mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docx
 
Chapter 11
Chapter 11Chapter 11
Chapter 11
 
Statistics Review
Statistics ReviewStatistics Review
Statistics Review
 
Sofi Presentation
Sofi PresentationSofi Presentation
Sofi Presentation
 
BIOSTAT.pptx
BIOSTAT.pptxBIOSTAT.pptx
BIOSTAT.pptx
 
Intro to statistics formatted
Intro to statistics   formattedIntro to statistics   formatted
Intro to statistics formatted
 
Type of data @ web mining discussion
Type of data @ web mining discussionType of data @ web mining discussion
Type of data @ web mining discussion
 
Type of data @ Web Mining Discussion
Type of data @ Web Mining DiscussionType of data @ Web Mining Discussion
Type of data @ Web Mining Discussion
 
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docxInstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
 
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docxTSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
 
03 chapter 3 application .pptx
03 chapter 3 application .pptx03 chapter 3 application .pptx
03 chapter 3 application .pptx
 
Instructions and Advice · This assignment consists of six que.docx
Instructions and Advice · This assignment consists of six que.docxInstructions and Advice · This assignment consists of six que.docx
Instructions and Advice · This assignment consists of six que.docx
 
Sampling Technique
Sampling TechniqueSampling Technique
Sampling Technique
 
1 BBS300 Empirical Research Methods for Business .docx
1  BBS300 Empirical  Research  Methods  for  Business .docx1  BBS300 Empirical  Research  Methods  for  Business .docx
1 BBS300 Empirical Research Methods for Business .docx
 

Kürzlich hochgeladen

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Kürzlich hochgeladen (20)

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 

Why descriptive stats are important for understanding shuttle launch temps

  • 1. Why are descriptive statistics important? In January 1986, the space shuttle Challenger broke apart shortly after liftoff. The accident was caused by a part that was not designed to fly at the unusually cold temperature of 29◦ F at launch. Here are the launch-temperatures of the first 25 shuttle missions (in degrees F): 66,70,69,80,68,67,72,70,70,57,63,70,78,67,53,67,75,70,81,76,79,75,76,58,29 Temperature Frequency 20 30 40 50 60 70 80 90 0 2 4 6 8 10
  • 2. The two most important functions of descriptive statistics are: I Communicate information I Support reasoning about data When exploring data of large size, it becomes essential to use summaries.
  • 3. Graphical summaries of data It is best to use a graphical summary to communicate information, because people prefer to look at pictures rather than at numbers. There are many ways to visualize data. The nature of the data and the goal of the visualization determine which method to choose.
  • 4. Pie chart and dot plot California Oregon Washington Other US International California Oregon Washington Other US International 0 10 20 30 40 Percent The dot plot makes it easier to compare frequencies of various categories, while the pie chart allows more easily to eyeball what fraction of the total a category corresponds to.
  • 5. Bar graph When the data are quantitative (i.e. numbers), then they should be put on a number line. This is because the ordering and the distance between the numbers convey important information. The bar graph is essentially a dot plot put on its side. 3 4 5 6 Number of assignments completed 0 2 4 6 8
  • 6. The histogram The histogram allows to use blocks with different widths. Key point: The areas of the blocks are proportional to frequency. So the percentage falling into a block can be figured without a vertical scale since the total area equals 100%. But it’s helpful to have a vertical scale (density scale). Its unit is ‘% per unit’, so in the above example the vertical unit is ‘% per year’.
  • 7. The histogram gives two kinds of information about the data: 1. Density (crowding): The height of the bar tells how many subjects there are for one unit on the horizontal scale. For example, the highest density is around age 19 as .04 = 4% of all subjects are age 19. In contrast, only about 0.7% of subjects fall into each one year range for ages 60–80.
  • 8. 2. Percentages (relative frequences): Those are given by area = height x width. For example, about 14% of all subjects fall into the age range 60–80, because the corresponding area is (20 years) x (0.7 % per year)=14 %. Alternatively, you can find this answer by eyeballing that this area makes up roughly 1/7 of the total area of the histogram, so roughly 1/7=14% of all subjects fall in that range.
  • 9. The boxplot (box-and-whisker plot) The boxplot depicts five key numbers of the data: 10 15 20 25 30 Miles per gallon for 32 cars
  • 10. The boxplot conveys less information than a histogram, but it takes up less space and so is well suited to compare several datasets: 4 6 8 10 15 20 25 30 Number of Cylinders Miles per gallon
  • 11. The scatterplot The scatterplot is used to depict data that come as pairs. 6 8 10 12 14 16 0 5000 10000 15000 20000 25000 Education Income The scatterplot visualizes the relationship between the two variables.
  • 12. Providing context is important Statistical analyses typically compare the observed data to a reference. Therefore context is essential for graphical integrity. I ‘The Visual Display of Quantitative Information’ by Edward Tufte (p.74) One way to provide context is by using small multiples. The compact design of the boxplot makes it well suited for this task:
  • 13. Providing context with small multiples
  • 14. Pitfalls when visualizing data Sophisticated software makes it tempting to produce showy but poor visualizations: ale up the projects so it can elf," Webb said. gether experts from many- se cooperation results in a ogical breakthroughs. is heading towards inter- Webb said. "Having the biologists all in the same roughs. I think it is a good erdisciplinary program at ofcooperation between dis- ing funding requirements. fficult because it is general- experiments. Often biolo- ogy development is critical, ding. You are going after al sources that are used to al way," Webb said. fense Advanced Research n more willing to take the ayoff projects, according to rtment of Defense funding ects that are long-range, ard, like those taken on by over the 2000 figure. "The total is surprisinglylarge in light ofthe overall sharp decline in stock market values over this peri- od," RAND's Council forAid to Ed- ucation said.The non-profit Coun- cil forAid to Education has tracked "1 expected growth in giving by foundations to be there, but 1 didn't expect it to be nearly as high as it was," Kaplan said. "There was so much bad news in the stock market, especially dur- ing that fiscal period, with the ority now is its Campaign fo Undergraduate Education launched by University Presi- dent Hennessy in 2000 with th goal to raise $ 1 billion over five years. This is "the largest campaign specifically for undergraduat education ever undertaken b any university," Henness wrote in a 2000 letter introduc- ing the program. As of March 31, Stanford campaign had already raise $733 million, although a portio ofthat totalreflected as-yet unme commitments to match dona tions. Columbia University, wi $359 million in 2001, was just be hind Stanford. Indiana Universi ty was the most-funded pub university, in seventh place wi $301 million. AARON STAPLE/The Stanford Daily Steakburger" I The ‘Ghettysburg Powerpoint Presentation’ by Peter Norvig
  • 15. Numerical summary measures For summerizing data with one number, use the mean (=average) or the median. The median is the number that is larger than half the data and smaller than the other half.
  • 16. Mean vs. median Mean and median are the same when the histogram is symmetric. 100 measurements of the speed of light km/sec 600 700 800 900 1000 1100 0 5 10 15 20 25 30
  • 17. Mean vs. median When the histogram is skewed to the right, then the mean can be much larger than the median. So if the histogram is very skewed, then use the median.
  • 18. Mean vs. median If the median sales price of 10 homes is $ 1 million, then we know that 5 homes sold for $ 1 million or more. If we are told that the average sale price is $ 1 million, then we can’t draw such a conclusion:
  • 19. Percentiles The 90th percentile of incomes is $ 135,000: 90% of households report an income of $ 135,000 or less, 10% report more. The 75th percentile is called 3rd quartile: $ 85,000 The 50th percentile is the median: $ 50,000 The 25th percentile is called 1st quartile.
  • 20. Five-number summary Recall that the boxplot gives a five-number summary of the data: the smallest number, 1st quartile, median, 3rd quartile, largest number. 10 15 20 25 30 Miles per gallon for 32 cars The interquartile range = 3rd quartile − 1st quartile. It measures how spread out the data are.
  • 21. The standard deviation A more commonly used measure of spread is the standard deviation. x̄ stands for the average of the numbers x1, . . . , xn. The standard deviation of these numbers is s = v u u t 1 n n X i=1 (xi − x̄)2 or v u u t 1 n − 1 n X i=1 (xi − x̄)2 The two numbers x̄ and s are often used to summarize data. Both are sensitive to a few large or small data. If that is a concern, use the median and the interquartile range.