03.data presentation(2015) 2

Main purpose of data presentation or
displaying data
 To make the findings easy and clear to understand
 To provide comprehensive information in a succinct and
efficient way

Four ways of displaying data
 Text
 Tables
 Graph
 Statistical measures ( see in descriptive statistics
lecture)

Text
 The most common method of communication in both quantitative
and qualitative research studies
 Writing should be thematic; i.e written around various themes of
report
 Text should place the most important and significant findings in the
context of short- and longer-term trends.
 It should explore relationships, causes and effects, to the extent that
they can be supported by evidence.
 It should show readers the significance of the most current
information.

Good example of text
Net profits of non-financial companies in the Netherlands
amounted to 19 billion euros in the second quarter of 2008. This
is the lowest level for three years. Profits were 11 percent lower
than in the second quarter of 2007. The drop in net profits is the
result of two main factors: higher interest costs - the companies
paid more net interest - and lower profits of foreign subsidiaries.
Source: Statistics Netherlands
 Findings should be integrated into the literature citing references
using acceptable system of citation

Tables
 A table is the simplest means of summarizing a set of
observations .
 Tables are more informative when they are not overly
complex.
 As a general rule, tables and columns within them should
always clearly labeled.
 If units of measurement are involved, they should be
specified.
 Uses: It can be used for all types of numerical data.

Table
Structure of tables
1. Title: indicates the table number and describes the type of
data the table contains
2. Row stub: the subcategories of a variable, listed along the y-
axis
3. Column headings: the subcategories of a variable, listed along
the x-axis
4. Body: the cells housing the analysed data
* Supplementary notes or footnotes

Structure of tables
Table (x) Attitudes towards uranium by age
Attitudes towards
uranium mining
Age of respondent
Total
<25 25-34 35-44 45-54 55+
Strongly favourable cell
Favourable
Uncertain
Unfavaourable
Strongly unfavourable
Total
Title
Column headings
Row stub
Body
Source: …………………………Hypothetical data
Supplementary notes
subcategory
 should give a clear and accurate description
of the data.
 should answer the three questions “what”,
“where” and “when”.
 Be short and concise, and avoid using verbs.

Tables
Types of tables
1. Univariate (also known as frequency tables) –
containing information about one variable
2. Bivariate (also known as cross-tabulations) –
containing information about two variables
3. Polyvariate or multivariate – containing information
about more than two variables

Tables
Frequency distributions (Absolute frequency)
 One type of data that is commonly used to evaluate data
 For nominal and ordinal data, a frequency distribution
consists of a set of class or categories
Kaposi ‘s
sarcoma
Number of
individuals
Yes 246
No 2314
Table (1) Cases of Kaposi’s sarcoma for the first 2560 AIDS patients
reported to the Centers for Disease Control in Atlanta, Georgia

Tables (Frequency table/Univariate)
Table (2) Cigarette consumption per person aged 18 or older, United
states, 1900-1960
Year
Number of
Cigarettes
1990 54
1910 151
1920 665
1930 1485
1940 1976
1950 3522
1960 4171

Tables
Frequency distributions
 For discrete or continuous data, the range of values of the
observations must be broken down into a series of distinct,
non overlapping intervals with equal width intervals

Tables (Frequency table/Univariate)
Table (3) Absolute frequencies of serum cholesterol levels for 1067 US
males age 25 to 34 years, 1976-1980
Cholesterol Level
(mg/100ml)
Number of
Men
80-119 13
120-159 150
160-199 442
200-239 299
240-279 115
280-319 34
320-359 9
360-399 5
Total 1067

Tables
Relative Frequency
 The proportion of the total number of observations that
appears in that interval
 Computed by dividing the number of values within an interval
by the total number of values in the table
 Relative frequencies are useful for comparing sets of data
that contain unequal number of observations

Tables (Bivariate/cross-tabulations)
Table (4) Absolute and relative frequencies of serum cholesterol levels
for 2294 US males, 1976-1980
Cholesterol
Level
(mg/100ml)
Age 25-34 Age 55-64
Number of
Men
Relative
frequency (%)
Number of
Men
Relative
frequency (%)
80-119 13 1.2 5 0.4
120-159 150 14.1 48 3.9
160-199 442 41.4 265 21.6
200-239 299 28.0 458 37.3
240-279 115 10.8 281 22.9
280-319 34 3.2 128 10.4
320-359 9 0.8 35 2.9
360-399 5 0.5 7 0.6
Total 1067 100.0 1227 100.0

Tables
Cumulative relative Frequency
 The percentage of the total number of observations that
have a value less than or equal to the upper limit of the
interval
 Calculated by summing the relative frequencies for the
specified interval and all the previous ones

Tables (Bivariate/cross-tabulations)
Table (5) Relative and cumulative frequencies of serum cholesterol
levels for 2294 US males, 1976-1980
Cholesterol
Level
(mg/100ml)
Age 25-34 Age 55-64
Relative
frequency (%)
Cumulative
relative
frequency (%)
Relative
frequency (%)
Cumulative
relative
frequency (%)
80-119 1.2 1.2 0.4 0.4
120-159 14.1 15.3 3.9 4.3
160-199 41.4 56.7 21.6 25.9
200-239 28.0 84.7 37.3 63.2
240-279 10.8 95.5 22.9 86.1
280-319 3.2 98.7 10.4 96.5
320-359 0.8 99.5 2.9 99.4
360-399 0.5 100.0 0.6 100.0
Total 100.0 100.0

Tables
Types of percentages
 The use of percentages is a common procedure in the
interpretation of data
 Three types of percentage
 Row percentage
 Column percentage
 Total percentage
Use of rounding and decimals: Numeric values should be right
justified

What do you find out in this table?
Bad example

What do you find out in this table?
Bad example
Good example

Graphs
 Graph or pictorial representation of numerical data → to
summarize and display data
 Should be designed to convey the general patterns in a set of
observations at a single glance
 Most informative graphs are relatively simple and self-
explanatory.
 They should be clearly labeled and units of measurement
should be indicated.

Bar Charts
 A popular type of graph used to display a frequency
distribution for nominal or ordinal data.
 Horizontal axis: various categories into which the
observations fall
 Vertical axis (height of bar): the frequency or the relative
frequency of observations within the class
 Uses: It is used to compare frequencies or values for different
categories or groups.

Bar Charts
 The bars can be either vertically or horizontally oriented.
 In the horizontal orientation, the text is easier to read.
 It is also easier to compare the different values when the bars
are ordered by size from smallest to largest, rather than
displayed arbitrarily.
 The bars should be much wider than the gaps between them.
 The gaps should not exceed 40% of the bar width.

Simple Bar chart (vertical)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990
Numberofcigarettes
Year
Figure (1) Cigarette consumption per person 18 years of age or older,
United states, 1900-1990

Clustered Bar chart
Source: Fulfilling the Health Agenda for Women and Children: The 2014 Report

Clustered Bar chart
Figure 2. Use of yoga among adults in the past 12 months, by age group: United States,
2002, 2007, and 2012

Stacked bar
 Uses: A stacked bar chart can be used to show and compare
segments of totals.
 Caution should be exercised when using this type of chart.
 It can be difficult to analyze and compare, if there are too
many items in each stack or if many items are fairly close in
size.

Pie chart
 A pie chart can be used to show the percentage distribution
of one variable, but only a small number of categories can be
displayed, usually not more than six.
 There are 360 degrees in a circle and so the full circle can be
used to represent 100% or the total population.
 The circle or pie is divided into sections in accordance with
the magnitude of each category
 Each slice is proportionate to the size of each subcategory of a
frequency distribution
 Uses: Pies can be drawn for both qualitative data and
variables measured on a continuous scale but grouped into
categories

Pie chart
53.0%47.0%
Male Female
Figure (1) Gender of study population

Pie chart
69.80%
14.63%
3.86%
1.65%
3.14%
2.23%
4.69%
Hospitals
Ambulatory health care
Retail sale and medical goods
Figure (2) Government health expenditure by providers (2010-2011)

Histogram
 A special type of bar graph showing frequency distribution for
continuous data
 When we construct a histogram the values of the variable
under consideration are represented by the horizontal axis,
while the vertical axis has as its scale the frequency (or
relative frequency if desired) of occurrence.
 True class limit is used for continuity of values or observations

Histogram
Figure(1) Histogram for leadership aptitude scores for n= 30 football coaches.

Frequency polygon
 Frequency distribution can be portrayed graphically by means
of a frequency polygon, which is a special kind of line graph.
 To draw a frequency polygon we first place a dot above the
midpoint of each class interval represented on the horizontal
axis of a graph.
 The height of a given dot above the horizontal axis corresponds
to the frequency of the relevant class interval.
 Connecting the dots by straight lines produces the frequency
polygon.

Frequency polygon
 Note that the polygon is brought down to the horizontal axis
at the ends at points that would be the midpoints if there
were an additional cell at each end of the corresponding
histogram.
 This allows for the total area to be enclosed.
 The total area under the frequency polygon= the area under
the histogram.

Frequency polygon
Figure(3) Histogram and frequency
polygon of ages of 189 subjects
Figure (2) Frequency polygon of ages
of 189 students

Ogive curve
The graph of a
cumulative
probability
distribution is called
‘Ogive’.
Figure (3) Cumulative percentage frequency polygon for leadership aptitude
scores for n= 30 football coaches

Stem-and-leaf
 A properly constructed stem-and-leaf display, like a histogram, provides
information regarding the range of the data set, shows the location of the
highest concentration of measurements, and reveals the presence or
absence of symmetry.
 An advantage of the stem-and-leaf display over the histogram is the fact
that it preserves the information contained in the individual
measurements.
 Such information is lost when measurements are assigned to the class
intervals of a histogram.
 Another advantage of stem-and-leaf displays is that they can be
constructed during the tallying process, so the intermediate step of
preparing an ordered array is eliminated.

Construction of Stem-and-leaf
 The first part is called the stem, and the second part is called the leaf.
 The stem consists of one or more of the initial digits of the measurement,
and the leaf is composed of one or more of the remaining digits.
 All partitioned numbers are shown together in a single display; the stems
form an ordered column with the smallest stem at the top and the largest
at the bottom.
 In the stem column all stems within the range of the data even when a
measurement with that stem is not in the data set.
 The rows of the display contain the leaves, ordered and listed to the right
of their respective stems.
 When leaves consist of more than one digit, all digits after the first may be
deleted. Decimals when present in the original data are omitted in the
stem-and-leaf display.
 The stems are separated from their leaves by a vertical line.
 Thus, a stem-and-leaf display is also an ordered array of the data.

Stem-and-leaf
 Stem-and-leaf displays are most effective with relatively
small data sets.
 As a rule, they are not suitable for use in annual reports or
other communications aimed at the general public.
 They are primarily of value in helping researchers and
decision makers understand the nature of their data.
 Histograms are more appropriate for externally circulated
publications

Box-and-Whisker Plots
 Useful ways to display data (Exploratory data analysis)
 At the centre of the plot is the median, which is surrounded by a
box the top and bottom of which are the limits within which the
middle 50% of observations fall.
 Sticking out of the top and bottom of the box are two whiskers
which extend to the most and least extreme scores respectively.
 The horizontal lines are called fences. The upper fence is at ( Q3 +
1.5(IQR)) or the largest X , whichever is lower.
 The lower fence is at (Q1 - 1.5(IQR)) or the smallest X , whichever is
higher.
 Values that are outside the fences are considered possible extreme
values, or outliers.

 In fairly symmetric data sets, the adjacent values should
contain approximately 99% of the measurements.
 All points outside this range are represented by circles: these
observations are considered to be outliers or data points that
are not typical of the rest of values

Figure () Boxplot of hygiene scores on day 1 of the Download Festival split by gender

One way scatter plot
 Uses: One-way scatter plots are the simplest type of graph
that can be used to summarize a set of continuous
observations.
 A one-way scatter plot uses a single horizontal axis to display
the relative position of each data point.
 An advantage of a one-way scatter plot is that since each
observation is represented individually, no information is lost
 A disadvantage is that it may be difficult to read (and to
construct) if values are close to each other.
Figure 2.1 Crude death rates for the United States, 1988.

Scatter plot (Two way)
 Both the variables must be measured either on interval or
ratio scales
 The data on both the variables needs to be available in
absolute values for each observation
 Data for both variables is taken pairs and displayed as dots in
relation to their values on both axes

Scatter plot (Two way)
Figure (3) Scatter diagram reveals pattern of strong positive correlation

Line diagram or trend curve
 Most appropriate type of chart for time series
 A trend line can be drawn for data pertaining to both a specific
time (e.g. 1995,1996, 1997) or a period (e.g. 1985-1989, 1990-
1994, 1995-)
 A line diagram is useful way of conveying the changes when long-
term trends in a phenomenon or situation need to be studied
 For example, a line diagram would be useful for illustrating trends
in births or death rates and changes in population size

Line diagram or trend curve
 Uses: A set of data measured on a continuous interval or a ratio
scale can be displayed using a line diagram or trend curve

Area Chart
 For variables measured on an interval or a ratio scale,
information about the subcategories of a variable can be
presented in the form of an area chart.
 This is plotted in the same way as a line diagram but with the
area under each line shaded to highlight the total magnitude
of the subcategory in relation to other subcategories.

Area Chart
0
5
10
15
20
25
30
35
40
45
< 25 25-34 35-44 45-54 55+
Numberofrespondents
Age group
Female Male
Figure (1) Attitudes towards uranium mining

Exploratory data analysis (EDA)
 Exploratory Data Analysis (EDA) was heavily promoted by John
Tukey, whose book on the topic is widely regarded as a
statistical classic.
 Exploring data, by summarising and plotting variables and the
relationships between them, is an important step in
subsequent modelling and analysis.
 By exploring data, this procedure will gain insight the nature of
data set and look for the errors and anomalies.

An approach to data analysis that emphasizes the use of informal
graphical procedures not based on prior assumptions about the
structure of the data or on formal models for the data.
The Cambridge Dictionary of Statistics, 4th edition
 The essence of this approach is that, broadly speaking, data are assumed
to possess the following structure
Data = Smooth + Rough
where the ‘Smooth’ is the underlying regularity or pattern in the data.
 The objective of the exploratory approach is to separate the smooth from
the ‘Rough’ with minimal use of formal mathematics or statistical
methods.

 Two forms of EDA : Numerical summaries and plots
 Numerical summaries: Measures of central tendency, Measures
of spread, Measures of correlation, confidence intervals
 Plots: Histogram, Stem and leaf display, Box plot, scatter plot,
Bar plot,……..

Maps
 A graph used to plot variables by geographic locations
 Geographic information is an integral part of all statistical
data.
 Geographic areas have boundaries, names and other
information that make it possible to locate them on the
ground and relate statistical information to them.
 This spatial relationship is particularly important for census
data.
 Maps are the most efficient tools to visualize spatial patterns

Choropleth maps
 The most common type of
map is the choropleth map,
in which areas are shaded
in proportion to the value
of the variable being
displayed.
 This kind of map provides
an easy way to visualize
patterns across space.
 Only ratios (i.e. proportions,
rates or densities) can be
mapped with this technique

Adjusting the chart parameters

Type of data Vs Commonly used graphical
presentation
Scales of measurement Graphical presentation
Nominal or ordinal scale Bar graph
Pie diagram
Trend diagram
Box plot
Interval or ratio scale Histogram
Frequency polygon
Ogive curve
Scatter plot

Take home message
OLIVE JEAN DUNN, VIRGINIA A. CLARK, “Basic statistics: A Primer for the Biomedical
Sciences”, Fourth Edition

References
1. Wayne W. Daniel, Chad L.Cross;”Biostatistics: a foundation for
analysis in the Health Sciences”, 10th edition
2. Marcello Pagano, Kimberlee Gauvreau; “Principles of
Biostatistics”, 2nd edition
3. Michael J Campbell, “Statistics at square one”, 2nd edition
4. Ranjit Kumar, “Research methodology”, 3rd edition
5. Olive Jean Dunn, Virginia A. Clark, “Basic statistics: A Primer for
the Biomedical Sciences”, 4th Edition
6. United Nations Geneva, 2009, “Making Data Meaningful Part 2:A
guide to presenting statistics”

03.data presentation(2015) 2

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie 03.data presentation(2015) 2

Ähnlich wie 03.data presentation(2015) 2 (20)

Mehr von Mmedsc Hahm

Mehr von Mmedsc Hahm (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

03.data presentation(2015) 2