2. Main purpose of data presentation or
displaying data
To make the findings easy and clear to understand
To provide comprehensive information in a succinct and
efficient way
3. Four ways of displaying data
Text
Tables
Graph
Statistical measures ( see in descriptive statistics
lecture)
4. Text
The most common method of communication in both quantitative
and qualitative research studies
Writing should be thematic; i.e written around various themes of
report
Text should place the most important and significant findings in the
context of short- and longer-term trends.
It should explore relationships, causes and effects, to the extent that
they can be supported by evidence.
It should show readers the significance of the most current
information.
5. Good example of text
Net profits of non-financial companies in the Netherlands
amounted to 19 billion euros in the second quarter of 2008. This
is the lowest level for three years. Profits were 11 percent lower
than in the second quarter of 2007. The drop in net profits is the
result of two main factors: higher interest costs - the companies
paid more net interest - and lower profits of foreign subsidiaries.
Source: Statistics Netherlands
Findings should be integrated into the literature citing references
using acceptable system of citation
6. Tables
A table is the simplest means of summarizing a set of
observations .
Tables are more informative when they are not overly
complex.
As a general rule, tables and columns within them should
always clearly labeled.
If units of measurement are involved, they should be
specified.
Uses: It can be used for all types of numerical data.
7. Table
Structure of tables
1. Title: indicates the table number and describes the type of
data the table contains
2. Row stub: the subcategories of a variable, listed along the y-
axis
3. Column headings: the subcategories of a variable, listed along
the x-axis
4. Body: the cells housing the analysed data
* Supplementary notes or footnotes
8. Structure of tables
Table (x) Attitudes towards uranium by age
Attitudes towards
uranium mining
Age of respondent
Total
<25 25-34 35-44 45-54 55+
Strongly favourable cell
Favourable
Uncertain
Unfavaourable
Strongly unfavourable
Total
Title
Column headings
Row stub
Body
Source: …………………………Hypothetical data
Supplementary notes
subcategory
should give a clear and accurate description
of the data.
should answer the three questions “what”,
“where” and “when”.
Be short and concise, and avoid using verbs.
9. Tables
Types of tables
1. Univariate (also known as frequency tables) –
containing information about one variable
2. Bivariate (also known as cross-tabulations) –
containing information about two variables
3. Polyvariate or multivariate – containing information
about more than two variables
10. Tables
Frequency distributions (Absolute frequency)
One type of data that is commonly used to evaluate data
For nominal and ordinal data, a frequency distribution
consists of a set of class or categories
Kaposi ‘s
sarcoma
Number of
individuals
Yes 246
No 2314
Table (1) Cases of Kaposi’s sarcoma for the first 2560 AIDS patients
reported to the Centers for Disease Control in Atlanta, Georgia
11. Tables (Frequency table/Univariate)
Table (2) Cigarette consumption per person aged 18 or older, United
states, 1900-1960
Year
Number of
Cigarettes
1990 54
1910 151
1920 665
1930 1485
1940 1976
1950 3522
1960 4171
12. Tables
Frequency distributions
For discrete or continuous data, the range of values of the
observations must be broken down into a series of distinct,
non overlapping intervals with equal width intervals
13. Tables (Frequency table/Univariate)
Table (3) Absolute frequencies of serum cholesterol levels for 1067 US
males age 25 to 34 years, 1976-1980
Cholesterol Level
(mg/100ml)
Number of
Men
80-119 13
120-159 150
160-199 442
200-239 299
240-279 115
280-319 34
320-359 9
360-399 5
Total 1067
14. Tables
Relative Frequency
The proportion of the total number of observations that
appears in that interval
Computed by dividing the number of values within an interval
by the total number of values in the table
Relative frequencies are useful for comparing sets of data
that contain unequal number of observations
15. Tables (Bivariate/cross-tabulations)
Table (4) Absolute and relative frequencies of serum cholesterol levels
for 2294 US males, 1976-1980
Cholesterol
Level
(mg/100ml)
Age 25-34 Age 55-64
Number of
Men
Relative
frequency (%)
Number of
Men
Relative
frequency (%)
80-119 13 1.2 5 0.4
120-159 150 14.1 48 3.9
160-199 442 41.4 265 21.6
200-239 299 28.0 458 37.3
240-279 115 10.8 281 22.9
280-319 34 3.2 128 10.4
320-359 9 0.8 35 2.9
360-399 5 0.5 7 0.6
Total 1067 100.0 1227 100.0
16. Tables
Cumulative relative Frequency
The percentage of the total number of observations that
have a value less than or equal to the upper limit of the
interval
Calculated by summing the relative frequencies for the
specified interval and all the previous ones
17. Tables (Bivariate/cross-tabulations)
Table (5) Relative and cumulative frequencies of serum cholesterol
levels for 2294 US males, 1976-1980
Cholesterol
Level
(mg/100ml)
Age 25-34 Age 55-64
Relative
frequency (%)
Cumulative
relative
frequency (%)
Relative
frequency (%)
Cumulative
relative
frequency (%)
80-119 1.2 1.2 0.4 0.4
120-159 14.1 15.3 3.9 4.3
160-199 41.4 56.7 21.6 25.9
200-239 28.0 84.7 37.3 63.2
240-279 10.8 95.5 22.9 86.1
280-319 3.2 98.7 10.4 96.5
320-359 0.8 99.5 2.9 99.4
360-399 0.5 100.0 0.6 100.0
Total 100.0 100.0
18. Tables
Types of percentages
The use of percentages is a common procedure in the
interpretation of data
Three types of percentage
Row percentage
Column percentage
Total percentage
Use of rounding and decimals: Numeric values should be right
justified
19. What do you find out in this table?
Bad example
20. What do you find out in this table?
Bad example
Good example
22. Graphs
Graph or pictorial representation of numerical data → to
summarize and display data
Should be designed to convey the general patterns in a set of
observations at a single glance
Most informative graphs are relatively simple and self-
explanatory.
They should be clearly labeled and units of measurement
should be indicated.
23. Bar Charts
A popular type of graph used to display a frequency
distribution for nominal or ordinal data.
Horizontal axis: various categories into which the
observations fall
Vertical axis (height of bar): the frequency or the relative
frequency of observations within the class
Uses: It is used to compare frequencies or values for different
categories or groups.
24. Bar Charts
The bars can be either vertically or horizontally oriented.
In the horizontal orientation, the text is easier to read.
It is also easier to compare the different values when the bars
are ordered by size from smallest to largest, rather than
displayed arbitrarily.
The bars should be much wider than the gaps between them.
The gaps should not exceed 40% of the bar width.
25. Simple Bar chart (vertical)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990
Numberofcigarettes
Year
Figure (1) Cigarette consumption per person 18 years of age or older,
United states, 1900-1990
28. Clustered Bar chart
Figure 2. Use of yoga among adults in the past 12 months, by age group: United States,
2002, 2007, and 2012
29. Stacked bar
Uses: A stacked bar chart can be used to show and compare
segments of totals.
Caution should be exercised when using this type of chart.
It can be difficult to analyze and compare, if there are too
many items in each stack or if many items are fairly close in
size.
32. Pie chart
A pie chart can be used to show the percentage distribution
of one variable, but only a small number of categories can be
displayed, usually not more than six.
There are 360 degrees in a circle and so the full circle can be
used to represent 100% or the total population.
The circle or pie is divided into sections in accordance with
the magnitude of each category
Each slice is proportionate to the size of each subcategory of a
frequency distribution
Uses: Pies can be drawn for both qualitative data and
variables measured on a continuous scale but grouped into
categories
36. Histogram
A special type of bar graph showing frequency distribution for
continuous data
When we construct a histogram the values of the variable
under consideration are represented by the horizontal axis,
while the vertical axis has as its scale the frequency (or
relative frequency if desired) of occurrence.
True class limit is used for continuity of values or observations
38. Frequency polygon
Frequency distribution can be portrayed graphically by means
of a frequency polygon, which is a special kind of line graph.
To draw a frequency polygon we first place a dot above the
midpoint of each class interval represented on the horizontal
axis of a graph.
The height of a given dot above the horizontal axis corresponds
to the frequency of the relevant class interval.
Connecting the dots by straight lines produces the frequency
polygon.
39. Frequency polygon
Note that the polygon is brought down to the horizontal axis
at the ends at points that would be the midpoints if there
were an additional cell at each end of the corresponding
histogram.
This allows for the total area to be enclosed.
The total area under the frequency polygon= the area under
the histogram.
41. Ogive curve
The graph of a
cumulative
probability
distribution is called
‘Ogive’.
Figure (3) Cumulative percentage frequency polygon for leadership aptitude
scores for n= 30 football coaches
42. Stem-and-leaf
A properly constructed stem-and-leaf display, like a histogram, provides
information regarding the range of the data set, shows the location of the
highest concentration of measurements, and reveals the presence or
absence of symmetry.
An advantage of the stem-and-leaf display over the histogram is the fact
that it preserves the information contained in the individual
measurements.
Such information is lost when measurements are assigned to the class
intervals of a histogram.
Another advantage of stem-and-leaf displays is that they can be
constructed during the tallying process, so the intermediate step of
preparing an ordered array is eliminated.
43. Construction of Stem-and-leaf
The first part is called the stem, and the second part is called the leaf.
The stem consists of one or more of the initial digits of the measurement,
and the leaf is composed of one or more of the remaining digits.
All partitioned numbers are shown together in a single display; the stems
form an ordered column with the smallest stem at the top and the largest
at the bottom.
In the stem column all stems within the range of the data even when a
measurement with that stem is not in the data set.
The rows of the display contain the leaves, ordered and listed to the right
of their respective stems.
When leaves consist of more than one digit, all digits after the first may be
deleted. Decimals when present in the original data are omitted in the
stem-and-leaf display.
The stems are separated from their leaves by a vertical line.
Thus, a stem-and-leaf display is also an ordered array of the data.
44. Stem-and-leaf
Stem-and-leaf displays are most effective with relatively
small data sets.
As a rule, they are not suitable for use in annual reports or
other communications aimed at the general public.
They are primarily of value in helping researchers and
decision makers understand the nature of their data.
Histograms are more appropriate for externally circulated
publications
46. Box-and-Whisker Plots
Useful ways to display data (Exploratory data analysis)
At the centre of the plot is the median, which is surrounded by a
box the top and bottom of which are the limits within which the
middle 50% of observations fall.
Sticking out of the top and bottom of the box are two whiskers
which extend to the most and least extreme scores respectively.
The horizontal lines are called fences. The upper fence is at ( Q3 +
1.5(IQR)) or the largest X , whichever is lower.
The lower fence is at (Q1 - 1.5(IQR)) or the smallest X , whichever is
higher.
Values that are outside the fences are considered possible extreme
values, or outliers.
47. Box-and-Whisker Plots
In fairly symmetric data sets, the adjacent values should
contain approximately 99% of the measurements.
All points outside this range are represented by circles: these
observations are considered to be outliers or data points that
are not typical of the rest of values
51. One way scatter plot
Uses: One-way scatter plots are the simplest type of graph
that can be used to summarize a set of continuous
observations.
A one-way scatter plot uses a single horizontal axis to display
the relative position of each data point.
An advantage of a one-way scatter plot is that since each
observation is represented individually, no information is lost
A disadvantage is that it may be difficult to read (and to
construct) if values are close to each other.
Figure 2.1 Crude death rates for the United States, 1988.
52. Scatter plot (Two way)
Both the variables must be measured either on interval or
ratio scales
The data on both the variables needs to be available in
absolute values for each observation
Data for both variables is taken pairs and displayed as dots in
relation to their values on both axes
55. Line diagram or trend curve
Most appropriate type of chart for time series
A trend line can be drawn for data pertaining to both a specific
time (e.g. 1995,1996, 1997) or a period (e.g. 1985-1989, 1990-
1994, 1995-)
A line diagram is useful way of conveying the changes when long-
term trends in a phenomenon or situation need to be studied
For example, a line diagram would be useful for illustrating trends
in births or death rates and changes in population size
56. Line diagram or trend curve
Uses: A set of data measured on a continuous interval or a ratio
scale can be displayed using a line diagram or trend curve
57. Area Chart
For variables measured on an interval or a ratio scale,
information about the subcategories of a variable can be
presented in the form of an area chart.
This is plotted in the same way as a line diagram but with the
area under each line shaded to highlight the total magnitude
of the subcategory in relation to other subcategories.
59. Exploratory data analysis (EDA)
Exploratory Data Analysis (EDA) was heavily promoted by John
Tukey, whose book on the topic is widely regarded as a
statistical classic.
Exploring data, by summarising and plotting variables and the
relationships between them, is an important step in
subsequent modelling and analysis.
By exploring data, this procedure will gain insight the nature of
data set and look for the errors and anomalies.
60. Exploratory data analysis (EDA)
An approach to data analysis that emphasizes the use of informal
graphical procedures not based on prior assumptions about the
structure of the data or on formal models for the data.
The Cambridge Dictionary of Statistics, 4th edition
The essence of this approach is that, broadly speaking, data are assumed
to possess the following structure
Data = Smooth + Rough
where the ‘Smooth’ is the underlying regularity or pattern in the data.
The objective of the exploratory approach is to separate the smooth from
the ‘Rough’ with minimal use of formal mathematics or statistical
methods.
61. Exploratory data analysis (EDA)
Two forms of EDA : Numerical summaries and plots
Numerical summaries: Measures of central tendency, Measures
of spread, Measures of correlation, confidence intervals
Plots: Histogram, Stem and leaf display, Box plot, scatter plot,
Bar plot,……..
62. Maps
A graph used to plot variables by geographic locations
Geographic information is an integral part of all statistical
data.
Geographic areas have boundaries, names and other
information that make it possible to locate them on the
ground and relate statistical information to them.
This spatial relationship is particularly important for census
data.
Maps are the most efficient tools to visualize spatial patterns
63. Choropleth maps
The most common type of
map is the choropleth map,
in which areas are shaded
in proportion to the value
of the variable being
displayed.
This kind of map provides
an easy way to visualize
patterns across space.
Only ratios (i.e. proportions,
rates or densities) can be
mapped with this technique
73. Type of data Vs Commonly used graphical
presentation
Scales of measurement Graphical presentation
Nominal or ordinal scale Bar graph
Pie diagram
Trend diagram
Box plot
Interval or ratio scale Histogram
Frequency polygon
Ogive curve
Scatter plot
74. Take home message
OLIVE JEAN DUNN, VIRGINIA A. CLARK, “Basic statistics: A Primer for the Biomedical
Sciences”, Fourth Edition
75. References
1. Wayne W. Daniel, Chad L.Cross;”Biostatistics: a foundation for
analysis in the Health Sciences”, 10th edition
2. Marcello Pagano, Kimberlee Gauvreau; “Principles of
Biostatistics”, 2nd edition
3. Michael J Campbell, “Statistics at square one”, 2nd edition
4. Ranjit Kumar, “Research methodology”, 3rd edition
5. Olive Jean Dunn, Virginia A. Clark, “Basic statistics: A Primer for
the Biomedical Sciences”, 4th Edition
6. United Nations Geneva, 2009, “Making Data Meaningful Part 2:A
guide to presenting statistics”