• Research is any process by which information is
systematically and carefully gathered for the purpose of
answering questions, examining ideas, or testing
• Numerical information collected as part of any research
is called Data. Depending on the nature of the problem,
the data may relate to individuals, families, houses,
• The data collected are known as observations. The
individual subjects upon whom the data are collected
are known as statistical units.
• The characteristics or events that are measured on a
subject, in a research study are called variables,
because they vary. (i.e., they take different values in
different subjects or vary from one subject to
• Variables are measured according to two broad
types of measurement scales: Numerical &
Categorical (otherwise known as Quantitative &
4. Types of dataset and their measure
• Population - dataset consisting of all outcomes,
measurements, or responses of interest.
• Sample - dataset which is a subset of the
• Parameter - a numerical measurement made
using the population.
• Statistic - a numerical measurement made using
5. Properties of Measurement
• Difference - Different numerals mean different
instances the variable can take
• Magnitude – This indicates that something is
more or less than the other
• Equal Appearing Interval – Different numerals
have equal distances with preceding & succeeding
• True Zero – Zero has an absolute meaning
6. Level of Measurement (Measurement
The measurement levels are considered in the
(LOWEST) Nominal -Ordinal – Interval - Ratio (HIGHEST)
7. Nominal Scale
• Numbers serve as labels.
• Numbers used only for identification and one-
to- one correspondence with the objects
• Only permissible operation is counting
• Statistical analysis based on frequency counts
such as percentage, mode.
Example: gender, religion, locality, party
8. Ordinal Scale
• Ranking scale, assign numbers to indicate relative
extent to which the object possess some
• Can determine whether an object has more or less
some characteristics than other object and not how
much more or less
• Any series of numbers can be given that preserves
the ordered relationship among objects.
• Along with counting operation of nominal scale this
has statistics based on percentiles, quartiles and
Example: social class, severity of a behavior disorder
9. Interval Scale
• Distance between any two objects is fixed and equal
• It allows comparison of difference between two
• Meaningful addition and subtraction of scale values
• The zero point and the unit of measurement are
• In addition to the statistical techniques applied to
nominal and ordinal data, the arithmetic mean and
standard deviation are used
Example: Temperature (Fahrenheit or Celsius)
10. Ratio Scale
• Possess all the properties of nominal, ordinal and
• This has absolute zero point
• It is meaningful to calculate ratio of scale values.
• All statistical techniques can be applied.
Examples: Income, age, weight, height so on
11. Categorical variables
• They can be placed into one of two (dichotomous) or
more (polychotomous) categories.
• Examples of dichotomous categorical variables:
Male / Female Pregnant / Not pregnant
Smoker / Non smoker Married / Single
• However, many classifications require more than two
categories. For e.g., Married / Single / Divorced/
Separated/ Widowed; Blood group: A/ B/ AB/ O;
Religion: Hindu/ Christian/ Muslim etc…. There is no
ordering of the categories.
• These are examples of nominal scale, in which the
values fall into unordered categories or classes.
12. Categorical variables
• But often there is a natural order, as with the
varying stages of cancer and social class.
• Example : degree of smoking can be further
divided as non-smokers/ ex-smokers/ light
smokers/ heavy smokers. This is an example of
• In ordinal scales, the categories bear an ordered
relationship to one another.
13. Numerical variables
• Also called quantitative or interval variables. They are
expressed as integers, fractions or decimals, in which
equal distances exist between successive intervals. Age,
systolic & diastolic blood pressure, and height are
examples of continuous variables.
• Numerical variables can be further divided into discrete &
continuous. Discrete numerical variable can take only
intermittent values over a range, they differ by fixed
amount, and no intermediate values are possible.
• Examples of discrete numerical variables are no. of
children, no. of ectopic heart beats etc…
14. Numerical variables
• Data that represent measurable quantities but are
not restricted to taking on specified values such as
integers are known as continuous data.
• If the values of the measurement take any number in
a range, the data are said to be continuous.
• The difference between any two possible data values
can be very small. Common examples include height,
weight, temperature etc…
• Continuous data can be reduced to several
15. Discrete data -- Gaps between possible values
Continuous data -- no gaps between possible values
16. Derived Variables
• Used to measure diseases in epidemiological studies.
• Rate, ratio and proportion.
Ratio: quantifies the magnitude of one occurrence or
condition to another.
Expresses the relationship between two numbers
Example: The ratio of males or females in Ethiopia
Proportion: quantifies occurrences in relation to the
population in which these occurrences take place
Expressed as a percentage
Example: The proportion of all births that was male
17. Derived Variables…
• Rate: expresses probability or risk of disease in
a defined population over a specified period
Considered to be a basic measure of disease
Example: The number of newly diagnosed breast
cancer cases per 100,000 women.
18. Data collection
• There are two sources of data:
• Primary Data
Data measured or collect by the investigator or
the user directly from the source.
Data collected first hand by the investigator.
• Secondary Data
Data gathered or compiled from published and
unpublished sources or files.
19. Planning & Measuring
• Identify source and elements of the data.
• Decide whether to consider sample or census.
• If sampling is preferred, decide on sample size,
selection method,… etc
• Decide measurement procedure.
• Set up the necessary organizational structure.
• there are different methods.
20. Methods of collecting primary data
• Survey method
- Investigator makes personal contact with the
informants either directly or indirectly and collect the
data (Telephone Interview, Mail Questionnaires)
- Collected information is more reliable/accurate
• Experimental method
-Determine whether/in what manner variables are
related to each other
- Large scale organizations with R & D departments
doing to determine the cause and effect relationships.
-to study the effect of fertilizer on crop
21. Methods of collecting primary data…
• Observation method
-Investigator observes the overall nature of the event
and collects the required data.
-devices used are automatic recorder, motion picture
-ex: individual doing research on growth of plants,
behavior of bats, keenly observes and finds out the
-Gives more accurate result and supplementary
information. Costly and time consuming.
22. Secondary data sources
• Official publications of Government
• Publications of research institutions
• Professional bodies
• Economic trade and scientific Journals
23. When the source is secondary data check that:
• The type and objective of the situations.
• The purpose for which the data are collected and
compatible with the present problem.
• The nature and classification of data is appropriate to our
• There are no biases and misreporting in the published
Note: Data which are primary for one may be secondary for
24. Descriptive Vs Inferential Statistics
Depending on how data can be used, statistics is
sometimes divided in to two main areas or
• Descriptive Statistics:
is concerned with summary calculations, graphs, charts
Generally characterizes or describes a set of data
elements by graphically displaying the information or
describing its central tendencies and how it is
25. • Inferential Statistics:
consists of generalizing from samples to populations,
performing estimations and hypothesis tests, determining
relationships among variables, and making predictions.
Statistical techniques based on probability theory are
• Example: the following is the number of malaria patients who have
been treated in a Hospital from 2001 to 2005: 3645; 4568; 5432; 6751;
If we calculate the average malaria patients from 2001 to 2005, then our
work belongs to the domain of descriptive statistics.
If we predict the number of malaria patients in the year 2015 to be 9917,
then our work belongs to the domain of inferential statistics.