Data collection & management

Medicine & Society II

Collecting & Managing
Data
Dr Azmi Mohd Tamil
Dept. of Community Health,
Faculty of Medicine,
UKM

notes partially based on a lecture by
Assc. Prof. Dr. Roslina Abd. Manap

Sampling

 Choosing a relatively small subset such
that it can adequately represent the
entire spectrum of population subjects
 Aim to extrapolate results back to a
substantially larger population
 to save time, money, efficiency and
safety.

SAMPLING

PROBABILITY NON-
SAMPLING
 equal chance of being
PROBABILITY
selected SAMPLING
• simple random, • convenience,
• systematic,
•
• quota,
stratified,
• multistage, • purposive.
• cluster

SAMPLING &
TYPE OF POPULATION

 Selection representative of population
 ? sampling methods
- simple random sampling (may not be
practical in national study)
- stratified random sampling
(in heterogenous pop./stratum)
- multistage sampling
(national-state-district-sub district-village)
- cluster sampling

Data Collection

 Data collection begins after
deciding on design of study and
the sampling strategy

Data Collection

 Sample subjects are identified and the
required individual information is
obtained in an item-wise and structured
manner.

Data Collection

 Information is collected on certain
characteristics, attributes and the
qualities of interest from the samples
 These data may be quantitative or
qualitative in nature.

Types of Variables

 Qualitative - categorised based on
characteristics which differentiate it e.g.
ethnic - Malay, Chinese, Indian etc.
Qualitative variables can be classed into
nominal & ordinal.
 Quantitative - numerical values collected
by observation, by measurement or by
counting. Can either be discrete or
continuous.

Variable
Classification
Quantitative
Qualitative
 discrete - from
 Nominal - no rank
counting ie no of
nor specific order
children/wives
e.g. ethnic; M, C, I &
 continuous - can be in
O.
 Ordinal - has
fractions, from
measurement e.g.
rank/order between
blood pressure,
categories but the
haemoglobin level.
difference cannot be
measured.

Types of Data

Table 1.1 Exam ples of types of data
Quantitative
Continuous Discrete
Blood pressure, height, w eight, age Number of children
Number of attacks of asthma per w eek
Categorical
Ordinal (Ordered categories) Nom inal (Unordered categories)
Grade of breast cancer Sex (male/female)
Better, same, w orse Alive or dead
Disagree, neutral, agree Blood group O, A, B, AB

http://www.bmj.com/collections/statsbk/

SO WHAT!

So what’s the big deal about data
types?

Statistical Tests - Qualitative

Type of Data Dictates Type of
Analysis - Quantitative

Data Collection Techniques

 Use available information
 Observation
 Interviews
 Questionnaires
 Focus group discussion

Using Available
Information

 Existing Records
• Hospital records - case notes
• National registry of births & deaths
• Census data
• Data from other surveys

Disadvantages of using
existing records

 Incomplete records
 Cause of death may not be verified by a
physician/MD
 Missing vital information
 Difficult to decipher
 May not be representative of the target
group - only severe cases go to hosital

Disadvantages of using
existing records

 Delayed publication - obsolete data
 Different method of data recording
between institutions, states, countries,
making comparison & pooling of data
incompatible
 Comparisons across time difficult due to
difference in classification, diagnostic
tools etc

Advantages of using
existing records

 Cheap
 convenient
 in some situations, it is the only data
source i.e. accidents & suicides

Observation

 Involves systematically selecting,
watching & recording behaviour and
characteristics of living beings, objects
or phenomena
 Done using defined scales
 Participant observation e.g. PEF and
asthma symptom diary
 Non-participant observation e.g.
cholesterol levels

Interviews

 Oral questioning of respondents either
individually or as a group.
 Can be done loosely or highly structured
using a questionnaire

Administering Written
Questionnaires

 Self-administered
 via mail
 by gathering them in one place and
getting them to fill it up
 hand-delivering and collecting them later
 Large non-response can distort results

Questionnaires

 Influenced by education & attitude of
respondent esp. for self-administered
 Interviewers need to be trained
 open ended vs close ended
 the need for pre-testing or pilot study

Issues at stake

 Content validity
 Structural validity
 Criterion validity

Focus group discussion

 Selecting relevant parties to the
research questions at hand and
discussing with them in focus groups
 examples in your own field of interest?

Source of biases during
data collection

 Defective instruments
• close ended questions with poor choice of
options
• open ended questions with no guidelines
• vaguely-phrased questions
• illogical sequences of questions
• weighing scales that are not standardised

Source of biases during
data collection

 Observer bias
• reporting of radiographs
 Effect of interview on respondent
 Attitude of respondent
• cough may be ignored by a smoker
• stigmatised diseases may not be disclosed

Plan for data collection

 Permission to proceed
 Logistics - who will collect what, when
and with what resources
 Quality control

Quality of Data

 How well do the variables designed for
the study represent the phenomena of
interest?
 E.g. How well does FBS represent
control of diabetes

Accuracy & Reliability

 Accuracy - the degree which a
measurement actually measures the
measures the characteristic it is
supposed to measure
 Reliability is the consistency of replicate
measures

Accuracy & Reliability

 Both are reduced by random error and
systematic error from the same sources
of variability;
• the data collectors
• the respondents
• the instrument

Strategies to enhance
accuracy & reliability

 Standardise procedures and
measurement methods
 training & certifying the data collectors
 Repetition
 Blinding

Data handling

 Check the data gathered
 storing of data - backup, backup &
backup some more!

Data Management

 Data processing
• Categorising
• Coding
• Data entry
• Verification/validation

Variable Labels

• Unique
• Not more than 8 characters
• Consists of letters and numbers only
• Begins with a letter instead of a number.
• Try to give a label that means something

Coding

• Determine the coding to be used for each
variable.
• For qualitative variables, it is recommended
to use numerical-codes to represent the
groups; eg. 1 = male and 2 = female, this
will also simplify the data entry process.
The “danger” of using string/text is that a
small “male” is different from a big “Male”,
• see Table I.

Coding for Dichotomus
Variable

 It is advisable to use
1=present, 0=absent.
 Or 1=higher risk, 0=lower risk

Coding for Missing Value

 @ blank responses
 Usually required only for qualitative
variables
 Conventionally coded using a value that
is not part of a valid response. For
example;
• Gender; M=1, F=2, MV=9
• Ethnic in East Malaysia; Codes 1 till 14 for
races, MV=99

Advantage of Coding

 Reduce time for “data entry”.
 Make analysis possible e.g. SPSS wont
analyse string responses of more than 8
characters
 Need a proper coding manual
 How to define variables and coding for
application such as SPSS and Excel are
available at the dept website
http://161.142.92.104/spss/
http://161.142.92.104/excel/

Data Entry

 Independent operator verification
 Random check of data entered against
the original
 <5% error by convention
 Some checks are built-in by the
software i.e. EpiInfo

Data collection & management

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie Data collection & management

Ähnlich wie Data collection & management (20)

Mehr von Azmi Mohd Tamil

Mehr von Azmi Mohd Tamil (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Data collection & management