Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
Data collection & management
1. Medicine & Society II
Collecting & Managing
Data
Dr Azmi Mohd Tamil
Dept. of Community Health,
Faculty of Medicine,
UKM
notes partially based on a lecture by
Assc. Prof. Dr. Roslina Abd. Manap
2. Sampling
Choosing a relatively small subset such
that it can adequately represent the
entire spectrum of population subjects
Aim to extrapolate results back to a
substantially larger population
to save time, money, efficiency and
safety.
3. SAMPLING
PROBABILITY NON-
SAMPLING
equal chance of being
PROBABILITY
selected SAMPLING
• simple random, • convenience,
• systematic,
•
• quota,
stratified,
• multistage, • purposive.
• cluster
4. SAMPLING &
TYPE OF POPULATION
Selection representative of population
? sampling methods
- simple random sampling (may not be
practical in national study)
- stratified random sampling
(in heterogenous pop./stratum)
- multistage sampling
(national-state-district-sub district-village)
- cluster sampling
5. Data Collection
Data collection begins after
deciding on design of study and
the sampling strategy
6. Data Collection
Sample subjects are identified and the
required individual information is
obtained in an item-wise and structured
manner.
7. Data Collection
Information is collected on certain
characteristics, attributes and the
qualities of interest from the samples
These data may be quantitative or
qualitative in nature.
8. Types of Variables
Qualitative - categorised based on
characteristics which differentiate it e.g.
ethnic - Malay, Chinese, Indian etc.
Qualitative variables can be classed into
nominal & ordinal.
Quantitative - numerical values collected
by observation, by measurement or by
counting. Can either be discrete or
continuous.
9. Variable
Classification
Quantitative
Qualitative
discrete - from
Nominal - no rank
counting ie no of
nor specific order
children/wives
e.g. ethnic; M, C, I &
continuous - can be in
O.
Ordinal - has
fractions, from
measurement e.g.
rank/order between
blood pressure,
categories but the
haemoglobin level.
difference cannot be
measured.
10. Types of Data
Table 1.1 Exam ples of types of data
Quantitative
Continuous Discrete
Blood pressure, height, w eight, age Number of children
Number of attacks of asthma per w eek
Categorical
Ordinal (Ordered categories) Nom inal (Unordered categories)
Grade of breast cancer Sex (male/female)
Better, same, w orse Alive or dead
Disagree, neutral, agree Blood group O, A, B, AB
http://www.bmj.com/collections/statsbk/
13. Type of Data Dictates Type of
Analysis - Quantitative
14. Data Collection Techniques
Use available information
Observation
Interviews
Questionnaires
Focus group discussion
15. Using Available
Information
Existing Records
• Hospital records - case notes
• National registry of births & deaths
• Census data
• Data from other surveys
16. Disadvantages of using
existing records
Incomplete records
Cause of death may not be verified by a
physician/MD
Missing vital information
Difficult to decipher
May not be representative of the target
group - only severe cases go to hosital
17. Disadvantages of using
existing records
Delayed publication - obsolete data
Different method of data recording
between institutions, states, countries,
making comparison & pooling of data
incompatible
Comparisons across time difficult due to
difference in classification, diagnostic
tools etc
18. Advantages of using
existing records
Cheap
convenient
in some situations, it is the only data
source i.e. accidents & suicides
19. Observation
Involves systematically selecting,
watching & recording behaviour and
characteristics of living beings, objects
or phenomena
Done using defined scales
Participant observation e.g. PEF and
asthma symptom diary
Non-participant observation e.g.
cholesterol levels
20. Interviews
Oral questioning of respondents either
individually or as a group.
Can be done loosely or highly structured
using a questionnaire
21. Administering Written
Questionnaires
Self-administered
via mail
by gathering them in one place and
getting them to fill it up
hand-delivering and collecting them later
Large non-response can distort results
22. Questionnaires
Influenced by education & attitude of
respondent esp. for self-administered
Interviewers need to be trained
open ended vs close ended
the need for pre-testing or pilot study
27. Focus group discussion
Selecting relevant parties to the
research questions at hand and
discussing with them in focus groups
examples in your own field of interest?
28. Source of biases during
data collection
Defective instruments
• close ended questions with poor choice of
options
• open ended questions with no guidelines
• vaguely-phrased questions
• illogical sequences of questions
• weighing scales that are not standardised
29. Source of biases during
data collection
Observer bias
• reporting of radiographs
Effect of interview on respondent
Attitude of respondent
• cough may be ignored by a smoker
• stigmatised diseases may not be disclosed
30. Plan for data collection
Permission to proceed
Logistics - who will collect what, when
and with what resources
Quality control
31. Quality of Data
How well do the variables designed for
the study represent the phenomena of
interest?
E.g. How well does FBS represent
control of diabetes
32.
33. Accuracy & Reliability
Accuracy - the degree which a
measurement actually measures the
measures the characteristic it is
supposed to measure
Reliability is the consistency of replicate
measures
36. Accuracy & Reliability
Both are reduced by random error and
systematic error from the same sources
of variability;
• the data collectors
• the respondents
• the instrument
37. Strategies to enhance
accuracy & reliability
Standardise procedures and
measurement methods
training & certifying the data collectors
Repetition
Blinding
38. Data handling
Check the data gathered
storing of data - backup, backup &
backup some more!
39. Data Management
Data processing
• Categorising
• Coding
• Data entry
• Verification/validation
41. Variable Labels
• Unique
• Not more than 8 characters
• Consists of letters and numbers only
• Begins with a letter instead of a number.
• Try to give a label that means something
42. Coding
• Determine the coding to be used for each
variable.
• For qualitative variables, it is recommended
to use numerical-codes to represent the
groups; eg. 1 = male and 2 = female, this
will also simplify the data entry process.
The “danger” of using string/text is that a
small “male” is different from a big “Male”,
• see Table I.
43.
44. Coding for Dichotomus
Variable
It is advisable to use
1=present, 0=absent.
Or 1=higher risk, 0=lower risk
45. Coding for Missing Value
@ blank responses
Usually required only for qualitative
variables
Conventionally coded using a value that
is not part of a valid response. For
example;
• Gender; M=1, F=2, MV=9
• Ethnic in East Malaysia; Codes 1 till 14 for
races, MV=99
46. Advantage of Coding
Reduce time for “data entry”.
Make analysis possible e.g. SPSS wont
analyse string responses of more than 8
characters
Need a proper coding manual
How to define variables and coding for
application such as SPSS and Excel are
available at the dept website
http://161.142.92.104/spss/
http://161.142.92.104/excel/
49. Data Entry
Independent operator verification
Random check of data entered against
the original
<5% error by convention
Some checks are built-in by the
software i.e. EpiInfo