Topic 5 quality datafile_management

Srinivasulu Rajendran
Centre for the Study of Regional Development (CSRD)

Jawaharlal Nehru University (JNU)
New Delhi
India
r.srinivasulu@gmail.com

Objective of the session

To understand Data
File
Management, Quality
checking a dataset &
missing values through
software packages

1. What are the procedure one
should follow before proceeding
for statistical analysis through a
software?
2. How do we check quality of
data?
3. How do we organize the
dataset through a software?

Data sources

 International Food Policy Research
Institute (IFPRI) – 2006-07
 Bangladesh Bureau of Statistics –
Household Income and Expenditure
Surveys (HIES) – 2004/2005
 Bangladesh Demographic and Health
Survey (BDHS) - 2007

IFPRI Dataset
Chronic Poverty Study (resurvey 3 studies)

1.Micronutrients Gender/Agricultural Technology
(1996-97) – 5 Thanas

2. Food for Education/Cash for Education - (2000 (10
Thanas) & 2003 (8 Thanas))

3. Microfinance (1994 – 5 Thanas)
Institute involved:
IFPRI, Chronic Poverty Research Center, Data Analysis
and Technical Assistance

In the 2006-07
resurvey, all thanas
from the 1994, 1996-97
& 2003 rounds were
resurveyed

Micronutrients Gender/Agricultural
Technology
 Hereafter we refer MCG study also known as
Agricultural Technology or Ag Tech
 “A census of households was conducted in
villages where the NGO had introduced the
agricultural technology and comparable
villages where NGO was operating, but
where the new technologies had not yet
been introduced”.

There are two major type of
households selected from census

1. NGO – members adopting agricultural
tech households

2. NGO members likely adopter
households in villages where the
technology was not yet introduced

330 Households 1304 HHs in the resurvey
for AgrTech

AgriTech introduced – AgriTech not introduced –
“A” type villages “B” type villages

110 NGO Members LIKELY
110 NGO Members adopter HHs
adopter –“B” HHs
“A” - HHs

55 Non adopter non-NGO
Members & NGO members 55 Non LIKELY adopter non NGO
UNLIKELY to adopt members & NGO members unlikely
“C1” HHs to adopt “C2” HHs

What are the procedure one should follow before
proceeding for statistical analysis through a
software?

SPSS

1. Identify the data file format and convert them into relevant
software (SPSS) data file format (*.sav)
2. Make sure that COMPLETE variables and observations has been
converted into SPSS Format
3. Identify the characteristics of the variables for the analysis
4. Save name of the file smaller size
5. It is better to have no space in the file name
6. Organize the data file at one place and folder
7. When ever we work on data, please append the files with the
previous programme file.

How do we check quality of data?

There are few things that needs to be checked before we
proceed for any statistical analysis

1. Missing values

2. Wrong coding system

3. Outliers

4. Digits in the variables (specially for value term variables)

5. Unique numbers of id for the observation

6. Relevant variable characteristics i.e string, numberic etc

SPSS has some good routines for detecting
outliers
 There is always the FREQUENCIES routine, of course.

 The PLOTS command can do scatterplots of 2 variables.

 The EXAMINE procedure includes an option for printing out the cases

with the 5 lowest and 5 highest values.

 The REGRESSION command can print out scatterplots (particularly

good is *ZRESID by *ZPRED, which is a plot of the standardized

residuals by the standardized predicted values). In addition, the

regression procedure will produce output on CASEWISE

DIAGNOSTICS, which indicate which cases are extreme outliers.

Detecting the problem

 Scatterplots, frequencies can reveal atypical
cases
 Can also look for cases with very large
residuals.
 Suspicious correlations sometimes indicate
the presence of outliers.

The difference between STATA &
SPSS

Probably the most critical difference between SPSS

and STATA is that STATA includes additional routines

(e.g. rreg, qreg) for addressing the problem of

outliers, which we will discuss in future classes.

Topic 5 quality datafile_management

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (12)

Ähnlich wie Topic 5 quality datafile_management

Ähnlich wie Topic 5 quality datafile_management (20)

Mehr von Sizwan Ahammed

Mehr von Sizwan Ahammed (19)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Topic 5 quality datafile_management