SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Srinivasulu Rajendran
 Centre for the Study of Regional Development (CSRD)


Jawaharlal Nehru University (JNU)
                      New Delhi
                        India
              r.srinivasulu@gmail.com
Objective of the session


       To understand Data
               File
      Management, Quality
       checking a dataset &
      missing values through
        software packages
1. What are the procedure one
should follow before proceeding
for statistical analysis through a
software?
2. How do we check quality of
data?
3. How do we organize the
dataset through a software?
Data sources

  International Food Policy Research
   Institute (IFPRI) – 2006-07
  Bangladesh Bureau of Statistics –
   Household Income and Expenditure
   Surveys (HIES) – 2004/2005
  Bangladesh Demographic and Health
   Survey (BDHS) - 2007
IFPRI Dataset
  Chronic Poverty Study (resurvey 3 studies)

  1.Micronutrients Gender/Agricultural Technology
       (1996-97) – 5 Thanas

  2. Food for Education/Cash for Education - (2000 (10
    Thanas) & 2003 (8 Thanas))

  3. Microfinance (1994 – 5 Thanas)
  Institute involved:
  IFPRI, Chronic Poverty Research Center, Data Analysis
    and Technical Assistance
In      the     2006-07
 resurvey, all thanas
 from the 1994, 1996-97
 & 2003 rounds were
 resurveyed
Micronutrients Gender/Agricultural
Technology
  Hereafter we refer MCG study also known as
  Agricultural Technology or Ag Tech
  “A census of households was conducted in
  villages where the NGO had introduced the
  agricultural technology and comparable
  villages where NGO was operating, but
  where the new technologies had not yet
  been introduced”.
There are two major type of
households selected from census

 1. NGO – members adopting agricultural
    tech households

 2. NGO members likely adopter
  households in villages where the
  technology was not yet introduced
330 Households 1304 HHs in the resurvey
                               for AgrTech




AgriTech introduced –                              AgriTech not introduced –
   “A” type villages                                    “B” type villages




                                     110 NGO Members LIKELY
110 NGO Members adopter HHs
                                          adopter –“B” HHs
         “A” - HHs


          55 Non adopter non-NGO
          Members & NGO members           55 Non LIKELY adopter non NGO
             UNLIKELY to adopt           members & NGO members unlikely
                  “C1” HHs                       to adopt “C2” HHs
What are the procedure one should follow before
proceeding for statistical analysis through a
software?




               SPSS
1. Identify the data file format and convert them into relevant
   software (SPSS) data file format (*.sav)
2. Make sure that COMPLETE variables and observations has been
   converted into SPSS Format
3. Identify the characteristics of the variables for the analysis
4. Save name of the file smaller size
5. It is better to have no space in the file name
6. Organize the data file at one place and folder
7. When ever we work on data, please append the files with the
   previous programme file.
How do we check quality of data?

There are few things that needs to be checked before we
  proceed for any statistical analysis

1. Missing values

2. Wrong coding system

3. Outliers

4. Digits in the variables (specially for value term variables)

5. Unique numbers of id for the observation

6. Relevant variable characteristics i.e string, numberic etc
SPSS has some good routines for detecting
outliers
 There is always the FREQUENCIES routine, of course.

 The PLOTS command can do scatterplots of 2 variables.

 The EXAMINE procedure includes an option for printing out the cases

  with the 5 lowest and 5 highest values.

 The REGRESSION command can print out scatterplots (particularly

  good is *ZRESID by *ZPRED, which is a plot of the standardized

  residuals by the standardized predicted values). In addition, the

  regression   procedure     will   produce   output    on   CASEWISE

  DIAGNOSTICS, which indicate which cases are extreme outliers.
Detecting the problem

 Scatterplots, frequencies can reveal atypical
  cases
 Can also look for cases with very large
  residuals.
 Suspicious correlations sometimes indicate
  the presence of outliers.
The difference between STATA &
SPSS

 Probably the most critical difference between SPSS

 and STATA is that STATA includes additional routines

 (e.g. rreg, qreg) for addressing the problem of

 outliers, which we will discuss in future classes.

Weitere ähnliche Inhalte

Andere mochten auch

Rao 5c policies for ameliorating instability in food security
Rao 5c   policies for ameliorating instability in food securityRao 5c   policies for ameliorating instability in food security
Rao 5c policies for ameliorating instability in food security
Sizwan Ahammed
 
Waid pe3 food security and nutrition surveillance
Waid pe3 food security and nutrition surveillanceWaid pe3 food security and nutrition surveillance
Waid pe3 food security and nutrition surveillance
Sizwan Ahammed
 
Lalita 4c nutrition in the life span
Lalita 4c  nutrition in the life spanLalita 4c  nutrition in the life span
Lalita 4c nutrition in the life span
Sizwan Ahammed
 
Topic 20 anthro meaurement
Topic 20 anthro meaurementTopic 20 anthro meaurement
Topic 20 anthro meaurement
Sizwan Ahammed
 
Rahman 2a areas of interventions in national nutrition services
Rahman 2a areas of interventions in national nutrition servicesRahman 2a areas of interventions in national nutrition services
Rahman 2a areas of interventions in national nutrition services
Sizwan Ahammed
 
Topic 19 inequality stata
Topic 19 inequality stataTopic 19 inequality stata
Topic 19 inequality stata
Sizwan Ahammed
 
Topic 20 anthropomeric indicators
Topic 20 anthropomeric indicatorsTopic 20 anthropomeric indicators
Topic 20 anthropomeric indicators
Sizwan Ahammed
 
Sanghvi 9 linking dietary intakes with nutritional assessment
Sanghvi 9 linking dietary intakes with nutritional assessmentSanghvi 9 linking dietary intakes with nutritional assessment
Sanghvi 9 linking dietary intakes with nutritional assessment
Sizwan Ahammed
 
Topic 6 stat basic concepts
Topic 6 stat basic conceptsTopic 6 stat basic concepts
Topic 6 stat basic concepts
Sizwan Ahammed
 
Topic 15 correlation spss
Topic 15 correlation spssTopic 15 correlation spss
Topic 15 correlation spss
Sizwan Ahammed
 
Topic 18 multiple regression
Topic 18 multiple regressionTopic 18 multiple regression
Topic 18 multiple regression
Sizwan Ahammed
 

Andere mochten auch (12)

Rao 5c policies for ameliorating instability in food security
Rao 5c   policies for ameliorating instability in food securityRao 5c   policies for ameliorating instability in food security
Rao 5c policies for ameliorating instability in food security
 
Waid pe3 food security and nutrition surveillance
Waid pe3 food security and nutrition surveillanceWaid pe3 food security and nutrition surveillance
Waid pe3 food security and nutrition surveillance
 
Topic 15 correlation
Topic 15 correlationTopic 15 correlation
Topic 15 correlation
 
Lalita 4c nutrition in the life span
Lalita 4c  nutrition in the life spanLalita 4c  nutrition in the life span
Lalita 4c nutrition in the life span
 
Topic 20 anthro meaurement
Topic 20 anthro meaurementTopic 20 anthro meaurement
Topic 20 anthro meaurement
 
Rahman 2a areas of interventions in national nutrition services
Rahman 2a areas of interventions in national nutrition servicesRahman 2a areas of interventions in national nutrition services
Rahman 2a areas of interventions in national nutrition services
 
Topic 19 inequality stata
Topic 19 inequality stataTopic 19 inequality stata
Topic 19 inequality stata
 
Topic 20 anthropomeric indicators
Topic 20 anthropomeric indicatorsTopic 20 anthropomeric indicators
Topic 20 anthropomeric indicators
 
Sanghvi 9 linking dietary intakes with nutritional assessment
Sanghvi 9 linking dietary intakes with nutritional assessmentSanghvi 9 linking dietary intakes with nutritional assessment
Sanghvi 9 linking dietary intakes with nutritional assessment
 
Topic 6 stat basic concepts
Topic 6 stat basic conceptsTopic 6 stat basic concepts
Topic 6 stat basic concepts
 
Topic 15 correlation spss
Topic 15 correlation spssTopic 15 correlation spss
Topic 15 correlation spss
 
Topic 18 multiple regression
Topic 18 multiple regressionTopic 18 multiple regression
Topic 18 multiple regression
 

Ähnlich wie Topic 5 quality datafile_management

Provider workshop 11.14.12
Provider workshop 11.14.12Provider workshop 11.14.12
Provider workshop 11.14.12
progroup
 
Final spss hands on training (descriptive analysis) may 24th 2013
Final spss  hands on training (descriptive analysis) may 24th 2013Final spss  hands on training (descriptive analysis) may 24th 2013
Final spss hands on training (descriptive analysis) may 24th 2013
Tin Myo Han
 
Ebi Review Breeding Management B
Ebi Review Breeding Management BEbi Review Breeding Management B
Ebi Review Breeding Management B
guestda53ab
 
Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....
Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....
Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....
CORE Group
 
Nutritional software
Nutritional software Nutritional software
Nutritional software
David mbwiga
 

Ähnlich wie Topic 5 quality datafile_management (20)

Provider workshop 11.14.12
Provider workshop 11.14.12Provider workshop 11.14.12
Provider workshop 11.14.12
 
Final spss hands on training (descriptive analysis) may 24th 2013
Final spss  hands on training (descriptive analysis) may 24th 2013Final spss  hands on training (descriptive analysis) may 24th 2013
Final spss hands on training (descriptive analysis) may 24th 2013
 
Why Resource Identification Matters - Scrazzl
Why Resource Identification Matters - ScrazzlWhy Resource Identification Matters - Scrazzl
Why Resource Identification Matters - Scrazzl
 
Introduction to Routine Health Information System Slides
Introduction to Routine Health Information System SlidesIntroduction to Routine Health Information System Slides
Introduction to Routine Health Information System Slides
 
STATISCAL PAKAGE.pptx
STATISCAL PAKAGE.pptxSTATISCAL PAKAGE.pptx
STATISCAL PAKAGE.pptx
 
Lessons from the past: How performance data availability and quality has led...
Lessons from the past:  How performance data availability and quality has led...Lessons from the past:  How performance data availability and quality has led...
Lessons from the past: How performance data availability and quality has led...
 
Sheep Cross Breeding and Reproductive Management Dr Talaat Refaat
Sheep Cross Breeding and Reproductive Management Dr Talaat RefaatSheep Cross Breeding and Reproductive Management Dr Talaat Refaat
Sheep Cross Breeding and Reproductive Management Dr Talaat Refaat
 
The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...
 
Make Your Reports Over the Counter
Make Your Reports Over the CounterMake Your Reports Over the Counter
Make Your Reports Over the Counter
 
IGAD_CODATA
IGAD_CODATAIGAD_CODATA
IGAD_CODATA
 
Ebi Review Breeding Management B
Ebi Review Breeding Management BEbi Review Breeding Management B
Ebi Review Breeding Management B
 
Lec 2 Human Resource Analytics
Lec 2 Human Resource AnalyticsLec 2 Human Resource Analytics
Lec 2 Human Resource Analytics
 
IRJET - Machine Learning for Diagnosis of Diabetes
IRJET - Machine Learning for Diagnosis of DiabetesIRJET - Machine Learning for Diagnosis of Diabetes
IRJET - Machine Learning for Diagnosis of Diabetes
 
Setting the stage with beginning data analyses
Setting the stage with beginning data analysesSetting the stage with beginning data analyses
Setting the stage with beginning data analyses
 
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsRamil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
 
Towards Operationalizing the SRF with a Suite of Indicators within a Monitori...
Towards Operationalizing the SRF with a Suite of Indicators within a Monitori...Towards Operationalizing the SRF with a Suite of Indicators within a Monitori...
Towards Operationalizing the SRF with a Suite of Indicators within a Monitori...
 
Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....
Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....
Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....
 
Wheat Data Interoperability (1) by Esther DZALE YEUMO KABORE and Richard FULSS
Wheat Data Interoperability (1) by Esther DZALE YEUMO KABORE and Richard FULSSWheat Data Interoperability (1) by Esther DZALE YEUMO KABORE and Richard FULSS
Wheat Data Interoperability (1) by Esther DZALE YEUMO KABORE and Richard FULSS
 
Ethiopian livestock feed (ELF) project: Fodder and feed in livestock value ch...
Ethiopian livestock feed (ELF) project: Fodder and feed in livestock value ch...Ethiopian livestock feed (ELF) project: Fodder and feed in livestock value ch...
Ethiopian livestock feed (ELF) project: Fodder and feed in livestock value ch...
 
Nutritional software
Nutritional software Nutritional software
Nutritional software
 

Mehr von Sizwan Ahammed

Topic 21 evidence on diet diversity
Topic 21 evidence on diet diversityTopic 21 evidence on diet diversity
Topic 21 evidence on diet diversity
Sizwan Ahammed
 
Topic 21 diet diversity
Topic 21 diet diversityTopic 21 diet diversity
Topic 21 diet diversity
Sizwan Ahammed
 
Topic 21 diet diversity stata
Topic 21  diet diversity stataTopic 21  diet diversity stata
Topic 21 diet diversity stata
Sizwan Ahammed
 
Topic 20 anthro meaurement230312
Topic 20 anthro meaurement230312Topic 20 anthro meaurement230312
Topic 20 anthro meaurement230312
Sizwan Ahammed
 
Topic 14 maternal education
Topic 14 maternal educationTopic 14 maternal education
Topic 14 maternal education
Sizwan Ahammed
 
Topic 13 con pattern spss
Topic 13 con pattern spssTopic 13 con pattern spss
Topic 13 con pattern spss
Sizwan Ahammed
 
Topic 12 gender technology interface
Topic 12 gender technology interfaceTopic 12 gender technology interface
Topic 12 gender technology interface
Sizwan Ahammed
 
Topic 11 commercialization
Topic 11 commercializationTopic 11 commercialization
Topic 11 commercialization
Sizwan Ahammed
 
Topic 9 fsp descriptive statistics
Topic 9 fsp descriptive statisticsTopic 9 fsp descriptive statistics
Topic 9 fsp descriptive statistics
Sizwan Ahammed
 
Topic 7 stat inference
Topic 7 stat inferenceTopic 7 stat inference
Topic 7 stat inference
Sizwan Ahammed
 
Topic 6 stat probability theory
Topic 6 stat probability theoryTopic 6 stat probability theory
Topic 6 stat probability theory
Sizwan Ahammed
 
Topic 4 intro spss_stata
Topic 4 intro spss_stataTopic 4 intro spss_stata
Topic 4 intro spss_stata
Sizwan Ahammed
 
Topic 3 policy imperatives
Topic 3 policy imperativesTopic 3 policy imperatives
Topic 3 policy imperatives
Sizwan Ahammed
 

Mehr von Sizwan Ahammed (19)

Topic 21 evidence on diet diversity
Topic 21 evidence on diet diversityTopic 21 evidence on diet diversity
Topic 21 evidence on diet diversity
 
Topic 21 diet diversity
Topic 21 diet diversityTopic 21 diet diversity
Topic 21 diet diversity
 
Topic 21 diet diversity stata
Topic 21  diet diversity stataTopic 21  diet diversity stata
Topic 21 diet diversity stata
 
Topic 20 anthro meaurement230312
Topic 20 anthro meaurement230312Topic 20 anthro meaurement230312
Topic 20 anthro meaurement230312
 
Topic 20 anthro stata
Topic 20 anthro stataTopic 20 anthro stata
Topic 20 anthro stata
 
Topic 19 inequaltiy
Topic 19 inequaltiyTopic 19 inequaltiy
Topic 19 inequaltiy
 
Topic 16 poverty(ii)
Topic 16 poverty(ii)Topic 16 poverty(ii)
Topic 16 poverty(ii)
 
Topic 16 poverty(i)
Topic 16 poverty(i)Topic 16 poverty(i)
Topic 16 poverty(i)
 
Topic 14 two anova
Topic 14 two anovaTopic 14 two anova
Topic 14 two anova
 
Topic 14 maternal education
Topic 14 maternal educationTopic 14 maternal education
Topic 14 maternal education
 
Topic 13 con pattern spss
Topic 13 con pattern spssTopic 13 con pattern spss
Topic 13 con pattern spss
 
Topic 12 gender technology interface
Topic 12 gender technology interfaceTopic 12 gender technology interface
Topic 12 gender technology interface
 
Topic 11 commercialization
Topic 11 commercializationTopic 11 commercialization
Topic 11 commercialization
 
Topic 9 fsp descriptive statistics
Topic 9 fsp descriptive statisticsTopic 9 fsp descriptive statistics
Topic 9 fsp descriptive statistics
 
Topic 8 graphs
Topic 8 graphsTopic 8 graphs
Topic 8 graphs
 
Topic 7 stat inference
Topic 7 stat inferenceTopic 7 stat inference
Topic 7 stat inference
 
Topic 6 stat probability theory
Topic 6 stat probability theoryTopic 6 stat probability theory
Topic 6 stat probability theory
 
Topic 4 intro spss_stata
Topic 4 intro spss_stataTopic 4 intro spss_stata
Topic 4 intro spss_stata
 
Topic 3 policy imperatives
Topic 3 policy imperativesTopic 3 policy imperatives
Topic 3 policy imperatives
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Topic 5 quality datafile_management

  • 1. Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India r.srinivasulu@gmail.com
  • 2. Objective of the session To understand Data File Management, Quality checking a dataset & missing values through software packages
  • 3. 1. What are the procedure one should follow before proceeding for statistical analysis through a software? 2. How do we check quality of data? 3. How do we organize the dataset through a software?
  • 4. Data sources  International Food Policy Research Institute (IFPRI) – 2006-07  Bangladesh Bureau of Statistics – Household Income and Expenditure Surveys (HIES) – 2004/2005  Bangladesh Demographic and Health Survey (BDHS) - 2007
  • 5. IFPRI Dataset Chronic Poverty Study (resurvey 3 studies) 1.Micronutrients Gender/Agricultural Technology (1996-97) – 5 Thanas 2. Food for Education/Cash for Education - (2000 (10 Thanas) & 2003 (8 Thanas)) 3. Microfinance (1994 – 5 Thanas) Institute involved: IFPRI, Chronic Poverty Research Center, Data Analysis and Technical Assistance
  • 6. In the 2006-07 resurvey, all thanas from the 1994, 1996-97 & 2003 rounds were resurveyed
  • 7. Micronutrients Gender/Agricultural Technology  Hereafter we refer MCG study also known as Agricultural Technology or Ag Tech  “A census of households was conducted in villages where the NGO had introduced the agricultural technology and comparable villages where NGO was operating, but where the new technologies had not yet been introduced”.
  • 8. There are two major type of households selected from census 1. NGO – members adopting agricultural tech households 2. NGO members likely adopter households in villages where the technology was not yet introduced
  • 9. 330 Households 1304 HHs in the resurvey for AgrTech AgriTech introduced – AgriTech not introduced – “A” type villages “B” type villages 110 NGO Members LIKELY 110 NGO Members adopter HHs adopter –“B” HHs “A” - HHs 55 Non adopter non-NGO Members & NGO members 55 Non LIKELY adopter non NGO UNLIKELY to adopt members & NGO members unlikely “C1” HHs to adopt “C2” HHs
  • 10. What are the procedure one should follow before proceeding for statistical analysis through a software? SPSS
  • 11. 1. Identify the data file format and convert them into relevant software (SPSS) data file format (*.sav) 2. Make sure that COMPLETE variables and observations has been converted into SPSS Format 3. Identify the characteristics of the variables for the analysis 4. Save name of the file smaller size 5. It is better to have no space in the file name 6. Organize the data file at one place and folder 7. When ever we work on data, please append the files with the previous programme file.
  • 12. How do we check quality of data? There are few things that needs to be checked before we proceed for any statistical analysis 1. Missing values 2. Wrong coding system 3. Outliers 4. Digits in the variables (specially for value term variables) 5. Unique numbers of id for the observation 6. Relevant variable characteristics i.e string, numberic etc
  • 13. SPSS has some good routines for detecting outliers  There is always the FREQUENCIES routine, of course.  The PLOTS command can do scatterplots of 2 variables.  The EXAMINE procedure includes an option for printing out the cases with the 5 lowest and 5 highest values.  The REGRESSION command can print out scatterplots (particularly good is *ZRESID by *ZPRED, which is a plot of the standardized residuals by the standardized predicted values). In addition, the regression procedure will produce output on CASEWISE DIAGNOSTICS, which indicate which cases are extreme outliers.
  • 14. Detecting the problem  Scatterplots, frequencies can reveal atypical cases  Can also look for cases with very large residuals.  Suspicious correlations sometimes indicate the presence of outliers.
  • 15. The difference between STATA & SPSS Probably the most critical difference between SPSS and STATA is that STATA includes additional routines (e.g. rreg, qreg) for addressing the problem of outliers, which we will discuss in future classes.