2. Two Definitions of Quality
• Conformance to Requirements
• (Traditional Producer-Oriented
Definition)
• Fitness for Use
• (Modern Client-Oriented
Definition)
3. Definition of Process Quality
• Process Improvements Focus
• (Do It Right the First Time)
• Can be Reduced to Slogans
• Can also lead to Continuous
Improvements
• Kaisen
4. Be Real Four Quality Costs
• Costs of Reputation and Loss of
Business from Inaction
• Cost of Prevention to Avoid Errors
• Cost of Detection to Find Errors
• Cost of Repairing Errors Found
6. Repair Methods
• Goal is “Fixing” to Fit Use
• Data Editing
• Data Imputation
• Data Fabrication
• Raking at NSS
7. Data Editing
• Honest Differences of Opinion or
Real Errors?
• Need for Redundancy in System for
Can’t Fail Items
• Achieving Measurability to Frame
Expectations and Improvements
9. Types of Edits Illustrated
• Range Test
Age Negative
• Deterministic Tests
If Age =14, then code as Child
• Probabilistic Tests
If Income $1,000,000, take a look
10. Practical Editing Tips
• Edit for Diagnosis, not just
Correction
• Don’t Edit Outside Your Confidence
Interval
• Preserve the Original Dataset as
Backup to Avoid Irreversible
Changes
• Keep Tallies of all Errors Found
11. Not all errors need to be
corrected
Resist your Perfectionist
Tendencies
12. More Practical Edit Tips
• Use your skilled staff to
improve system rather than
just edit data
• Never just depend on Intuition
but still use it too!
• Employ Redundancy, Frugally!
13. Capture Recapture Methods
(Double Keying Example)
• Two-by-Two Table with Cells
A B
C D
• Comparing Data Keyed the Same each
time (A) with Errors Detected, (B and C)
• How to Estimate D?
• One Model D = BC/A?
14. Bottom Line Take-Away
• Use Data Checking to
Understand Data’s Fitness for
Use
• Edit but Don’t Over-Edit
• Use Edit Checks to Prevent
Future Errors
15. Data Editing and Data
Imputation
• Joint Role of Imputation and
Editing No Clear Line?
• Editing “fixes” Often are
Model-Based Hunches
• Data Quality (editing)
• Information Quality
(imputation)
16. Imputation Versus Editing
• What is Imputation?
• Handles Missing and
Misreported Data
• Imputation Goal is roughly
right! Information Quality
• Editing Goal often “correction”
Exactly right? Data Quality
17. Data Imputation Techniques
• Imputation Needs More
Justification when Data Quality
is the Goal
• Must be no more than Cosmetic
in Nature, if done at all
• Can only be Aggressively applied
for Information Quality Goal
18. Fellegi-Holt Example
• Identify Errors with Automated Edit
Detection Software
• Hot Deck acceptable values from
Records that Pass Edits
• Can be worth doing if errors are
minor or cosmetic (e.g., Rounding)
19. More on Imputation
• Treat Influential Errors Individually
not just Automatically
• That Said, Software Fixes can lead
to Better Documentation (Paradata
Matters)
• Need to Measure Variance Impacts
• Provide a natural break to
Overediting but seldom used for this.
20. Edit/Imputation Summary
• Most Editing Mainly
Eliminates the Bad
• Replacing it with a
(Good?)Guess of some Sort
• Imputation emphasizes
Guessing even more
21. More Editing/Imputation
• Best Imputation Practice tries to
quantify Guessing impact on
Information Quality
• Editing has not improved as much as
Imputation
• Editing/Imputation needs more Joint
Theory, especially to Measure and
Use Mean Square Error Impacts
22. First Illustrative Example
• Fabrication/Falsification
• Illustrate the General Points
about Editing and Imputation
• Emphasize Importance of
Fabrication threat to Quality
24. Right Structure
Right Resources
• Examine Practice Elsewhere?
• www.amstat.org Website
• Key is right incentives
• Good staff/training
• But Eternal Vigilance
25. Second Illustration
• Raking Application at NSS
• To link up to Next Talk
• To illustrate Information
Quality that is fit for use
despite Data Quality
26. Raking Quality “Fix”
• What is Raking?
• How does it improve quality?
Not Data Quality
But Information Quality
• Sometimes both --
Better Point Estimates
More Stable (smaller variances)
27. Quality Summary
• Editing Data Quality
• Imputation Information Quality
• Raking Information Quality
• Fabrication Can Harm Both
• Must be guarded against always
28. Almost Done Now
• Tried to Stay Practical, with a Frank
Discussion of Key Weaknesses in
Current Practice
• Deeper Understanding of Data
Quality
• But at an Applied Level