SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Clinical Data Munging
DO NOT ALTER THE RAW DATA
We must all accept that science is data and
that data are science, and thus provide
for, and justify the need for the support
of, much improved data curation.
Brooks Hanson , Andrew Sugden , Bruce Alberts (Science Editorial February 11th 2011)
Data Munging?
• Manipulating raw data to achieve a final
form
• Parsing or filtering data, or the many steps
required for data recognition.
• Cleaning the raw data using algorithms
(e.g. sorting) or parsing the data into
predefined data structures.
Clinical Data Munging?
• Following clinical research ethics to
manipulate clinical data to achieve an
acceptable form
– Respect of Persons (Autonomy)
– Data Security and Storage
– Data Integrity / Data Quality
– Privacy and Confidentiality
Why Clinical data
munging ?
• Analyst devotes up to 85% of total time to
data cleaning and preparation.
• Health science is driven by data than by
computation
• Identify missing data
Why data munging? Cont.
• Extreme Scores - Data value falling
outside the expected range
• Identify erroneous dates
• Confounders
Phases in clinical Data
Munging
• Screening
Phase:
– lack or excess of
data;
– inconsistencies;
– strange patterns
in distributions;
– unexpected
analysis results
and other types
of inferences and
abstractions
Phases in clinical Data
Munging
• Diagnostic
Phase: The
purpose is to clarify
the true nature of the
worrisome data
points, patterns, and
statistics.
-Documentation
should start at this
point.
• Treatment
Phase: What to do
with problematic
observation. The
options are limited to
correcting, deleting,
or leaving
unchanged.
Phases in clinical Data
Munging
Data
Warehouse
Core1
Core2
Core3
Data screening?
• Understand the clinical data and the
different clinical data variables
• Categorise the data into groups/cores
• Determine the unique identifier
• Check data normality using frequency
distributions, skweness and kurtosis
,summary statistics and cross-tabulations
Data visualization
Missing values
• Occur if respondents refuse to answer,
malfunction of tools, subjects withdrawal
from studies
• Missing values are categorized as
– MAR ,MCAR or MNAR
• Most modern stat packages require
complete data
Dealing with Missing Values
• Use analysis that can deal with incomplete
data (Hierarchical Linear Modelling),survival
analysis
• Adjusting the denominator – remove the
unmarried from married
• Delete values with missing data- lead to
misestimating of population thus lower the
power
• Mean substitution – reduces the variance
• Imputation via multiple regression
Erroneous dates
Extreme Scores (Fringelier,
Outlier)
Other Data Errors
• Duplications- take the first admission using
time
• Biologically impossible results
– Robust estimation: Estimation of statistical
parameters, using methods that are less
sensitive to the effect of outliers than more
conventional methods
• Questionable values
Given the rapid growth of the internet such
techniques will become increasingly
important in the organization of the growing
amounts of data available.
Large synoptic survey telescope 40tb of
data per day calls for a different way of
approach….100+PB of data in 10 yrs
tOOLs for a Clinical Data
Munger
Features Stata R SPSS SAS
Learning
Curve
Steep/Gradual Pretty Steep GradualFlat Pretty Steep
User
Interface
Code/PnC Code Mostly PnC Very Strong
Data
Manipulation
Very Strong Very Strong Moderate Very Strong
Data Analysis Versatile Versatile Powerful Powerful/Vers
atile
Graphics V good Excellent v good good
Cost Renewal on
upgrade -
affordable
Open Source Expensive Expensive(yea
rly renewal)
Other Important Tools
• Python - Getting real time data from social
networks
• Nvivo – for qualitative data
• perl
Asante
Q?

Weitere ähnliche Inhalte

Was ist angesagt?

Berman pcori challenge powerpoint
Berman pcori challenge powerpointBerman pcori challenge powerpoint
Berman pcori challenge powerpoint
Lew Berman
 
Heart Rate Monitoring Saves Lives of Premature Infants: Results of HeRO Clini...
Heart Rate Monitoring Saves Lives of Premature Infants: Results of HeRO Clini...Heart Rate Monitoring Saves Lives of Premature Infants: Results of HeRO Clini...
Heart Rate Monitoring Saves Lives of Premature Infants: Results of HeRO Clini...
CvilleDataScience
 
EventFlow Presentation
EventFlow PresentationEventFlow Presentation
EventFlow Presentation
madeyjay
 
D1S1T3N4_Pratibha Jalui & Reetabrata Bhattacharyya
D1S1T3N4_Pratibha Jalui & Reetabrata BhattacharyyaD1S1T3N4_Pratibha Jalui & Reetabrata Bhattacharyya
D1S1T3N4_Pratibha Jalui & Reetabrata Bhattacharyya
Reetabrata Bhattacharyya
 

Was ist angesagt? (20)

Berman pcori challenge powerpoint
Berman pcori challenge powerpointBerman pcori challenge powerpoint
Berman pcori challenge powerpoint
 
Clinical Data Management: Strategies for unregulated data
Clinical Data Management: Strategies for unregulated dataClinical Data Management: Strategies for unregulated data
Clinical Data Management: Strategies for unregulated data
 
12 michelle dalton conul acil
12   michelle dalton conul acil12   michelle dalton conul acil
12 michelle dalton conul acil
 
Heart Rate Monitoring Saves Lives of Premature Infants: Results of HeRO Clini...
Heart Rate Monitoring Saves Lives of Premature Infants: Results of HeRO Clini...Heart Rate Monitoring Saves Lives of Premature Infants: Results of HeRO Clini...
Heart Rate Monitoring Saves Lives of Premature Infants: Results of HeRO Clini...
 
Systematic review ppt
Systematic review pptSystematic review ppt
Systematic review ppt
 
A Standards-based Approach to Development of Clinical Registries - NZ Gestati...
A Standards-based Approach to Development of Clinical Registries - NZ Gestati...A Standards-based Approach to Development of Clinical Registries - NZ Gestati...
A Standards-based Approach to Development of Clinical Registries - NZ Gestati...
 
How to extract quantitative data for systematic review and meta analysis - Pu...
How to extract quantitative data for systematic review and meta analysis - Pu...How to extract quantitative data for systematic review and meta analysis - Pu...
How to extract quantitative data for systematic review and meta analysis - Pu...
 
Systematic review
Systematic reviewSystematic review
Systematic review
 
The End of the Drug Development Casino?
The End of the Drug Development Casino?The End of the Drug Development Casino?
The End of the Drug Development Casino?
 
Application of business analytics in healthcare
Application of business analytics in healthcareApplication of business analytics in healthcare
Application of business analytics in healthcare
 
Machine Learning for Preclinical Research
Machine Learning for Preclinical ResearchMachine Learning for Preclinical Research
Machine Learning for Preclinical Research
 
Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2
 
EventFlow Presentation
EventFlow PresentationEventFlow Presentation
EventFlow Presentation
 
D1S1T3N4_Pratibha Jalui & Reetabrata Bhattacharyya
D1S1T3N4_Pratibha Jalui & Reetabrata BhattacharyyaD1S1T3N4_Pratibha Jalui & Reetabrata Bhattacharyya
D1S1T3N4_Pratibha Jalui & Reetabrata Bhattacharyya
 
Systematic review
Systematic reviewSystematic review
Systematic review
 
lecture 9 B
lecture 9 Blecture 9 B
lecture 9 B
 
How to conduct abstract screening for systematic review – Pubrica
How to conduct abstract screening for systematic review – PubricaHow to conduct abstract screening for systematic review – Pubrica
How to conduct abstract screening for systematic review – Pubrica
 
Checking in on Healthcare Data Analytics
Checking in on Healthcare Data AnalyticsChecking in on Healthcare Data Analytics
Checking in on Healthcare Data Analytics
 
Systematic Review
Systematic ReviewSystematic Review
Systematic Review
 
Systematic review: teams, processes, experiences
Systematic review: teams, processes, experiencesSystematic review: teams, processes, experiences
Systematic review: teams, processes, experiences
 

Andere mochten auch (15)

การเปลี่ยนแปลงทางเพศของวัยรุ่น
การเปลี่ยนแปลงทางเพศของวัยรุ่นการเปลี่ยนแปลงทางเพศของวัยรุ่น
การเปลี่ยนแปลงทางเพศของวัยรุ่น
 
Revised BHE logo
Revised BHE logoRevised BHE logo
Revised BHE logo
 
Andrew Scott Goodman Design Portfolio
Andrew Scott Goodman Design Portfolio Andrew Scott Goodman Design Portfolio
Andrew Scott Goodman Design Portfolio
 
Ideologies
IdeologiesIdeologies
Ideologies
 
Cv ashley haussman
Cv ashley haussmanCv ashley haussman
Cv ashley haussman
 
20121011 unibo expert system
20121011 unibo expert system20121011 unibo expert system
20121011 unibo expert system
 
BHERKELEY Health Edu Revised
BHERKELEY Health Edu RevisedBHERKELEY Health Edu Revised
BHERKELEY Health Edu Revised
 
Wat eclipse
Wat eclipseWat eclipse
Wat eclipse
 
BHERKELEY Health Edu
BHERKELEY Health EduBHERKELEY Health Edu
BHERKELEY Health Edu
 
Cv ashley haussman
Cv ashley haussmanCv ashley haussman
Cv ashley haussman
 
20121011 cream campus
20121011 cream campus20121011 cream campus
20121011 cream campus
 
Slides 111017220255-phpapp01
Slides 111017220255-phpapp01Slides 111017220255-phpapp01
Slides 111017220255-phpapp01
 
Presentation1
Presentation1Presentation1
Presentation1
 
Procesos Mentales
Procesos MentalesProcesos Mentales
Procesos Mentales
 
Examen 2° grado primer bimestre
Examen 2° grado primer bimestreExamen 2° grado primer bimestre
Examen 2° grado primer bimestre
 

Ähnlich wie Clinical data munging

Bigdatapdi2015 150112111012-conversion-gate02
Bigdatapdi2015 150112111012-conversion-gate02Bigdatapdi2015 150112111012-conversion-gate02
Bigdatapdi2015 150112111012-conversion-gate02
soniamra
 
Computer System Validation - privacy zones, eSource and EHR data in clinical ...
Computer System Validation - privacy zones, eSource and EHR data in clinical ...Computer System Validation - privacy zones, eSource and EHR data in clinical ...
Computer System Validation - privacy zones, eSource and EHR data in clinical ...
Wolfgang Kuchinke
 
Computer System Validation with privacy zones, e-source and clinical trials b...
Computer System Validation with privacy zones, e-source and clinical trials b...Computer System Validation with privacy zones, e-source and clinical trials b...
Computer System Validation with privacy zones, e-source and clinical trials b...
Wolfgang Kuchinke
 
Computer validation of e-source and EHR in clinical trials-Kuchinke
Computer validation of e-source and EHR in clinical trials-KuchinkeComputer validation of e-source and EHR in clinical trials-Kuchinke
Computer validation of e-source and EHR in clinical trials-Kuchinke
Wolfgang Kuchinke
 

Ähnlich wie Clinical data munging (20)

Bigdatapdi2015 150112111012-conversion-gate02
Bigdatapdi2015 150112111012-conversion-gate02Bigdatapdi2015 150112111012-conversion-gate02
Bigdatapdi2015 150112111012-conversion-gate02
 
Big Data: Big Opportunities or Big Trouble?
Big Data: Big Opportunities or Big Trouble?Big Data: Big Opportunities or Big Trouble?
Big Data: Big Opportunities or Big Trouble?
 
Dealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysisDealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysis
 
analysis of data.pptx
analysis of data.pptxanalysis of data.pptx
analysis of data.pptx
 
Data_Collection.pptx.pptx
Data_Collection.pptx.pptxData_Collection.pptx.pptx
Data_Collection.pptx.pptx
 
Clinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-StatisticiansClinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-Statisticians
 
Building a Next Generation Clinical and Scientific Data Management Solution
Building a Next Generation Clinical and Scientific Data Management SolutionBuilding a Next Generation Clinical and Scientific Data Management Solution
Building a Next Generation Clinical and Scientific Data Management Solution
 
Introduction to Data Analytics.pptx
Introduction to Data Analytics.pptxIntroduction to Data Analytics.pptx
Introduction to Data Analytics.pptx
 
The Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based MedicineThe Uneven Future of Evidence-Based Medicine
The Uneven Future of Evidence-Based Medicine
 
Computer System Validation - privacy zones, eSource and EHR data in clinical ...
Computer System Validation - privacy zones, eSource and EHR data in clinical ...Computer System Validation - privacy zones, eSource and EHR data in clinical ...
Computer System Validation - privacy zones, eSource and EHR data in clinical ...
 
Computer System Validation with privacy zones, e-source and clinical trials b...
Computer System Validation with privacy zones, e-source and clinical trials b...Computer System Validation with privacy zones, e-source and clinical trials b...
Computer System Validation with privacy zones, e-source and clinical trials b...
 
Computer validation of e-source and EHR in clinical trials-Kuchinke
Computer validation of e-source and EHR in clinical trials-KuchinkeComputer validation of e-source and EHR in clinical trials-Kuchinke
Computer validation of e-source and EHR in clinical trials-Kuchinke
 
Data mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industryData mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industry
 
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
 
Research Methodology Workshop - Quantitative and Qualitative
Research Methodology Workshop - Quantitative and QualitativeResearch Methodology Workshop - Quantitative and Qualitative
Research Methodology Workshop - Quantitative and Qualitative
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
inset-2023.pptx
inset-2023.pptxinset-2023.pptx
inset-2023.pptx
 
Principles of data collection.pptx
Principles of data collection.pptxPrinciples of data collection.pptx
Principles of data collection.pptx
 
Data Cleaning Best Practices.pdf
Data Cleaning Best Practices.pdfData Cleaning Best Practices.pdf
Data Cleaning Best Practices.pdf
 
Edited assignment in research
Edited assignment in researchEdited assignment in research
Edited assignment in research
 

Kürzlich hochgeladen

Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
chetankumar9855
 

Kürzlich hochgeladen (20)

VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
 
Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Guntur  Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Guntur  Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
 
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
 
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
 
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
 
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
 
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
 
Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...
 
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
 
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 

Clinical data munging

  • 1. Clinical Data Munging DO NOT ALTER THE RAW DATA
  • 2. We must all accept that science is data and that data are science, and thus provide for, and justify the need for the support of, much improved data curation. Brooks Hanson , Andrew Sugden , Bruce Alberts (Science Editorial February 11th 2011)
  • 3. Data Munging? • Manipulating raw data to achieve a final form • Parsing or filtering data, or the many steps required for data recognition. • Cleaning the raw data using algorithms (e.g. sorting) or parsing the data into predefined data structures.
  • 4. Clinical Data Munging? • Following clinical research ethics to manipulate clinical data to achieve an acceptable form – Respect of Persons (Autonomy) – Data Security and Storage – Data Integrity / Data Quality – Privacy and Confidentiality
  • 5. Why Clinical data munging ? • Analyst devotes up to 85% of total time to data cleaning and preparation. • Health science is driven by data than by computation • Identify missing data
  • 6. Why data munging? Cont. • Extreme Scores - Data value falling outside the expected range • Identify erroneous dates • Confounders
  • 7. Phases in clinical Data Munging • Screening Phase: – lack or excess of data; – inconsistencies; – strange patterns in distributions; – unexpected analysis results and other types of inferences and abstractions
  • 8. Phases in clinical Data Munging • Diagnostic Phase: The purpose is to clarify the true nature of the worrisome data points, patterns, and statistics. -Documentation should start at this point. • Treatment Phase: What to do with problematic observation. The options are limited to correcting, deleting, or leaving unchanged.
  • 9. Phases in clinical Data Munging Data Warehouse Core1 Core2 Core3
  • 10. Data screening? • Understand the clinical data and the different clinical data variables • Categorise the data into groups/cores • Determine the unique identifier • Check data normality using frequency distributions, skweness and kurtosis ,summary statistics and cross-tabulations
  • 12. Missing values • Occur if respondents refuse to answer, malfunction of tools, subjects withdrawal from studies • Missing values are categorized as – MAR ,MCAR or MNAR • Most modern stat packages require complete data
  • 13. Dealing with Missing Values • Use analysis that can deal with incomplete data (Hierarchical Linear Modelling),survival analysis • Adjusting the denominator – remove the unmarried from married • Delete values with missing data- lead to misestimating of population thus lower the power • Mean substitution – reduces the variance • Imputation via multiple regression
  • 16. Other Data Errors • Duplications- take the first admission using time • Biologically impossible results – Robust estimation: Estimation of statistical parameters, using methods that are less sensitive to the effect of outliers than more conventional methods • Questionable values
  • 17. Given the rapid growth of the internet such techniques will become increasingly important in the organization of the growing amounts of data available. Large synoptic survey telescope 40tb of data per day calls for a different way of approach….100+PB of data in 10 yrs
  • 18. tOOLs for a Clinical Data Munger Features Stata R SPSS SAS Learning Curve Steep/Gradual Pretty Steep GradualFlat Pretty Steep User Interface Code/PnC Code Mostly PnC Very Strong Data Manipulation Very Strong Very Strong Moderate Very Strong Data Analysis Versatile Versatile Powerful Powerful/Vers atile Graphics V good Excellent v good good Cost Renewal on upgrade - affordable Open Source Expensive Expensive(yea rly renewal)
  • 19. Other Important Tools • Python - Getting real time data from social networks • Nvivo – for qualitative data • perl

Hinweis der Redaktion

  1. This is convenient to distinguish following areas lack or excess of data; outliers, including inconsistencies; strange patterns in (joint) distributions; and unexpected analysis results and other types of inferences and abstractions
  2. . During the diagnostic phase, the data munger may have to reconsider prior expectations and/or review quality assurance procedures.
  3. data sink for storage, modelling or future use.
  4. Graphical exploration of distributions: box plots, histograms, and scatter plots.Plots of repeated measurements on the same individual, e.g., growth curves.Statistical outlier detection
  5. In statistics, hierarchical linear modeling (HLM), also known as multi-level analysis, is a more advanced form of simple linear regressi...
  6. - (transform, truncate)-Robust estimation: Estimation of statistical parameters, using methods that are less sensitive to the effect of outliers than more conventional methods. Accomodate and reduce errrors(LSE, TrimedMEAN,Windsorized mean – mean by removing extreme and calculate with the closest)