10th Annual Utah's Health Services Research Conference - Data Quality in Multi-Site Health Services and Comparative Effective Research: Lessons from PHIS+ By: Ram Gouripeddi
This document summarizes a presentation about ensuring data quality in the PHIS+ consortium, which integrates clinical and administrative data across multiple children's hospitals for comparative effectiveness research. It describes the process of developing common data models, semantically mapping local data elements to standards, collecting data using a toolkit with validation, processing the data through a platform to standardize terminology and storage, and conducting various automated and manual checks for data quality issues. These included checks for missing or invalid data, relationships between test results and specimens/cultures, and study-specific assessments through chart review. The final database contained over 4.5 million records across various domains with standardized coding to support health services research.
10th Annual Utah's Health Services Research Conference - Clinical and Economi...
Ähnlich wie 10th Annual Utah's Health Services Research Conference - Data Quality in Multi-Site Health Services and Comparative Effective Research: Lessons from PHIS+ By: Ram Gouripeddi
Provenance abstraction for implementing security: Learning Health System and ...Vasa Curcin
Ähnlich wie 10th Annual Utah's Health Services Research Conference - Data Quality in Multi-Site Health Services and Comparative Effective Research: Lessons from PHIS+ By: Ram Gouripeddi (20)
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
10th Annual Utah's Health Services Research Conference - Data Quality in Multi-Site Health Services and Comparative Effective Research: Lessons from PHIS+ By: Ram Gouripeddi
1. Data Quality in Multi-Site Health Services
and Comparative Effective Research: Lessons
from PHIS+
Ram Gouripeddi
University of Utah
10th Annual Utah Health Services Research Conference
Considering Data Quality in Health Services Research
Monday, March 16, 2015
2. Acknowledgements
• Raj Srivastava, MD, MPH
• Ron Keren MD, MPH
• OpenFurther Team members
• PHIS+ Team members across multiple institutions
• Apelon
• FURTHeR development was supported by the NCRR and the NCATS, NIH, through
Grant UL1RR025764 and supplement 3UL1RR025764-02S2. This project was
funded under grant number R01 HS019862-01 from the AHRQ, U.S. Department of
Health and Human Services (HHS). The opinions expressed [in this document] are
those of the authors and do not reflect the official position of AHRQ or the HHS.
• PHIS+: www.childrenshospitals.org/phisplus/index.html
2
3. PHIS+
• Augment Children’s Hospital Association’s (CHA)
existing electronic database of administrative data -
Pediatric Health Information System (PHIS) with clinical
data to conduct Comparative Effectiveness Research
studies.
• UU Biomedical Informatics Core - Informatics Partners
• Agency for Healthcare Research and Quality (AHRQ)
funded project.
3
5. The PHIS+ Process
6
1. Cincinnati Children’s Hospital
Medical Center (CCHMC)
3. Children’s Hospital of Philadelphia
(CHOP)
5. Primary Children’s Medical Center,
Intermountain Healthcare (PCMC)
2. Children’s Hospital Boston (CHB)
4. Children’s Hospital of Pittsburgh
(CHP)
6. Seattle Children’s Hospital (SCH)
5
1
2
3
4
7. Developmental Process Overview
Narus et. al, Federating Clinical Data from Six Pediatric Hospitals: Process and Initial Results from the PHIS+
Consortium. AMIA 2011
7
8. Modeling & Terminology Phase
• Data Model Harmonization
• Semantic Mapping
• Steps ensured quality of the data by limiting
information losses arising from data
transformations
8
9. Data Model Harmonization
• Informatics team worked with domain experts
to create representative common data models
for storage of different domains of data.
• Then with each hospital’s IT to harmonize
their data models with the common data
models.
9
10. Semantic Mapping
• Obtained detailed information
about distinct local data
elements using a metadata
collection toolkit
• Mapped local data elements to
standard biomedical
terminologies.
• Doubtful mappings discussed
with their respective hospital
team inclusive of the site PI,
lab and EHR personnel.
• All mapping peer-reviewed
within the informatics team,
with the contributing hospital
team, and also run through
software checks.
10
Metadata Fields Example
Local Battery/Panel
Name/Code
Battery/Panel Description
Local Test Name Glucose
Local Test Code Glu
Test Description Blood Glucose
LOINC Code -
Test Value Type Numeric
Test Value Sample Data 86
Test Start Date Format
Test End Date Format
Specimen Serum
Units of Measure mg/dL
Reference Range 80 – 120
Interpretation Codes
Test Status Codes
Comments
11. Differences in Local Coding Schemas
C Reactive
Protein
[Mass/volume]
in Serum or
Plasma (1988-5)
C Reactive
Protein
(8726)
C Reactive
Protein
(CRPT)
CRP (CRP)
CRP Test
(700111)
C-Reactive
Protein
(801582)
C R Protein
(801679)
11
Nanogram/
Decilitre
(258805003)
NG/DL
ng/dL
ng/dL
ng per
dL
ng/Dl
ng per
dL
Laboratory Test Unit of Measure
12. Data Processing Phase
• Data collection phase: Each hospital used a
combination of a data collection toolkit and data
validation scripts to assess their submitted data.
• Contributed data was then processed through the
OpenFurther platform for translation to selected
standard terminologies and storage in common data
models.
• Each row of processed data was check for different
data quality issues specific to each domain.
• Errors in the data were flagged with an error taxonomy
and reviewed for fixes or resubmissions.
12
13. Example Checks
• Is the lab test associated with a patient?
• Is there a valid lab test in each row of lab
result data?
• Does the lab test have a result a valid result?
• Are there proper relationships between
cultures, their test specimens and results?
13
14. Study Specific Quality Assessment
• Individual studies have
different granularities and
specificities in their data
requirements.
• We undertook a second set
of data quality assessments
at the study cohort level.
• This included a chart review
of a significant sample
within each study cohort.
14
0
5000
10000
15000
20000
25000
30000
35000
0.6
1.9
2.9
3.76
4.5
5.4
6.4
7.4
8.4
9.4
10.4
11.4
12.6
13.7
****
>9.0
QNStorepeat
2823-3: Potassium
[Moles/volume] in Serum or
Plasma
15. PHIS+ CER Database – 2007-11
Site Results
LOINC Lab Test
Code
A 15,011,312 538
B 33,214,540 1,214
C 16,868,383 860
D 25,706,608 1,089
E 38,422,668 1,016
F 14,507,629 2,131
Total 150,731,140 *6,848 (2,992)
Site Culture Results
SNOMED Specimen
Code
SNOMED Culture
Procedure Code
SNOMED Organism
Code
RxNorm Anti-
microbial Code
Susceptibility
Results
LOINC Susceptibility
Test Code
A 247,933 114 70 113 57 487,813 97
B 359,780 58 42 56 58 393,594 85
C 231,071 179 46 162 59 340,100 99
D 335,606 110 34 145 57 376,844 75
E 486,315 130 56 160 59 605,000 76
F 176,848 264 71 121 51 283,865 89
Total 1,837,553 *855 (451) *319 (95) *757 (203) *341 (74) 2,487,216 *521 (136)
Site Reports
CPT Radiology
Procedure Code
A 445,681 280
B 1,151,383 349
C 635,458 296
D 980,740 482
E 1,098,693 497
F 201,708 477
Total 4,513,663 *2,381 (714)
Laboratory Radiology
Microbiology
* The first number is the total number of standard codes, the second in parenthesis is the distinct number of standard codes across all sites.
1,854,406
Kids
16. Discussion
• We developed an infrastructure that assesses the
quality of data being integrated from disparate data
sources.
• Using this infrastructure we populated a database with
high quality data to support HSR & CER.
• To ensure data quality a combination of computerized
data assessment checks within OpenFurther and
manual checks were used.
• Global and study specific data quality assessments
were required
– Address systemic issues in data integration and study
specific issues.
16
17. Discussion
• Informed by the framework developed by Kahn et. al in “A
Pragmatic Framework for Single-site and Multisite Data
Quality Assessment in Electronic Health Record-based
Clinical Research”
• Inherent dimensions such as Accuracy, Objectivity and
Believability; and Conceptual dimensions such as
Timeliness and Appropriate amount of data were
measured.
• A software platform that complies with existing theoretic
frameworks of data quality can assist this process and
speed up the process of generating new and reproducible
study results.
– A Data Model for Representation and Storage of Biomedical
Data Quality, Breakout Session 3 – Strategies for Identifying
Data Quality Issues
17
Initial Data Analysis for Terminology & Modeling
Harmonization of Site Specific Models to FURTHeR Input Model
Input Data Submission Specifications
Trial Sample Submission
PHIS+ Storage Model
Software Development for Translations
Mapping of Local Terminologies to standards
Evaluation with 1 Year data
Installation of Infrastructure at CHA
Terminology and Software Support – Current Status
This slide has the hospital-wise and total numbers in the PHIS+ CER database. It currently stores all laboratory, microbiology and radiology data for the years 2007 – 2011 with plans to add future year data loads from these six pediatric hospitals. Using the FURTHeR infrastructure, data from each hospital has been converted to a central model that uses standard terminologies – which allows performing CER.
The database at this time consists of about: 143 million lab results, 1.8 million culture results and 2.4 susceptibility results, and 4.2 million radiology reports with distinct study codes in terms of standard codes for each of the data-streams.