10th Annual Utah's Health Services Research Conference - Data Quality in Multi-Site Health Services and Comparative Effective Research: Lessons from PHIS+ By: Ram Gouripeddi

Data Quality in Multi-Site Health Services
and Comparative Effective Research: Lessons
from PHIS+
Ram Gouripeddi
University of Utah
10th Annual Utah Health Services Research Conference
Considering Data Quality in Health Services Research
Monday, March 16, 2015

Acknowledgements
• Raj Srivastava, MD, MPH
• Ron Keren MD, MPH
• OpenFurther Team members
• PHIS+ Team members across multiple institutions
• Apelon
• FURTHeR development was supported by the NCRR and the NCATS, NIH, through
Grant UL1RR025764 and supplement 3UL1RR025764-02S2. This project was
funded under grant number R01 HS019862-01 from the AHRQ, U.S. Department of
Health and Human Services (HHS). The opinions expressed [in this document] are
those of the authors and do not reflect the official position of AHRQ or the HHS.
• PHIS+: www.childrenshospitals.org/phisplus/index.html
2

PHIS+
• Augment Children’s Hospital Association’s (CHA)
existing electronic database of administrative data -
Pediatric Health Information System (PHIS) with clinical
data to conduct Comparative Effectiveness Research
studies.
• UU Biomedical Informatics Core - Informatics Partners
• Agency for Healthcare Research and Quality (AHRQ)
funded project.
3

PHIS+ Overview
Pneumonia
Appendicitis
Osteomyelitis
Gastroesophageal
Reflux Disease
CER Studies
4
4

The PHIS+ Process
6
1. Cincinnati Children’s Hospital
Medical Center (CCHMC)
3. Children’s Hospital of Philadelphia
(CHOP)
5. Primary Children’s Medical Center,
Intermountain Healthcare (PCMC)
2. Children’s Hospital Boston (CHB)
4. Children’s Hospital of Pittsburgh
(CHP)
6. Seattle Children’s Hospital (SCH)
5
1
2
3
4

Developmental Process Overview
Narus et. al, Federating Clinical Data from Six Pediatric Hospitals: Process and Initial Results from the PHIS+
Consortium. AMIA 2011
7

Modeling & Terminology Phase
• Data Model Harmonization
• Semantic Mapping
• Steps ensured quality of the data by limiting
information losses arising from data
transformations
8

Data Model Harmonization
• Informatics team worked with domain experts
to create representative common data models
for storage of different domains of data.
• Then with each hospital’s IT to harmonize
their data models with the common data
models.
9

Semantic Mapping
• Obtained detailed information
about distinct local data
elements using a metadata
collection toolkit
• Mapped local data elements to
standard biomedical
terminologies.
• Doubtful mappings discussed
with their respective hospital
team inclusive of the site PI,
lab and EHR personnel.
• All mapping peer-reviewed
within the informatics team,
with the contributing hospital
team, and also run through
software checks.
10
Metadata Fields Example
Local Battery/Panel
Name/Code
Battery/Panel Description
Local Test Name Glucose
Local Test Code Glu
Test Description Blood Glucose
LOINC Code -
Test Value Type Numeric
Test Value Sample Data 86
Test Start Date Format
Test End Date Format
Specimen Serum
Units of Measure mg/dL
Reference Range 80 – 120
Interpretation Codes
Test Status Codes
Comments

Differences in Local Coding Schemas
C Reactive
Protein
[Mass/volume]
in Serum or
Plasma (1988-5)
C Reactive
Protein
(8726)
C Reactive
Protein
(CRPT)
CRP (CRP)
CRP Test
(700111)
C-Reactive
Protein
(801582)
C R Protein
(801679)
11
Nanogram/
Decilitre
(258805003)
NG/DL
ng/dL
ng/dL
ng per
dL
ng/Dl
ng per
dL
Laboratory Test Unit of Measure

Data Processing Phase
• Data collection phase: Each hospital used a
combination of a data collection toolkit and data
validation scripts to assess their submitted data.
• Contributed data was then processed through the
OpenFurther platform for translation to selected
standard terminologies and storage in common data
models.
• Each row of processed data was check for different
data quality issues specific to each domain.
• Errors in the data were flagged with an error taxonomy
and reviewed for fixes or resubmissions.
12

Example Checks
• Is the lab test associated with a patient?
• Is there a valid lab test in each row of lab
result data?
• Does the lab test have a result a valid result?
• Are there proper relationships between
cultures, their test specimens and results?
13

Study Specific Quality Assessment
• Individual studies have
different granularities and
specificities in their data
requirements.
• We undertook a second set
of data quality assessments
at the study cohort level.
• This included a chart review
of a significant sample
within each study cohort.
14
0
5000
10000
15000
20000
25000
30000
35000
0.6
1.9
2.9
3.76
4.5
5.4
6.4
7.4
8.4
9.4
10.4
11.4
12.6
13.7
****
>9.0
QNStorepeat
2823-3: Potassium
[Moles/volume] in Serum or
Plasma

PHIS+ CER Database – 2007-11
Site Results
LOINC Lab Test
Code
A 15,011,312 538
B 33,214,540 1,214
C 16,868,383 860
D 25,706,608 1,089
E 38,422,668 1,016
F 14,507,629 2,131
Total 150,731,140 *6,848 (2,992)
Site Culture Results
SNOMED Specimen
Code
SNOMED Culture
Procedure Code
SNOMED Organism
Code
RxNorm Anti-
microbial Code
Susceptibility
Results
LOINC Susceptibility
Test Code
A 247,933 114 70 113 57 487,813 97
B 359,780 58 42 56 58 393,594 85
C 231,071 179 46 162 59 340,100 99
D 335,606 110 34 145 57 376,844 75
E 486,315 130 56 160 59 605,000 76
F 176,848 264 71 121 51 283,865 89
Total 1,837,553 *855 (451) *319 (95) *757 (203) *341 (74) 2,487,216 *521 (136)
Site Reports
CPT Radiology
Procedure Code
A 445,681 280
B 1,151,383 349
C 635,458 296
D 980,740 482
E 1,098,693 497
F 201,708 477
Total 4,513,663 *2,381 (714)
Laboratory Radiology
Microbiology
* The first number is the total number of standard codes, the second in parenthesis is the distinct number of standard codes across all sites.
1,854,406
Kids

Discussion
• We developed an infrastructure that assesses the
quality of data being integrated from disparate data
sources.
• Using this infrastructure we populated a database with
high quality data to support HSR & CER.
• To ensure data quality a combination of computerized
data assessment checks within OpenFurther and
manual checks were used.
• Global and study specific data quality assessments
were required
– Address systemic issues in data integration and study
specific issues.
16

Discussion
• Informed by the framework developed by Kahn et. al in “A
Pragmatic Framework for Single-site and Multisite Data
Quality Assessment in Electronic Health Record-based
Clinical Research”
• Inherent dimensions such as Accuracy, Objectivity and
Believability; and Conceptual dimensions such as
Timeliness and Appropriate amount of data were
measured.
• A software platform that complies with existing theoretic
frameworks of data quality can assist this process and
speed up the process of generating new and reproducible
study results.
– A Data Model for Representation and Storage of Biomedical
Data Quality, Breakout Session 3 – Strategies for Identifying
Data Quality Issues
17

10th Annual Utah's Health Services Research Conference - Data Quality in Multi-Site Health Services and Comparative Effective Research: Lessons from PHIS+ By: Ram Gouripeddi

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (16)

Ähnlich wie 10th Annual Utah's Health Services Research Conference - Data Quality in Multi-Site Health Services and Comparative Effective Research: Lessons from PHIS+ By: Ram Gouripeddi

Ähnlich wie 10th Annual Utah's Health Services Research Conference - Data Quality in Multi-Site Health Services and Comparative Effective Research: Lessons from PHIS+ By: Ram Gouripeddi (20)

Mehr von Utah's Annual Health Services Research Conference

Mehr von Utah's Annual Health Services Research Conference (10)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

10th Annual Utah's Health Services Research Conference - Data Quality in Multi-Site Health Services and Comparative Effective Research: Lessons from PHIS+ By: Ram Gouripeddi

Hinweis der Redaktion