IRIDA's Genomic epidemiology application ontology for data standardization, integration and sharing. Presented at IMMEM XI in Estoril, Portugal, March 11 2016.
Ähnlich wie IRIDA's Genomic epidemiology application ontology (GenEpiO): Genomic, clinical and epidemiological data standardization and integration (20)
IRIDA's Genomic epidemiology application ontology (GenEpiO): Genomic, clinical and epidemiological data standardization and integration
1. IRIDA’s Genomic Epidemiology Application Ontology
(GenEpiO): Genomic, Clinical and Epidemiological Data
Standardization and Integration
Emma Griffiths
Brinkman Lab
Simon Fraser University, Greater Vancouver, Canada
On behalf of the IRIDA Ontology WG
(Will Hsiao & Damion Dooley (BC Public Health Lab), Fiona Brinkman (SFU)
IMMEM XI, Estoril, Portugal
March 11, 2016
2. Contextual Information is Crucial for Interpreting Genomics Data.
Microbial genomics is a high
resolution tool for identification.
2
3. 3
Contextual Information Needs to be Shared…..
So Keep the Next User in Mind.
International Partners Intervention Partners
6. “Ontologies are for the digital age what dictionaries were in the age of print.”
Logic
Vocabulary
Hierarchy
Knowledge Extraction
Ontology
Ontology, A Way of Structuring Information.
• Standardized, well-defined hierarchy terms
• interconnected with logical relationships
• “knowledge-generation engine”
=
6
7. Ontologies Standardize Vocabulary and Enable Complex Querying.
7
Simple Food Ontology Hierarchy
Animal Feed Poultry Water
Pellets Nuggets Deli Meats Bottled Well
Produce
Spinach Sprouts Whole Mice
Transmission
through_
ingestion or
contact
Treated
by_filtration
Taxonomy_Spniacea
oleracea
Preparation_Ready
-to-Eat
Animal
(Consumer)_
Snake
Synonym_Cold Cuts
8. Case Studies: Ontology Can Help Resolve Issues of Taxonomy, Granularity and Specificity.
Leafy Greens
Spinach Lettuce
EndiveIcebergSpinacia oleracea Amaranthus hybridus
Taxonomy_species
found in N. America
Taxonomy_species
found in S. Africa Equivalent Subtypes
of Lettuce
a) Taxonomy & Granularity
Poultry
Chicken Nuggets
b) Specificity
Breast
Processing_Ready-to-Eat
Composition_breading,
spices, chicken breast
Location of
Purchase_Retail
(Grocery Store vs
Butcher)
Preparation_marinated
8
9. Ontology Acts Like A Rosetta Stone.
• Need a common language
• Humans AND computers need to read it
• Mapping allows interoperability AND
customization
*ontologies can be translated into different human languages as wellRosetta Stone – Egypt, 196 BC
• stone tablet translating same text
into different ancient languages
9
12. To Develop a Useful Gen Epi Ontology, Engaging the End Users is Your
TOP Priority.
12
Medical & Environmental
Microbiologists
Bioinformaticians
Surveillance Analysts
& Lab Personnel
Epidemiologists
Software and Work Flows
Investigation ToolsInstrumentation
+ =
Interview users Examine resources
GenEpiO
(Genomic Epidemiology
Application Ontology)
13. GenEpiO Combines Different Epi, Lab, Genomics and Clinical Data Fields.
Lab Analytics
Genomics, PFGE
Serotyping, Phage typing
MLST, AMR
Sample Metadata
Isolation Source (Food, Host
Body Product,
Environmental), BioSample
Epidemiology Investigation
Exposures
Clinical Data
Patient demographics, Medical
History, Comorbidities,
Symptoms, Health Status
Reporting
Case/Investigation Status
13
GenEpiO
(Genomic Epidemiology
Application Ontology)
14. 14
Use computers to
identify common
exposures, symptoms
etc among genomics
clusters
Example: Automating Case Definition generation
Correlate Genomics Salmonella Cluster A cases between 01 Mar 2015- 15 Mar 2015 with
High-Risk Food Types Spinach Leafy Greens and Geographical Location of Vancouver
XXXXXXXXXXXXXX
GenEpiO Will Help Integrate Genomics and Epidemiological Data
in the IRIDA Platform.
15. 15
Integrated Rapid Infectious Disease Analysis Platform
Find out more about IRIDA from
Will Hsiao (BC Public Health Lab) on
Sat Mar 12 in the Molecular
Epidemiology and Public Health
session!
Website: IRIDA.ca
Email: IRIDA-mail@sfu.ca
GitHub: https://github.com/phac-nml/irida
16. GenEpiO has been Implemented in Different IRIDA Interfaces.
• Creates BioSample-Compliant Genome Submission Forms. 16
Metadata Manager: Data entry portal
• Implements GenEpiO terms
• Facilitates descriptive metadata
• Secure environment
• Selective sharing
17. IRIDA Offers Line List Visualizations of Selectable Data Based on GenEpiO Fields.
1. Line List
View
2. Timeline
View
Hideable cases
Selectable fields
Travel
Symptoms and Onset
Exposure Types
Hospitalization
19. 19
GenEpiO is Standardizing Terms for Reporting and Quality Control.
• Reproducibility
• Reproducibility
• Reproducibility
• Reproducibility
20. A Genomic Epidemiology Ontology has Advantages for Public Health.
Improved Public Health
Investigation power!
1. Eliminates semantic ambiguity
2. Term-mapping allows customization
3. Faster data integration
4. Standardized quality control and result reporting trigger actionable
events in same way
5. Reproducibility (accreditation, validation)
20
21. The Future Ontology Development Will Focus On Three Key Areas.
Food Antimicrobial
Resistance
Epidemiology
21
22. Genomic Epidemiology Ontology is Like Instrumentation for
Your Contextual Information…it Needs Maintenance and
Improvements.
We’re forming a Genomic Epidemiology Ontology Consortium.
Join us! 22
24. Acknowledgements
Integrated Rapid Infectious
Disease Analysis Project
www.IRIDA.ca
Primary Investigators
Fiona Brinkman – SFU
Will Hsiao – PHMRL
Gary Van Domselaar – NML
Co-Investigators
Dr. Rob Beiko - Dalhousie
Dr. Eduardo Taboada - LFZ
Dr. Morag Graham - NML
Dr. Joᾶo Andre Carrico – University of Lisbon
National Microbiology Laboratory (NML)
Franklin Bristow
Aaron Petkau
Thomas Matthews
Josh Adam
Adam Olsen
Tara Lynch
Shaun Tyler
Philip Mabon
Philip Au
Celine Nadon
Matthew Stuart-Edwards
Chrystal Berry
Lorelee Tschetter
Aleisha Reimer
Laboratory for Foodborne Zoonoses (LFZ)
Eduardo Toboada
Peter Kruczkiewicz
Chad Laing
Vic Gannon
Matthew Whiteside
Ross Duncan
Steven Mutschall
Simon Fraser University (SFU)
Emma Griffiths
Geoff Winsor
Julie Shay
Bhav Dhillon
Claire Bertelli
BC Public Health Microbiology &
Reference Laboratory (PHMRL) and BC
Centre for Disease Control (BCCDC)
Natalie Prystajecky
Jennifer Gardy
Linda Hoang
Kim MacDonald
Yin Chang
Eleni Galanis
Marsha Taylor
Damion Dooley
Cletus D’Souza
University of Maryland
Lynn Schriml
Canadian Food Inspection Agency (CFIA)
Adam Koziol
Burton Blais
Catherine Carrillo
Dalhousie University
Alex Keddy
24
Hinweis der Redaktion
Ontology: a way of organizing information in a hierarchy of well defined terms that are interconnected with logical relationships
Well defined, reuse terms from different domains, IDs to disambiguate meaning and control for synonyms
Integrates different data types, extra information layer provides “knowledge-generation engine”
Taxonomy differences (domesticated vs wild types, between countries eg spinach not the same plant in Africa as North America)
Relationships between consumers and food consumed
Relationships specifying food processing, preservation, distribution
Relationships describing how consumer and pathogen can interact eg transmission routes
Provides means for automation of routine processes, improved querying
Genomic Epidemiology Requires a Lot of Different Types of Contextual Data.
Conducted interviews to create user profiles (to identify user capabilities, expectations and requirements) and understand information flow
To define the different users' needs and requirements:
bioinformatics training and expertise
types of software they use
daily activities and duties
issues and concerns regarding current systems
requirements for a WGS platform
PH Users include:
BC PHMRL
Epidemiologists
Environmental Microbiologists
Medical Microbiologists
Bioinformaticians
“Person, place, time”
Exposure, food items, geographical information, symptoms, onset of symptoms
Created (manually in excel) on ad hoc basis per investigation
Need to be shared between stakeholders, but data governance is an issue
The particularity of IRIDA, in addition to being a unique collaboration between different types of collaborators, is to use standards throughout the platform.
Much easier and effective to prospectively collect metadata that retrospectively collect it from different lab notebooks, databases, health authorities (have to ask for permission)
Prompts user to input epidemiologically useful info at point of sample intake/prior to submission (benefitting NEXT user)
Facilitates use of common language that can be shared
Archiving, select cases as case definition changes
Create a smaller core (Lab, Epi exposure, and Food) ontology for line-list testing
Create a consortium for group to take on different domains of Genomic Epidemiology Application Ontology
Pursuing longer term funding for ontology