Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Ontologies for big data

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 22 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Ontologies for big data (20)

Anzeige

Aktuellste (20)

Ontologies for big data

  1. 1. ONTOLOGIES FOR BIG DATA Asiyah Yu Lin, M.D., M.S,. Ph.D.
  2. 2. My profile <location>USA</locati on> <work>postdoctoral training</work> <work:company_type >University</work:has_ degree> <work:has_title>resear ch fellow</work:has_title> <bioinformatics>ontolo gy development</bioinfo rmatics> <bioinformatics>social network analysis</bioinformatic s> <bioinformatics>ontolo gy applying data analysis</bioinformatic s> Postdoc training: Ontologies <location>Japan</l ocation> <work>institution</w ork> <work:company_typ e>non profit organization</work: has_degree> <work:has_title>bioin formatician</work:h as_title> <bioinformatics>454s equence assembly</bioinform atics> <bioinformatics>non model organism sequence analysis</bioinforma ticsl> Bioinformatician: NGS <location>Japan</l ocation> <education:has_de gree>Ph.D</educat ion:has_degree> <education:has_ma jor>medical informatics</educa tion:has_major> <bioinformatics>ont ology</bioinformati cs> <bioinformatics>dat a integration</bioinfo rmatics> <bioinformatics>biol ogical pathway analysis</bioinform atics> Ph.D. in Medical Informatics <location>China</l ocation> <work>industry</wo rk> <work:company_ty pe>start_up IT </work:has_degree > <work:has_title>con tent manager</work:ha s_title> <work:has_title>proj ect manager</work:ha s_title> <IT_skill>web site building</IT_skill> <IT_skill>relational database </IT_skill> Content Manager & Project Manager <location>China</l ocation> <education>Medic al School </education> <education:has_de gree>master</edu cation:has_degree > <education:has_ma jor>molecular immunology</educ ation:has_major> <bioinformatics>seq uencing</bioinform atics> <bioinformatics>pro tein 3D simulation</bioinfor matics> Master in Molecular Immunology <location>China</l ocation> <education>Medic al School </education> <education:has_de gree>bachelor</ed ucation:has_degre e> <education:has_ma jor>Pediatrics</edu cation:has_major> M.D. in Pediatrics
  3. 3. Agenda  Introduction : ontologies, semantic web and big data  Selected projects:  1. Informed Consent Ontology (ICO)  2. miRNA and Aging Ontology (MIAGO)  3. Ontology of Drug Neuropathy Adverse Event (ODNAE)  4. LINCS-BD2K  5. mebdo (Medicare and Census big data project)  SOCR Data Dashboard  Conclusion
  4. 4. Ontologies, form of knowledge representation, the structural frameworks for organizing terms hierarchically and defining relations between terms within a domain 1. A hierarchical vocabulary, class-subclass-instance 2. Defined relations between terms to interlink the whole system 3. Constrains and logical definitions 4. Explicit specification of a conceptualization (Gruber,1993) What is ontology?
  5. 5. Why ontology ? Knowledge management •RDF, RDFS, OWL Natural language processing •Linguistic ontology: WordNet E-commerce Intelligent information integration Knowledge acquisition and discovery Database design and integration Medical decision making agent Linked Open Data, Semantic Web
  6. 6. Semantic Web Layer Cake RDF: simple triples, graph-based queries, supports very large amount of data Bill –has_address- Location A OWL: significantly more expressive language, strong axioms, inference capabilities, consistency verification, but can be rather slow Bill –has_address- Location A  Location A –is_address_of- Bill Inverse relation
  7. 7. SELECTED PROJECTS 1. Informed Consent Ontology (ICO) 2. miRNA and Aging Ontology (MIAGO) 3. Adverse event analysis Ontology of Drug Neuropathy Adverse Event 4. LINCS-BD2K 5. mebdo (Medicare and Census big data project) SOCR Data Dashboard
  8. 8. Informed Consent Ontology (ICO) ICBO 2014 poster
  9. 9. SELECTED PROJECTS 1. Informed Consent Ontology (ICO) 2. miRNA and Aging Ontology (MIAGO) 3. Adverse event analysis Ontology of Drug Neuropathy Adverse Event (ODNAE) 4. LINCS-BD2K 5. mebdo (Medicare and Census big data project) SOCR Data Dashboard
  10. 10. The power of reasoning miRNA and Aging Ontology (MIAGO) Database (in revision)
  11. 11. SELECTED PROJECTS 1. Informed Consent Ontology (ICO) 2. miRNA and Aging Ontology (MIAGO) 3. Adverse event analysis Ontology of Drug Neuropathy Adverse Event (ODNAE) 4. LINCS-BD2K 5. mebdo (Medicare and Census big data project) SOCR Data Dashboard
  12. 12. drug-associated neuropathy AE (ODNAE) drug administration (OAE_0000011) a drug (DrON, linked to RxNORM, NDFRT) preceded_by chemical element (ChEBI) has_proper_part biological process (GO) drug role in mechanism of action (NDFRT) has_role is_ realized _in human (NCBITaxon_9606) occurs in has participant a quality (e.g., age) (PATO) has_quality has participant neuropathy AE (OAE_0000418) is_a bupropion (Aplezin, Wellbutrin, Zyban, Budeprion SR, Buproban, Forfivo XL)-associated neuropathy AE (ODNAE_0000043) drug administration (OAE_0000011) Bupropion Oral Tablet (DRON_00026665) preceded_by bupropion (CHEBI_3219) has_proper_part negative regulation of dopamine uptake (GO_0051585) has_specified_input Dopamine Uptake Inhibitors [MoA] (N0000000114) has_role is_ realized _in human (NCBITaxon_9606) occurs in has participant age (PATO_0000011) has_quality has participant neuropathy AE (OAE_0000418) is_a (A) (B) drug product (DrON_00000005) is_a has_specified_input drug product (DrON_00000005) is_a negative regulation of neurotransmitter uptake (GO_0051581) is_a ODNAE: Linking knowledge together
  13. 13. ODNAE results: 215 neuropathy AE drugs knowledge base related AEs and 20 AE types (A) (B) 127 127 1 18 8 7 116 1 13 20 96 15 39 1 1 21 7 14 1 related chemical compounds 139 Mode of Action ICBO 2015 VDOS workshop
  14. 14. What’s missing in ODNAE  Only 13 GO biological processes were mapped to some MoA.  Holistic analytic methods are needed to understand the mechanism. We need more…
  15. 15. 1. LINCS-BD2K 2. SCOR DASHBOARD
  16. 16.  University of Miami Computational LINCS Center  LINCS Data Coordinating Center  http://lifeKB.org  BD2K LINCS Data Coordination and Integration Center  http://lincs-dcic.org/ NIH LINCS Program 16 Library of Integrated Network-based Cellular Signatures
  17. 17. Drug and Gene Knockdown Followed by Genome-Wide Expression KO and Mutant Genes and their Disease Phenotypes Drug and Knockdown Effects on Cell Viability Transcription Factors and Histone Modifications Profiled by ChIP-Seq Protein-Protein Interactions and Cell- or Metabolic-Pathways Gene Expression from Patient Cohorts with Genomics and Clinical Outcome Data Drugs and Toxic Chemicals that Cause Adverse Events Networks Bi-partite Graphs Gene-Set Libraries Hierarchical Trees
  18. 18. Drugs Side Effects Genes Diseases Proteins Signatures Patient Tumors Cancer Cell Lines Tissues Mutations Mouse Phenotypes Bi-Partite Relationships Between Data Types
  19. 19. Data Sources Metadata Semantic model / ontology Sets, graphs, trees, networks bit set libraries bipartite graphs networkshierarchy tree protein gene cell assaydiseasedrug application ontlogy LIFE knowledge model Data integration and systems modeling 19
  20. 20. SOCR Analytics Dashboard Statistics Online Computational Resource  Provide graphical querying, navigating and exploring the multivariate associations in complex heterogeneous datasets.  Integrate dispersed multi-source data and service the mashed information via human and machine interactions in a secure, scalable manner. http://socr.umich.edu/HTML5/Dashboard/
  21. 21. 1. Ontologies are important components for Big Data integration and manipulation. 2. Reuse ontologies will enable seamless integration with other resources. 3. However, ontologies can not solve all the problems in biomedical world; they are tools to support science. 4. Formalized ontologies can be used by humans and automated systems as a basis for communication and data exchange (such as RDF data) 5. Ontologies based application may go beyond reasoning alone and use statistical analyses (enrichment), semantic similarity, network analysis, graph algorithms, clustering, etc. 6. Many more to explore in the big-data era. Conclusion:

Hinweis der Redaktion

  •  The Library of Integrative Network-based Cellular Signatures (LINCS) is an NIH Common Fund project that was recently expanded to its second phase. The idea is to perturb different types of human cells with many different types of perturbations such as: drugs and other small molecules; genetic manipulations such as knockdown or overexpression of genes, manipulation of the extracellular microenvironment conditions, i.e., growing cells on different surfaces, and more. These perturbations are applied to various types of human cells including induced pluripotent stem cells from patients, differentiated into various lineages such as neuron or cardiomyocytes. Then, to better understand the molecular networks that are affected by these perturbations, changes in levels of many different variables are measured including: mRNA, protein, and metabolites, as well as cellular phenotypic changes such as changes in cell morphology. In most cases, the data that is collected is genome-wide and from across different regulatory layers. 
  • Seven data types that can be converted into single entity networks, gene-set libraries, hierarchical trees and bi-partite graphs.
  • LINCS is an important glue that connects various entities.

×