Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

409 Aufrufe

Veröffentlicht am

http://2015.semantics.cc/carlo-trugenberger

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

  1. 1. InfoCodex Semantic Technologies Turning Information into Knowledge Scientific Discovery by Machine Intelligence: A New Avenue for Drug Research? Dr. Carlo A. Trugenberger Co-Founder and Chief Scientific Officer InfoCodex Semantic Technologies AG, CH-9470 Buchs September  2,  2015   1  www.InfoCodex.com   Semantics 2015
  2. 2. InfoCodex Semantic Technologies Turning Information into Knowledge Big changes in pharmaceutical research The end of the blockbuster era? Challenges Opportunities 02/09/15   www.InfoCodex.com   2   Ø  Genomics / Proteomics Ø  Big data / data mining ➪ structure-based design Ø  Drugs are “computed” rather than discovered Ø  Costs are exploding Ø  Regulatory pressure Ø  Personalized medicine Ø  Outsourcing of critical processes Critical for survival: Ø  Shorten time-to market Ø  Early recognition of dead ends Critical to beat competition: Ø  Data + data analysis power Ø  Machine intelligence
  3. 3. InfoCodex Semantic Technologies Turning Information into Knowledge The data deluge as an opportunity for eDiscovery Traditional bioinformatics: structured data New Idea: exploit unstructured data 02/09/15   www.InfoCodex.com   3   Experiment: Merck + Thomson Reuters + InfoCodex Is it possible to drive drug research by text mining large pools of biomedical documents? sequence alignment, gene finding, genome assembly, protein structure prediction, gene expression… PubMed: 22 million citations, growing at the rate of I.7 paper/ minute
  4. 4. InfoCodex Semantic Technologies Turning Information into Knowledge 02/09/15   www.InfoCodex.com   4   The Experiment of Merck & Co with InfoCodex The tasks: Ø  Discover novel biomarkers for diabetes and obesity (D&O) by analyzing 120’000 medical publications (PubMed +ClinicalTrials.org + internal) Ø  Blind experiment, no human feedback The aim: Ø  Test pure machine intelligence for “semantic drug research” Biomarker: $13.6 billion market in 2011, growing to $25 billion by 2016.
  5. 5. InfoCodex Semantic Technologies Turning Information into Knowledge Semantic technologies in the pharma industry Most existing projects use NLP to extract triples “entity 1-relation-entity 2” sentence by sentence ➪ help to curate ontologies / libraries However: this is not a discovery approach Relations found this way have been explicitly written by human authors and are thus known in one way or another Going beyond triples: analyze text collections globally to identify small, seemingly unrelated and unnoticed facts dispersed over isolated texts assembling the scattered pieces of a puzzle Critical: machine intelligence 02/09/15   www.InfoCodex.com   5  
  6. 6. InfoCodex Semantic Technologies Turning Information into Knowledge The Technology: eDiscovery by InfoCodex Linguistics + Information Theory + Self-Organization 02/09/15   www.InfoCodex.com   6   Ø  Completely automatic semantic analysis of content. Ø  Designed for uncovering unnoticed correlations amongst information distributed over documents groups and collections (contrary to NLP) Ø  “Assemble the pieces of a puzzle” Ø  Knowledge discovery as opposed to information extraction
  7. 7. InfoCodex Semantic Technologies Turning Information into Knowledge 02/09/15   www.InfoCodex.com   7  
  8. 8. InfoCodex Semantic Technologies Turning Information into Knowledge Step 1 : establish reference models for biomarkers / phenotypes Ø  Cluster documents describing known biomarkers (224 references found) Ø  Reference model for each cluster → meanings for “biomarkers diabetes” … Step 2: determine the meaning of unknown words by machine inference. Step 3: analyze documents and generate a list of potential D&O biomarkers/phenotypes by comparison with the reference models. Step4: establish confidence levels 02/09/15   www.InfoCodex.com   8   Encoded meanings
  9. 9. InfoCodex Semantic Technologies Turning Information into Knowledge Determination of the meaning of unknown words: machine inference Example: “Hctz” is a “diuretic drug” and is a synonym of “hydrochlorothiazide” Such relations established only on the basis of machine intelligence combined with internal knowledge base 02/09/15   www.InfoCodex.com   9   Co-occurrences with words in internal knowledge base → most probable hypernym → “is a” , “has to do”
  10. 10. InfoCodex Semantic Technologies Turning Information into Knowledge 02/09/15   www.InfoCodex.com   10   The output
  11. 11. InfoCodex Semantic Technologies Turning Information into Knowledge 02/09/15   www.InfoCodex.com   11   Many uninteresting candidates Too much noise (the problem has been identified and corrected) Lots of “needles in the haystack” Tens of extremely interesting and valuable candidates with very high potential The Results
  12. 12. InfoCodex Semantic Technologies Turning Information into Knowledge Conclusion ü  Approach has high potential for discovery ü  Approach has potential to impact pharma research q  Speed up time-to-market q  Early recognition of dead ends X  Improvements in the process are needed: problems have been identified and corrected. Ø  Most promising is a hybrid approach q  Human expertise in formulation of reference models q  Human curation of candidates prior to passing to the laboratory ü  Possibly inevitable development 02/09/15   www.InfoCodex.com   12  

×