Controlled vocabularies and VIVO

Assistant Director of Research and Digital Services um Weill Cornell Medical College of Cornell University
24. May 2012

Más contenido relacionado

Presentaciones para ti(20)

Similar a Controlled vocabularies and VIVO(20)


Controlled vocabularies and VIVO

  1. Controlled vocabularies and VIVO Paul Albert Weill Cornell Medical College
  2. The problem We've seen 959 ways to refer to Proceedings of the National Academy of Sciences. Google Scholar Development Team
  3. The problem We've seen 959 ways to refer to Proceedings of the National Academy of Sciences. ¡Ay mi estómago! Team Google Scholar Development
  4. The main intent of the Semantic Web is to give machines much better access to information resources so they can be information intermediaries in support of humans. Michael Uschold
  5. Let’s Define Our Terms ive ps is t iat shi t L c hy ssoc ion ar l ici rar A lat ra mm E xp H ie R e G controlled vocabulary ✓ taxonomy ✓ ✓ thesaurus ✓ ✓ ✓ ontology ✓ ✓ ✓ ✓
  6. Warning Pursuit of controlled vocabulary tends to expose source systems for the quagmires they are.
  7. Which controlled vocabulary should I use?
  8. Selecting controlled vocabularies: when snobbery is a virtue
  9. “Desiderata” for Controlled Medical Vocabularies Methods of Information in Medicine © F. K. Schattauer Verlagsgesellschaft mbH (1998) f I , J. J. Cimino Desiderata for Controlled Medical Vocabularies in the Twenty-First Century Department of Medical Informatics, Abstract: Builders of medical informatics applications need controlled Columbia University, New York, USA medical vocabularies to support their applications and it is to their advan- tage to use available standards. In order to do so, however, these stand- ards need to address the requirements of their intended users. Overthe past decade, medical informatics researchers have begun to articulate some of these requirements. This paper brings together some of the common themes which have been described, including: vocabulary content, concept orientation, concept permanence, nonsemantic concept identifiers, poly- hierarchy, formal definitions, rejection of "not elsewhere classified" terms, multiple granularities, mUltiple consistent views, context representation, graceful evolution, and recognized redundancy. Standards developers are beginning to recognize and address these desiderata and adapt their offer- ings to meet them. Keywords: Controlled Medical Terminology, Vocabulary, Standards, Review 1. Introduction to achieve optimal integration of the ambivalence. A number of vocabularies two, transfer of patient information have been put forth as standards [3] but
  10. “Desiderata” for Controlled Medical Vocabularies 1. Content – formal editorial policy and methodology; provide breadth and depth; don’t just add terms 2. Concept orientation – exactly one meaning per concept and exactly one concept per meaning 3. Concept permanence – old concepts can't be deleted; names can be changed as long as meaning doesn't change
  11. “Desiderata” for Controlled Medical Vocabularies 4. Nonsemantic identifiers – use a meaningless integer 5. Polyhiearchy – employ multiple hierarchies to support need for tree walking and inferencing 6. Formal Definitions – structured descriptions that invoke relationships within the terminology
  12. “Desiderata” for Controlled Medical Vocabularies 7. Reject “not elsewhere classified” – terminology changes induce semantic drift 8. Graceful evolution – fix mistakes; account for changes in medical knowledge 9. Recognize redundancy – redundant expressions are inevitable, but redundant concepts are bad
  13. Is Roz Chast’s ice cream ontology desiderata compliant?
  14. Is Roz Chast’s ice cream ontology desiderata compliant? Compliant Reject "Not Elsewhere Classified" Recognize Redundancy Unclear or Non-Compliant Content Concept Permanence Graceful Evolution Concept Orientation Nonsemantic Concept Identifiers Polyhierarchy Formal Definitions
  15. What is the license of the controlled vocabulary? • Are the ontology codes copyrighted and can they be used in an open source application? • Need to account for the possibility that the data is reused for a commercial interest
  16. Externally maintained vocabularies are more sustainable Who will maintain and host the vocabulary?
  17. Controlled vocabularies used in VIVO
  18. The Ontology Team is considering serving vocabularies for select domains “The VIVO community might be able to build services to serve controlled vocabularies for organizations and journals.”
  19. Food and Agriculture Organization (FAO) geopolitical ontology • master reference for geopolitical information in multiple languages • provides relations among territories (land borders, group membership, etc) • tracks historical changes Ships with VIVO application
  20. Academic Degrees Ships with VIVO application
  21. As of version 1.4, VIVO allows users to lookup terms from UMLS and GEMET
  22. As of version 1.4, VIVO allows users to lookup terms from UMLS and GEMET
  23. As of version 1.4, VIVO allows users to lookup terms from UMLS and GEMET
  24. As of version 1.4, VIVO allows users to lookup terms from UMLS and GEMET
  25. GEMET: controlled vocabulary for environmental topics administration forestry radiations agriculture general research air geography resources animal husbandry human health social aspects, biology industry population building information soil chemistry legislation space climate materials tourism disasters, accidents, risk military aspects trade, services economics natural areas, landscape, transport energy ecosystems urban environment, natural dynamics urban stress environmental policy noise, vibrations waste fishery physics water food, drinking water pollution
  26. Vocabularies actively being considered for VIVO • colleges and universities • journals - open source status (VIVOONT-433) • languages (VIVOONT-250) - model write, speak, proficiency • others?
  27. – one promising option for organizations
  28. Modeling medical terms in VIVO
  29. Types of Specialty All Specialties Board-Certified Specialties Board-Certified Subspecialties
  30. Types of Medical Expertise Feigned Clinical Research < < GLG-20s Performed Board-certified Invented a masquerading 100+ ECGs in Cardiology better ECG as doctors for comic effect
  31. We use Intelligent Medical Objects (IMO)’s interface terminology • Maps medical expertise terms to SNOMED CT • Useful for returning relevant results to patients searching for a doctor • Enables the physician to enter more arcane areas of expertise (e.g., Asian American Community Health) • A commercial application
  32. Physician Admin View: Search for “chemotherapy” in IMO
  33. Physician Admin View: Search for “that” yields many terms not in SNOMED CT.
  34. Expertise exists in POPS. Board certification data exists in POPS, Intellicred.
  35. Export from Physicians Profile System contains specialty and expertise
  36. Board Certifications Problem #1: No indication of certifying board. At least 13 certifications including geriatric medicine, pain medicine, and urology are given by at least one ABMS board.
  37. Board Certifications Problem #2: Names of certifications are ambiguous. Colon and rectal surgery is listed in the following alternate ways: Surgery, Colon and Rectal Colon-Rectal Surgery Colorectal Surgery
  38. Board Certifications Problem #3: No given date of certification.
  39. Board Certifications Problem #4: Which source vocabulary? Prior to 1970 1970-1979 1970-1979
  40. The National Uniform Claim Committee (NUCC) maintains a list of health care provider taxonomy codes, but this list seems to be exclusively for non-MDs.
  41. Change in number of ABMS Subspecialties/Specialties 145 84 66 74 20 10 70 79 92 96 99 0 12 re- 19 70 -19 - 19 By 19 By 19 2 P 19 1980
  42. Cosmetic Dentistry Geriatric Psychotherapy Neuro Critical Care Cosmetic Dermatology Gynecologic Endocrinology Neuro Radiology Cosmetic Surgery Gynecologic Pathology Neuro-Ophthalmology Critical Care Neurology Gynecology Neuro-Pathology Dermatology, General Hand Surgery Nutrition Ear, Nose, and Throat, Heart Surgery Oral and Maxillofacial Pediatric Hematology/Oncology Pathology Echocardiography Hepatobiliary Surgery Oral and Maxillofacial Surgery Electrodiagnostic Medicine Hepatology Orthodontics Emergency Neurology High Risk Obstetrics Orthopedic Surgery Endocrinology Hospitalist Orthopedics ollowing 135 board Facial Plastic and Immunopathology Pain Medicine/Pain The f Reconstructive Surgery Facial Plastic Surgery Infant Psychiatry Intensive Care Management Pathology ns in our system Family Psychology Internal Medicine, General Pediatric Allergy and certificatio Fetal Cardiology Foot and Ankle Surgery International Medicine International Travel Medicine Immunology Pediatric Behavior and cognized by ABMS. Foot Surgery Interventional Neuroradiology Development are not re Gastroenterology Pathology Gastrointestinal Pathology Gastrointestinal Surgery Interventional Oncology Interventional Pain Management Pediatric Dentistry Pediatric Neurological Surgery Prior to 1970 1970-1979 1970-1979 Pediatric Neurology General Anesthesiology Interventional Radiology Pediatric Neurosurgery General Cardiology Invasive Cardiology Pediatric Orthopedic Surgery General Dentistry Laboratory Medicine Pediatric Orthopedics General Dermatology Laryngology Periodontics General Internal Medicine Liver Pathology Plastic and Reconstructive General Neurology Maternal-Fetal Medicine Surgery General Neurosurgery Medical Genetics Psychology General Obstetrics and Molecular Genetics Pulmonary Disease Medicine Gynecology Molecular Hematopathology Radiology General Ophthalmology Molecular Infectious Disease Radiology, Vascular/ General Pediatrics Molecular Pathology Interventional General Psychiatry Musculoskeletal Oncology Reproductive Endocrinology General Surgery Musculoskeletal Radiology Surgery, Critical Care General Urology Neonatal Neurology Surgery, Hand Genetics, Medical Neonatal Surgery Surgery, Oral and Maxillofacial Geriatric Cardiology Neonatal Thoracic Surgery Thoracic Surgery Geriatric Dermatology Neonatology Vascular and Interventional
  43. Weill Game Plan for Board Certifications • Explore ingest from Intellicred (fewer certifications, less variability, may include certifying agency?) • Explore external vocabularies • Failing that, create our own
  44. Medical Expertise and Non-Certified Specialties
  45. Expertise term from Weill Cornell Physician Profile 3% of terms from the source system System (n = 2578) lack or have an unclear equivalent in UMLS How does a term of local clinical expertise map to UMLS using Stony Brook's API? Weill → UMLS – In Vitro Fertilization Counseling → V Unclear Fertilization | Counseling – Adjustable Band → Band – Bowel-Sparing Strictureplasty → No Identical Subtype Compound term Equivalent preserving Union of two concepts original meaning 53% of terms from 3% of terms from 2% of terms from the source system correspond exactly to 5% of terms from the source system the source system some representation 34% of terms from the source system can be represented by the joining (not can only be represented as a in UMLS the source system have some equivalent in UMLS that is intersection) of two subtype of a can only be lexically different but concepts in UMLS concept in UMLS represented as a – Polycystic Ovary Syndrome semantically identical combination of terms – Anaphylaxis Weill → UMLS Weill → UMLS – Aortic Dissection from UMLS – Billing and Coding → Billing | – Bipolar 1 Disorder → Bipolar – Chemoembolization – Dental Implant Weill → UMLS Coding Disorder – Biopsy of Skin → Skin biopsy – Bone and Mineral Metabolism – FAA Medical Exam → Medica – Echocardiogram Weill → UMLS → Bone Metabolism | Mineral – Aneurysm of Popliteal Artery → Exam – Asian American Community Health Metabolim Aneurysm Popliteal → Asian American | Community – Bladder and Prostate Cancer – Charcot-Marie-Tooth Disease → Health → Bladder Cancer | Prostate Charcot-Marie-Tooth – Endoscopic Ultrasound of Cancer – Cirrhosis of Liver → Cirrhosis Esophagus → Endoscopic Ultrasound – Coarctation of the Aorta → | Esophagus Coarctation – Chronic Pelvic Pain In Female → Chronic Pelvic Pain | Female – Bronchoscopy With Biopsy →
  46. Pre-coordination Post-coordination Definition Terms combined by a developer to denote a Terms combined at the time of search and specific concept and its retrieval using Boolean attributes more or other operators. precisely. Benefits Users who are not Lazy or “busy” totally familiar with a developers controlled vocabulary and its structure. Examples avian hypersensitivity avian AND pneumonitis hypersensitivity AND pneumonitis carrier sense multiple access carrier sense AND multiple access
  47. How do we semantically model post-coordinated terms? 1. Do not mess with post-coordination. User adds term from lookup service. That's it. (Existing method.) 2. User adds term from lookup service. Machine makes basic inferences based on similarity. (Everything is "related term.") 3. User adds term from lookup service. Administrator models terms. 4. User adds term from lookup service. User interface enables and guides end user.
  48. Option #3: User adds term from lookup service. Administrator models terms. Can we build on others' work? • The International Health Terminology Standards Development Organization (IHTSDO) in Denmark is working to develop and promote SNOMED to support sharing of modelling. • IMO, our terminology service, may help model coordinated terms.
  49. Need for post-coordination is widespread For example, many global health terms require coordination.
  50. UMLS vs. SNOMED CT
  51. UMLS’s rapid growth is somewhat at odds with desiderata compliance 12000000 Strings 9000000 6000000 3000000 Concepts 0 99 000 001 002 003 004 005 006 007 008 009 0102011 19 2 2 2 2 2 2 2 2 2 2 2
  52. Cimino’s Critique of Terminologies Desiderata Adherence Cov Conc Perm ID Hier Def NEC Evol Redun ICD + - - - +/- - - - - CPT - + + + - - - + - DRG - + + + - - - + - NDC + + - - - - + - - RxNorm + + + + + + + + + LOINC + + + + +/- + + + + Nursing + + +/- + +/- - - +/- +/- SNOMED + + + + + +/- + + + MeSH + +/- + + +/- - - + - UMLS + + + + +/- - n/a + - Cov: Content coverage Conc: Concept oriented Perm: Concept permanence ID: meaningless identifiers Hier: Multiple hierarchy Def: Formal definitions NEC: Rejected “Not Elsewhere Classified” Evol: Graceful evolution Redun: Detect redundancy
  53. Why SNOMED CT may be better at representing medical terms compared to UMLS • No formal conceptual model (near-synonymy) • No hierarchy • Lots of redundancy • Lots of ambiguity
  54. UMLS is good for helping you find terms in a specific terminology because all many-to-one term-to-concept mappings expand the synonyms you can match against. I recommend you use UMLS to find terms from a very limited set of terminologies - maybe SNOMED plus LOINC plus RxNorm, for example. Jim Cimino
  55. Proposed Role of SKOS Classes skos:Concept     snomedct:Procedure     snomedct:Disorder     rxnorm:Drug     ... Properties skos:related     snomedct:equivalentTo     ... skos:broader skos:narrower
  56. Read More Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies Desiderata for Controlled Medical Vocabularies
  57. Practice Robot Courtesy with Local Extensions Use classes/properties that are subclasses/subproperties of existing classes/properties in VIVO’s core ontology.