SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
+                             Computer-              Query user-
                                aided                  defined
                            summarization             expansion




                             Post-retrieval          Extractive
                              clustering           Summarization




    Experiences on integrating explicit knowledge on
    information access tools in the medical domain

                                 Manuel de la Villa
                                 Department of Information Technologies
                                 University of Huelva
+                                                                                   2

    Index

      Brief     CV
           Why a research stay? In Wolverhampton?
           Teaching

     Integrating  explicit knowledge on information
       access tools
        Knowledge  sources (UMLS & Freebase)
        Automatic Text Summarization
        Information Retrieval




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   3

    Brief CV




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   6

    Teaching experience


     Software                Engineering
        Process and Methodologies, Metrics,
         Requirements analysis, Design, …
        Software Engineering Lab (UML, NetBeans,
         Subversion, Java, JUnit, Persistence…)

     Multimedia  applications development
        Adobe Director, Flash, Photoshop, Premiere
        Sony Sound Forge, Audacity



Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   7

    Knowledge integration




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+ Specific Domain Knowledge source. UMLS (I)                                                                8




                                                                       ICD-10
                                                                                            LOINC

                                                              SNOMED-CT                     UK-Clinical Terms
                   UMLS                                                              MeSH
                                                             DSM-IV
                                                                                             …
                                                       Gene Ontology                   RxNorm


An homogeneus group of terminologies                                 A saturation of different terminologies

 UMLS aims to overcome a significant barrier, the variety of
 ways the same concepts are expressed in different
 machine-readable sources.
 Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+ Specific Domain Knowledge source. UMLS (II)                                                  9




    Project NLM Unified Medical Language System (UMLS):

        Aim, to develop tools that help researchers in the knowledge
         representation, retrieval and integration of biomedical information.
           UMLS Knowledge Sources ‫‏‬

             Software tools
    Three main components:

    SPECIALIST Lexicon: Compilation of lexical elements (>200.000) with grammatical
    information and linguistic variants.

   “Anaesthetic”                                              “Anaesthetic”
  {base=anesthetic                                           {base=anesthetic
  spelling_variant=anaesthetic                               spelling_variant=anaesthetic
  entry=E0330018 cat=noun                                    entry=E0330019 cat=adj
  variants=reg variants=uncount }                            variants=inv position=attrib(3)
                                                             position=pred stative }
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+ Specific Domain Knowledge source. UMLS (III)                                      10




     Metathesaurus: very    large, multi-purpose, and multi-lingual
        vocabulary database (compiles more than 100 source
        vocabularios),      https://uts.nlm.nih.gov/metathesaurus.html
     every   term (>5M) associated with a concept (>1.5M), terms
        related (e.g., synonyms) (16M relations)

       each concept assigned to one or more semantic types of the 135
        existing
                   Different terms…



             for a same concept…



   Included in a semantic type
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+ Specific Domain Knowledge source. UMLS (IV)                                       11




                                     https://uts.nlm.nih.gov/semanticnetwork.html

    UMLS   Semantic Network: is an ontology with 135
       semantic types and to 54 types of relationships
       between types




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+ General Domain Knowledge Source: Freebase (I)


       Freebase is a large public database that collects three kinds of
       information:
        data;

        texts ; and
        media   , that references…
      …entities or topics (≈ 12 million). An entity is a unique single person,
       place, or thing.
          A single concept or real-world thing.
          A topic could also be called an entity, resource or element or thing, it is a
            fundamental unit in Freebase.
          /common/topic
          Each topic has a Guid or globally unique ID
             http://www.freebase.com/view/en/barack_obama
             http://www.freebase.com/guid/9202a8c04000641f800000000029c277
+ General Domain Knowledge Source: Freebase (II)
     Freebase connects entities together as a graph,
       defines  its data structure as a set of nodes and a set
        of links that establish relationships between the
        nodes.
     Most of our topics are associated with one or more types (such as
      people, places, books, films, etc) and may have additional
      properties like "date of birth" for a person or latitude and
      longitude for a location. These types and properties and related
      concepts are called Schema.
+ General Domain Knowledge Source: Freebase (III)
  The Schema
  Schema (the way Freebase's data is laid out) is expressed through
  Types and Properties. Types are grouped together in Domains.
+ General Domain Knowledge Source: Freebase (III)
  The Schema
  Schema (the way Freebase's data is laid out) is expressed through
  Types and Properties. Types are grouped together in Domains.
+ General Domain Knowledge Source: Freebase (III)
  The Schema
  Schema (the way Freebase's data is laid out) is expressed through
  Types and Properties. Types are grouped together in Domains.
+ General Domain Knowledge Source: Freebase (III)
  The Schema
  Schema (the way Freebase's data is laid out) is expressed through
  Types and Properties. Types are grouped together in Domains.
+ General Domain Knowledge Source: Freebase (IV)
  The Schema: Medicine
+ General Domain Knowledge Source: Freebase (V)
  How can we use it…


      As a reference or information source
       Create interesting Views and Visualizations and
       share them with others
      Embed Freebase data in your website

      Use our API or Acre, our hosted app development
       platform, to build apps that use Freebase data
      Download our Data dumps

   Use    Freebase's RDF for Semantic Web applications
+ General Domain Knowledge Source: Freebase (IV)
  The Freebase approach
+ MQL (Metaweb Query Language)
•  http://api.freebase.com/api/service/mqlread?query={"query":{"type":"/
   music/artist","name":"U2","album":[]}}
•  http://api.freebase.com/api/service/mqlread?query={"query":
   [{"type":"/medicine/disease", "name":null, "symptoms":
   {"name":"Nausea"}}]}
•  Query Editor
+                                                                                   22

    Knowledge integration




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   23

    Experiences in Automatic summarization (I)

+ We develop a proposal with this main
 characteristics:
             Sentences extraction
             Document representation as a graph
             Centered on biomedical concepts
             Using concept frequency to measure relevance


Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   24

    Experiences in Automatic summarization (II)

    + Phase I: Graph generation
       Sentences and UMLS concepts identification
    + Phase II: Similarity algorithm
       Concepts overlapping between sentences
       (edges) means “recommendation”
    + Phase III: Ranking algorithm
       Weight associated with each edge depends on
       similarity
    + Phase IV: Summary building
       Top ranked sentences are selected
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   25

    Experiences in Automatic summarization (II)

    + Phase I: Graph generation
       Sentences and UMLS concepts identification
    + Phase II: Similarity algorithm
       Concepts overlapping between sentences
       (edges) means “recommendation”
    + Phase III: Ranking algorithm
       Weight associated with each edge depends on
       similarity
    + Phase IV: Summary building
       Top ranked sentences are selected
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   26

    Experiences in Automatic summarization (II)

    + Phase I: Graph generation
       Sentences and UMLS concepts identification
    + Phase II: Similarity algorithm
       Concepts overlapping between sentences
       (edges) means “recommendation”
    + Phase III: Ranking algorithm
       Weight associated with each edge depends on
       similarity
    + Phase IV: Summary building
       Top ranked sentences are selected
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   27

    Experiences in Automatic summarization (II)

    + Phase I: Graph generation
       Sentences and UMLS concepts identification
    + Phase II: Similarity algorithm
       Concepts overlapping between sentences
       (edges) means “recommendation”
    + Phase III: Ranking algorithm
       Weight associated with each edge depends on
       similarity
    + Phase IV: Summary building
       Top ranked sentences are selected
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   28

    Experiences in Automatic summarization (II)

    + Phase I: Graph generation
       Sentences and UMLS concepts identification
    + Phase II: Similarity algorithm
       Concepts overlapping between sentences
       (edges) means “recommendation”
    + Phase III: Ranking algorithm
       Weight associated with each edge depends on
       similarity
    + Phase IV: Summary building
       Top ranked sentences are selected
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                                                    29

    Automatic Summarization. Evaluation




      Evaluation
                with ROUGE (based on n-grams) against generic
       summarizers
           Our method obtains good results, specially with small n-grams
                                                               de la Villa, M., Maña, M.
                                                               “Propuesta y evaluación de un método de generación de
                                                               resúmenes extractivo basado en conceptos en el ámbito
                                                               biomédico”. XXV edición del Congreso Anual de la Sociedad
                                                               Española para el Procesamiento del Lenguaje Natural 2009
                                                               (SEPLN´09) San Sebastián (Sept-2009).

Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   30

    Knowledge integration




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                    31


    Experiences in Computer-aided
    summarization(I)
      Computer-aided
                    summarization combines automatic
       and human summarization.
      The CAS system suggest an initial summary,
       selecting relevant sentences
      The human can change the sentences selection and
       edit manually the summary.
      Purpose: construction                      of a Gold-Standard building
       assistant.
      Novelty: Considering                       biomedical concepts distribution
       (Reeve et al., 2006)

Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   32


    Experiences in Computer-aided
    summarization(and II)
Experience in the design
  and construction of a
 Gold-Standard building
 assistant (or Computer-
  aided summarization)

Considering biomedical
 concepts distribution
  (Reeve et al., 2006)

    -Client-server app
 -Centralized repository
   -Supports PDF, XML



Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   33

    Knowledge integration




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   34

    Experiences in Information Retrieval
    and Post-retrieval clustering
     Experience in the design and
     construction of an information
         retrieval system with:
         •  ost-retrieval clustering,
          P
        •  rientation to biomedical
          o
                documents and
               •  obile devices
                m




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
Search	
  and	
  Informa.on	
  Retrieval	
  
                                                       Our	
  implementa.on	
               36


    Document sources: Biomed Central (web crawling in progress)
    Text Processing: lowercasing, stemming, stop-words ,…




                                        Lucene for indexing…


Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
Search	
  and	
  Informa.on	
  Retrieval	
  
                                                Our	
  implementa.on	
  (and	
  II)	
       37




                                      … and Lucene for searching
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
Clustering	
  
                                                    Our	
  implementa.on	
          38

    Weka for Clustering
          The post-processing clustering is to associate, according to their
          similarity, a set of documents retrieved from a query in different
          subsets




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
                                                                                         38	
  
Clustering	
  
                                            Why	
  Simple-­‐K-­‐Means?	
  



 Clustering algorithm:
  Simple-K-Means vs Expectation Maximization

                     Algorithms	
  	
  
                                          Simple-­‐K-­‐means	
               EM	
  
Querys	
  (Documents)	
  
     Ligaments	
  (10)	
                             1	
                      2	
  
    Cancer	
  Skin	
  (25)	
                         4	
                     12	
  
         Cancer	
  (46)	
                            5	
                     26	
  
      Disease	
  (62)	
                              8	
                     57	
  
                                          Time it takes to perform the grouping in seconds


    K? It depends on the number of documents retrieved.



                                                                                             39	
  
Visualiza.on	
  on	
  Mobile	
  Devices	
  
                   Our	
  interface	
  




Cancer skin




                                                       40	
  
+                                                                                   41

    Knowledge integration




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+
    Experiences in Information Retrieval
    and Query user-defined expansion (I)

      Userhave problems to define their information needs in a
      query string (Jansen, Spink y Koshman, 2007).
        Queries containe less than three terms (75,2%) and the majority of
        queries contained one (18,5%), two (32,2%)

      Methods  to improve (expand) query:
        Relevance feedback.
        Local analysis or global analysis.

        Natural   Language Processing Resources.

      Experiments   with users show the preferences of these to
      maintain control over how the query is reformulated (Belkin
      et al., 2001).
+                                                                                   43

    Experiences in Information Retrieval
    and Query user-defined expansion (II)

      Experience  on using Ontologies to assist the definition of the
       search string… previosly




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+
    Experiences in Information Retrieval
    and Query user-defined expansion (II)
    How does it works?
      Pre-retrieval      Construction   o f the Graph
+                                                                                   45

    Research: Information Retrieval
    (and III)
      …  or using Ontologies to build an enriched concept graph that
       assist the definition of the search string




  http://www.uhu.es/manuel.villa/viewmed/
  de la Villa, M., Garcia, S., Maña, M.
  “¿De verdad sabes lo que quieres buscar? Expansión guiada visualmente
  de la cadena de búsqueda usando ontologías y grafos de conceptos”.
  XXVII edición del Congreso Anual de la Sociedad Española para el
  Procesamiento del Lenguaje Natural 2011 (SEPLN´11) Huelva (Sept-2011).



Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                                       46

    Tools knowns.                                                   Expectations.

      UMLS:
           Metathesaurus, Semantic Network                               Ioffer my collaboration if
           Tools:                                                         you’re interested in using
              Metamap,                                                    any of these resources
              MMTx API,
                                                                          I’mopen to collaborate on
              Semrep
                                                                           whatever task you
              UTS Web Services, …
                                                                           consider related and…
      Freebase
                                                                          … to receive some
           MQL (Metaweb Query Language)                                   guidelines to improve
                                                                           summarization method
      Newbie        with UIMA & GATE
                                                                         Any questions?
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011

Weitere ähnliche Inhalte

Ähnlich wie Experiences on integrating explicit knowledge on information access tools in the medical domain

ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologieseswcsummerschool
 
20111022 ontologiescomeofageocas germanymcguinnessfinal
20111022 ontologiescomeofageocas germanymcguinnessfinal20111022 ontologiescomeofageocas germanymcguinnessfinal
20111022 ontologiescomeofageocas germanymcguinnessfinalDeborah McGuinness
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospectsGuus Schreiber
 
Eswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalEswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalElena Simperl
 
download
downloaddownload
downloadbutest
 
download
downloaddownload
downloadbutest
 
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Rinke Hoekstra
 
SMalL - Semantic Malware Log Based Reporter
SMalL  - Semantic Malware Log Based ReporterSMalL  - Semantic Malware Log Based Reporter
SMalL - Semantic Malware Log Based ReporterStefan Prutianu
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONIJwest
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION dannyijwest
 
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
NeISSProject
 
Multilingual Knowledge Organization Systems Management: Best Practices
Multilingual Knowledge Organization Systems Management: Best PracticesMultilingual Knowledge Organization Systems Management: Best Practices
Multilingual Knowledge Organization Systems Management: Best PracticesMauro Dragoni
 
Describe and Publish data sets on the web: vocabularies, catalogues, data por...
Describe and Publish data sets on the web: vocabularies, catalogues, data por...Describe and Publish data sets on the web: vocabularies, catalogues, data por...
Describe and Publish data sets on the web: vocabularies, catalogues, data por...Franck Michel
 
An Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAn Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAudrey Britton
 
Development of the database, the website and the online transcription platfor...
Development of the database, the website and the online transcription platfor...Development of the database, the website and the online transcription platfor...
Development of the database, the website and the online transcription platfor...Itinera Nova
 

Ähnlich wie Experiences on integrating explicit knowledge on information access tools in the medical domain (20)

Semantic annotation of biomedical data
Semantic annotation of biomedical dataSemantic annotation of biomedical data
Semantic annotation of biomedical data
 
Recommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenuRecommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenu
 
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
 
20111022 ontologiescomeofageocas germanymcguinnessfinal
20111022 ontologiescomeofageocas germanymcguinnessfinal20111022 ontologiescomeofageocas germanymcguinnessfinal
20111022 ontologiescomeofageocas germanymcguinnessfinal
 
Larflast
LarflastLarflast
Larflast
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospects
 
Eswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalEswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies final
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
download
downloaddownload
download
 
download
downloaddownload
download
 
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04
 
SMalL - Semantic Malware Log Based Reporter
SMalL  - Semantic Malware Log Based ReporterSMalL  - Semantic Malware Log Based Reporter
SMalL - Semantic Malware Log Based Reporter
 
The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

 
Multilingual Knowledge Organization Systems Management: Best Practices
Multilingual Knowledge Organization Systems Management: Best PracticesMultilingual Knowledge Organization Systems Management: Best Practices
Multilingual Knowledge Organization Systems Management: Best Practices
 
Describe and Publish data sets on the web: vocabularies, catalogues, data por...
Describe and Publish data sets on the web: vocabularies, catalogues, data por...Describe and Publish data sets on the web: vocabularies, catalogues, data por...
Describe and Publish data sets on the web: vocabularies, catalogues, data por...
 
An Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAn Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain Ontology
 
Development of the database, the website and the online transcription platfor...
Development of the database, the website and the online transcription platfor...Development of the database, the website and the online transcription platfor...
Development of the database, the website and the online transcription platfor...
 

Mehr von Manuel de la Villa

Presentación TFG Informes de Alta Automáticos
Presentación TFG Informes de Alta AutomáticosPresentación TFG Informes de Alta Automáticos
Presentación TFG Informes de Alta AutomáticosManuel de la Villa
 
Presentación programa Social Media UHU
Presentación programa Social Media UHUPresentación programa Social Media UHU
Presentación programa Social Media UHUManuel de la Villa
 
Marca personal para community managers
Marca personal para community managersMarca personal para community managers
Marca personal para community managersManuel de la Villa
 
Taller Facebook #SMUHU parte 2
Taller Facebook #SMUHU parte 2Taller Facebook #SMUHU parte 2
Taller Facebook #SMUHU parte 2Manuel de la Villa
 
Taller Facebook #SMUHU parte 1
Taller Facebook #SMUHU parte 1Taller Facebook #SMUHU parte 1
Taller Facebook #SMUHU parte 1Manuel de la Villa
 
Taller de Presentaciones efectivas
Taller de Presentaciones efectivasTaller de Presentaciones efectivas
Taller de Presentaciones efectivasManuel de la Villa
 
Presentacion Grado en Ingeniería Informática UHU
Presentacion Grado en Ingeniería Informática UHUPresentacion Grado en Ingeniería Informática UHU
Presentacion Grado en Ingeniería Informática UHUManuel de la Villa
 
Curso personal branding profesores
Curso personal branding profesoresCurso personal branding profesores
Curso personal branding profesoresManuel de la Villa
 
A Biomedical Information Retrieval System based on Clustering for Mobile Dev...
A Biomedical Information Retrieval System  based on Clustering for Mobile Dev...A Biomedical Information Retrieval System  based on Clustering for Mobile Dev...
A Biomedical Information Retrieval System based on Clustering for Mobile Dev...Manuel de la Villa
 
A critical and comparative study about ISO 9001, CMMI and ISO 15504
A critical and comparative study about  ISO 9001, CMMI and ISO 15504A critical and comparative study about  ISO 9001, CMMI and ISO 15504
A critical and comparative study about ISO 9001, CMMI and ISO 15504Manuel de la Villa
 

Mehr von Manuel de la Villa (17)

Mantenimiento del software
Mantenimiento del softwareMantenimiento del software
Mantenimiento del software
 
Presentación TFG Informes de Alta Automáticos
Presentación TFG Informes de Alta AutomáticosPresentación TFG Informes de Alta Automáticos
Presentación TFG Informes de Alta Automáticos
 
Presentación programa Social Media UHU
Presentación programa Social Media UHUPresentación programa Social Media UHU
Presentación programa Social Media UHU
 
Marca personal para community managers
Marca personal para community managersMarca personal para community managers
Marca personal para community managers
 
Taller Facebook #SMUHU parte 2
Taller Facebook #SMUHU parte 2Taller Facebook #SMUHU parte 2
Taller Facebook #SMUHU parte 2
 
Taller Facebook #SMUHU parte 1
Taller Facebook #SMUHU parte 1Taller Facebook #SMUHU parte 1
Taller Facebook #SMUHU parte 1
 
Personal branding
Personal brandingPersonal branding
Personal branding
 
Taller de Presentaciones efectivas
Taller de Presentaciones efectivasTaller de Presentaciones efectivas
Taller de Presentaciones efectivas
 
Presentacion Grado en Ingeniería Informática UHU
Presentacion Grado en Ingeniería Informática UHUPresentacion Grado en Ingeniería Informática UHU
Presentacion Grado en Ingeniería Informática UHU
 
Curso personal branding profesores
Curso personal branding profesoresCurso personal branding profesores
Curso personal branding profesores
 
Herramientas web 2.0 parte 2
Herramientas web 2.0 parte 2Herramientas web 2.0 parte 2
Herramientas web 2.0 parte 2
 
Herramientas web 2.0 Parte 1
Herramientas web 2.0 Parte 1Herramientas web 2.0 Parte 1
Herramientas web 2.0 Parte 1
 
MVilla IUI 2012 Lisbon
MVilla IUI 2012 LisbonMVilla IUI 2012 Lisbon
MVilla IUI 2012 Lisbon
 
A Biomedical Information Retrieval System based on Clustering for Mobile Dev...
A Biomedical Information Retrieval System  based on Clustering for Mobile Dev...A Biomedical Information Retrieval System  based on Clustering for Mobile Dev...
A Biomedical Information Retrieval System based on Clustering for Mobile Dev...
 
Deconstructing freebase
Deconstructing freebaseDeconstructing freebase
Deconstructing freebase
 
A critical and comparative study about ISO 9001, CMMI and ISO 15504
A critical and comparative study about  ISO 9001, CMMI and ISO 15504A critical and comparative study about  ISO 9001, CMMI and ISO 15504
A critical and comparative study about ISO 9001, CMMI and ISO 15504
 
Tesina08
Tesina08Tesina08
Tesina08
 

Experiences on integrating explicit knowledge on information access tools in the medical domain

  • 1. + Computer- Query user- aided defined summarization expansion Post-retrieval Extractive clustering Summarization Experiences on integrating explicit knowledge on information access tools in the medical domain Manuel de la Villa Department of Information Technologies University of Huelva
  • 2. + 2 Index   Brief CV   Why a research stay? In Wolverhampton?   Teaching  Integrating explicit knowledge on information access tools  Knowledge sources (UMLS & Freebase)  Automatic Text Summarization  Information Retrieval Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 3. + 3 Brief CV Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 4. + 6 Teaching experience  Software Engineering  Process and Methodologies, Metrics, Requirements analysis, Design, …  Software Engineering Lab (UML, NetBeans, Subversion, Java, JUnit, Persistence…)  Multimedia applications development  Adobe Director, Flash, Photoshop, Premiere  Sony Sound Forge, Audacity Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 5. + 7 Knowledge integration Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 6. + Specific Domain Knowledge source. UMLS (I) 8 ICD-10 LOINC SNOMED-CT UK-Clinical Terms UMLS MeSH DSM-IV … Gene Ontology RxNorm An homogeneus group of terminologies A saturation of different terminologies UMLS aims to overcome a significant barrier, the variety of ways the same concepts are expressed in different machine-readable sources. Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 7. + Specific Domain Knowledge source. UMLS (II) 9 Project NLM Unified Medical Language System (UMLS):   Aim, to develop tools that help researchers in the knowledge representation, retrieval and integration of biomedical information.   UMLS Knowledge Sources ‫‏‬   Software tools Three main components: SPECIALIST Lexicon: Compilation of lexical elements (>200.000) with grammatical information and linguistic variants. “Anaesthetic” “Anaesthetic” {base=anesthetic {base=anesthetic spelling_variant=anaesthetic spelling_variant=anaesthetic entry=E0330018 cat=noun entry=E0330019 cat=adj variants=reg variants=uncount } variants=inv position=attrib(3) position=pred stative } Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 8. + Specific Domain Knowledge source. UMLS (III) 10   Metathesaurus: very large, multi-purpose, and multi-lingual vocabulary database (compiles more than 100 source vocabularios), https://uts.nlm.nih.gov/metathesaurus.html   every term (>5M) associated with a concept (>1.5M), terms related (e.g., synonyms) (16M relations)   each concept assigned to one or more semantic types of the 135 existing Different terms… for a same concept… Included in a semantic type Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 9. + Specific Domain Knowledge source. UMLS (IV) 11 https://uts.nlm.nih.gov/semanticnetwork.html  UMLS Semantic Network: is an ontology with 135 semantic types and to 54 types of relationships between types Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 10. + General Domain Knowledge Source: Freebase (I)    Freebase is a large public database that collects three kinds of information:  data;  texts ; and  media , that references…   …entities or topics (≈ 12 million). An entity is a unique single person, place, or thing.  A single concept or real-world thing.  A topic could also be called an entity, resource or element or thing, it is a fundamental unit in Freebase.  /common/topic  Each topic has a Guid or globally unique ID  http://www.freebase.com/view/en/barack_obama  http://www.freebase.com/guid/9202a8c04000641f800000000029c277
  • 11. + General Domain Knowledge Source: Freebase (II)   Freebase connects entities together as a graph,  defines its data structure as a set of nodes and a set of links that establish relationships between the nodes.   Most of our topics are associated with one or more types (such as people, places, books, films, etc) and may have additional properties like "date of birth" for a person or latitude and longitude for a location. These types and properties and related concepts are called Schema.
  • 12. + General Domain Knowledge Source: Freebase (III) The Schema Schema (the way Freebase's data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
  • 13. + General Domain Knowledge Source: Freebase (III) The Schema Schema (the way Freebase's data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
  • 14. + General Domain Knowledge Source: Freebase (III) The Schema Schema (the way Freebase's data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
  • 15. + General Domain Knowledge Source: Freebase (III) The Schema Schema (the way Freebase's data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
  • 16. + General Domain Knowledge Source: Freebase (IV) The Schema: Medicine
  • 17. + General Domain Knowledge Source: Freebase (V) How can we use it…   As a reference or information source   Create interesting Views and Visualizations and share them with others   Embed Freebase data in your website   Use our API or Acre, our hosted app development platform, to build apps that use Freebase data   Download our Data dumps  Use Freebase's RDF for Semantic Web applications
  • 18. + General Domain Knowledge Source: Freebase (IV) The Freebase approach
  • 19. + MQL (Metaweb Query Language) •  http://api.freebase.com/api/service/mqlread?query={"query":{"type":"/ music/artist","name":"U2","album":[]}} •  http://api.freebase.com/api/service/mqlread?query={"query": [{"type":"/medicine/disease", "name":null, "symptoms": {"name":"Nausea"}}]} •  Query Editor
  • 20. + 22 Knowledge integration Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 21. + 23 Experiences in Automatic summarization (I) + We develop a proposal with this main characteristics:   Sentences extraction   Document representation as a graph   Centered on biomedical concepts   Using concept frequency to measure relevance Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 22. + 24 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selected Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 23. + 25 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selected Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 24. + 26 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selected Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 25. + 27 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selected Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 26. + 28 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selected Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 27. + 29 Automatic Summarization. Evaluation   Evaluation with ROUGE (based on n-grams) against generic summarizers   Our method obtains good results, specially with small n-grams de la Villa, M., Maña, M. “Propuesta y evaluación de un método de generación de resúmenes extractivo basado en conceptos en el ámbito biomédico”. XXV edición del Congreso Anual de la Sociedad Española para el Procesamiento del Lenguaje Natural 2009 (SEPLN´09) San Sebastián (Sept-2009). Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 28. + 30 Knowledge integration Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 29. + 31 Experiences in Computer-aided summarization(I)   Computer-aided summarization combines automatic and human summarization.   The CAS system suggest an initial summary, selecting relevant sentences   The human can change the sentences selection and edit manually the summary.   Purpose: construction of a Gold-Standard building assistant.   Novelty: Considering biomedical concepts distribution (Reeve et al., 2006) Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 30. + 32 Experiences in Computer-aided summarization(and II) Experience in the design and construction of a Gold-Standard building assistant (or Computer- aided summarization) Considering biomedical concepts distribution (Reeve et al., 2006) -Client-server app -Centralized repository -Supports PDF, XML Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 31. + 33 Knowledge integration Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 32. + 34 Experiences in Information Retrieval and Post-retrieval clustering Experience in the design and construction of an information retrieval system with: •  ost-retrieval clustering, P •  rientation to biomedical o documents and •  obile devices m Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 33. Search  and  Informa.on  Retrieval   Our  implementa.on   36 Document sources: Biomed Central (web crawling in progress) Text Processing: lowercasing, stemming, stop-words ,… Lucene for indexing… Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 34. Search  and  Informa.on  Retrieval   Our  implementa.on  (and  II)   37 … and Lucene for searching Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 35. Clustering   Our  implementa.on   38 Weka for Clustering The post-processing clustering is to associate, according to their similarity, a set of documents retrieved from a query in different subsets Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011 38  
  • 36. Clustering   Why  Simple-­‐K-­‐Means?   Clustering algorithm: Simple-K-Means vs Expectation Maximization Algorithms     Simple-­‐K-­‐means   EM   Querys  (Documents)   Ligaments  (10)   1   2   Cancer  Skin  (25)   4   12   Cancer  (46)   5   26   Disease  (62)   8   57   Time it takes to perform the grouping in seconds K? It depends on the number of documents retrieved. 39  
  • 37. Visualiza.on  on  Mobile  Devices   Our  interface   Cancer skin 40  
  • 38. + 41 Knowledge integration Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 39. + Experiences in Information Retrieval and Query user-defined expansion (I)   Userhave problems to define their information needs in a query string (Jansen, Spink y Koshman, 2007).   Queries containe less than three terms (75,2%) and the majority of queries contained one (18,5%), two (32,2%)   Methods to improve (expand) query:   Relevance feedback.   Local analysis or global analysis.   Natural Language Processing Resources.   Experiments with users show the preferences of these to maintain control over how the query is reformulated (Belkin et al., 2001).
  • 40. + 43 Experiences in Information Retrieval and Query user-defined expansion (II)   Experience on using Ontologies to assist the definition of the search string… previosly Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 41. + Experiences in Information Retrieval and Query user-defined expansion (II) How does it works?   Pre-retrieval   Construction o f the Graph
  • 42. + 45 Research: Information Retrieval (and III)   … or using Ontologies to build an enriched concept graph that assist the definition of the search string http://www.uhu.es/manuel.villa/viewmed/ de la Villa, M., Garcia, S., Maña, M. “¿De verdad sabes lo que quieres buscar? Expansión guiada visualmente de la cadena de búsqueda usando ontologías y grafos de conceptos”. XXVII edición del Congreso Anual de la Sociedad Española para el Procesamiento del Lenguaje Natural 2011 (SEPLN´11) Huelva (Sept-2011). Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 43. + 46 Tools knowns. Expectations.   UMLS:   Metathesaurus, Semantic Network   Ioffer my collaboration if   Tools: you’re interested in using   Metamap, any of these resources   MMTx API,   I’mopen to collaborate on   Semrep whatever task you   UTS Web Services, … consider related and…   Freebase   … to receive some   MQL (Metaweb Query Language) guidelines to improve summarization method   Newbie with UIMA & GATE Any questions? Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011