SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Multimodal Information Extraction:
Disease, DateTime, and Location Retrieval
             Laboratory for Knowledge Discovery in Databases
            Department of Computing and Information Sciences
                          Kansas State University

    Dr. William H. Hsu, Associate Professor of Computing and Information Sciences
                   Svitlana O. Volkova, Graduate Research Assistant
                       Timothy E. Weninger, Research Associate
                         Jing Xia, Graduate Research Assistant
                  Surya Teja Kallumadi, Graduate Research Assistant
                   Wesam S. Elshamy, Graduate Research Assistant
AGENDA
 Overview
 Document Extraction
   Document Level Analysis: Entity Recognition Task
   Disease Extractor Module: Disease Recognition Task & Future Improvements
 Temporal Tagging
   Date/Time Extractor Module: Date Recognition Task
   Future Improvements for Date/Time Extractor Module
 Spatial Tagging
   Location Extractor Module: Location Recognition Task
   Future Improvements for Location Extractor Module
 Event Classification Task
   Events Representation by Date/Time: Timeline View
   Events Representation by Location: Map View
MAIN STEPS

          Assist the integrator (Elder
        Research, Inc.) in incorporating
           these into a single system
     Perform collection-level analysis and
  interactive visualization of timelines, maps
Extend the basic document-level IE, temporal
annotation, and spatial annotation components
with more state-of-the-field analytical functions
HOW CAN WE GET DATA?
              WWW               Information Retrieval (IR)
  EMAIL                          from Web by crawling news,
                                     blogs, reports, etc.




                      CRAWLER
                DB

                     QUERY          DOCUMENTS
LITERATURE                          COLLECTION
DOCUMENTS COLLECTION



DOMAIN SPECIFIC                                DOMAIN INDEPENDENT
  KNOWLEDGE                                       KNOWLEDGE

       medical ontology, containing      location hierarchy, containing
        names of diseases, viruses,        names of countries, states or
        animal species etc., organized     provinces, cities, etc;
        in a conceptual hierarchy.        canonical date and time
                                           representation.
A TWO-LEVEL ANALYTICAL FRAMEWORK IN
      THE DOMAIN OF EPIZOOTICS
Document Level Analysis                Collection Level Analysis
 Web document content                  Semi-supervised Document
  extraction:                            Clustering & Linking by Finding
   Named entity recognition             Similarities by Keywords
     (NER)                              Document Categorization as
   Co-reference & association           Topics Summarization Task
     resolution, relation extraction     (pLSA, LDA )


      Geotagging: location extraction, map view
      Temporal tagging: date/time extraction, timeline
       view
      Event Identification <…what, where, when, …>
HIGH LEVEL SYSTEM’S ARCHITECTURE
                                 Data Search
  User Access                    and Query
Control API (Java)

                              Temporal
                              Tagging:
                              TimeLine
    Access                    View
    Privilege
                              Spatial
                              Tagging:
                              Map View
                                                  Internet Browser (IE/Mozilla/…)


                              Event
                              Detection
                              Deduplication
    Data Store
     (MSSQL)

                               Web Server


 Data Storage                   IAAC Server


                   Researchers, public health
                professionals, and governmental
                  health agencies, other users
DOCUMENT LEVEL ANALYSIS
   Entity Recognition Task
EXTENSION OF ENTITIES FOR MULTIMODAL
   INFORMATION EXTRACTION SYSTEM
Stanford NER Entities              KDD Group’s NER Entities
 Person (e.g. “John Lenin”,        Animal diseases (e.g. “rift valley
  “William K. Smith”)                  fever”, “fmd”);
 Organization (e.g. “U.K.            Date and time (e.g. “May 24
  Department for Environment,          2001”, “last year”);
  Food and Rural Affairs”)            Location (e.g. “London, Great
                                       Britain”, “Manhattan, KS, USA”)
 Location (e.g. “Europe”,
                                      Animal Species (e.g. “cow”,
  “Canada”)
                                       “horse”, “mammals”)
 Miscellaneous (e.g. “African”,      Quantities (e.g. # of animals
  researcher etc.)                     died, amount of money spend, $)
INFORMATION EXTRACTION TASK
                     Goal: Extract structured information
                  with facts and entities related to events from
                  unstructured/semistructured sources.




                     Result: The US saw its latest FMD
                   outbreak in Montebello, California in 1929
                   where 3,600 animals were slaughtered.

DOCUMENTS                    Animal Disease Names
                             Locations
COLLECTION                   Dates/Times
                             Quantities
NAME ENTITIES REPRESENTATION
          FOR NER TASK
 Disease                                 Multi-Faceted Quantitative Summary
 Location                                Map View
 Date and time                           Timeline View




          Timeline View Example:
  http://press.jrc.it/NewsExplorer/time
       lineedition/en/timeline.html

                                                     Map View Example:
                                             http://www.healthmap.org/promed/en
DISEASE EXTRACTOR MODULE
                  INPUT AND OUTPUT
                                                 Output:
                                               Index of the first character

                          Disease              Index of the last character
                         Extractor             Length of the matched text
            Input: Text Module
               from file                       Matched Text
                                               Canonical disease name

Disease ExtractionTask
  The task of disease recognition can be considered as NER/information
    extraction (IE) task. The main purpose is to retrieve tokens that much at
    least one term from list of the disease names
DISEASE EXTRACTOR MODULE DEMO
    iiac.ksu.edu/DiseaseExtractor
RESULTS FOR DISEASE EXTRACTOR MODULE

       INPUT A                OUTPUT A
Foot and mouth disease is
one of the most contagious
diseases of cloven-hooved
mammals…

       INPUT B                OUTPUT B
Rift Valley Fever | CDC
Special Pathogens Branch
Mission Statement Disease …
VOCABULARY CONSTRUCTION FOR DISEASE
              EXTRACTOR
1. Disease names and fact sheets from Iowa State University Center for Food
   Security and Public Health (CFSPH):
     http://www.cfsph.iastate.edu/diseaseinfo/animaldiseaseindex.htm
2.Word Organization of Animal Health (OIE) Animal Disease Data:
     http://www.oie.int/eng/maladies/en_alpha.htm
3. Department for Environmental Food and Rural Affairs, UK (DEFRA):
     http://www.defra.gov.uk/animalh/diseases/vetsurveillance/az_index.htm
4. United States Department of Agriculture (USDA), Animal and Plant Health
   Inspection Service
     http://www.aphis.usda.gov/animal_health/animal_diseases/
5. MedlinePlus, Service of National Library of Medicine and National Institute of
   Health
     http://www.nlm.nih.gov/medlineplus/animaldiseasesandyourhealth.html
6.Wikipedia
     http://en.wikipedia.org/wiki/Animal_diseases
RESULTS FOR DISEASE EXTRACTOR MODULE




              ClearForest Gnosis Software: http://www.clearforest.com/
COMPARATIVE RESULTS FOR DISEASE EXTRACTORS:
                                                       KDD GROUP’S VS. GNOSIS
                                             Disease Extraction "FMD"                                                                                                                 Disease Extraction "RVF"
Quantities of Extracted Diseases




                                                                                                                                    Quantities of Extracted Diseases
                                   400                                                                                                                                 180
                                   350                                                                               Gnosis Soft.                                      160                                              Gnosis Soft.
                                   300                                                                                                                                 140
                                                                                                                     KDD Group's                                       120                                              KDD Group's
                                   250
                                                                                                                     Disease                                           100                                              Disease
                                   200                                                                               Extractor                                                                                          Extractor
                                                                                                                                                                        80
                                   150
                                                                                                                                                                        60
                                   100                                                                                                                                  40
                                   50                                                                                                                                   20
                                    0                                                                                                                                   0
                                         0        5                 10                                      15                                                               0                 5              10   15
                                                      Number of seed                                                                                                                            Number of seeds

                                                                                                      Non-unique Animal Disease Extraction
                                                                                       1200
                                                       Non-unique Extracted Diseases




                                                                                       1000
                                                                                                                                                                                            Gnosis Soft.
                                                                                       800

                                                                                       600

                                                                                       400                                                                                                  KDD Group's Disease
                                                                                                                                                                                            Extractor
                                                                                       200

                                                                                         0
                                                                                              0   2     4          6        8                        10                          12    14
                                                                                                                 Number of seeds
COMPARATIVE RESULTS FOR UNIQUE DISEASE
                                    EXTRACTORS: KDD GROUP’S VS. GNOSIS
                                              Unique Disease Extraction
                            160
                            140
Extracted Unique Diseases




                            120                                                                                               Gnosis Soft.
                            100
                             80
                             60                                                                                               KDD Group's Disease
                                                                                                                              Extractor
                             40
                             20
                             0
                                  1   2   3    4   5     6  7      8   9   10   11   12                               13
                                                       Number of seeds                                                     Random Permutation of Extracted Diseases
                                                                                                                      400
                                                                                     # of Extracted Animal Diseases



                                                                                                                      350                                                  Gnosis Soft.
                                                                                                                      300
                                                                                                                      250                                                  KDD Group's
                                                                                                                                                                           Disease Extractor
                                                                                                                      200
                                                                                                                      150
                                                                                                                      100
                                                                                                                       50
                                                                                                                       0
                                                                                                                              1        2       3         4     5   6   7
                                                                                                                                                    Run number
CUMULATIVE COMPARATIVE RESULTS FOR DISEASE
                                         EXTRACTORS: KDD GROUP’S VS. GNOSIS
                                                                                                 Cumulative Disease Extraction
                                                # of Extracted Animal Disease       800
                                                                                    700                      y = 2.7283x2 + 14.914x - 4.4336                                            Gnosis Soft.
                                                                                    600                               R² = 0.9762
                                                                                    500                                                                                                 KDD Group's Disease
                                                                                    400                                                                                                 Extractor

                                                                                    300                                                                                                 Poly. (Gnosis Soft.)
                                                                                    200
                                                                                                                              y = 4.1708x2 - 29.864x + 48.364                           Poly. (KDD Group's
                                                                                    100
                                                                                                                                       R² = 0.9831                                      Disease Extractor)
                                                                                      0
                                                                                    -100 0       2       4         6          8                                   10      12   14

                                                                                                                Number of seeds
                                      KDD Group's Extractor: Results                                                                                                    Gnosis Software: Extraction Results
                                160                                                                                                                                90
# of unique extracted disease




                                140                                                                                                 of unique extracted disease    80
                                120                                                                                                                                70
                                                                                                                                                                   60
                                100
                                                                                                                                                                   50
                                80
                                                                                                                                                                   40
                                60
                                                                                                                                                                   30
                                40                                                                                                                                 20
                                20                                                                                                                                 10
                                 0                                                                                                                                  0
                                       1    2                                   3            4       5         6          7                                                1        2           3              4   5     6   7

                                                              # of seeds' permutation                                                                                                         # of seeds' permutations
FUTURE IMPROVEMENTS FOR DISEASE
            EXTRACTOR MODULE

Intermediate Functionality
   to add functionality for species extraction and construct
    vocabulary;
   to enrich dictionary with animal disease by species:
      National Center of Infection Disease:
        http://www.cdc.gov/healthypets/browse_by_animal.htm
      United States Department of Agriculture (APHIS), Animal Health:
        http://www.aphis.usda.gov/animal_health/animal_dis_spec/
    to construct disease ontology with Protégé software.

Advanced Functionality
  to apply “seeds set expansion" approach for improvements of
   diseases extraction.
DOCUMENT LEVEL ANALYSIS
     Temporal Tagging
DATE/TIME EXTRACTOR AND EVENT TAGGER MODULE
                     INPUT AND OUTPUT


                                           Output:
                                              Disease Name
                            Date              Event Trigger
                            Time
           Input:Text     Extractor           Location
              from file                       Canonical date/time
Temporal Extraction and EventsTaggingTask
   The main purpose is extracting temporal quantities associated with
    events from text, identifying events and the semantic relatedness of
    events and summarizing them.
   Extraction of temporal events involves identifying dates and times and
    the entities associated with these events.
COMPONENTS OF DATE/TIME EXTRACTOR
    AND EVENT TAGGER MODULE
                                          Date/Time
                                           Extractor
   Pattern-Based Event                                                   Named Entity
        Extractor                                                       Recognition Tool
                                    It is based on quantities and
                                            units’ chunker
  It is built through analysis of   Standard Time data structure
                                                                      It extracts Named Entities:
 the reports of disease outbreak:
                                                                    Location, Person, Organization
     e.g.“a report has been                                                   and Disease
      confirmed that …”

  Goal: Extracting facts and entity relations associated with events.

  Disease outbreaks: disease, organisms, victim, symptoms, location,
  country, date, containment measures …
RESULTS FOR DATE/TIME EXTRACTOR AND
       EVENT TAGGER MODULE




                 iiac.ksu.edu/Event Extractor
EVENT REPRESENTATION BY DATE/TIME:
          TIMELINE VIEW
Advanced functionality of Date/Time Extractor Module includes resolving of timeline
 mapping of events. Representative example can be found on EMM News Explorer:
       http://press.jrc.it/NewsExplorer/timelineedition/en/timeline.html
FUTURE IMPROVEMENTS FOR DATE/TIME
          EXTRACTOR MODULE

Intermediate Functionality
   to implement event extraction as event tuple <what[Disease],
    where[Location], when[DateTime]> by individual entities that were
    obtained from Disease, Temporal and Spatial Extraction
    Modules in Basic Phase.

Advanced Functionality
  spatiotemporal clustering, extraction of qualitative and
   quantitative details about events from documents, and
   relationship extraction among events;
  to integrate information extraction and information
   visualization components.
DOCUMENT LEVEL ANALYSIS
      Spatial Tagging
LOCATION EXTRACTOR MODULE
                INPUT AND OUTPUT
                                             NGA GEOnet Names Server (GNS)
                                            http://earth-info.nga.mil/gns/html/

                                         Output:

                        Location           Matched text (location)
                        Extractor          Location’s latitude
           Input:Text Module
              from file                    Location’s longitude
                                           Location’s radius
Location ExtractionTask
  Goal is to extract and tag geographical location mentions in the given
    text as part of the multimodal event extraction application. Extracted
    locations from the given text is presented to the user with their
    geographical latitude and longitude coordinates.
RESULTS FOR LOCATION EXTRACTOR MODULE
    INPUT                        OUTPUT
A third case of Foot-and-
Mouth Disease in Kansas
was reported yesterday in a
small farm North East of
Topeka. Roger Pride, who
owns the farm where foot-
and-mouth was discovered,
said the financial hardship of
losing his cattle was not as
devastating as the impact on
his reputation. It is to be
noted that the previous two
cases were reported earlier
this month in Wichita and
Leavenworth.



                                  iiac.ksu.edu/LocationExtractor
FUTURE IMPROVEMENTS FOR LOCATION
          EXTRACTOR MODULE

Intermediate Functionality
   improves on the results obtained using the basic phase by
    filtering out outliers, deduplicating, and possibly clustering
    them.


Advanced Functionality
  by considering implicit spatial relationships and
   independent observations that would add richness to the
   data presented to the user and would help in detecting
   pattern among them.
EVENT REPRESENTATION BY LOCATION:
                     MAP VIEW
Advanced functionality of Location Extractor Module includes resolution of geotagging task that means
mapping events that were extracted from different resources. Representative example can be found on
                              http://www.healthmap.org/promed/en
DOCUMENT LEVEL ANALYSIS
Event Classification/Identification Task
ESSENTIAL TASKS FOR EVENT TRACKING
 Automatic population of large databases with factual information
  from many text sources
 Rapid semantic processing of large volumes of unstructured text
 Automatic merging of facts and entity relationships across sets of
  documents
 Innovative techniques for extracting, summarizing and tracking
  information about events and their progressions over time from
  unstructured text

 Identification of events and outbreaks includes constituent
  tasks of date, time, and quantity extraction and timeline
  visualization, while geospatial IE includes location (in latter
  stages) disambiguation and map view visualization.
EVENT FORMAL REPRESENTATION
 Event is an occurrence of disease within particular time and space range, so
  the single event attributes are: specific disease,date and time and location:



 Event examples with missing values:
ADDITIONAL ASPECTS OF EVENT/OUTBREAK
   Outbreak Status - confirmed
   Date of event’s report - 12.18.2007
   Reported source - www.dafra.gov/reports
   Suffered species - cattle
   Morbidity/Mortality - 155 infected/12 died
   Damage measure, $ - $155,000

                         Standard features for event identification:
                            <disease, location, date/time…>
                               + <…person, organization,…
                                + <…, length of sentence, quantities,

                                  temporal/spatial terms occurrences…>
OUTBREAK FORMAL REPRESENTATION
 Outbreak is a collection of events that are connected by some disease that
  happened within restricted space and time:




 For outbreak identification events should be similar in temporal features:
  time overlap and similar in spatial features: space overlap
DATA FLOW FOR
      EVENT
  IDENTIFICATION
     BASED ON
    SENTENCES
  CLASSIFICATION
OUTBREAK
Disease: foot-and-mouth disease
Species: hog
Location: Taiwan
DateTime: 06/09/2009
Status: N/A
NLP TASKS
                               Foot-and-mouth disease[DIS] killed 15 hog on
                               farm in Taiwan[LOC]

                               Foot-and-mouth disease [SUBJ] killed[VP] 15 hog
        Syntactic Analysis     on farm in Taiwan [PP]
                               Fact:      killed
                               Disease: foot-and-mouth disease
                               Location: Taiwan
                               Species: hog
           Extraction          Quantity: 15

                               Foot-and-mouth disease killed 15 hog on farm
     Co-reference Resolution   in Taiwan. Outbreak was reported on 9 June.

                               Event:                outbreak
                               Species:              15 hog
                               Disease:              foot-and-mouth disease
      Template Generation      Location:             Taiwan
39                             DateTime:             9 June
Demo:http://L2R.cs.uiuc.edu/~cogcomp




SEMANTIC ROLE LABELING TASK: EXAMPLE 1
Outbreak as event identification task can be considered as Semantic Role Labeling
        Task (SRL) - who did what to whom, when, where, why, …
http://l2r.cs.uiuc.edu/~cogcomp/srl-demo.php




    SEMANTIC ROLE LABELING TASK: EXAMPLE 2
 Ecuador[LOC] - The Ecuadorian government[ORG] on Tuesday[DT] confirmed 48[QT] cases
 of foot-and-mouth disease[DIS] in domestic animals, which prompted neighboring
 Colombia[LOC] and Peru[LOC] to take preventive measures on their meat imports
41
Thank you for attention!!!

Weitere ähnliche Inhalte

Andere mochten auch

Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITAnkit Sharma
 
A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalChen Xi
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2ndhit_alex
 
Information Retrieval and Extraction
Information Retrieval and ExtractionInformation Retrieval and Extraction
Information Retrieval and ExtractionChristopher Frenz
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionDeeksha thakur
 
ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...Jim Jenkins
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesTommaso Teofili
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsBenjamin Habegger
 
Enterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesEnterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesYunyao Li
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment AnalysisAyush Khandelwal
 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataGerard de Melo
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked DataIsabelle Augenstein
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsMatthew Lease
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the WebTommaso Teofili
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Using the Web of Data for Information Extraction
Using the Web of Data for Information ExtractionUsing the Web of Data for Information Extraction
Using the Web of Data for Information ExtractionBenjamin Adrian
 
INTRODUCTION INFORMATION RETRIEVAL EVALUVATION
 INTRODUCTION INFORMATION RETRIEVAL EVALUVATION INTRODUCTION INFORMATION RETRIEVAL EVALUVATION
INTRODUCTION INFORMATION RETRIEVAL EVALUVATIONPremsankar Chakkingal
 
seminar topic
seminar topicseminar topic
seminar topicdipple
 

Andere mochten auch (20)

Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIIT
 
A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrieval
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2nd
 
Information Retrieval and Extraction
Information Retrieval and ExtractionInformation Retrieval and Extraction
Information Retrieval and Extraction
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...
 
2 13
2 132 13
2 13
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and Tools
 
Enterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesEnterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challenges
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram Data
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the Web
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Using the Web of Data for Information Extraction
Using the Web of Data for Information ExtractionUsing the Web of Data for Information Extraction
Using the Web of Data for Information Extraction
 
INTRODUCTION INFORMATION RETRIEVAL EVALUVATION
 INTRODUCTION INFORMATION RETRIEVAL EVALUVATION INTRODUCTION INFORMATION RETRIEVAL EVALUVATION
INTRODUCTION INFORMATION RETRIEVAL EVALUVATION
 
seminar topic
seminar topicseminar topic
seminar topic
 

Ähnlich wie Multimodal Information Extraction: Disease, Date and Location Retrieval

Presentation from Code Camp 2017
Presentation from Code Camp 2017Presentation from Code Camp 2017
Presentation from Code Camp 2017Mitch Miller
 
InSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD
 
CONFidence 2014: Davi Ottenheimer Protecting big data at scale
CONFidence 2014: Davi Ottenheimer Protecting big data at scaleCONFidence 2014: Davi Ottenheimer Protecting big data at scale
CONFidence 2014: Davi Ottenheimer Protecting big data at scalePROIDEA
 
From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...Carlo Torniai
 
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di TadaBiosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di TadaTaha Kass-Hout, MD, MS
 
Using the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support EcoinformaticsUsing the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support Ecoinformaticsebiquity
 
Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsNigel Collier
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
Big data in the research life cycle: technologies, infrastructures, policies
Big data in the research life cycle: technologies, infrastructures, policiesBig data in the research life cycle: technologies, infrastructures, policies
Big data in the research life cycle: technologies, infrastructures, policiesBigData_Europe
 
Data mining and data linking
Data mining and data linkingData mining and data linking
Data mining and data linkingRoderic Page
 
Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationGigaScience, BGI Hong Kong
 
Science Seminar Series 4 Norman Johnson
Science Seminar Series 4 Norman JohnsonScience Seminar Series 4 Norman Johnson
Science Seminar Series 4 Norman JohnsonUniversity of Adelaide
 
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impedimentDonat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impedimentICZN
 

Ähnlich wie Multimodal Information Extraction: Disease, Date and Location Retrieval (20)

MS Thesis Short
MS Thesis ShortMS Thesis Short
MS Thesis Short
 
Master Thesis
Master ThesisMaster Thesis
Master Thesis
 
Presentation from Code Camp 2017
Presentation from Code Camp 2017Presentation from Code Camp 2017
Presentation from Code Camp 2017
 
InSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & Response
 
InSTEDD HISA Conference
InSTEDD HISA ConferenceInSTEDD HISA Conference
InSTEDD HISA Conference
 
CONFidence 2014: Davi Ottenheimer Protecting big data at scale
CONFidence 2014: Davi Ottenheimer Protecting big data at scaleCONFidence 2014: Davi Ottenheimer Protecting big data at scale
CONFidence 2014: Davi Ottenheimer Protecting big data at scale
 
From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...
 
WiML Poster
WiML PosterWiML Poster
WiML Poster
 
MedEx'10
MedEx'10MedEx'10
MedEx'10
 
Dr David Schindel and Mike Trizna - BOL Data Portal
Dr David Schindel and Mike Trizna - BOL Data PortalDr David Schindel and Mike Trizna - BOL Data Portal
Dr David Schindel and Mike Trizna - BOL Data Portal
 
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di TadaBiosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada
 
Using the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support EcoinformaticsUsing the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support Ecoinformatics
 
Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease Informatics
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Big data in the research life cycle: technologies, infrastructures, policies
Big data in the research life cycle: technologies, infrastructures, policiesBig data in the research life cycle: technologies, infrastructures, policies
Big data in the research life cycle: technologies, infrastructures, policies
 
Data mining and data linking
Data mining and data linkingData mining and data linking
Data mining and data linking
 
Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data Citation
 
Science Seminar Series 4 Norman Johnson
Science Seminar Series 4 Norman JohnsonScience Seminar Series 4 Norman Johnson
Science Seminar Series 4 Norman Johnson
 
Biosurveillance 2.0
Biosurveillance 2.0Biosurveillance 2.0
Biosurveillance 2.0
 
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impedimentDonat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
 

Mehr von Svitlana volkova

Mehr von Svitlana volkova (14)

EACL'12 Poster
EACL'12 PosterEACL'12 Poster
EACL'12 Poster
 
Grace Hopper Celebration 2010
Grace Hopper Celebration 2010Grace Hopper Celebration 2010
Grace Hopper Celebration 2010
 
Web Intelligence 2010
Web Intelligence 2010Web Intelligence 2010
Web Intelligence 2010
 
IEEE ISI'10
IEEE ISI'10IEEE ISI'10
IEEE ISI'10
 
Multilingual Ner Using Wiki
Multilingual Ner Using WikiMultilingual Ner Using Wiki
Multilingual Ner Using Wiki
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)
 
Social Networks
Social NetworksSocial Networks
Social Networks
 
Methods Of Reliability Analysis
Methods Of Reliability AnalysisMethods Of Reliability Analysis
Methods Of Reliability Analysis
 
Ohio Project
Ohio ProjectOhio Project
Ohio Project
 
Ukraine Presentation
Ukraine PresentationUkraine Presentation
Ukraine Presentation
 
Ukraine Presentation at Kansas State University
Ukraine Presentation at Kansas State UniversityUkraine Presentation at Kansas State University
Ukraine Presentation at Kansas State University
 
Communicatons Fulbright
Communicatons FulbrightCommunicatons Fulbright
Communicatons Fulbright
 
Communications Ternopil
Communications TernopilCommunications Ternopil
Communications Ternopil
 

Kürzlich hochgeladen

mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 

Kürzlich hochgeladen (20)

mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 

Multimodal Information Extraction: Disease, Date and Location Retrieval

  • 1. Multimodal Information Extraction: Disease, DateTime, and Location Retrieval Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University Dr. William H. Hsu, Associate Professor of Computing and Information Sciences Svitlana O. Volkova, Graduate Research Assistant Timothy E. Weninger, Research Associate Jing Xia, Graduate Research Assistant Surya Teja Kallumadi, Graduate Research Assistant Wesam S. Elshamy, Graduate Research Assistant
  • 2.
  • 3. AGENDA  Overview  Document Extraction  Document Level Analysis: Entity Recognition Task  Disease Extractor Module: Disease Recognition Task & Future Improvements  Temporal Tagging  Date/Time Extractor Module: Date Recognition Task  Future Improvements for Date/Time Extractor Module  Spatial Tagging  Location Extractor Module: Location Recognition Task  Future Improvements for Location Extractor Module  Event Classification Task  Events Representation by Date/Time: Timeline View  Events Representation by Location: Map View
  • 4. MAIN STEPS Assist the integrator (Elder Research, Inc.) in incorporating these into a single system Perform collection-level analysis and interactive visualization of timelines, maps Extend the basic document-level IE, temporal annotation, and spatial annotation components with more state-of-the-field analytical functions
  • 5. HOW CAN WE GET DATA? WWW Information Retrieval (IR) EMAIL from Web by crawling news, blogs, reports, etc. CRAWLER DB QUERY DOCUMENTS LITERATURE COLLECTION
  • 6. DOCUMENTS COLLECTION DOMAIN SPECIFIC DOMAIN INDEPENDENT KNOWLEDGE KNOWLEDGE  medical ontology, containing  location hierarchy, containing names of diseases, viruses, names of countries, states or animal species etc., organized provinces, cities, etc; in a conceptual hierarchy.  canonical date and time representation.
  • 7. A TWO-LEVEL ANALYTICAL FRAMEWORK IN THE DOMAIN OF EPIZOOTICS Document Level Analysis Collection Level Analysis  Web document content  Semi-supervised Document extraction: Clustering & Linking by Finding  Named entity recognition Similarities by Keywords (NER)  Document Categorization as  Co-reference & association Topics Summarization Task resolution, relation extraction (pLSA, LDA )  Geotagging: location extraction, map view  Temporal tagging: date/time extraction, timeline view  Event Identification <…what, where, when, …>
  • 8. HIGH LEVEL SYSTEM’S ARCHITECTURE Data Search User Access and Query Control API (Java) Temporal Tagging: TimeLine Access View Privilege Spatial Tagging: Map View Internet Browser (IE/Mozilla/…) Event Detection Deduplication Data Store (MSSQL) Web Server Data Storage IAAC Server Researchers, public health professionals, and governmental health agencies, other users
  • 9. DOCUMENT LEVEL ANALYSIS Entity Recognition Task
  • 10. EXTENSION OF ENTITIES FOR MULTIMODAL INFORMATION EXTRACTION SYSTEM Stanford NER Entities KDD Group’s NER Entities  Person (e.g. “John Lenin”,  Animal diseases (e.g. “rift valley “William K. Smith”) fever”, “fmd”);  Organization (e.g. “U.K.  Date and time (e.g. “May 24 Department for Environment, 2001”, “last year”); Food and Rural Affairs”)  Location (e.g. “London, Great Britain”, “Manhattan, KS, USA”)  Location (e.g. “Europe”,  Animal Species (e.g. “cow”, “Canada”) “horse”, “mammals”)  Miscellaneous (e.g. “African”,  Quantities (e.g. # of animals researcher etc.) died, amount of money spend, $)
  • 11. INFORMATION EXTRACTION TASK Goal: Extract structured information with facts and entities related to events from unstructured/semistructured sources. Result: The US saw its latest FMD outbreak in Montebello, California in 1929 where 3,600 animals were slaughtered. DOCUMENTS Animal Disease Names Locations COLLECTION Dates/Times Quantities
  • 12. NAME ENTITIES REPRESENTATION FOR NER TASK  Disease Multi-Faceted Quantitative Summary  Location Map View  Date and time Timeline View Timeline View Example: http://press.jrc.it/NewsExplorer/time lineedition/en/timeline.html Map View Example: http://www.healthmap.org/promed/en
  • 13. DISEASE EXTRACTOR MODULE INPUT AND OUTPUT Output: Index of the first character Disease Index of the last character Extractor Length of the matched text Input: Text Module from file Matched Text Canonical disease name Disease ExtractionTask  The task of disease recognition can be considered as NER/information extraction (IE) task. The main purpose is to retrieve tokens that much at least one term from list of the disease names
  • 14. DISEASE EXTRACTOR MODULE DEMO iiac.ksu.edu/DiseaseExtractor
  • 15. RESULTS FOR DISEASE EXTRACTOR MODULE INPUT A OUTPUT A Foot and mouth disease is one of the most contagious diseases of cloven-hooved mammals… INPUT B OUTPUT B Rift Valley Fever | CDC Special Pathogens Branch Mission Statement Disease …
  • 16. VOCABULARY CONSTRUCTION FOR DISEASE EXTRACTOR 1. Disease names and fact sheets from Iowa State University Center for Food Security and Public Health (CFSPH):  http://www.cfsph.iastate.edu/diseaseinfo/animaldiseaseindex.htm 2.Word Organization of Animal Health (OIE) Animal Disease Data:  http://www.oie.int/eng/maladies/en_alpha.htm 3. Department for Environmental Food and Rural Affairs, UK (DEFRA):  http://www.defra.gov.uk/animalh/diseases/vetsurveillance/az_index.htm 4. United States Department of Agriculture (USDA), Animal and Plant Health Inspection Service  http://www.aphis.usda.gov/animal_health/animal_diseases/ 5. MedlinePlus, Service of National Library of Medicine and National Institute of Health  http://www.nlm.nih.gov/medlineplus/animaldiseasesandyourhealth.html 6.Wikipedia  http://en.wikipedia.org/wiki/Animal_diseases
  • 17. RESULTS FOR DISEASE EXTRACTOR MODULE ClearForest Gnosis Software: http://www.clearforest.com/
  • 18. COMPARATIVE RESULTS FOR DISEASE EXTRACTORS: KDD GROUP’S VS. GNOSIS Disease Extraction "FMD" Disease Extraction "RVF" Quantities of Extracted Diseases Quantities of Extracted Diseases 400 180 350 Gnosis Soft. 160 Gnosis Soft. 300 140 KDD Group's 120 KDD Group's 250 Disease 100 Disease 200 Extractor Extractor 80 150 60 100 40 50 20 0 0 0 5 10 15 0 5 10 15 Number of seed Number of seeds Non-unique Animal Disease Extraction 1200 Non-unique Extracted Diseases 1000 Gnosis Soft. 800 600 400 KDD Group's Disease Extractor 200 0 0 2 4 6 8 10 12 14 Number of seeds
  • 19. COMPARATIVE RESULTS FOR UNIQUE DISEASE EXTRACTORS: KDD GROUP’S VS. GNOSIS Unique Disease Extraction 160 140 Extracted Unique Diseases 120 Gnosis Soft. 100 80 60 KDD Group's Disease Extractor 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Number of seeds Random Permutation of Extracted Diseases 400 # of Extracted Animal Diseases 350 Gnosis Soft. 300 250 KDD Group's Disease Extractor 200 150 100 50 0 1 2 3 4 5 6 7 Run number
  • 20. CUMULATIVE COMPARATIVE RESULTS FOR DISEASE EXTRACTORS: KDD GROUP’S VS. GNOSIS Cumulative Disease Extraction # of Extracted Animal Disease 800 700 y = 2.7283x2 + 14.914x - 4.4336 Gnosis Soft. 600 R² = 0.9762 500 KDD Group's Disease 400 Extractor 300 Poly. (Gnosis Soft.) 200 y = 4.1708x2 - 29.864x + 48.364 Poly. (KDD Group's 100 R² = 0.9831 Disease Extractor) 0 -100 0 2 4 6 8 10 12 14 Number of seeds KDD Group's Extractor: Results Gnosis Software: Extraction Results 160 90 # of unique extracted disease 140 of unique extracted disease 80 120 70 60 100 50 80 40 60 30 40 20 20 10 0 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 # of seeds' permutation # of seeds' permutations
  • 21. FUTURE IMPROVEMENTS FOR DISEASE EXTRACTOR MODULE Intermediate Functionality  to add functionality for species extraction and construct vocabulary;  to enrich dictionary with animal disease by species:  National Center of Infection Disease: http://www.cdc.gov/healthypets/browse_by_animal.htm  United States Department of Agriculture (APHIS), Animal Health:  http://www.aphis.usda.gov/animal_health/animal_dis_spec/  to construct disease ontology with Protégé software. Advanced Functionality  to apply “seeds set expansion" approach for improvements of diseases extraction.
  • 22. DOCUMENT LEVEL ANALYSIS Temporal Tagging
  • 23. DATE/TIME EXTRACTOR AND EVENT TAGGER MODULE INPUT AND OUTPUT Output: Disease Name Date Event Trigger Time Input:Text Extractor Location from file Canonical date/time Temporal Extraction and EventsTaggingTask  The main purpose is extracting temporal quantities associated with events from text, identifying events and the semantic relatedness of events and summarizing them.  Extraction of temporal events involves identifying dates and times and the entities associated with these events.
  • 24. COMPONENTS OF DATE/TIME EXTRACTOR AND EVENT TAGGER MODULE Date/Time Extractor Pattern-Based Event Named Entity Extractor Recognition Tool It is based on quantities and units’ chunker It is built through analysis of Standard Time data structure It extracts Named Entities: the reports of disease outbreak: Location, Person, Organization e.g.“a report has been and Disease confirmed that …” Goal: Extracting facts and entity relations associated with events. Disease outbreaks: disease, organisms, victim, symptoms, location, country, date, containment measures …
  • 25. RESULTS FOR DATE/TIME EXTRACTOR AND EVENT TAGGER MODULE iiac.ksu.edu/Event Extractor
  • 26. EVENT REPRESENTATION BY DATE/TIME: TIMELINE VIEW Advanced functionality of Date/Time Extractor Module includes resolving of timeline mapping of events. Representative example can be found on EMM News Explorer: http://press.jrc.it/NewsExplorer/timelineedition/en/timeline.html
  • 27. FUTURE IMPROVEMENTS FOR DATE/TIME EXTRACTOR MODULE Intermediate Functionality  to implement event extraction as event tuple <what[Disease], where[Location], when[DateTime]> by individual entities that were obtained from Disease, Temporal and Spatial Extraction Modules in Basic Phase. Advanced Functionality  spatiotemporal clustering, extraction of qualitative and quantitative details about events from documents, and relationship extraction among events;  to integrate information extraction and information visualization components.
  • 28. DOCUMENT LEVEL ANALYSIS Spatial Tagging
  • 29. LOCATION EXTRACTOR MODULE INPUT AND OUTPUT NGA GEOnet Names Server (GNS) http://earth-info.nga.mil/gns/html/ Output: Location Matched text (location) Extractor Location’s latitude Input:Text Module from file Location’s longitude Location’s radius Location ExtractionTask  Goal is to extract and tag geographical location mentions in the given text as part of the multimodal event extraction application. Extracted locations from the given text is presented to the user with their geographical latitude and longitude coordinates.
  • 30. RESULTS FOR LOCATION EXTRACTOR MODULE INPUT OUTPUT A third case of Foot-and- Mouth Disease in Kansas was reported yesterday in a small farm North East of Topeka. Roger Pride, who owns the farm where foot- and-mouth was discovered, said the financial hardship of losing his cattle was not as devastating as the impact on his reputation. It is to be noted that the previous two cases were reported earlier this month in Wichita and Leavenworth. iiac.ksu.edu/LocationExtractor
  • 31. FUTURE IMPROVEMENTS FOR LOCATION EXTRACTOR MODULE Intermediate Functionality  improves on the results obtained using the basic phase by filtering out outliers, deduplicating, and possibly clustering them. Advanced Functionality  by considering implicit spatial relationships and independent observations that would add richness to the data presented to the user and would help in detecting pattern among them.
  • 32. EVENT REPRESENTATION BY LOCATION: MAP VIEW Advanced functionality of Location Extractor Module includes resolution of geotagging task that means mapping events that were extracted from different resources. Representative example can be found on http://www.healthmap.org/promed/en
  • 33. DOCUMENT LEVEL ANALYSIS Event Classification/Identification Task
  • 34. ESSENTIAL TASKS FOR EVENT TRACKING  Automatic population of large databases with factual information from many text sources  Rapid semantic processing of large volumes of unstructured text  Automatic merging of facts and entity relationships across sets of documents  Innovative techniques for extracting, summarizing and tracking information about events and their progressions over time from unstructured text  Identification of events and outbreaks includes constituent tasks of date, time, and quantity extraction and timeline visualization, while geospatial IE includes location (in latter stages) disambiguation and map view visualization.
  • 35. EVENT FORMAL REPRESENTATION  Event is an occurrence of disease within particular time and space range, so the single event attributes are: specific disease,date and time and location:  Event examples with missing values:
  • 36. ADDITIONAL ASPECTS OF EVENT/OUTBREAK  Outbreak Status - confirmed  Date of event’s report - 12.18.2007  Reported source - www.dafra.gov/reports  Suffered species - cattle  Morbidity/Mortality - 155 infected/12 died  Damage measure, $ - $155,000  Standard features for event identification: <disease, location, date/time…> + <…person, organization,… + <…, length of sentence, quantities, temporal/spatial terms occurrences…>
  • 37. OUTBREAK FORMAL REPRESENTATION  Outbreak is a collection of events that are connected by some disease that happened within restricted space and time:  For outbreak identification events should be similar in temporal features: time overlap and similar in spatial features: space overlap
  • 38. DATA FLOW FOR EVENT IDENTIFICATION BASED ON SENTENCES CLASSIFICATION OUTBREAK Disease: foot-and-mouth disease Species: hog Location: Taiwan DateTime: 06/09/2009 Status: N/A
  • 39. NLP TASKS Foot-and-mouth disease[DIS] killed 15 hog on farm in Taiwan[LOC] Foot-and-mouth disease [SUBJ] killed[VP] 15 hog Syntactic Analysis on farm in Taiwan [PP] Fact: killed Disease: foot-and-mouth disease Location: Taiwan Species: hog Extraction Quantity: 15 Foot-and-mouth disease killed 15 hog on farm Co-reference Resolution in Taiwan. Outbreak was reported on 9 June. Event: outbreak Species: 15 hog Disease: foot-and-mouth disease Template Generation Location: Taiwan 39 DateTime: 9 June
  • 40. Demo:http://L2R.cs.uiuc.edu/~cogcomp SEMANTIC ROLE LABELING TASK: EXAMPLE 1 Outbreak as event identification task can be considered as Semantic Role Labeling Task (SRL) - who did what to whom, when, where, why, …
  • 41. http://l2r.cs.uiuc.edu/~cogcomp/srl-demo.php SEMANTIC ROLE LABELING TASK: EXAMPLE 2 Ecuador[LOC] - The Ecuadorian government[ORG] on Tuesday[DT] confirmed 48[QT] cases of foot-and-mouth disease[DIS] in domestic animals, which prompted neighboring Colombia[LOC] and Peru[LOC] to take preventive measures on their meat imports 41
  • 42. Thank you for attention!!!