SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
SemEval 2012 task 6
        A pilot on
Semantic Textual Similarity
   http://www.cs.york.ac.uk/semeval-2012/task6/


     Eneko Agirre (University of the Basque Country)
            Daniel Cer (Stanford University)
           Mona Diab (Columbia University)
                  Bill Dolan (Microsoft)
Aitor Gonzalez-Agirre (University of the Basque Country)
Outline

    Motivation

    Description of the task

    Source Datasets

    Definition of similarity and annotation

    Results

    Conclusions, open issues




                      STS task - SemEval 2012   2
Motivation
●   Word similarity and relatedness highly correlated
    with humans
      Move to longer text fragments (STS)
       –   Li et al. (2006) 65 pairs of glosses
       –   Lee et al. (2005) 50 documents on news
●   Paraphrase datasets judge semantic equivalence
    between text fragments
●   Textual entailment (TE) judges whether one
    fragment entails another
      Move to graded notion of semantic equivalence (STS)

                            STS task - SemEval 2012         3
Motivation
●   STS has been part of the core implementation
    of TE and paraphrase systems
●   Algorithms for STS have been extensively
    applied
    ●   MT, MT evaluation, Summarization, Generation,
        Distillation, Machine Reading, Textual Inference,
        Deep QA
●   Interest from application side confirmed in a
    recent STS workshop:
    ●   http://www.cs.columbia.edu/~weiwei/workshop/

                          STS task - SemEval 2012           4
Motivation
●   STS as a unified framework to combine and evaluate
    semantic (and pragmatic components)
      word sense disambiguation and induction
      lexical substitution
      semantic role labeling
      multiword expression detection and handling
      anaphora and coreference resolution
      time and date resolution
      named-entity handling
      underspecification
      hedging
      semantic scoping
      discourse analysis

                         STS task - SemEval 2012         5
Motivation
●   Start with a pilot task, with the following goals
    1.To set a definition of STS as a graded notion which
      can be easily communicated to non-expert
      annotators beyond the likert-scale
    2.To gather a substantial amount of sentence pairs
      from diverse datasets, and to annotate them with
      high quality
    3.To explore evaluation measures for STS
    4.To explore the relation of STS to paraphrase and
      Machine Translation Evaluation exercises

                       STS task - SemEval 2012              6
Description of the task
●   Given two sentences, s1 and s2
    ●   Return a similarity score
        and an optional confidence score
●   Evaluation
    ●   Correlation (Pearson)
        with average of human scores




                          STS task - SemEval 2012   7
Data sources
●   MSR paraphrase: train (750), test (750)
●   MSR video: train (750), test (750)
●   WMT 07–08 (EuroParl): train (734), test (499)
●   Surprise datasets
    ●   WMT 2007 news: test (399)
    ●   Ontonotes – WordNet glosses: test (750)




                        STS task - SemEval 2012     8
Definition of similarity
Likert scale with definitions




                 STS task - SemEval 2012   9
Annotation
●   Pilot with 200 pairs annotated by three authors
    ●   Pairwise (0.84r to 0.87r), with average (0.87r to 0.89r)
●   Amazon Mechanical Turk
    ●   5 annotations per pair, averaged
    ●   Remove turkers with very low correlations with pilot
    ●   Correlation with us 0.90r to 0.94r
    ●   MSR: 2.76 mean, 0.66 sdv.
    ●   SMT: 4.05 mean, 0.66 sdv.



                           STS task - SemEval 2012                 10
Results
●   Baselines: random, cosine of tokens
●   Participation: 120 hours to submit three runs.
    ●   35 teams, 88 runs
●   Evaluation
    ●   Pearson for each dataset
    ●   Concatenate all 5 datasets: ALL
        –   Some systems doing well in each dataset, low results
    ●   Weighted mean over 5 datasets (micro-average): MEAN
        –   Statistical significance
    ●   Normalize each dataset and concatenate (least square): ALLnorm
        –   Corrects errors (random would get 0.59r)

                                       STS task - SemEval 2012           11
Results
●   Large majority better than both baselines
●   Best three runs
    ●   ALL: 0.82r UKP, TAKELAB, TAKELAB
    ●   Mean: 0.67r TAKELAB, UKP, TAKELAB
    ●   ALLnrm: 0.86r UKP, TAKELAB, SOFT-CARDINALITY
●   Statistical significance (ALL 95% confidence interval)
    ●   1st 0.824r [0.812,0.835]
    ●   2nd 0.814r [0.802,0.825]



                           STS task - SemEval 2012           12
Results
●   Datasets (ALL)
    ●   MSRpar 0.73r TAKELAB
    ●   MSRvid 0.88r TAKELAB
    ●   SMT-eur 0.57r SRANJANS
    ●   SMT-news 0.61r FBK
    ●   On-WN 0.73r WEIWEI




                      STS task - SemEval 2012   13
Results
●   Evaluation using confidence scores
    ●   Weighted Pearson correlation
    ●   Some systems improve results (IRIT, TIANTIANZHU7)
        –   IRIT: 0.48r => 0.55r
    ●   Others did not (UNED)
●   Unfortunately only a few teams sent out
    confidence scores
●   Promising direction, potentially useful in
    applications (Watson)

                             STS task - SemEval 2012        14
Tools used
●   WordNet, corpora and Wikipedia most used
●   Knowledge-based and distributional equally
●   Machine learning widely used for combination
    and tuning
●   Best systems used most resources
    ●   Exception: SOFT-CARDINALITY




                      STS task - SemEval 2012      15
Conclusions
●   Pilot worked!
    ●   Define STS as likert scale with definitions
    ●   Produce a wealth of data of high quality (~ 3750)
    ●   Very successful participation
    ●   All data and system outputs are publicly available
●   Started to explore evaluation of STS
●   Started to explore relation to paraphrase and
    MT evaluation
●   Planning for STS 2013
                          STS task - SemEval 2012            16
Open issues
●   Data sources, alternatives to the opportunistic method
    ●   New pairs of sentences
    ●   Possibly related to specific phenomena, e.g. negation
●   Definition of task
    ●   Agreement for definitions
    ●   Compare to Likert scale with no definitions
    ●   Define multiple dimensions of similarity
        (polarity, sentiment, modality, relatedness, entailment, etc.)
●   Evaluation
    ●   Spearman, Kendall's Tau
    ●   Significance tests over multiple datasets (Bergmann & Hommel, 1989)
●   And more!!         Join STS-semeval google group

                                  STS task - SemEval 2012                     17
STS presentations
●   Three best systems will be presented in
    last session of Semeval today (4:00pm)
●   Analysis of runs and some thoughts on
    evaluation will be also presented
●   Tomorrow in the posters sessions




                     STS task - SemEval 2012   18
Thanks for your attention!

And thanks to all participants, specially all participants, specially
those contributing to the evaluation discussion (Yoan Gutierrez,
Michael Heilman, Sergio Jimenez, Nitin Madnami, Diana
McCarthy and Shrutiranjan Satpathy)
Eneko Agirre was partially funded by the European Community's Seventh Framework Programme
(FP7/2007-2013) under grant agreement no. 270082 (PATHS project) and the Ministry of Economy
under grant TIN2009-14715-C04-01 (KNOW2 project). Daniel Cer gratefully acknowledges the support
of the Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air
Force Research Labora-tory (AFRL) prime contract no. FA8750-09-C-0181 and the support of the
DARPA Broad Operational Language Translation (BOLT) program through IBM. The STS annotations
were funded by an extension to DARPA GALE subcontract to IBM # W0853748 4911021461.0 to Mona
Diab. Any opinions, findings, and conclusion or recommendations expressed in this material are those
of the author(s) and do not necessarily reflect the view of the DARPA, AFRL, or the US government.



                                        STS task - SemEval 2012                                        19
SemEval 2012 task 6
        A pilot on
Semantic Textual Similarity
   http://www.cs.york.ac.uk/semeval-2012/task6/


     Eneko Agirre (University of the Basque Country)
             Daniel Cer (Stanford University)
             Mona Diab (Columbia University)
                  Bill Dolan (Microsoft)
Aitor Agirre-Gonzalez (University of the Basque Country)
MSR paraphrase corpus
●   Widely used to evaluate text similarity algorithms
●   Gleaned over a period of 18 months from
    thousands of news sources on the web.
●   5801 pairs of sentences
    ●   70% train, 30% test
    ●   67% yes, %33 no
        –   completely unrelated semantically, partially overlapping, to
            those that are almost-but-not-quite semantically equivalent.
    ●   IAA 82%-84%
●   (Dolan et al. 2004)
                              STS task - SemEval 2012                      21
MSR paraphrase corpus
●   The Senate Select Committee on Intelligence is preparing a
    blistering report on prewar intelligence on Iraq.
●   American intelligence leading up to the war on Iraq will be
    criticised by a powerful US Congressional committee due to
    report soon, officials said today.

●   A strong geomagnetic storm was expected to hit Earth today
    with the potential to affect electrical grids and satellite
    communications.
●   A strong geomagnetic storm is expected to hit Earth
    sometime %%DAY%% and could knock out electrical grids
    and satellite communications.

                          STS task - SemEval 2012                 22
MSR paraphrase corpus
●   Methodology:
    ●   Rank pairs according to string similarity
        –   Algorithms for Approximate String Matching", E.
            Ukkonen, Information and Control Vol. 64, 1985, pp. 100-
            118.
    ●   Five bands (0.8 – 0.4 similarity)
    ●   Sample equal number of pairs from each band
    ●   Repeat for paraphrases / non-paraphrases
    ●   50% from each
●   750 pairs for train, 750 pairs for test
                            STS task - SemEval 2012                    23
MSR Video Description Corpus
●   Show a segment of YouTube video
    ●   Ask for one-sentence description of the main
        action/event in the video (AMT)
    ●   120K sentences, 2,000 videos
    ●   Roughly parallel descriptions (not only in English)
●   (Chen and Dolan, 2011)




                          STS task - SemEval 2012             24
MSR Video Description Corpus
                       ●   A person is slicing a cucumber into
                           pieces.
                       ●   A chef is slicing a vegetable.
                       ●   A person is slicing a cucumber.
                       ●   A woman is slicing vegetables.
                       ●   A woman is slicing a cucumber.
                       ●   A person is slicing cucumber with
                           a knife.
                       ●   A person cuts up a piece of
                           cucumber.
                       ●   A man is slicing cucumber.
                       ●   A man cutting zucchini.
                       ●   Someone is slicing fruit.

          STS task - SemEval 2012                                25
MSR Video Description Corpus
●   Methodology:
    ●   All possible pairs from the same video
    ●   1% of all possible pairs from different videos
    ●   Rank pairs according to string similarity
    ●   Four bands (0.8 – 0.5 similarity)
    ●   Sample equal number of pairs from each band
    ●   Repeat for same video / different video
    ●   50% from each
●   750 pairs for train, 750 pairs for test

                          STS task - SemEval 2012        26
WMT: MT evaluation
●   Pairs of segments (~ sentences) that had been part
    of the human evaluation for WMT systems
    ●   a reference translation
    ●   a machine translation submission
●   To keep things consistent, we just used French to
    English system submissions translation
●   Train contains pairs in WMT 2007
●   Test contains pairs with less than 16 tokens from
    WMT 2008
●   Train and test come from Europarl

                            STS task - SemEval 2012      27
WMT: MT evaluation
●   The only instance in which no tax is levied is
    when the supplier is in a non-EU country and
    the recipient is in a Member State of the EU.
●   The only case for which no tax is still perceived
    "is an example of supply in the European
    Community from a third country.
●   Thank you very much, Commissioner.
●   Thank you very much, Mr Commissioner.


                      STS task - SemEval 2012           28
Surprise datasets
●   human ranked fr-en system submissions from
    the WMT 2007 news conversation test set,
    resulting in 351 unique system reference pairs.
●   The second set is radically different as it
    comprised 750 pairs of glosses from OntoNotes
    4.0 (Hovy et al., 2006) and WordNet 3.1
    (Fellbaum, 1998) senses.




                     STS task - SemEval 2012          29
Pilot
●   Mona, Dan, Eneko
●   ~200 pairs from three datasets
●   Pairwise agreement:
    ●   GS:dan     SYS:eneko     N:188 Pearson: 0.874
    ●   GS:dan     SYS:mona      N:174 Pearson: 0.845
    ●   GS:eneko   SYS:mona      N:184 Pearson: 0.863
●   Agreement with average of rest of us:
    ●   GS:average  SYS:dan   N:188 Pearson: 0.885
    ●   GS:average  SYS:eneko N:198 Pearson: 0.889
    ●   GS:average  SYS:mona  N:184 Pearson: 0.875

                          STS task - SemEval 2012       30
STS task - SemEval 2012   31
Pilot with turkers
●   Average turkers with our average:
    ●   N:197 Pearson: 0.959
●   Each of us with average of turkers:
    ●   dan        N:187 Pearson: 0.937
    ●   eneko      N:197 Pearson: 0.919
    ●   mona       N:183 Pearson: 0.896




                     STS task - SemEval 2012   32
Working with AMT
●   Requirements:
    ●   95% approval rating for their other HITs on AMT.
    ●   To pass a qualification test with 80% accuracy.
        –   6 example pairs
        –   answers were marked correct if they were within +1/-1 of our
            annotations
    ●   Targetting US, but used all origins
●   HIT: 5 pairs of sentences, $ 0.20, 5 turkers per HIT
●   114.9 seconds per HIT on the most recent data we
    submitted.


                                STS task - SemEval 2012                    33
Working with AMT
●   Quality control
    ●   Each HIT contained one pair from our pilot
    ●   After the tagging we check correlation of individual
        turkers with our scores
    ●   Remove annotations of low correlation turkers
        –   A2VJKPNDGBSUOK N:100 Pearson: -0.003
    ●   Later realized that we could use correlation with
        average of other Turkers




                          STS task - SemEval 2012              34
Assessing quality of annotation




           STS task - SemEval 2012   35
Assessing quality of annotation
●   MSR datasets
    ●   Average 2.76
    ●   0:2228
    ●   1:1456
    ●   2:1895
    ●   3:4072
    ●   4:3275
    ●   5:2126


                       STS task - SemEval 2012   36
Average (MSR data)

6



5



4



                                   ave
3



2



1



0




         STS task - SemEval 2012         37
Standard deviation (MSR data)


7


6


5


4


3


2


1


0


-1


-2


                STS task - SemEval 2012   38
Standard deviation (MSR data)

2.5




 2




1.5

                                           sdv


 1




0.5




 0




                 STS task - SemEval 2012         39
Average SMTeuroparl




      STS task - SemEval 2012   40

Weitere ähnliche Inhalte

Andere mochten auch

PATHS: User Requirements Analysis v1.0
PATHS: User Requirements Analysis v1.0PATHS: User Requirements Analysis v1.0
PATHS: User Requirements Analysis v1.0pathsproject
 
PATHS Functional specification first prototype
PATHS Functional specification first prototypePATHS Functional specification first prototype
PATHS Functional specification first prototypepathsproject
 
IND-2012-252 A G Matic Hr sec School -Kids Birthday Garden
IND-2012-252 A G Matic Hr sec School -Kids Birthday GardenIND-2012-252 A G Matic Hr sec School -Kids Birthday Garden
IND-2012-252 A G Matic Hr sec School -Kids Birthday Gardendesignforchangechallenge
 
The old exchange environment versus modern exchange environment part 02#36
The old exchange environment versus modern exchange environment  part 02#36The old exchange environment versus modern exchange environment  part 02#36
The old exchange environment versus modern exchange environment part 02#36Eyal Doron
 
PATHS Second prototype-functional-spec
PATHS Second prototype-functional-specPATHS Second prototype-functional-spec
PATHS Second prototype-functional-specpathsproject
 
PATHS at PATCH 2011
PATHS at PATCH 2011PATHS at PATCH 2011
PATHS at PATCH 2011pathsproject
 
My E-mail appears as spam - Troubleshooting path | Part 11#17
My E-mail appears as spam - Troubleshooting path | Part 11#17My E-mail appears as spam - Troubleshooting path | Part 11#17
My E-mail appears as spam - Troubleshooting path | Part 11#17Eyal Doron
 
Autodiscover flow in active directory based environment part 15#36
Autodiscover flow in active directory based environment  part 15#36Autodiscover flow in active directory based environment  part 15#36
Autodiscover flow in active directory based environment part 15#36Eyal Doron
 
Exchange In-Place eDiscovery & Hold | Introduction | 5#7
Exchange In-Place eDiscovery & Hold | Introduction  | 5#7Exchange In-Place eDiscovery & Hold | Introduction  | 5#7
Exchange In-Place eDiscovery & Hold | Introduction | 5#7Eyal Doron
 
PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013pathsproject
 

Andere mochten auch (16)

PATHS: User Requirements Analysis v1.0
PATHS: User Requirements Analysis v1.0PATHS: User Requirements Analysis v1.0
PATHS: User Requirements Analysis v1.0
 
PATHS Functional specification first prototype
PATHS Functional specification first prototypePATHS Functional specification first prototype
PATHS Functional specification first prototype
 
IND-2012-252 A G Matic Hr sec School -Kids Birthday Garden
IND-2012-252 A G Matic Hr sec School -Kids Birthday GardenIND-2012-252 A G Matic Hr sec School -Kids Birthday Garden
IND-2012-252 A G Matic Hr sec School -Kids Birthday Garden
 
Anafrank
AnafrankAnafrank
Anafrank
 
The old exchange environment versus modern exchange environment part 02#36
The old exchange environment versus modern exchange environment  part 02#36The old exchange environment versus modern exchange environment  part 02#36
The old exchange environment versus modern exchange environment part 02#36
 
PATHS Second prototype-functional-spec
PATHS Second prototype-functional-specPATHS Second prototype-functional-spec
PATHS Second prototype-functional-spec
 
IND-2012-290 Anando -REUNION
IND-2012-290 Anando -REUNIONIND-2012-290 Anando -REUNION
IND-2012-290 Anando -REUNION
 
PATHS at PATCH 2011
PATHS at PATCH 2011PATHS at PATCH 2011
PATHS at PATCH 2011
 
Extra unit 2
Extra unit 2Extra unit 2
Extra unit 2
 
My E-mail appears as spam - Troubleshooting path | Part 11#17
My E-mail appears as spam - Troubleshooting path | Part 11#17My E-mail appears as spam - Troubleshooting path | Part 11#17
My E-mail appears as spam - Troubleshooting path | Part 11#17
 
Renaissance
RenaissanceRenaissance
Renaissance
 
Canciones del colegio
Canciones del colegioCanciones del colegio
Canciones del colegio
 
Autodiscover flow in active directory based environment part 15#36
Autodiscover flow in active directory based environment  part 15#36Autodiscover flow in active directory based environment  part 15#36
Autodiscover flow in active directory based environment part 15#36
 
IND-2012-289 Anando SIGNATURE CAMPAIGN
IND-2012-289 Anando SIGNATURE CAMPAIGNIND-2012-289 Anando SIGNATURE CAMPAIGN
IND-2012-289 Anando SIGNATURE CAMPAIGN
 
Exchange In-Place eDiscovery & Hold | Introduction | 5#7
Exchange In-Place eDiscovery & Hold | Introduction  | 5#7Exchange In-Place eDiscovery & Hold | Introduction  | 5#7
Exchange In-Place eDiscovery & Hold | Introduction | 5#7
 
PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013
 

Ähnlich wie SemEval 2012 task 6 pilot on Semantic Textual Similarity

Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...Jinho Choi
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 
Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxChimezie Ogbuji
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Anubhav Jain
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptxKtonNguyn2
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia VoulibasiISSEL
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Quinsulon Israel
 
a deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarizationa deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarizationJEE HYUN PARK
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimationCS, NcState
 
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...広樹 本間
 
A literature survey of benchmark functions for global optimisation problems
A literature survey of benchmark functions for global optimisation problemsA literature survey of benchmark functions for global optimisation problems
A literature survey of benchmark functions for global optimisation problemsXin-She Yang
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingeXascale Infolab
 
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво....NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...NETFest
 

Ähnlich wie SemEval 2012 task 6 pilot on Semantic Textual Similarity (20)

Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
Test for AI model
Test for AI modelTest for AI model
Test for AI model
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
a deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarizationa deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarization
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
ICSE20_Tao_slides.pptx
ICSE20_Tao_slides.pptxICSE20_Tao_slides.pptx
ICSE20_Tao_slides.pptx
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimation
 
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
 
A literature survey of benchmark functions for global optimisation problems
A literature survey of benchmark functions for global optimisation problemsA literature survey of benchmark functions for global optimisation problems
A literature survey of benchmark functions for global optimisation problems
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво....NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
 

Mehr von pathsproject

Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...
Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...
Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...pathsproject
 
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...pathsproject
 
PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...
PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...
PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...pathsproject
 
Implementing Recommendations in the PATHS system, SUEDL 2013
Implementing Recommendations in the PATHS system, SUEDL 2013Implementing Recommendations in the PATHS system, SUEDL 2013
Implementing Recommendations in the PATHS system, SUEDL 2013pathsproject
 
User-Centred Design to Support Exploration and Path Creation in Cultural Her...
 User-Centred Design to Support Exploration and Path Creation in Cultural Her... User-Centred Design to Support Exploration and Path Creation in Cultural Her...
User-Centred Design to Support Exploration and Path Creation in Cultural Her...pathsproject
 
Generating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paperGenerating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paperpathsproject
 
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...pathsproject
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring reportpathsproject
 
Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...pathsproject
 
Semantic Enrichment of Cultural Heritage content in PATHS
Semantic Enrichment of Cultural Heritage content in PATHSSemantic Enrichment of Cultural Heritage content in PATHS
Semantic Enrichment of Cultural Heritage content in PATHSpathsproject
 
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paperGenerating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paperpathsproject
 
PATHS @ LATECH 2013
PATHS @ LATECH 2013PATHS @ LATECH 2013
PATHS @ LATECH 2013pathsproject
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conferencepathsproject
 
PATHS at the EAA conference 2013
PATHS at the EAA conference 2013PATHS at the EAA conference 2013
PATHS at the EAA conference 2013pathsproject
 
Comparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentationComparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentationpathsproject
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similaritypathsproject
 
Comparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documentsComparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documentspathsproject
 
PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0pathsproject
 
PATHS Evaluation of the 1st paths prototype
PATHS Evaluation of the 1st paths prototypePATHS Evaluation of the 1st paths prototype
PATHS Evaluation of the 1st paths prototypepathsproject
 
PATHS Final state of art monitoring report v0_4
PATHS  Final state of art monitoring report v0_4PATHS  Final state of art monitoring report v0_4
PATHS Final state of art monitoring report v0_4pathsproject
 

Mehr von pathsproject (20)

Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...
Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...
Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...
 
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
 
PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...
PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...
PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...
 
Implementing Recommendations in the PATHS system, SUEDL 2013
Implementing Recommendations in the PATHS system, SUEDL 2013Implementing Recommendations in the PATHS system, SUEDL 2013
Implementing Recommendations in the PATHS system, SUEDL 2013
 
User-Centred Design to Support Exploration and Path Creation in Cultural Her...
 User-Centred Design to Support Exploration and Path Creation in Cultural Her... User-Centred Design to Support Exploration and Path Creation in Cultural Her...
User-Centred Design to Support Exploration and Path Creation in Cultural Her...
 
Generating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paperGenerating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paper
 
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring report
 
Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...
 
Semantic Enrichment of Cultural Heritage content in PATHS
Semantic Enrichment of Cultural Heritage content in PATHSSemantic Enrichment of Cultural Heritage content in PATHS
Semantic Enrichment of Cultural Heritage content in PATHS
 
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paperGenerating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
 
PATHS @ LATECH 2013
PATHS @ LATECH 2013PATHS @ LATECH 2013
PATHS @ LATECH 2013
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conference
 
PATHS at the EAA conference 2013
PATHS at the EAA conference 2013PATHS at the EAA conference 2013
PATHS at the EAA conference 2013
 
Comparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentationComparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentation
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
Comparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documentsComparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documents
 
PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0
 
PATHS Evaluation of the 1st paths prototype
PATHS Evaluation of the 1st paths prototypePATHS Evaluation of the 1st paths prototype
PATHS Evaluation of the 1st paths prototype
 
PATHS Final state of art monitoring report v0_4
PATHS  Final state of art monitoring report v0_4PATHS  Final state of art monitoring report v0_4
PATHS Final state of art monitoring report v0_4
 

Kürzlich hochgeladen

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 

Kürzlich hochgeladen (20)

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 

SemEval 2012 task 6 pilot on Semantic Textual Similarity

  • 1. SemEval 2012 task 6 A pilot on Semantic Textual Similarity http://www.cs.york.ac.uk/semeval-2012/task6/ Eneko Agirre (University of the Basque Country) Daniel Cer (Stanford University) Mona Diab (Columbia University) Bill Dolan (Microsoft) Aitor Gonzalez-Agirre (University of the Basque Country)
  • 2. Outline  Motivation  Description of the task  Source Datasets  Definition of similarity and annotation  Results  Conclusions, open issues STS task - SemEval 2012 2
  • 3. Motivation ● Word similarity and relatedness highly correlated with humans Move to longer text fragments (STS) – Li et al. (2006) 65 pairs of glosses – Lee et al. (2005) 50 documents on news ● Paraphrase datasets judge semantic equivalence between text fragments ● Textual entailment (TE) judges whether one fragment entails another Move to graded notion of semantic equivalence (STS) STS task - SemEval 2012 3
  • 4. Motivation ● STS has been part of the core implementation of TE and paraphrase systems ● Algorithms for STS have been extensively applied ● MT, MT evaluation, Summarization, Generation, Distillation, Machine Reading, Textual Inference, Deep QA ● Interest from application side confirmed in a recent STS workshop: ● http://www.cs.columbia.edu/~weiwei/workshop/ STS task - SemEval 2012 4
  • 5. Motivation ● STS as a unified framework to combine and evaluate semantic (and pragmatic components) word sense disambiguation and induction lexical substitution semantic role labeling multiword expression detection and handling anaphora and coreference resolution time and date resolution named-entity handling underspecification hedging semantic scoping discourse analysis STS task - SemEval 2012 5
  • 6. Motivation ● Start with a pilot task, with the following goals 1.To set a definition of STS as a graded notion which can be easily communicated to non-expert annotators beyond the likert-scale 2.To gather a substantial amount of sentence pairs from diverse datasets, and to annotate them with high quality 3.To explore evaluation measures for STS 4.To explore the relation of STS to paraphrase and Machine Translation Evaluation exercises STS task - SemEval 2012 6
  • 7. Description of the task ● Given two sentences, s1 and s2 ● Return a similarity score and an optional confidence score ● Evaluation ● Correlation (Pearson) with average of human scores STS task - SemEval 2012 7
  • 8. Data sources ● MSR paraphrase: train (750), test (750) ● MSR video: train (750), test (750) ● WMT 07–08 (EuroParl): train (734), test (499) ● Surprise datasets ● WMT 2007 news: test (399) ● Ontonotes – WordNet glosses: test (750) STS task - SemEval 2012 8
  • 9. Definition of similarity Likert scale with definitions STS task - SemEval 2012 9
  • 10. Annotation ● Pilot with 200 pairs annotated by three authors ● Pairwise (0.84r to 0.87r), with average (0.87r to 0.89r) ● Amazon Mechanical Turk ● 5 annotations per pair, averaged ● Remove turkers with very low correlations with pilot ● Correlation with us 0.90r to 0.94r ● MSR: 2.76 mean, 0.66 sdv. ● SMT: 4.05 mean, 0.66 sdv. STS task - SemEval 2012 10
  • 11. Results ● Baselines: random, cosine of tokens ● Participation: 120 hours to submit three runs. ● 35 teams, 88 runs ● Evaluation ● Pearson for each dataset ● Concatenate all 5 datasets: ALL – Some systems doing well in each dataset, low results ● Weighted mean over 5 datasets (micro-average): MEAN – Statistical significance ● Normalize each dataset and concatenate (least square): ALLnorm – Corrects errors (random would get 0.59r) STS task - SemEval 2012 11
  • 12. Results ● Large majority better than both baselines ● Best three runs ● ALL: 0.82r UKP, TAKELAB, TAKELAB ● Mean: 0.67r TAKELAB, UKP, TAKELAB ● ALLnrm: 0.86r UKP, TAKELAB, SOFT-CARDINALITY ● Statistical significance (ALL 95% confidence interval) ● 1st 0.824r [0.812,0.835] ● 2nd 0.814r [0.802,0.825] STS task - SemEval 2012 12
  • 13. Results ● Datasets (ALL) ● MSRpar 0.73r TAKELAB ● MSRvid 0.88r TAKELAB ● SMT-eur 0.57r SRANJANS ● SMT-news 0.61r FBK ● On-WN 0.73r WEIWEI STS task - SemEval 2012 13
  • 14. Results ● Evaluation using confidence scores ● Weighted Pearson correlation ● Some systems improve results (IRIT, TIANTIANZHU7) – IRIT: 0.48r => 0.55r ● Others did not (UNED) ● Unfortunately only a few teams sent out confidence scores ● Promising direction, potentially useful in applications (Watson) STS task - SemEval 2012 14
  • 15. Tools used ● WordNet, corpora and Wikipedia most used ● Knowledge-based and distributional equally ● Machine learning widely used for combination and tuning ● Best systems used most resources ● Exception: SOFT-CARDINALITY STS task - SemEval 2012 15
  • 16. Conclusions ● Pilot worked! ● Define STS as likert scale with definitions ● Produce a wealth of data of high quality (~ 3750) ● Very successful participation ● All data and system outputs are publicly available ● Started to explore evaluation of STS ● Started to explore relation to paraphrase and MT evaluation ● Planning for STS 2013 STS task - SemEval 2012 16
  • 17. Open issues ● Data sources, alternatives to the opportunistic method ● New pairs of sentences ● Possibly related to specific phenomena, e.g. negation ● Definition of task ● Agreement for definitions ● Compare to Likert scale with no definitions ● Define multiple dimensions of similarity (polarity, sentiment, modality, relatedness, entailment, etc.) ● Evaluation ● Spearman, Kendall's Tau ● Significance tests over multiple datasets (Bergmann & Hommel, 1989) ● And more!! Join STS-semeval google group STS task - SemEval 2012 17
  • 18. STS presentations ● Three best systems will be presented in last session of Semeval today (4:00pm) ● Analysis of runs and some thoughts on evaluation will be also presented ● Tomorrow in the posters sessions STS task - SemEval 2012 18
  • 19. Thanks for your attention! And thanks to all participants, specially all participants, specially those contributing to the evaluation discussion (Yoan Gutierrez, Michael Heilman, Sergio Jimenez, Nitin Madnami, Diana McCarthy and Shrutiranjan Satpathy) Eneko Agirre was partially funded by the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 270082 (PATHS project) and the Ministry of Economy under grant TIN2009-14715-C04-01 (KNOW2 project). Daniel Cer gratefully acknowledges the support of the Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Labora-tory (AFRL) prime contract no. FA8750-09-C-0181 and the support of the DARPA Broad Operational Language Translation (BOLT) program through IBM. The STS annotations were funded by an extension to DARPA GALE subcontract to IBM # W0853748 4911021461.0 to Mona Diab. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA, AFRL, or the US government. STS task - SemEval 2012 19
  • 20. SemEval 2012 task 6 A pilot on Semantic Textual Similarity http://www.cs.york.ac.uk/semeval-2012/task6/ Eneko Agirre (University of the Basque Country) Daniel Cer (Stanford University) Mona Diab (Columbia University) Bill Dolan (Microsoft) Aitor Agirre-Gonzalez (University of the Basque Country)
  • 21. MSR paraphrase corpus ● Widely used to evaluate text similarity algorithms ● Gleaned over a period of 18 months from thousands of news sources on the web. ● 5801 pairs of sentences ● 70% train, 30% test ● 67% yes, %33 no – completely unrelated semantically, partially overlapping, to those that are almost-but-not-quite semantically equivalent. ● IAA 82%-84% ● (Dolan et al. 2004) STS task - SemEval 2012 21
  • 22. MSR paraphrase corpus ● The Senate Select Committee on Intelligence is preparing a blistering report on prewar intelligence on Iraq. ● American intelligence leading up to the war on Iraq will be criticised by a powerful US Congressional committee due to report soon, officials said today. ● A strong geomagnetic storm was expected to hit Earth today with the potential to affect electrical grids and satellite communications. ● A strong geomagnetic storm is expected to hit Earth sometime %%DAY%% and could knock out electrical grids and satellite communications. STS task - SemEval 2012 22
  • 23. MSR paraphrase corpus ● Methodology: ● Rank pairs according to string similarity – Algorithms for Approximate String Matching", E. Ukkonen, Information and Control Vol. 64, 1985, pp. 100- 118. ● Five bands (0.8 – 0.4 similarity) ● Sample equal number of pairs from each band ● Repeat for paraphrases / non-paraphrases ● 50% from each ● 750 pairs for train, 750 pairs for test STS task - SemEval 2012 23
  • 24. MSR Video Description Corpus ● Show a segment of YouTube video ● Ask for one-sentence description of the main action/event in the video (AMT) ● 120K sentences, 2,000 videos ● Roughly parallel descriptions (not only in English) ● (Chen and Dolan, 2011) STS task - SemEval 2012 24
  • 25. MSR Video Description Corpus ● A person is slicing a cucumber into pieces. ● A chef is slicing a vegetable. ● A person is slicing a cucumber. ● A woman is slicing vegetables. ● A woman is slicing a cucumber. ● A person is slicing cucumber with a knife. ● A person cuts up a piece of cucumber. ● A man is slicing cucumber. ● A man cutting zucchini. ● Someone is slicing fruit. STS task - SemEval 2012 25
  • 26. MSR Video Description Corpus ● Methodology: ● All possible pairs from the same video ● 1% of all possible pairs from different videos ● Rank pairs according to string similarity ● Four bands (0.8 – 0.5 similarity) ● Sample equal number of pairs from each band ● Repeat for same video / different video ● 50% from each ● 750 pairs for train, 750 pairs for test STS task - SemEval 2012 26
  • 27. WMT: MT evaluation ● Pairs of segments (~ sentences) that had been part of the human evaluation for WMT systems ● a reference translation ● a machine translation submission ● To keep things consistent, we just used French to English system submissions translation ● Train contains pairs in WMT 2007 ● Test contains pairs with less than 16 tokens from WMT 2008 ● Train and test come from Europarl STS task - SemEval 2012 27
  • 28. WMT: MT evaluation ● The only instance in which no tax is levied is when the supplier is in a non-EU country and the recipient is in a Member State of the EU. ● The only case for which no tax is still perceived "is an example of supply in the European Community from a third country. ● Thank you very much, Commissioner. ● Thank you very much, Mr Commissioner. STS task - SemEval 2012 28
  • 29. Surprise datasets ● human ranked fr-en system submissions from the WMT 2007 news conversation test set, resulting in 351 unique system reference pairs. ● The second set is radically different as it comprised 750 pairs of glosses from OntoNotes 4.0 (Hovy et al., 2006) and WordNet 3.1 (Fellbaum, 1998) senses. STS task - SemEval 2012 29
  • 30. Pilot ● Mona, Dan, Eneko ● ~200 pairs from three datasets ● Pairwise agreement: ● GS:dan     SYS:eneko     N:188 Pearson: 0.874 ● GS:dan     SYS:mona      N:174 Pearson: 0.845 ● GS:eneko   SYS:mona      N:184 Pearson: 0.863 ● Agreement with average of rest of us: ● GS:average  SYS:dan   N:188 Pearson: 0.885 ● GS:average  SYS:eneko N:198 Pearson: 0.889 ● GS:average  SYS:mona  N:184 Pearson: 0.875 STS task - SemEval 2012 30
  • 31. STS task - SemEval 2012 31
  • 32. Pilot with turkers ● Average turkers with our average: ● N:197 Pearson: 0.959 ● Each of us with average of turkers: ● dan        N:187 Pearson: 0.937 ● eneko      N:197 Pearson: 0.919 ● mona       N:183 Pearson: 0.896 STS task - SemEval 2012 32
  • 33. Working with AMT ● Requirements: ● 95% approval rating for their other HITs on AMT. ● To pass a qualification test with 80% accuracy. – 6 example pairs – answers were marked correct if they were within +1/-1 of our annotations ● Targetting US, but used all origins ● HIT: 5 pairs of sentences, $ 0.20, 5 turkers per HIT ● 114.9 seconds per HIT on the most recent data we submitted. STS task - SemEval 2012 33
  • 34. Working with AMT ● Quality control ● Each HIT contained one pair from our pilot ● After the tagging we check correlation of individual turkers with our scores ● Remove annotations of low correlation turkers – A2VJKPNDGBSUOK N:100 Pearson: -0.003 ● Later realized that we could use correlation with average of other Turkers STS task - SemEval 2012 34
  • 35. Assessing quality of annotation STS task - SemEval 2012 35
  • 36. Assessing quality of annotation ● MSR datasets ● Average 2.76 ● 0:2228 ● 1:1456 ● 2:1895 ● 3:4072 ● 4:3275 ● 5:2126 STS task - SemEval 2012 36
  • 37. Average (MSR data) 6 5 4 ave 3 2 1 0 STS task - SemEval 2012 37
  • 38. Standard deviation (MSR data) 7 6 5 4 3 2 1 0 -1 -2 STS task - SemEval 2012 38
  • 39. Standard deviation (MSR data) 2.5 2 1.5 sdv 1 0.5 0 STS task - SemEval 2012 39
  • 40. Average SMTeuroparl STS task - SemEval 2012 40