SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Parsing Italian texts together is better
than parsing them alone!
Oronzo Antonelli, Fabio Tamburini
University of Bologna
CLiC-it 2018, 10 December 2018
Two Goals
Test the effectiveness of Dependency Parsers based on Deep Neural
Networks on Italian.
We collected nine different state-of-the-art parsers;
All parsers hyper-parameters have been set up following the
recommendation of the developers to obtain the best performance.
Propose ensemble systems able to further improve the neural parsers
performances on Italian texts.
Focus on ensemble systems that can be build using pre-trained
parsing models.
Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 1 / 12
Parsers
All parsers considered in this study are based on two popular
approaches:
Transition-based: train a classifier to predict the next transition given
the previous ones.
Graph-based: learn the score of each arc and then find the
dependency tree using a maximum spanning tree (MST) algorithm.
Parser Approach Architecture Optimizer
Chen & Manning (2014) T-based MLP AdaGrad
Ballesteros et al. (2015) T-based Stack LSTM SGD
Kiperwasser & Goldberg (2016) T/G-based Deep BiLSTM with MLP Adam
Andor et al. (2016) T-based MLP Momentum
Cheng et al. (2016) G-based BiGRU attention with MLP AdaGrad
Dozat & Manning (2017) G-based Deep Biaffine attention with MLP Adam
Shi et al. (2017) T/G-based Deep Biaffine attention with MLP Adam
Nguyen et al. (2017) G-based Deep BiLSTM with MLP Adam
Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 2 / 12
Setups and Evaluation Metrics
Two Italian corpora from the Universal Dependencies (UD) project
have been used to train/evaluate the models:
UD Italian 2.1, composed of generic domain texts, contains 13.884
sentences with train/dev/test splitting of 12.838/564/482;
UD PoSTWITA 2.2, composed of social media texts, contains
6.713 sentences with train/dev/test splitting of 5.368/671/674;
The train/validation/test cycle was executed 5 times for each of the
9 parsers, by considering three different setups:
Setup0 use only the UD Italian 2.1 dataset (generic);
Setup1 use only the UD Italian PoSTWITA 2.2 dataset (domain);
Setup2 use the UD Italian 2.1 dataset joined with the UD Italian
PoSTWITA 2.2 dataset, keeping the test set of PoSTWITA (mixed);
Two standard accuracy metrics were selected to evaluate the models
with respect to the gold standard:
UAS: percentage of predicted words with the same head.
LAS: percentage of predicted words with the same head and rel.
Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 3 / 12
Results on UD Italian texts (Setup0), µ ± σ.
Valid. UD Ita Test UD Ita
UAS LAS UAS LAS
C&M (2014) 88.20±0.18% 85.46±0.14% 89.33±0.17% 86.85±0.22%
Ballesteros et al. (2015) 91.15±0.11% 88.55±0.23% 91.57±0.38% 89.15±0.33%
K&G (2016) – T 91.17±0.29% 88.42±0.24% 91.21±0.33% 88.72±0.24%
K&G (2016) – G 91.85±0.27% 89.23±0.31% 92.04±0.18% 89.65±0.10%
Andor et al. (2016) 85.52±0.34% 77.67±0.30% 87.70±0.31% 79.48±0.24%
Cheng et al. (2016) 92.42±0.00% 89.60±0.00% 92.82±0.00% 90.26±0.00%
D&M (2017) 93.37±0.27% 91.37±0.24% 93.72±0.14% 91.84±0.18%
Shi et al. (2017) 89.67±0.24% 85.05±0.24% 89.89±0.29% 84.55±0.30%
Nguyen et al. (2017) 90.37±0.12% 87.19±0.21% 90.67±0.15% 87.58±0.11%
The best results in the Italian dep. parsing were obtained at EVALITA
2014 with UAS 93.55% and LAS 88.76% on a subset of UD Italian.
Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 4 / 12
Results on UD PosTWITA texts (Setup1), µ ± σ.
Valid. UD PoSTW Test UD PoSTW
UAS LAS UAS LAS
C&M (2014) 81.03±0.17% 75.24±0.30% 81.50±0.28% 76.07±0.17%
Ballesteros et al. (2015) 83.44±0.20% 77.70±0.25% 84.06±0.38% 78.64±0.44%
K&G (2016) – T 77.38±0.14% 68.81±0.25% 77.41±0.43% 69.13±0.43%
K&G (2016) – G 78.81±0.23% 70.14±0.33% 78.78±0.44% 70.52±0.51%
Andor et al. (2016) 77.74±0.25% 66.63±0.16% 77.78±0.33% 67.21±0.30%
Cheng et al. (2016) 84.78±0.00% 78.51±0.00% 86.12±0.00% 79.89±0.00%
D&M (2017) 85.01±0.16% 78.80±0.09% 86.26±0.16% 80.40±0.19%
Shi et al. (2017) 80.52±0.18% 73.71±0.14% 81.11±0.29% 74.53±0.26%
Nguyen et al. (2017) 82.02±0.11% 75.20±0.24% 82.74±0.39% 76.22±0.41%
Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 5 / 12
Results on UD It.+PosTWITA texts (Setup2), µ ± σ.
Valid. UD Ita+PoSTW Test UD PoSTW
UAS LAS UAS LAS
C&M (2014) 85.52±0.13% 81.51±0.05% 82.62±0.24% 77.45±0.23%
Ballesteros et al. (2015) 87.85±0.13% 83.80±0.12% 85.15±0.29% 80.12±0.27%
K&G (2016) – T 83.89±0.23% 77.77±0.26% 80.47±0.36% 72.92±0.46%
K&G (2016) – G 84.70±0.14% 78.41±0.14% 81.41±0.37% 73.49±0.19%
Andor et al. (2016) 82.95±0.33% 73.46±0.37% 79.81±0.27% 69.19±0.19%
Cheng et al. (2016) 89.16±0.00% 84.56±0.00% 86.85±0.00% 80.93±0.00%
D&M (2017) 89.72±0.10% 85.85±0.13% 87.22±0.24% 81.65±0.21%
Shi et al. (2017) 85.85±0.36% 80.00±0.39% 83.12±0.50% 76.38±0.38%
Nguyen et al. (2017) 86.81±0.04% 82.13±0.09% 84.09±0.07% 78.02±0.11%
Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 6 / 12
Ensemble systems: Theoretical Gain
Let us consider two oracles (Choi et al. 2015):
Micro chooses the best dependency relation among m dependencies
relations involved in an ensemble.
Macro chooses the best tree for a sentence among the m dependency
trees involved in an ensemble;
Results for an ensemble system using Micro and Macro oracles
and considering all parsers.
Validation Test
UAS LAS UAS LAS
Setup0
Micro 98.30% 97.82% 98.08% 97.72%
Macro 96.62% 95.10% 96.31% 94.82%
Setup2
Micro 97.08% 96.02% 96.32% 94.73%
Macro 94.62% 91.29% 93.27% 88.50%
Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 7 / 12
Tested Ensemble Techniques
Voting. Each parser contributes by assigning a vote on every
dependency edge.
Majority: for each word is taken the edge with highest number of
votes, in case of a draw take the choice of the first parser.
Switching: with majority the dependency tree could be ill-formed, in
this case the tree is replaced with the output of the first parser.
Reparsing. An MST algorithm is used to reparse a graph build using
each word in the sentence as a node, the edges for all the parses and
the number of votes as the edges weights.
cle: Chu-Liu/Edmonds algorithm.
eisner: Eisner algorithm.
Distilling: Train a distillation parser using a loss function with a cost
that incorporates ensemble uncertainty estimates for each possible
attachment.
Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 8 / 12
Setup
The best model on validation set was taken from Setup0 (UD Italian)
and Setup2 (UD Italian + PoSTWITA)
For the voting approach the following parsers combinations were
used:
The best three (DM17+CH16+BA15);
The worst three (AN16+CM14+SH17);
The best plus those with lowest agreement (DM17+CM14+SH17);
The worst plus all the others (AN16+ALL);
The best plus all the others (DM17+ALL).
For the reparsing approach the following parsers combinations were
used:
The best three (DM17+CH16+BA15);
All parsers (ALL).
For the distilling approach we considered the combination of all
parsers together (ALL).
Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 9 / 12
Comparing the Ensembles Results
Differences in performances evaluated on the test set with respect to
the best single parser (DM17).
Setup0
Ensemble strategy UAS LAS
Voting: majority (DM17+ALL) 93.94% (+0.19%) 92.41% (+0.38%)
Voting: switching (DM17+ALL) 93.91% (+0.16%) 92.37% (+0.34%)
Reparsing: cle (ALL) 94.00% (+0.25%) 92.48% (+0.45%)
Reparsing: eisner (ALL) 93.95% (+0.20%) 92.35% (+0.32%)
Distilling (ALL) 92.50% (–1.25%) 89.93% (–2.10%)
Setup2
Ensemble statregy UAS LAS
Voting: majority (DM17+ALL) 88.51% (+0.92%) 84.42% (+2.47%)
Voting: switching (DM17+ALL) 88.50% (+0.91%) 84.20% (+2.25%)
Reparsing: cle (ALL) 88.36% (+0.77%) 84.25% (+2.30%)
Reparsing: eisner (ALL) 88.31% (+0.72%) 84.08% (+2.13%)
Distilling (ALL) 86.73% (–0.86%) 81.39% (–0.56%)
Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 10 / 12
Voting-Majority Side Effects
Even if the voting-majority strategy exhibit good results, we have to
consider that it may produce some ill-formed dependency trees.
The numbers of ill-formed trees obtained by using the majority
strategy for both setups are reported in the following table:
Setup0 Setup2
Voters Valid Test Valid Test Average
DM17+CH16+BA15 9/564 7/482 31/1235 31/674 2.5%
AN16+CM14+SH17 45/564 25/482 88/1235 77/674 7.9%
DM17+CM14+SH17 6/564 6/482 19/1235 23/674 1.8%
AN16+ALL 18/564 17/482 73/1235 63/674 5.5%
DM17+ALL 17/564 11/482 75/1235 57/674 5.0%
For tasks that do not involve a subsequent manual correction,
the majority strategy is not the recommended choice.
Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 11 / 12
Conclusions
The experiments we made show that recent neural parsers are able to
achieve results that define the new state-of-the-art for Italian (both
on UD Italian and UD PoSTWITA).
The ensemble models we proposed were able to increase single parser
performances especially when using in-domain data (PoSTWITA),
exhibiting relevant improvements (∼ 1% in UAS and ∼ 2.5% in LAS).
Performances of the ensemble models increase as the number of
parsers grows.
Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 12 / 12
Thank you!

Weitere ähnliche Inhalte

Ähnlich wie CLiC-it 2018 Presentation

Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteIAEME Publication
 
Handling Numeric Attributes in Hoeffding Trees
Handling Numeric Attributes in Hoeffding TreesHandling Numeric Attributes in Hoeffding Trees
Handling Numeric Attributes in Hoeffding Treesbutest
 
Accelerated life testing
Accelerated life testingAccelerated life testing
Accelerated life testingSteven Li
 
Topic 1 stat. analysis
Topic 1 stat. analysisTopic 1 stat. analysis
Topic 1 stat. analysisMizan Salim
 
Trabajo de ingles (5)
Trabajo de ingles (5)Trabajo de ingles (5)
Trabajo de ingles (5)sasmaripo
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
 
A comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsA comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsAndrew McEachran
 
Transition-based Dependency Parsing with Selectional Branching
Transition-based Dependency Parsing with Selectional BranchingTransition-based Dependency Parsing with Selectional Branching
Transition-based Dependency Parsing with Selectional BranchingJinho Choi
 
One-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual FoundationsOne-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual Foundationssmackinnon
 
Uncertainties & Error.ppt
Uncertainties & Error.pptUncertainties & Error.ppt
Uncertainties & Error.pptKhalil Alhatab
 
19 9742 the application paper id 0016(edit ty)
19 9742 the application paper id 0016(edit ty)19 9742 the application paper id 0016(edit ty)
19 9742 the application paper id 0016(edit ty)IAESIJEECS
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningIRJET Journal
 
Design of frequency selective surface comprising of dipoles using artificial ...
Design of frequency selective surface comprising of dipoles using artificial ...Design of frequency selective surface comprising of dipoles using artificial ...
Design of frequency selective surface comprising of dipoles using artificial ...IJAAS Team
 
Two-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionTwo-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionXin-She Yang
 
From DNA Sequence Variation to .NET Bits and Bobs
From DNA Sequence Variation to .NET Bits and BobsFrom DNA Sequence Variation to .NET Bits and Bobs
From DNA Sequence Variation to .NET Bits and BobsSource Conference
 

Ähnlich wie CLiC-it 2018 Presentation (20)

Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suite
 
Handling Numeric Attributes in Hoeffding Trees
Handling Numeric Attributes in Hoeffding TreesHandling Numeric Attributes in Hoeffding Trees
Handling Numeric Attributes in Hoeffding Trees
 
Accelerated life testing
Accelerated life testingAccelerated life testing
Accelerated life testing
 
Topic 1 stat. analysis
Topic 1 stat. analysisTopic 1 stat. analysis
Topic 1 stat. analysis
 
Trabajo de ingles (5)
Trabajo de ingles (5)Trabajo de ingles (5)
Trabajo de ingles (5)
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
A comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsA comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction models
 
Discussants
DiscussantsDiscussants
Discussants
 
Transition-based Dependency Parsing with Selectional Branching
Transition-based Dependency Parsing with Selectional BranchingTransition-based Dependency Parsing with Selectional Branching
Transition-based Dependency Parsing with Selectional Branching
 
One-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual FoundationsOne-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual Foundations
 
MSc Presentation
MSc PresentationMSc Presentation
MSc Presentation
 
Uncertainties & Error.ppt
Uncertainties & Error.pptUncertainties & Error.ppt
Uncertainties & Error.ppt
 
19 9742 the application paper id 0016(edit ty)
19 9742 the application paper id 0016(edit ty)19 9742 the application paper id 0016(edit ty)
19 9742 the application paper id 0016(edit ty)
 
Cuhk system 14oct_2
Cuhk system 14oct_2Cuhk system 14oct_2
Cuhk system 14oct_2
 
Cuhk system 14oct
Cuhk system 14octCuhk system 14oct
Cuhk system 14oct
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
 
ictir2016
ictir2016ictir2016
ictir2016
 
Design of frequency selective surface comprising of dipoles using artificial ...
Design of frequency selective surface comprising of dipoles using artificial ...Design of frequency selective surface comprising of dipoles using artificial ...
Design of frequency selective surface comprising of dipoles using artificial ...
 
Two-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionTwo-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential Evolution
 
From DNA Sequence Variation to .NET Bits and Bobs
From DNA Sequence Variation to .NET Bits and BobsFrom DNA Sequence Variation to .NET Bits and Bobs
From DNA Sequence Variation to .NET Bits and Bobs
 

Kürzlich hochgeladen

Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxlionnarsimharajumjf
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.thamaeteboho94
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfMahamudul Hasan
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...ZurliaSoop
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...David Celestin
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalFabian de Rijk
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...amilabibi1
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 

Kürzlich hochgeladen (17)

Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 

CLiC-it 2018 Presentation

  • 1. Parsing Italian texts together is better than parsing them alone! Oronzo Antonelli, Fabio Tamburini University of Bologna CLiC-it 2018, 10 December 2018
  • 2. Two Goals Test the effectiveness of Dependency Parsers based on Deep Neural Networks on Italian. We collected nine different state-of-the-art parsers; All parsers hyper-parameters have been set up following the recommendation of the developers to obtain the best performance. Propose ensemble systems able to further improve the neural parsers performances on Italian texts. Focus on ensemble systems that can be build using pre-trained parsing models. Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 1 / 12
  • 3. Parsers All parsers considered in this study are based on two popular approaches: Transition-based: train a classifier to predict the next transition given the previous ones. Graph-based: learn the score of each arc and then find the dependency tree using a maximum spanning tree (MST) algorithm. Parser Approach Architecture Optimizer Chen & Manning (2014) T-based MLP AdaGrad Ballesteros et al. (2015) T-based Stack LSTM SGD Kiperwasser & Goldberg (2016) T/G-based Deep BiLSTM with MLP Adam Andor et al. (2016) T-based MLP Momentum Cheng et al. (2016) G-based BiGRU attention with MLP AdaGrad Dozat & Manning (2017) G-based Deep Biaffine attention with MLP Adam Shi et al. (2017) T/G-based Deep Biaffine attention with MLP Adam Nguyen et al. (2017) G-based Deep BiLSTM with MLP Adam Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 2 / 12
  • 4. Setups and Evaluation Metrics Two Italian corpora from the Universal Dependencies (UD) project have been used to train/evaluate the models: UD Italian 2.1, composed of generic domain texts, contains 13.884 sentences with train/dev/test splitting of 12.838/564/482; UD PoSTWITA 2.2, composed of social media texts, contains 6.713 sentences with train/dev/test splitting of 5.368/671/674; The train/validation/test cycle was executed 5 times for each of the 9 parsers, by considering three different setups: Setup0 use only the UD Italian 2.1 dataset (generic); Setup1 use only the UD Italian PoSTWITA 2.2 dataset (domain); Setup2 use the UD Italian 2.1 dataset joined with the UD Italian PoSTWITA 2.2 dataset, keeping the test set of PoSTWITA (mixed); Two standard accuracy metrics were selected to evaluate the models with respect to the gold standard: UAS: percentage of predicted words with the same head. LAS: percentage of predicted words with the same head and rel. Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 3 / 12
  • 5. Results on UD Italian texts (Setup0), µ ± σ. Valid. UD Ita Test UD Ita UAS LAS UAS LAS C&M (2014) 88.20±0.18% 85.46±0.14% 89.33±0.17% 86.85±0.22% Ballesteros et al. (2015) 91.15±0.11% 88.55±0.23% 91.57±0.38% 89.15±0.33% K&G (2016) – T 91.17±0.29% 88.42±0.24% 91.21±0.33% 88.72±0.24% K&G (2016) – G 91.85±0.27% 89.23±0.31% 92.04±0.18% 89.65±0.10% Andor et al. (2016) 85.52±0.34% 77.67±0.30% 87.70±0.31% 79.48±0.24% Cheng et al. (2016) 92.42±0.00% 89.60±0.00% 92.82±0.00% 90.26±0.00% D&M (2017) 93.37±0.27% 91.37±0.24% 93.72±0.14% 91.84±0.18% Shi et al. (2017) 89.67±0.24% 85.05±0.24% 89.89±0.29% 84.55±0.30% Nguyen et al. (2017) 90.37±0.12% 87.19±0.21% 90.67±0.15% 87.58±0.11% The best results in the Italian dep. parsing were obtained at EVALITA 2014 with UAS 93.55% and LAS 88.76% on a subset of UD Italian. Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 4 / 12
  • 6. Results on UD PosTWITA texts (Setup1), µ ± σ. Valid. UD PoSTW Test UD PoSTW UAS LAS UAS LAS C&M (2014) 81.03±0.17% 75.24±0.30% 81.50±0.28% 76.07±0.17% Ballesteros et al. (2015) 83.44±0.20% 77.70±0.25% 84.06±0.38% 78.64±0.44% K&G (2016) – T 77.38±0.14% 68.81±0.25% 77.41±0.43% 69.13±0.43% K&G (2016) – G 78.81±0.23% 70.14±0.33% 78.78±0.44% 70.52±0.51% Andor et al. (2016) 77.74±0.25% 66.63±0.16% 77.78±0.33% 67.21±0.30% Cheng et al. (2016) 84.78±0.00% 78.51±0.00% 86.12±0.00% 79.89±0.00% D&M (2017) 85.01±0.16% 78.80±0.09% 86.26±0.16% 80.40±0.19% Shi et al. (2017) 80.52±0.18% 73.71±0.14% 81.11±0.29% 74.53±0.26% Nguyen et al. (2017) 82.02±0.11% 75.20±0.24% 82.74±0.39% 76.22±0.41% Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 5 / 12
  • 7. Results on UD It.+PosTWITA texts (Setup2), µ ± σ. Valid. UD Ita+PoSTW Test UD PoSTW UAS LAS UAS LAS C&M (2014) 85.52±0.13% 81.51±0.05% 82.62±0.24% 77.45±0.23% Ballesteros et al. (2015) 87.85±0.13% 83.80±0.12% 85.15±0.29% 80.12±0.27% K&G (2016) – T 83.89±0.23% 77.77±0.26% 80.47±0.36% 72.92±0.46% K&G (2016) – G 84.70±0.14% 78.41±0.14% 81.41±0.37% 73.49±0.19% Andor et al. (2016) 82.95±0.33% 73.46±0.37% 79.81±0.27% 69.19±0.19% Cheng et al. (2016) 89.16±0.00% 84.56±0.00% 86.85±0.00% 80.93±0.00% D&M (2017) 89.72±0.10% 85.85±0.13% 87.22±0.24% 81.65±0.21% Shi et al. (2017) 85.85±0.36% 80.00±0.39% 83.12±0.50% 76.38±0.38% Nguyen et al. (2017) 86.81±0.04% 82.13±0.09% 84.09±0.07% 78.02±0.11% Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 6 / 12
  • 8. Ensemble systems: Theoretical Gain Let us consider two oracles (Choi et al. 2015): Micro chooses the best dependency relation among m dependencies relations involved in an ensemble. Macro chooses the best tree for a sentence among the m dependency trees involved in an ensemble; Results for an ensemble system using Micro and Macro oracles and considering all parsers. Validation Test UAS LAS UAS LAS Setup0 Micro 98.30% 97.82% 98.08% 97.72% Macro 96.62% 95.10% 96.31% 94.82% Setup2 Micro 97.08% 96.02% 96.32% 94.73% Macro 94.62% 91.29% 93.27% 88.50% Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 7 / 12
  • 9. Tested Ensemble Techniques Voting. Each parser contributes by assigning a vote on every dependency edge. Majority: for each word is taken the edge with highest number of votes, in case of a draw take the choice of the first parser. Switching: with majority the dependency tree could be ill-formed, in this case the tree is replaced with the output of the first parser. Reparsing. An MST algorithm is used to reparse a graph build using each word in the sentence as a node, the edges for all the parses and the number of votes as the edges weights. cle: Chu-Liu/Edmonds algorithm. eisner: Eisner algorithm. Distilling: Train a distillation parser using a loss function with a cost that incorporates ensemble uncertainty estimates for each possible attachment. Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 8 / 12
  • 10. Setup The best model on validation set was taken from Setup0 (UD Italian) and Setup2 (UD Italian + PoSTWITA) For the voting approach the following parsers combinations were used: The best three (DM17+CH16+BA15); The worst three (AN16+CM14+SH17); The best plus those with lowest agreement (DM17+CM14+SH17); The worst plus all the others (AN16+ALL); The best plus all the others (DM17+ALL). For the reparsing approach the following parsers combinations were used: The best three (DM17+CH16+BA15); All parsers (ALL). For the distilling approach we considered the combination of all parsers together (ALL). Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 9 / 12
  • 11. Comparing the Ensembles Results Differences in performances evaluated on the test set with respect to the best single parser (DM17). Setup0 Ensemble strategy UAS LAS Voting: majority (DM17+ALL) 93.94% (+0.19%) 92.41% (+0.38%) Voting: switching (DM17+ALL) 93.91% (+0.16%) 92.37% (+0.34%) Reparsing: cle (ALL) 94.00% (+0.25%) 92.48% (+0.45%) Reparsing: eisner (ALL) 93.95% (+0.20%) 92.35% (+0.32%) Distilling (ALL) 92.50% (–1.25%) 89.93% (–2.10%) Setup2 Ensemble statregy UAS LAS Voting: majority (DM17+ALL) 88.51% (+0.92%) 84.42% (+2.47%) Voting: switching (DM17+ALL) 88.50% (+0.91%) 84.20% (+2.25%) Reparsing: cle (ALL) 88.36% (+0.77%) 84.25% (+2.30%) Reparsing: eisner (ALL) 88.31% (+0.72%) 84.08% (+2.13%) Distilling (ALL) 86.73% (–0.86%) 81.39% (–0.56%) Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 10 / 12
  • 12. Voting-Majority Side Effects Even if the voting-majority strategy exhibit good results, we have to consider that it may produce some ill-formed dependency trees. The numbers of ill-formed trees obtained by using the majority strategy for both setups are reported in the following table: Setup0 Setup2 Voters Valid Test Valid Test Average DM17+CH16+BA15 9/564 7/482 31/1235 31/674 2.5% AN16+CM14+SH17 45/564 25/482 88/1235 77/674 7.9% DM17+CM14+SH17 6/564 6/482 19/1235 23/674 1.8% AN16+ALL 18/564 17/482 73/1235 63/674 5.5% DM17+ALL 17/564 11/482 75/1235 57/674 5.0% For tasks that do not involve a subsequent manual correction, the majority strategy is not the recommended choice. Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 11 / 12
  • 13. Conclusions The experiments we made show that recent neural parsers are able to achieve results that define the new state-of-the-art for Italian (both on UD Italian and UD PoSTWITA). The ensemble models we proposed were able to increase single parser performances especially when using in-domain data (PoSTWITA), exhibiting relevant improvements (∼ 1% in UAS and ∼ 2.5% in LAS). Performances of the ensemble models increase as the number of parsers grows. Antonelli and Tamburini Parsing Italian texts together is better than parsing them alone! 10/12/2018 12 / 12