SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Tracing armed conicts with diachronic word embedding models
Andrey Kutuzov, Erik Velldal and Lilja Øvrelid
Language Technology Group, Department of Informatics, University of Oslo {andreku | erikve | liljao } @ifi.uio.no
General overview
We employ diachronic word embedding models in the task of predicting the events of armed conflicts
escalating or calming down in various geographical locations, spanning over 16 years (1994–2010).
The task is similar to that of detecting semantic shifts [Kulkarni et al., 2015], [Hamilton et al., 2016a],
but focused more on subtle changes of perspective instead of full-scale meaning changes.
Classification task
We monitor changes in the local semantic neighborhoods of country names, applying it to the
downstream task of predicting changes in the state of conflict:
1. Nothing has changed in the country conflict state year-to-year (class ‘stable’);
2. Armed conflicts have escalated in the country year-to-year (class ‘war’);
3. Armed conflicts have calmed down in the country year-to-year (class ‘peace’).
Example: given two distributional models trained on news texts from 2002 and 2003, predict in what
direction the conflicts state moved in these years in Senegal (the correct answer is ‘it escalated’)?
UCDP data Data representation
Set of data points equal to the differences (δ) between the location’s
conflict state in the current year and in the previous year, 832 points in
total (52 locations × 16 years).
As an example, for Congo, the transition from 2001 to 2002 was
accompanied by the ending of armed conflicts. Thus, for the data point
‘congo_2002’, δ = 0 − 1 = −1. Then, there were no changes (each
new δ is 0) until 2006, when armed conflicts resumed with the intensity
of 1. Thus, for the ‘congo_2006’ data point, δ = 1 − 0 = 1.
Then, δ values were transformed to classes:
class =



war if δ ≥ 0.5
peace if δ ≤ −0.5
stable otherwise
Gold standard
As a ground truth, we use the UCDP/PRIO Armed Conflict Dataset (http://ucdp.uu.se/) maintained by the Uppsala Conflict
Data Program and the Peace Research Institute Oslo: a manually annotated geographical and temporal dataset with information on
armed conflicts, in the time period from 1946 to the present [Gleditsch et al., 2002]. The resulting test set mentions 52 unique
locations and 673 unique armed conflicts. The task was to predict the conflict state difference for these locations year-to-year.
Diachronic models
To train distributional word embedding models, we used the Continuous Bag-of-Words algorithm proposed in [Mikolov et al., 2013].
Models were trained on Gigaword [Parker et al., 2011] texts in 3 time-representation modes:
1. yearly models, each trained from scratch on the corpora containing news texts from a particular year only (separate);
2. yearly models trained from scratch on the texts from the particular year and all the previous years (cumulative);
3. incrementally trained models (incremental).
Methods
To actually detect semantic shifts for the word wq, one can either:
1. align two models (Mcur and Mprev) using the orthogonal Procrustes transformation, and then measure cosine similarity
between the wq vectors in both models, as proposed in [Hamilton et al., 2016b];
2. alternatively, define a set of anchor words related to the semantic categories we are interested in, and then measure the ‘drift’ of
wq towards or away from these ‘anchors’ in Mcur compared against Mprev.
The anchor words method can provide information about the exact direction of the shift. It can be quantified in 2 ways:
1. for each anchor, calculate its cosine similarity against wq in Mcur and Mprev (Sim);
2. as above, but use the position of each anchor in the models’ vocabulary sorted by similarity to wq (Rank).
These methods produce two vectors Rprev and Rcur, corresponding to the models Mcur and Mprev, with the size is equal to
the number of the anchor words. Then we can either:
1. calculate the cosine distance between these ‘second-order vectors’ (SimDist or RankDist);
2. element-wise subtract Rprev from Rcur to get the idea of whether wq drifted towards or away from the anchors (SimSub or
RankSub).
These features are then fed to SVM classifier.
Anchor words
(adopted from the search strings UCDP uses to filter news texts):
kill, die, injury, dead, death, wound, massacre.
Expanded list adds the nearest neighbors of these words in the CBOW
model trained on the full Gigaword (26 words total).
Overall results
Approach Separate Cumulative Incremental
Procrustes 0.15 0.24 0.29
Basic word list
SimDist 0.27 0.17 0.25
SimSub 0.31 0.26 0.26
RankDist 0.28 0.19 0.23
RankSub 0.26 0.22 0.21
Expanded word list
SimDist 0.25 0.18 0.23
SimSub 0.35 0.31 0.29
RankDist 0.24 0.20 0.28
RankSub 0.36 0.30 0.32
Macro-F1 measure of ternary classification
The test set is available
http://ltr.uio.no/~andreku/armedconflicts/
ucdp_conflicts_1994_2010_testset.tsv
Best model detailed results
Class Precision Recall F1
Peace 0.13 (0.06) 0.29 (0.06) 0.18 (0.06)
Stable 0.80 (0.79) 0.58 (0.82) 0.67 (0.80)
War 0.17 (0.12) 0.33 (0.08) 0.22 (0.10)
Detailed performance of the best model (results of weighted random
guess baseline in parenthesis).
The ‘shifting’ classes War and Peace constitute 10% and 11% of the
data points respectively.
Conclusion
Tracing actual real-world events by detecting ‘cultural’ semantic shifts
in distributional semantic models is a difficult task.
The approaches proposed in the previous work for large-scale shifts
observed over decades or even centuries are not very successful in this
more fine-grained task.
[Hamilton et al., 2016b] report almost perfect accuracy for the Procrustes transformation when
detecting the direction of semantic change. However, our time periods are much more granular
and we attempt to detect subtle associative drifts (often pendulum-like) rather than full-scale
shifts of the meaning.
The proposed ‘anchor words’ method outperforms previous work
approaches by large margin, but it still achieves a macro F1 measure
of only 0.36 on the task of ternary classification (‘stable’, ‘escalating’,
‘calming down’).
See also our forthcoming EMNLP’17 paper on tracing semantic
relations in diachronic models.
Events and Stories in the News (EventStory) 4 August 2017

Weitere ähnliche Inhalte

Ähnlich wie Andrey Kutuzov - 2017 - Tracing armed conflicts with diachronic word embedding models

Spatial GAS Models for Systemic Risk Measurement - Blasques F., Koopman S.J.,...
Spatial GAS Models for Systemic Risk Measurement - Blasques F., Koopman S.J.,...Spatial GAS Models for Systemic Risk Measurement - Blasques F., Koopman S.J.,...
Spatial GAS Models for Systemic Risk Measurement - Blasques F., Koopman S.J.,...SYRTO Project
 
Essentials of monte carlo simulation
Essentials of monte carlo simulationEssentials of monte carlo simulation
Essentials of monte carlo simulationSpringer
 
Session 2 b inter temporal equivalence scales based on stochastic indifferenc...
Session 2 b inter temporal equivalence scales based on stochastic indifferenc...Session 2 b inter temporal equivalence scales based on stochastic indifferenc...
Session 2 b inter temporal equivalence scales based on stochastic indifferenc...IARIW 2014
 
Deflection characteristics of cross ply laminates under different loading con...
Deflection characteristics of cross ply laminates under different loading con...Deflection characteristics of cross ply laminates under different loading con...
Deflection characteristics of cross ply laminates under different loading con...IAEME Publication
 
Cointegration and Long-Horizon Forecasting
Cointegration and Long-Horizon ForecastingCointegration and Long-Horizon Forecasting
Cointegration and Long-Horizon Forecastingمحمد إسماعيل
 
Physique et Chimie de la Terre Physics and Chemistry of the .docx
Physique et Chimie de la Terre  Physics and Chemistry of the .docxPhysique et Chimie de la Terre  Physics and Chemistry of the .docx
Physique et Chimie de la Terre Physics and Chemistry of the .docxLacieKlineeb
 
Transition for regular to mach reflection hysteresis
Transition for regular to mach reflection hysteresisTransition for regular to mach reflection hysteresis
Transition for regular to mach reflection hysteresiseSAT Publishing House
 
Jgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging componentJgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging componentNiccolò Tubini
 
Application of stochastic modeling in geomagnetism
Application of stochastic modeling in geomagnetismApplication of stochastic modeling in geomagnetism
Application of stochastic modeling in geomagnetismeSAT Journals
 
Application of stochastic modeling in geomagnetism
Application of stochastic modeling in geomagnetismApplication of stochastic modeling in geomagnetism
Application of stochastic modeling in geomagnetismeSAT Publishing House
 
Normality_assumption_for_the_log_re.pdf
Normality_assumption_for_the_log_re.pdfNormality_assumption_for_the_log_re.pdf
Normality_assumption_for_the_log_re.pdfVasudha Singh
 
Analysis of Variance1
Analysis of Variance1Analysis of Variance1
Analysis of Variance1Mayar Zo
 
Partial stabilization based guidance
Partial stabilization based guidancePartial stabilization based guidance
Partial stabilization based guidanceISA Interchange
 
The general theory of einstein field equations a gesellschaft gemeinschaft model
The general theory of einstein field equations a gesellschaft gemeinschaft modelThe general theory of einstein field equations a gesellschaft gemeinschaft model
The general theory of einstein field equations a gesellschaft gemeinschaft modelAlexander Decker
 
EXPERIMENTAL VERIFICATION OF MULTIPLE CORRELATIONS BETWEEN THE SOIL YOUNG MOD...
EXPERIMENTAL VERIFICATION OF MULTIPLE CORRELATIONS BETWEEN THE SOIL YOUNG MOD...EXPERIMENTAL VERIFICATION OF MULTIPLE CORRELATIONS BETWEEN THE SOIL YOUNG MOD...
EXPERIMENTAL VERIFICATION OF MULTIPLE CORRELATIONS BETWEEN THE SOIL YOUNG MOD...IAEME Publication
 

Ähnlich wie Andrey Kutuzov - 2017 - Tracing armed conflicts with diachronic word embedding models (20)

Climate Extremes Workshop - Assessing models for estimation and methods for u...
Climate Extremes Workshop - Assessing models for estimation and methods for u...Climate Extremes Workshop - Assessing models for estimation and methods for u...
Climate Extremes Workshop - Assessing models for estimation and methods for u...
 
Spatial GAS Models for Systemic Risk Measurement - Blasques F., Koopman S.J.,...
Spatial GAS Models for Systemic Risk Measurement - Blasques F., Koopman S.J.,...Spatial GAS Models for Systemic Risk Measurement - Blasques F., Koopman S.J.,...
Spatial GAS Models for Systemic Risk Measurement - Blasques F., Koopman S.J.,...
 
Essentials of monte carlo simulation
Essentials of monte carlo simulationEssentials of monte carlo simulation
Essentials of monte carlo simulation
 
Session 2 b inter temporal equivalence scales based on stochastic indifferenc...
Session 2 b inter temporal equivalence scales based on stochastic indifferenc...Session 2 b inter temporal equivalence scales based on stochastic indifferenc...
Session 2 b inter temporal equivalence scales based on stochastic indifferenc...
 
Deflection characteristics of cross ply laminates under different loading con...
Deflection characteristics of cross ply laminates under different loading con...Deflection characteristics of cross ply laminates under different loading con...
Deflection characteristics of cross ply laminates under different loading con...
 
Cointegration and Long-Horizon Forecasting
Cointegration and Long-Horizon ForecastingCointegration and Long-Horizon Forecasting
Cointegration and Long-Horizon Forecasting
 
Forecasting_GAMLSS
Forecasting_GAMLSSForecasting_GAMLSS
Forecasting_GAMLSS
 
Physique et Chimie de la Terre Physics and Chemistry of the .docx
Physique et Chimie de la Terre  Physics and Chemistry of the .docxPhysique et Chimie de la Terre  Physics and Chemistry of the .docx
Physique et Chimie de la Terre Physics and Chemistry of the .docx
 
Transition for regular to mach reflection hysteresis
Transition for regular to mach reflection hysteresisTransition for regular to mach reflection hysteresis
Transition for regular to mach reflection hysteresis
 
Jgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging componentJgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging component
 
Application of stochastic modeling in geomagnetism
Application of stochastic modeling in geomagnetismApplication of stochastic modeling in geomagnetism
Application of stochastic modeling in geomagnetism
 
Application of stochastic modeling in geomagnetism
Application of stochastic modeling in geomagnetismApplication of stochastic modeling in geomagnetism
Application of stochastic modeling in geomagnetism
 
Normality_assumption_for_the_log_re.pdf
Normality_assumption_for_the_log_re.pdfNormality_assumption_for_the_log_re.pdf
Normality_assumption_for_the_log_re.pdf
 
1607.01152.pdf
1607.01152.pdf1607.01152.pdf
1607.01152.pdf
 
Analysis of Variance1
Analysis of Variance1Analysis of Variance1
Analysis of Variance1
 
Fractal Analyzer
Fractal AnalyzerFractal Analyzer
Fractal Analyzer
 
Partial stabilization based guidance
Partial stabilization based guidancePartial stabilization based guidance
Partial stabilization based guidance
 
The general theory of einstein field equations a gesellschaft gemeinschaft model
The general theory of einstein field equations a gesellschaft gemeinschaft modelThe general theory of einstein field equations a gesellschaft gemeinschaft model
The general theory of einstein field equations a gesellschaft gemeinschaft model
 
Steed Variance
Steed VarianceSteed Variance
Steed Variance
 
EXPERIMENTAL VERIFICATION OF MULTIPLE CORRELATIONS BETWEEN THE SOIL YOUNG MOD...
EXPERIMENTAL VERIFICATION OF MULTIPLE CORRELATIONS BETWEEN THE SOIL YOUNG MOD...EXPERIMENTAL VERIFICATION OF MULTIPLE CORRELATIONS BETWEEN THE SOIL YOUNG MOD...
EXPERIMENTAL VERIFICATION OF MULTIPLE CORRELATIONS BETWEEN THE SOIL YOUNG MOD...
 

Mehr von Association for Computational Linguistics

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Association for Computational Linguistics
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Association for Computational Linguistics
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Association for Computational Linguistics
 

Mehr von Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Kürzlich hochgeladen

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 

Kürzlich hochgeladen (20)

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 

Andrey Kutuzov - 2017 - Tracing armed conflicts with diachronic word embedding models

  • 1. Tracing armed conicts with diachronic word embedding models Andrey Kutuzov, Erik Velldal and Lilja Øvrelid Language Technology Group, Department of Informatics, University of Oslo {andreku | erikve | liljao } @ifi.uio.no General overview We employ diachronic word embedding models in the task of predicting the events of armed conflicts escalating or calming down in various geographical locations, spanning over 16 years (1994–2010). The task is similar to that of detecting semantic shifts [Kulkarni et al., 2015], [Hamilton et al., 2016a], but focused more on subtle changes of perspective instead of full-scale meaning changes. Classification task We monitor changes in the local semantic neighborhoods of country names, applying it to the downstream task of predicting changes in the state of conflict: 1. Nothing has changed in the country conflict state year-to-year (class ‘stable’); 2. Armed conflicts have escalated in the country year-to-year (class ‘war’); 3. Armed conflicts have calmed down in the country year-to-year (class ‘peace’). Example: given two distributional models trained on news texts from 2002 and 2003, predict in what direction the conflicts state moved in these years in Senegal (the correct answer is ‘it escalated’)? UCDP data Data representation Set of data points equal to the differences (δ) between the location’s conflict state in the current year and in the previous year, 832 points in total (52 locations × 16 years). As an example, for Congo, the transition from 2001 to 2002 was accompanied by the ending of armed conflicts. Thus, for the data point ‘congo_2002’, δ = 0 − 1 = −1. Then, there were no changes (each new δ is 0) until 2006, when armed conflicts resumed with the intensity of 1. Thus, for the ‘congo_2006’ data point, δ = 1 − 0 = 1. Then, δ values were transformed to classes: class =    war if δ ≥ 0.5 peace if δ ≤ −0.5 stable otherwise Gold standard As a ground truth, we use the UCDP/PRIO Armed Conflict Dataset (http://ucdp.uu.se/) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo: a manually annotated geographical and temporal dataset with information on armed conflicts, in the time period from 1946 to the present [Gleditsch et al., 2002]. The resulting test set mentions 52 unique locations and 673 unique armed conflicts. The task was to predict the conflict state difference for these locations year-to-year. Diachronic models To train distributional word embedding models, we used the Continuous Bag-of-Words algorithm proposed in [Mikolov et al., 2013]. Models were trained on Gigaword [Parker et al., 2011] texts in 3 time-representation modes: 1. yearly models, each trained from scratch on the corpora containing news texts from a particular year only (separate); 2. yearly models trained from scratch on the texts from the particular year and all the previous years (cumulative); 3. incrementally trained models (incremental). Methods To actually detect semantic shifts for the word wq, one can either: 1. align two models (Mcur and Mprev) using the orthogonal Procrustes transformation, and then measure cosine similarity between the wq vectors in both models, as proposed in [Hamilton et al., 2016b]; 2. alternatively, define a set of anchor words related to the semantic categories we are interested in, and then measure the ‘drift’ of wq towards or away from these ‘anchors’ in Mcur compared against Mprev. The anchor words method can provide information about the exact direction of the shift. It can be quantified in 2 ways: 1. for each anchor, calculate its cosine similarity against wq in Mcur and Mprev (Sim); 2. as above, but use the position of each anchor in the models’ vocabulary sorted by similarity to wq (Rank). These methods produce two vectors Rprev and Rcur, corresponding to the models Mcur and Mprev, with the size is equal to the number of the anchor words. Then we can either: 1. calculate the cosine distance between these ‘second-order vectors’ (SimDist or RankDist); 2. element-wise subtract Rprev from Rcur to get the idea of whether wq drifted towards or away from the anchors (SimSub or RankSub). These features are then fed to SVM classifier. Anchor words (adopted from the search strings UCDP uses to filter news texts): kill, die, injury, dead, death, wound, massacre. Expanded list adds the nearest neighbors of these words in the CBOW model trained on the full Gigaword (26 words total). Overall results Approach Separate Cumulative Incremental Procrustes 0.15 0.24 0.29 Basic word list SimDist 0.27 0.17 0.25 SimSub 0.31 0.26 0.26 RankDist 0.28 0.19 0.23 RankSub 0.26 0.22 0.21 Expanded word list SimDist 0.25 0.18 0.23 SimSub 0.35 0.31 0.29 RankDist 0.24 0.20 0.28 RankSub 0.36 0.30 0.32 Macro-F1 measure of ternary classification The test set is available http://ltr.uio.no/~andreku/armedconflicts/ ucdp_conflicts_1994_2010_testset.tsv Best model detailed results Class Precision Recall F1 Peace 0.13 (0.06) 0.29 (0.06) 0.18 (0.06) Stable 0.80 (0.79) 0.58 (0.82) 0.67 (0.80) War 0.17 (0.12) 0.33 (0.08) 0.22 (0.10) Detailed performance of the best model (results of weighted random guess baseline in parenthesis). The ‘shifting’ classes War and Peace constitute 10% and 11% of the data points respectively. Conclusion Tracing actual real-world events by detecting ‘cultural’ semantic shifts in distributional semantic models is a difficult task. The approaches proposed in the previous work for large-scale shifts observed over decades or even centuries are not very successful in this more fine-grained task. [Hamilton et al., 2016b] report almost perfect accuracy for the Procrustes transformation when detecting the direction of semantic change. However, our time periods are much more granular and we attempt to detect subtle associative drifts (often pendulum-like) rather than full-scale shifts of the meaning. The proposed ‘anchor words’ method outperforms previous work approaches by large margin, but it still achieves a macro F1 measure of only 0.36 on the task of ternary classification (‘stable’, ‘escalating’, ‘calming down’). See also our forthcoming EMNLP’17 paper on tracing semantic relations in diachronic models. Events and Stories in the News (EventStory) 4 August 2017