SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Controlled and Balanced Dataset for 

Japanese Lexical Simplification
Tomonori Kodaira, Tomoyuki Kajiwara, and Mamoru Komachi
Tokyo Metropolitan University, Japan. https://github.com/KodairaTomonori/EvaluationDataset
Example of dataset
Lexical Simplification
Substitutes a complex word or phrase in
sentence with simpler synonym
1.Extracting Sentences
2,100 sentences from Balanced Corpus of Contemporary Written
Japanese (Maekawa et al., 2010):
Including only one complex word.
10 contexts of occurrence were collected for each complex words.
Difficult words:
“High Level” words in the Lexicon for Japanese Language Education
(Sunakawa et al., 2012).
Content words (7 parts of speech):
nouns, verbs, adjectives, adverbs, adjectival nouns, sahen nouns, and
sahen verbs.
B&M
(2012)
Specia et al.
(2012)
K&Y
(2015)
This

work
# of

sentences
430 2,010 2,330 2,010
lang En En Ja Ja
balanced

dataset
Yes Yes No Yes
complex
word
multi multi multi one
ties 

allowed
Yes No No Yes
outlier
excluded
Yes No No Yes
He is a sincere man.
He is a honest man.
Dataset for Japanese Lexical Simplification
We propose a new method of constructing a dataset for Japanese
lexical simplification.
Each sentence in our dataset includes only one difficult word.
It is the first controlled and balanced dataset for Japanese lexical
simplification with high correlation with human judgment.
Comparison of Datasets
(1) Our dataset is more consistent than the
previous datasets.
(2) Lexical simplification methods using our
dataset correlate with human annotation
better than the previous datasets.
Future work includes increasing the number of
sentences, so as to leverage the dataset for
machine learning-based simplification methods.
Conclusion
sentence
もっとも 安上がり に サーファ を 装う 方法
The most simplest method that is imitating safer.
simp

ranking
の ふり を する に 見せ かける の 真似 を する, の 振り を する を 真似る を 装う
1. professing 2.counterfeiting 3.playing, professing 4.playing 5.imitating
5. Ranking Integration
To remove extraordinary annotators,we used Maximum Likelihood
Estimation method (Matsui et al., 2012).
Integrate rankings were made
by average score after removing
extraordinary annotators.
Correlation of ranking
baseline outlier removal
Spearman’s score 0.541 0.580
Flow of Constructing Dataset
1. Given Sentence
はるかに変化に富む (Far more varied)
2. Extracting Substitutes
が多い(numerous), が豊富(rich), が多い(wealthy)
3. Evaluating substitutes
が多い(numerous),が豊富(rich)
4. Ranking substitutes
substitutes rank
に富む (varied) 2
が豊富 (wealthy) 3
が多い (numerous) 1
Example of annotation
B&M: Belder and Moens (2015), K&Y:Kajiwara and Yamamoto (2015)
Annotation Using Crowdsourcing
2. Extracting Substitutes
For each complex word, five annotators wrote substitutes that didn’t
change the sense of the sentence.
※ These substitutes could include particles in context. (Kajiwara and
Yamamoto (2015) didn’t include them.)
3. Evaluating Substitutes
Five annotators selected an appropriate word to include as a substitution
that didn’t change the sense of the sentence.
Substitutes that won a majority were defined as correct.
4. Ranking Substitutes
Five annotators arranged substitutes and complex word according to the
simplification ranking.
These ranking could include ties.
Inter annotator agreement
Spearman’s score
K&Y 0.332
Our work 0.522

Weitere ähnliche Inhalte

Ähnlich wie Poster: Controlled and Balanced Dataset for Japanese Lexical Simplification

Sentence analysis
Sentence analysisSentence analysis
Sentence analysis
krukob9
 
Word Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisWord Segmentation in Sentence Analysis
Word Segmentation in Sentence Analysis
Andi Wu
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015
Conor McGrory
 
Chinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLPChinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLP
Andi Wu
 

Ähnlich wie Poster: Controlled and Balanced Dataset for Japanese Lexical Simplification (20)

Sentence analysis
Sentence analysisSentence analysis
Sentence analysis
 
Word Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisWord Segmentation in Sentence Analysis
Word Segmentation in Sentence Analysis
 
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
 
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
IRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational VideosIRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational Videos
 
DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZER DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZER
 
DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZERDESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZER
 
Design of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizerDesign of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizer
 
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
 
A^2_Poster
A^2_PosterA^2_Poster
A^2_Poster
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015
 
Chinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLPChinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLP
 
Word Tagging using Max Entropy Model and Feature selection
Word Tagging using Max Entropy Model and Feature selection Word Tagging using Max Entropy Model and Feature selection
Word Tagging using Max Entropy Model and Feature selection
 
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)
A hybrid approach toward Natural Language Understanding (by Daisuke Bekki)
 
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMEROPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
 
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMEROPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceBeyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
 

Mehr von Kodaira Tomonori

Simp lex rankng based on contextual and psycholinguistic features
Simp lex rankng based on contextual and psycholinguistic featuresSimp lex rankng based on contextual and psycholinguistic features
Simp lex rankng based on contextual and psycholinguistic features
Kodaira Tomonori
 
Improving text simplification language modeling using unsimplified text data
Improving text simplification language modeling using unsimplified text dataImproving text simplification language modeling using unsimplified text data
Improving text simplification language modeling using unsimplified text data
Kodaira Tomonori
 

Mehr von Kodaira Tomonori (20)

Deep recurrent generative decoder for abstractive text summarization
Deep recurrent generative decoder for abstractive text summarizationDeep recurrent generative decoder for abstractive text summarization
Deep recurrent generative decoder for abstractive text summarization
 
Selective encoding for abstractive sentence summarization
Selective encoding for abstractive sentence summarizationSelective encoding for abstractive sentence summarization
Selective encoding for abstractive sentence summarization
 
Abstractive Text Summarization @Retrieva seminar
Abstractive Text Summarization @Retrieva seminarAbstractive Text Summarization @Retrieva seminar
Abstractive Text Summarization @Retrieva seminar
 
AttSum: Joint Learning of Focusing and Summarization with Neural Attention
AttSum: Joint Learning of Focusing and Summarization with Neural AttentionAttSum: Joint Learning of Focusing and Summarization with Neural Attention
AttSum: Joint Learning of Focusing and Summarization with Neural Attention
 
障害情報レポートに対する同時関連文章圧縮
障害情報レポートに対する同時関連文章圧縮障害情報レポートに対する同時関連文章圧縮
障害情報レポートに対する同時関連文章圧縮
 
Neural Summarization by Extracting Sentences and Words
Neural Summarization by Extracting Sentences and WordsNeural Summarization by Extracting Sentences and Words
Neural Summarization by Extracting Sentences and Words
 
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
 
[ポスター]均衡コーパスを用いた語彙平易化データセットの構築
[ポスター]均衡コーパスを用いた語彙平易化データセットの構築[ポスター]均衡コーパスを用いた語彙平易化データセットの構築
[ポスター]均衡コーパスを用いた語彙平易化データセットの構築
 
Noise or additional information? Leveraging crowdsource annotation item agree...
Noise or additional information? Leveraging crowdsource annotation item agree...Noise or additional information? Leveraging crowdsource annotation item agree...
Noise or additional information? Leveraging crowdsource annotation item agree...
 
語彙平易化システム評価のためのデータセット改良[ブースター]
語彙平易化システム評価のためのデータセット改良[ブースター]語彙平易化システム評価のためのデータセット改良[ブースター]
語彙平易化システム評価のためのデータセット改良[ブースター]
 
語彙平易化システム評価のためのデータセットの改良[ポスター]
語彙平易化システム評価のためのデータセットの改良[ポスター]語彙平易化システム評価のためのデータセットの改良[ポスター]
語彙平易化システム評価のためのデータセットの改良[ポスター]
 
PPDB 2.0: Better paraphrase ranking, 
fine-grained entailment relations,
word...
PPDB 2.0: Better paraphrase ranking, 
fine-grained entailment relations,
word...PPDB 2.0: Better paraphrase ranking, 
fine-grained entailment relations,
word...
PPDB 2.0: Better paraphrase ranking, 
fine-grained entailment relations,
word...
 
WordNet-Based Lexical Simplification of Document
WordNet-Based Lexical Simplification of DocumentWordNet-Based Lexical Simplification of Document
WordNet-Based Lexical Simplification of Document
 
文レベルの機械翻訳評価尺度に関する調査
文レベルの機械翻訳評価尺度に関する調査文レベルの機械翻訳評価尺度に関する調査
文レベルの機械翻訳評価尺度に関する調査
 
Simp lex rankng based on contextual and psycholinguistic features
Simp lex rankng based on contextual and psycholinguistic featuresSimp lex rankng based on contextual and psycholinguistic features
Simp lex rankng based on contextual and psycholinguistic features
 
Aligning sentences from standard wikipedia to simple wikipedia
Aligning sentences from standard wikipedia to simple wikipediaAligning sentences from standard wikipedia to simple wikipedia
Aligning sentences from standard wikipedia to simple wikipedia
 
日本語の語彙平易化評価セットの構築
日本語の語彙平易化評価セットの構築日本語の語彙平易化評価セットの構築
日本語の語彙平易化評価セットの構築
 
Improving text simplification language modeling using unsimplified text data
Improving text simplification language modeling using unsimplified text dataImproving text simplification language modeling using unsimplified text data
Improving text simplification language modeling using unsimplified text data
 
言い換えを用いたテキスト要約の自動評価
言い換えを用いたテキスト要約の自動評価言い換えを用いたテキスト要約の自動評価
言い換えを用いたテキスト要約の自動評価
 
聾者向け文章読解支援における構文的言い換えの効果について
聾者向け文章読解支援における構文的言い換えの効果について聾者向け文章読解支援における構文的言い換えの効果について
聾者向け文章読解支援における構文的言い換えの効果について
 

Kürzlich hochgeladen

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Kürzlich hochgeladen (20)

Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 

Poster: Controlled and Balanced Dataset for Japanese Lexical Simplification

  • 1. Controlled and Balanced Dataset for 
 Japanese Lexical Simplification Tomonori Kodaira, Tomoyuki Kajiwara, and Mamoru Komachi Tokyo Metropolitan University, Japan. https://github.com/KodairaTomonori/EvaluationDataset Example of dataset Lexical Simplification Substitutes a complex word or phrase in sentence with simpler synonym 1.Extracting Sentences 2,100 sentences from Balanced Corpus of Contemporary Written Japanese (Maekawa et al., 2010): Including only one complex word. 10 contexts of occurrence were collected for each complex words. Difficult words: “High Level” words in the Lexicon for Japanese Language Education (Sunakawa et al., 2012). Content words (7 parts of speech): nouns, verbs, adjectives, adverbs, adjectival nouns, sahen nouns, and sahen verbs. B&M (2012) Specia et al. (2012) K&Y (2015) This
 work # of
 sentences 430 2,010 2,330 2,010 lang En En Ja Ja balanced
 dataset Yes Yes No Yes complex word multi multi multi one ties 
 allowed Yes No No Yes outlier excluded Yes No No Yes He is a sincere man. He is a honest man. Dataset for Japanese Lexical Simplification We propose a new method of constructing a dataset for Japanese lexical simplification. Each sentence in our dataset includes only one difficult word. It is the first controlled and balanced dataset for Japanese lexical simplification with high correlation with human judgment. Comparison of Datasets (1) Our dataset is more consistent than the previous datasets. (2) Lexical simplification methods using our dataset correlate with human annotation better than the previous datasets. Future work includes increasing the number of sentences, so as to leverage the dataset for machine learning-based simplification methods. Conclusion sentence もっとも 安上がり に サーファ を 装う 方法 The most simplest method that is imitating safer. simp
 ranking の ふり を する に 見せ かける の 真似 を する, の 振り を する を 真似る を 装う 1. professing 2.counterfeiting 3.playing, professing 4.playing 5.imitating 5. Ranking Integration To remove extraordinary annotators,we used Maximum Likelihood Estimation method (Matsui et al., 2012). Integrate rankings were made by average score after removing extraordinary annotators. Correlation of ranking baseline outlier removal Spearman’s score 0.541 0.580 Flow of Constructing Dataset 1. Given Sentence はるかに変化に富む (Far more varied) 2. Extracting Substitutes が多い(numerous), が豊富(rich), が多い(wealthy) 3. Evaluating substitutes が多い(numerous),が豊富(rich) 4. Ranking substitutes substitutes rank に富む (varied) 2 が豊富 (wealthy) 3 が多い (numerous) 1 Example of annotation B&M: Belder and Moens (2015), K&Y:Kajiwara and Yamamoto (2015) Annotation Using Crowdsourcing 2. Extracting Substitutes For each complex word, five annotators wrote substitutes that didn’t change the sense of the sentence. ※ These substitutes could include particles in context. (Kajiwara and Yamamoto (2015) didn’t include them.) 3. Evaluating Substitutes Five annotators selected an appropriate word to include as a substitution that didn’t change the sense of the sentence. Substitutes that won a majority were defined as correct. 4. Ranking Substitutes Five annotators arranged substitutes and complex word according to the simplification ranking. These ranking could include ties. Inter annotator agreement Spearman’s score K&Y 0.332 Our work 0.522