SlideShare ist ein Scribd-Unternehmen logo
1 von 14
ANALYSIS OF IMAGES, SOCIAL NETWORKS,
AND TEXTS
Single-sentence Readability Prediction in Russian
Nikolay Karpov, Julia Baranova, Fedor Vitugin
National Research University Higher School of Economics
Ekaterinburg, Russia
Structure

Motivation

Text readability prediction

Single-sentence readability prediction

Single-sentence readability prediction with syntactic features

Conclusion
Motivation
We present a part of a project which aim is to develop a system with
a simplification functionality.

It should be a system for a text adaptation to a target level
readability in Russian language as a foreign language (RFL).

We were solving the identification problem of a source level of
difficulty (readability) of the sentences or texts.

Further step will be their lexical and syntactic simplification.
In this study we give the results of application which identify the
level of difficulty of a single-sentence and whole text using
different statistical and syntactic features
Text readability prediction
First task was to perform the prototyping of Russian text retrieval
with needed readability. The main goal of this process was to find
which kind of variables and classification algorithm would allow us to
obtain the highest indicators of precision and recall of readability
prediction.
There was conducted a series of experiments on the training of
different classification algorithms.

naive Bayes;

k-nearest neighbors;

classification tree;

random forests;

SVM.
We extract 25 variables from texts proposed in the
previous works
Average number of words in the sentence of the text.
Average length of one word in a sentence.
Text length in letters.
Text length in words.
Average sentence length in syllables.
Average word length in syllables.
Percentage of words with number of syllables more or equal to N. We define N as
each value from 3 to 6.
Average sentence length in letters.
Average length of words in letters.
Percentage of words with number of letters more or equal to N. We define N as
each value from 5 to 13.
The percentage of words in a sentence, not included in the active vocabulary of
A1 level.
The percentage of words in a sentence, not included in the active vocabulary of
A2 level.
The percentage of words in a sentence, not included in the active vocabulary of
B1 level.
The occurrence in the sentence of concrete parts of speech.
Text readability prediction
For evaluation we used collection consists of 219 texts divided into
four groups. Levels distribution is following: A1 (elementary – 52),
A2 (basic) – 57, B1 (first) – 60, C2 (difficult) – 50 according to levels
described in Common European Framework of Reference for
Languages (CEFR). A1, A2, B1 texts was created specially for
students by language teachers on the basis of news. С2 texts was an
original news in Russian language.
First experiment was a binary classification of readability:

A1 versus C2,

A2 versus C2,

B1 versus C2.
With the help of Classification Tree, SVM and Logistic Regression
algorithms the accuracy we got was really high, it was almost equal to
1.
Text classification into four levels of readability
Method Classification
accuracy
F-measure Precision Recall
SVM 0.8092 0.7965 0.8491 0.75
Classification
Tree
0.9905 0.9916 1 0.9833
kNN 0.8131 0.7333 0.7333 0.7333
Random
Forest
0.9818 0.9667 0.9667 0.9667
Naive Bayes 0.8726 0.7890 0.8776 0.7167
An example of precision and recall for text retrieval with B1 level of
readability
Classification variables ranked by information gain
ratio
Variable name Information gain ratio
The percentage of words in a sentence, are not
included in the active vocabulary of A1 level
0.105141
The percentage of words in a sentence, are not
included in the active vocabulary of A2 level
0.105141
The percentage of words in a sentence, are not
included in the active vocabulary of B1 level
0.084211
Percentage of words with 8 letters or more 0.040098
Percentage of words with 9 letters or more 0.038431
Percentage of words with 7 letters or more 0.036923
Average sentence length in syllables 0.034359
The average length of one word in a text 0.034359
Percentage of words with 10 letters or more 0.033689
Single-sentence redability prediction

Prototyping sentence classification with respect to its readability.
For result evaluation we use corpus SunTagRus.

Level B1 suits to the majority of our students. So, we created a
binary sentence markup, which is:
1) B1 or lower than B1;
2) Higher than B1.

Manually tagged 3500 sentences in this corpus to mark their
structural level of perception complexity.

Lexical readability for each sentence we obtain on the basis of
lexical vocabulary of B1 level. Defined sentences having more
than 33% words not in active vocabulary as lexically difficult
ones.
Thus, we have two kinds of markup: structural complexity and
lexical difficulty. As an intersection we obtained a total level of
Results of total readability prediction using all kinds
of variables and syntactic links
Method Classification
accuracy
F-measure
(difficult
/simple)
Precision
(difficult
/simple)
Recall (difficult
/simple)
Naive Bayes 0.8191 0.8906/
0.4767
0.8354/
0.6975
0.9537/
0.3621
kNN 0.8224 0.8893/
0.5501
0.8571/
0.6493
0.9241/
0.4772
Random
Forest
0.9443 0.9640/
0.8768
0.9620/
0.8832
0.9661/
0.8705
Classification
Tree
0.9364 0.9584/
0.8648
0.9679/
0.8380
0.9491/
0.8933
SVM 0.8633 0.9125/
0.6875
0.9679/
0.7165
0.9491/
0.6607
Recall value of complex sentences retrieval using
different set of features
Dale-Chall
Flesch-Kincaid
Syntactic links for structural complexity
Total set
0.75
0.8
0.85
0.9
0.95
1
kNN Logistic regression Random Forest Classification Tree
Classification variables of sentenses ranked by
information gain ratio
Variable name Information gain ratio
The percentage of words in a sentence, are not included in
the active vocabulary of B1 level
0.318
Sentence length in letters 0.122
Percentage of words with 3 syllable and more 0.119
Sentence length in syllables 0.118
Sentence length in words 0.098
Syntactic predicative link 0.095
Average words length in syllables 0.092
The average length of one word in a text 0.092
Percentage of words with 7 letters or more 0.069
Percentage of words with 5 letters or more 0.069
Top 10 of classification variables
Conclusion

For text readability prediction obtained results reached 99-98%, so
we can say that they met our needs.

We adapted features from traditional readability prediction
techniques to identify lexical and structural complexity of single-
sentences in Russian.

We tested the readability prediction of Russian sentences using
syntactic links.

Single-sentence readability prediction algorithm was tested on the
set of sentences from SynTagRus, where readability was manually
marked (a binary classification).

Total set of features with statistical, lexical and syntactial ones can
predict sentence readability with 0.9661 amount of recall using
Random Forest algorithm.

Most important features for this classification are lexical ones.
Thank you for your attention
nkarpov@hse.ru

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deren Lei
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsijaia
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextrudolf eremyan
 
Adversarial examples reading comprehension system
Adversarial examples reading comprehension systemAdversarial examples reading comprehension system
Adversarial examples reading comprehension systemMasa Kato
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language modelc sharada
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlpeSAT Journals
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingJenny Midwinter
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextBayu Aldi Yansyah
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)Sihan Chen
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityIDES Editor
 

Was ist angesagt? (13)

Ceis 3
Ceis 3Ceis 3
Ceis 3
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristics
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Adversarial examples reading comprehension system
Adversarial examples reading comprehension systemAdversarial examples reading comprehension system
Adversarial examples reading comprehension system
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
 

Andere mochten auch

Konstantion Vorontsov - Additive regularization of matrix decompositons and p...
Konstantion Vorontsov - Additive regularization of matrix decompositons and p...Konstantion Vorontsov - Additive regularization of matrix decompositons and p...
Konstantion Vorontsov - Additive regularization of matrix decompositons and p...AIST
 
Борис Парфененков - Сравнение методов оценки качества изображений
Борис Парфененков - Сравнение методов оценки качества изображенийБорис Парфененков - Сравнение методов оценки качества изображений
Борис Парфененков - Сравнение методов оценки качества изображенийAIST
 
Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...
Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...
Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...AIST
 
Nicolay Lyfenko - Conceptual scheme for text classification system
Nicolay Lyfenko - Conceptual scheme for text classification systemNicolay Lyfenko - Conceptual scheme for text classification system
Nicolay Lyfenko - Conceptual scheme for text classification systemAIST
 
Trends and challanges for IT in Knowledge Management
Trends and challanges for IT in Knowledge ManagementTrends and challanges for IT in Knowledge Management
Trends and challanges for IT in Knowledge ManagementYury Kupriyanov
 
Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...
Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...
Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...AIST
 
Marina Danshina - Semiotic system of musical texts
Marina Danshina - Semiotic system of musical textsMarina Danshina - Semiotic system of musical texts
Marina Danshina - Semiotic system of musical textsAIST
 
Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...
Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...
Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...AIST
 
Iosif Itkin - Network models for exchange trade analysis
Iosif Itkin - Network models for exchange trade analysisIosif Itkin - Network models for exchange trade analysis
Iosif Itkin - Network models for exchange trade analysisAIST
 
Nikita Trifonov - Zipf ’s law for live journal
Nikita Trifonov - Zipf ’s law for live journalNikita Trifonov - Zipf ’s law for live journal
Nikita Trifonov - Zipf ’s law for live journalAIST
 
Dmitriy Ignatov - AIST'2014 Opening
Dmitriy Ignatov - AIST'2014 OpeningDmitriy Ignatov - AIST'2014 Opening
Dmitriy Ignatov - AIST'2014 OpeningAIST
 
Rostislav Yavorskiy - AIST'2014 Closing Presentation
Rostislav Yavorskiy - AIST'2014 Closing PresentationRostislav Yavorskiy - AIST'2014 Closing Presentation
Rostislav Yavorskiy - AIST'2014 Closing PresentationAIST
 
Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...
Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...
Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...AIST
 
Dialogue systems and personal assistants
Dialogue systems and personal assistantsDialogue systems and personal assistants
Dialogue systems and personal assistantsNatalia Konstantinova
 
Alexander Semenov - Recent Advances in Social Network Analysis
Alexander Semenov - Recent Advances in Social Network AnalysisAlexander Semenov - Recent Advances in Social Network Analysis
Alexander Semenov - Recent Advances in Social Network AnalysisAIST
 
Open Data and Data Journalism
Open Data and Data JournalismOpen Data and Data Journalism
Open Data and Data JournalismIrina Radchenko
 
Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...
Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...
Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...AIST
 
Daniel Khachay - GPS navigation algorithm based on osm data
Daniel Khachay - GPS navigation algorithm based on osm dataDaniel Khachay - GPS navigation algorithm based on osm data
Daniel Khachay - GPS navigation algorithm based on osm dataAIST
 

Andere mochten auch (18)

Konstantion Vorontsov - Additive regularization of matrix decompositons and p...
Konstantion Vorontsov - Additive regularization of matrix decompositons and p...Konstantion Vorontsov - Additive regularization of matrix decompositons and p...
Konstantion Vorontsov - Additive regularization of matrix decompositons and p...
 
Борис Парфененков - Сравнение методов оценки качества изображений
Борис Парфененков - Сравнение методов оценки качества изображенийБорис Парфененков - Сравнение методов оценки качества изображений
Борис Парфененков - Сравнение методов оценки качества изображений
 
Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...
Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...
Bulat Fatkulin - The Afghanistan chapter of the chinese online encyclopedia b...
 
Nicolay Lyfenko - Conceptual scheme for text classification system
Nicolay Lyfenko - Conceptual scheme for text classification systemNicolay Lyfenko - Conceptual scheme for text classification system
Nicolay Lyfenko - Conceptual scheme for text classification system
 
Trends and challanges for IT in Knowledge Management
Trends and challanges for IT in Knowledge ManagementTrends and challanges for IT in Knowledge Management
Trends and challanges for IT in Knowledge Management
 
Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...
Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...
Нургуль Маматова - Применение модели векторной авторегрессии для анализа потр...
 
Marina Danshina - Semiotic system of musical texts
Marina Danshina - Semiotic system of musical textsMarina Danshina - Semiotic system of musical texts
Marina Danshina - Semiotic system of musical texts
 
Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...
Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...
Rita Gaibadullina - Automatic defect recognition in corrosion logging using m...
 
Iosif Itkin - Network models for exchange trade analysis
Iosif Itkin - Network models for exchange trade analysisIosif Itkin - Network models for exchange trade analysis
Iosif Itkin - Network models for exchange trade analysis
 
Nikita Trifonov - Zipf ’s law for live journal
Nikita Trifonov - Zipf ’s law for live journalNikita Trifonov - Zipf ’s law for live journal
Nikita Trifonov - Zipf ’s law for live journal
 
Dmitriy Ignatov - AIST'2014 Opening
Dmitriy Ignatov - AIST'2014 OpeningDmitriy Ignatov - AIST'2014 Opening
Dmitriy Ignatov - AIST'2014 Opening
 
Rostislav Yavorskiy - AIST'2014 Closing Presentation
Rostislav Yavorskiy - AIST'2014 Closing PresentationRostislav Yavorskiy - AIST'2014 Closing Presentation
Rostislav Yavorskiy - AIST'2014 Closing Presentation
 
Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...
Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...
Елена Малютина - Оценка параметров хаотического процесса с помощью Ukf-фильтр...
 
Dialogue systems and personal assistants
Dialogue systems and personal assistantsDialogue systems and personal assistants
Dialogue systems and personal assistants
 
Alexander Semenov - Recent Advances in Social Network Analysis
Alexander Semenov - Recent Advances in Social Network AnalysisAlexander Semenov - Recent Advances in Social Network Analysis
Alexander Semenov - Recent Advances in Social Network Analysis
 
Open Data and Data Journalism
Open Data and Data JournalismOpen Data and Data Journalism
Open Data and Data Journalism
 
Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...
Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...
Dmitriy Kolesov - GIS as an environment for integration and analysis of spati...
 
Daniel Khachay - GPS navigation algorithm based on osm data
Daniel Khachay - GPS navigation algorithm based on osm dataDaniel Khachay - GPS navigation algorithm based on osm data
Daniel Khachay - GPS navigation algorithm based on osm data
 

Ähnlich wie Nikolay Karpov - Single-sentence readability prediction in russian

Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationAndi Wu
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysisIndexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysiskevig
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis kevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsVincenzo Lomonaco
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlpLaraOlmosCamarena
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
 
Fasttext 20170720 yjy
Fasttext 20170720 yjyFasttext 20170720 yjy
Fasttext 20170720 yjy재연 윤
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET Journal
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationEugene Nho
 
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...cscpconf
 

Ähnlich wie Nikolay Karpov - Single-sentence readability prediction in russian (20)

Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word Identification
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysisIndexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experiments
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlp
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
Fasttext 20170720 yjy
Fasttext 20170720 yjyFasttext 20170720 yjy
Fasttext 20170720 yjy
 
CICLing_2016_paper_52
CICLing_2016_paper_52CICLing_2016_paper_52
CICLing_2016_paper_52
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
 
Presentation
PresentationPresentation
Presentation
 
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
 

Mehr von AIST

Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  ImagesAlexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray ImagesAIST
 
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныАлена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныAIST
 
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...AIST
 
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискПавел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискAIST
 
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...AIST
 
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...AIST
 
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...AIST
 
Иосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAИосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAAIST
 
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeNikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeAIST
 
George Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesGeorge Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesAIST
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationElena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationAIST
 
Marina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsMarina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsAIST
 
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceEdward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceAIST
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...AIST
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...AIST
 
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...AIST
 
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumValeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumAIST
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...AIST
 
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...AIST
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingAIST
 

Mehr von AIST (20)

Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  ImagesAlexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
 
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныАлена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
 
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
 
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискПавел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
 
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
 
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
 
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
 
Иосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAИосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBA
 
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeNikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
 
George Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesGeorge Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product Categories
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationElena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
 
Marina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsMarina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chants
 
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceEdward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
 
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
 
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumValeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
 
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
 

Kürzlich hochgeladen

Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 

Kürzlich hochgeladen (20)

Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 

Nikolay Karpov - Single-sentence readability prediction in russian

  • 1. ANALYSIS OF IMAGES, SOCIAL NETWORKS, AND TEXTS Single-sentence Readability Prediction in Russian Nikolay Karpov, Julia Baranova, Fedor Vitugin National Research University Higher School of Economics Ekaterinburg, Russia
  • 2. Structure  Motivation  Text readability prediction  Single-sentence readability prediction  Single-sentence readability prediction with syntactic features  Conclusion
  • 3. Motivation We present a part of a project which aim is to develop a system with a simplification functionality.  It should be a system for a text adaptation to a target level readability in Russian language as a foreign language (RFL).  We were solving the identification problem of a source level of difficulty (readability) of the sentences or texts.  Further step will be their lexical and syntactic simplification. In this study we give the results of application which identify the level of difficulty of a single-sentence and whole text using different statistical and syntactic features
  • 4. Text readability prediction First task was to perform the prototyping of Russian text retrieval with needed readability. The main goal of this process was to find which kind of variables and classification algorithm would allow us to obtain the highest indicators of precision and recall of readability prediction. There was conducted a series of experiments on the training of different classification algorithms.  naive Bayes;  k-nearest neighbors;  classification tree;  random forests;  SVM.
  • 5. We extract 25 variables from texts proposed in the previous works Average number of words in the sentence of the text. Average length of one word in a sentence. Text length in letters. Text length in words. Average sentence length in syllables. Average word length in syllables. Percentage of words with number of syllables more or equal to N. We define N as each value from 3 to 6. Average sentence length in letters. Average length of words in letters. Percentage of words with number of letters more or equal to N. We define N as each value from 5 to 13. The percentage of words in a sentence, not included in the active vocabulary of A1 level. The percentage of words in a sentence, not included in the active vocabulary of A2 level. The percentage of words in a sentence, not included in the active vocabulary of B1 level. The occurrence in the sentence of concrete parts of speech.
  • 6. Text readability prediction For evaluation we used collection consists of 219 texts divided into four groups. Levels distribution is following: A1 (elementary – 52), A2 (basic) – 57, B1 (first) – 60, C2 (difficult) – 50 according to levels described in Common European Framework of Reference for Languages (CEFR). A1, A2, B1 texts was created specially for students by language teachers on the basis of news. С2 texts was an original news in Russian language. First experiment was a binary classification of readability:  A1 versus C2,  A2 versus C2,  B1 versus C2. With the help of Classification Tree, SVM and Logistic Regression algorithms the accuracy we got was really high, it was almost equal to 1.
  • 7. Text classification into four levels of readability Method Classification accuracy F-measure Precision Recall SVM 0.8092 0.7965 0.8491 0.75 Classification Tree 0.9905 0.9916 1 0.9833 kNN 0.8131 0.7333 0.7333 0.7333 Random Forest 0.9818 0.9667 0.9667 0.9667 Naive Bayes 0.8726 0.7890 0.8776 0.7167 An example of precision and recall for text retrieval with B1 level of readability
  • 8. Classification variables ranked by information gain ratio Variable name Information gain ratio The percentage of words in a sentence, are not included in the active vocabulary of A1 level 0.105141 The percentage of words in a sentence, are not included in the active vocabulary of A2 level 0.105141 The percentage of words in a sentence, are not included in the active vocabulary of B1 level 0.084211 Percentage of words with 8 letters or more 0.040098 Percentage of words with 9 letters or more 0.038431 Percentage of words with 7 letters or more 0.036923 Average sentence length in syllables 0.034359 The average length of one word in a text 0.034359 Percentage of words with 10 letters or more 0.033689
  • 9. Single-sentence redability prediction  Prototyping sentence classification with respect to its readability. For result evaluation we use corpus SunTagRus.  Level B1 suits to the majority of our students. So, we created a binary sentence markup, which is: 1) B1 or lower than B1; 2) Higher than B1.  Manually tagged 3500 sentences in this corpus to mark their structural level of perception complexity.  Lexical readability for each sentence we obtain on the basis of lexical vocabulary of B1 level. Defined sentences having more than 33% words not in active vocabulary as lexically difficult ones. Thus, we have two kinds of markup: structural complexity and lexical difficulty. As an intersection we obtained a total level of
  • 10. Results of total readability prediction using all kinds of variables and syntactic links Method Classification accuracy F-measure (difficult /simple) Precision (difficult /simple) Recall (difficult /simple) Naive Bayes 0.8191 0.8906/ 0.4767 0.8354/ 0.6975 0.9537/ 0.3621 kNN 0.8224 0.8893/ 0.5501 0.8571/ 0.6493 0.9241/ 0.4772 Random Forest 0.9443 0.9640/ 0.8768 0.9620/ 0.8832 0.9661/ 0.8705 Classification Tree 0.9364 0.9584/ 0.8648 0.9679/ 0.8380 0.9491/ 0.8933 SVM 0.8633 0.9125/ 0.6875 0.9679/ 0.7165 0.9491/ 0.6607
  • 11. Recall value of complex sentences retrieval using different set of features Dale-Chall Flesch-Kincaid Syntactic links for structural complexity Total set 0.75 0.8 0.85 0.9 0.95 1 kNN Logistic regression Random Forest Classification Tree
  • 12. Classification variables of sentenses ranked by information gain ratio Variable name Information gain ratio The percentage of words in a sentence, are not included in the active vocabulary of B1 level 0.318 Sentence length in letters 0.122 Percentage of words with 3 syllable and more 0.119 Sentence length in syllables 0.118 Sentence length in words 0.098 Syntactic predicative link 0.095 Average words length in syllables 0.092 The average length of one word in a text 0.092 Percentage of words with 7 letters or more 0.069 Percentage of words with 5 letters or more 0.069 Top 10 of classification variables
  • 13. Conclusion  For text readability prediction obtained results reached 99-98%, so we can say that they met our needs.  We adapted features from traditional readability prediction techniques to identify lexical and structural complexity of single- sentences in Russian.  We tested the readability prediction of Russian sentences using syntactic links.  Single-sentence readability prediction algorithm was tested on the set of sentences from SynTagRus, where readability was manually marked (a binary classification).  Total set of features with statistical, lexical and syntactial ones can predict sentence readability with 0.9661 amount of recall using Random Forest algorithm.  Most important features for this classification are lexical ones.
  • 14. Thank you for your attention nkarpov@hse.ru