Lexical Simplification - University of Manchester Postgraduate Summer Research Showcase

Lexical Simplification
Matthew Shardlow
http://lexicalsimplification.blogspot.com/
Abstract
We live in an information based society where text is ubiquitous.
However, public information is often too difficult for the intended
audience. Increasingly, more and more information is presented
via digital media. Automatic processes can be used to improve the
readability of a text. Lexical simplification makes text easier to
understand. Difficult words are replaced with easier alternatives.
This can be done before a user ever sees the original difficult text.
This PhD focusses on the errors that arise during simplification.
Novel evaluation measures are introduced. A variety of areas will
benefit from automated simplification.
The Pipeline
“The protestor was arrested”
Output Text
Substitution Ranking
1) Protestor
2) Activist
Sense Disambiguation
Campaigner: Protestor,
Activist, Advocate
Substitution Generation
Campaigner: Protestor,
Activist, Advocate
Complex Word Discovery
The campaigner was. . .
“The campaigner was arrested”
Input Text
• Simplest synonym selected.
• Treated as a ranking task.
• Decide which words will fit.
• Must consider context.
• Find suitable replacements.
• Thesaurus look up.
• Difficult words identified.
• Depends on context.
The Applications
Usage Description
Language Learners Easy to read material in target language.
Stroke Victims Access to easy to read information pro-
motes rehabilitation and self confidence.
Medical Patients Improved access to medical information
improves patient knowledge and care.
Consumers Better understanding of technical legal
language in licence agreements.
Academics Support when reading material from
outside of main discipline.
Public Engagement Tools to help authors produce jargon
free text for a lay audience.
The Problem
• Errors occur in the pipeline, affecting text quality.
• Low text quality results in poor understandability.
• The process can result in text being translated to nonsense.
• My research has categorised the errors as follows:
Type 2: A complex or a simple word may be assigned to the
wrong category.
Type 3: No substitutions which would result in a simplification
of the target word are available.
Type 4: Sense disambiguation error. The meaning of the sen-
tence has changed significantly.
Type 5: Ranking Error. A replacement which does not simplify
the sentence has been selected.
• In a recent study [2] I found the frequency of each error to be:
0 %
10 %
20 %
30 %
40 %
50 %
60 %
70 %
Type 2 Type 3 Type 4 Type 5
ErrorFrequency
Error Code
65.03%
42.19%
29.73%
26.92%
The Research
Pipeline and Errors
• A literature survey has identified focus areas [1].
• An error study has highlighted the importance of each area [2].
• Ongoing work will refine the error study.
Complex Word Identification
• The CW corpus has been developed using simple Wikipedia [3].
• Techniques to identify complex words have been evaluated [4].
Substitution Generation
• Initial research has shown problems with traditional thesauri.
• Thesaurus augmentation depends on the specific domain.
Word Sense Disambiguation
• Many systems exist for the task of disambiguation.
• Several top disambiguation systems evaluated for simplification.
• Research awaiting submission.
Substitution Ranking
• Depends heavily on the context and the user.
• Research will look at the needs of individual users.
Applications
• Simplification will target academic literature.
• Target audience will be lay readers.
References
[1] Shardlow, M. 2014. A Survey of Automated Text Simplification. IJACSA Spe-
cial Issue on Natural Language Processing.
[2] Shardlow, M. 2014. Out in the Open: Finding and Categorising Errors in the
Lexical Simplification Pipeline. LREC, Reykjavik, Iceland, May. ELRA.
[3] Shardlow, M. 2013. The CW Corpus: Evaluating the Identification of Complex
Words. PITR, Sofia, Bulgaria, ACL.
[4] Shardlow, M. 2013. A Comparison of Techniques to Automatically Identify
Complex Words. ACL Student Research Workshop, Sofia, Bulgaria, ACL

Lexical Simplification - University of Manchester Postgraduate Summer Research Showcase

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Lexical Simplification - University of Manchester Postgraduate Summer Research Showcase

Ähnlich wie Lexical Simplification - University of Manchester Postgraduate Summer Research Showcase (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Lexical Simplification - University of Manchester Postgraduate Summer Research Showcase