Using regular expressions in online langage learning tools to enable learners to identify particular features and provide feedback on the features as necessary, e.g. find errors and provide suggestions on how to rewrite them
3. Probabilistic parsing
03
• Dynamic algorithms
• Machine learning
• Training sets
(e.g. Stanford POS parser)
Extremely powerful, but
requires significant knowledge
of computational linguistics and
huge time investment so…
4. Rule-based pattern matching
04
1. There is a man on your left. T / F
If true, a man is on your left. Stop.
If false, proceed to 2.
2. There is a woman on your left. T / F
If true, there is a woman on your left. Stop.
If false, there is nobody on your left. Stop.
True/false statements
5. Rule-based pattern matching
05
Decision-tree algorithm
There is a man on your left.
There is a woman on your left.
No.Yes. STOP
Yes. STOP No.
There is nobody on your left. STOP
Assumptions:
1. Only adults are present
2. There is no third gender
6. Rule-based pattern matching
06
There is a man. /bmanb/;
There is a woman. /bwomanb/;
Regular expressions (regexp|regex)
The discrete words “man” and “woman” will
be identified, generating a “true” result.
7. Regular expressions (Regex)
07
e.g. /bmaybeb/gi;
– escape (from normal characters)
i – case insensitive
b – boundary
g – greedy
1. I think that maybe he can understand. T/F
2. He may be able to understand T/F
3. Maybe, he can understand. T/F
4. Maybelline is a company name. T/F
5. Maybe, he said maybe. T/F
8. Pedagogic applications
08
Modality detector
Online error detectors
- Common error detector (Morrall, 2000-14)
- Corpus-based error detector (Blake, 2012-15)
Other applications
- Annotation highlighter
- Ideas for pronunciation, grammar and vocab
10. 10
Tentative language
& approximation
Type Examples
Modal verbs may, might, would, can
Lexical verbs seem, appear, suggest
Modal adverbs perhaps, probably, possibly,
Modal adjectives probable, possible, uncertain
Modal nouns assumption, claim, possibility
# Approximation
49% Almost a half, nearly 50%, less than 1 in 2
App. 1
11. 11
Material mismatch
Students from different faculties studying
tentative language (hedging) and
approximation in academic writing use
generic materials prepared by teacher.
App. 1
12. 12
Lack of face validity
Some students do not want to “waste
time” dealing with materials not
appropriate to their major. They expect
materials tailored to their exact needs.
App. 1
16. 16
Piles of unmarked homework
Responding to written work takes too
much time, and is repetitive since many
students make the same surface-level
mistakes.
App. 2
17. 17
No time to respond
Teachers are expected to:
• Identify the location of errors
• Explain the errors (if necessary)
• Correct the errors (if necessary)
All of which take lots of time.
App. 2
18. 18
Solution: Error detector
Identification
Student inputs own work
Regex identifies expected errors
Explanation
Execute command selects and displays
prepared explanation
Correction
Student corrects work and submits
improved version
App. 2
19. 19
Error classification
App. 2
Type Description
Accuracy factual and language errors
Brevity too many words
Clarity vague or ambiguous terms
Objectivity emotive language
Formality abbreviations, contractions, &
informal terms
An ethnographic survey of the literature on writing scientific research articles
revealed five key criteria (Blake & Blake, 2015)
21. 21
Specific example
Error
• One of the + singular noun
Regex
• /bone of theb/gi;
Execute
• Check that the phrase one of the
is followed by a plural noun
App. 2
23. 23
Difficult-to-read tags
Introduction Purpose Method Results Discussion
<segment features='problem;introduction;rhetorical_moves' state='active'>We
address the problem of model-based object recognition.</segment> <segment
features='purpose;rhetorical_moves' state='active'>Our aim is to localize and
recognize road vehicles from monocular images or videos in calibrated traffic
scenes.</segment> <segment features='method;rhetorical_moves' state='active'>A
3-D deformable vehicle model with 12 shape parameters is set up as prior
information, and its pose is determined by three parameters, which are its position
on the ground plane and its orientation about the vertical axis under ground-plane
constraints.</segment> <segment features='purpose;rhetorical_moves'
state='active'>An efficient local gradient-based method is proposed to evaluate the
fitness between the projection of the vehicle model and image data, which is
combined into a novel evolutionary computing framework to estimate the 12 shape
parameters and three pose parameters by iterative evolution.</segment> <segment
features='background;introduction;rhetorical_moves' state='active'>The recovery of
pose parameters achieves vehicle localization, whereas the shape parameters are
used for vehicle recognition.</segment> <segment
features='method;rhetorical_moves' state='active'>Numerous experiments are
App. 3
24. 24
Difficult-to-read tags
Introduction Purpose Method Results Discussion
<segment features='problem;introduction;rhetorical_moves' state='active'>We
address the problem of model-based object recognition.</segment> <segment
features='purpose;rhetorical_moves' state='active'>Our aim is to localize and
recognize road vehicles from monocular images or videos in calibrated traffic
scenes.</segment> <segment features='method;rhetorical_moves' state='active'>A
3-D deformable vehicle model with 12 shape parameters is set up as prior
information, and its pose is determined by three parameters, which are its position
on the ground plane and its orientation about the vertical axis under ground-plane
constraints.</segment> <segment features='purpose;rhetorical_moves'
state='active'>An efficient local gradient-based method is proposed to evaluate the
fitness between the projection of the vehicle model and image data, which is
combined into a novel evolutionary computing framework to estimate the 12 shape
parameters and three pose parameters by iterative evolution.</segment> <segment
features='background;introduction;rhetorical_moves' state='active'>The recovery of
pose parameters achieves vehicle localization, whereas the shape parameters are
used for vehicle recognition.</segment> <segment
features='method;rhetorical_moves' state='active'>Numerous experiments are
App. 3
26. 26
Ideas for you and your students
Pronunciation: Regular “ed”
• Regular “ed” /t/, /d/, /id/
• th [voiced or voiceless]
Grammar:
• Tenses: e.g. perfect continuous: been + ing
• Quantifiers : [U] much, little; [C] many, few; [U/C] lots of , a lot of
Vocabulary:
• Colours: red, blue crimson red, cobalt blue,
• Body parts: hand, eyes, leg hand out, eye up, leg it
27. 27
Regular “ed”
False positives:
• learned /d/ /id/
Pron Preceeding sound Potential regex
/id/ d, t /(d|t)edb/gi;
/t/ voiceless consonants /(s|f)edb/gi;
/d/ voiced consonants /(z|v)edb/gi;
/d/ Vowel /(ow|i|ay)edb/gi;
Pronunciation of “ed” is dictated by the sound of the preceeding letter(s).
| – Boolean “or”
so x|y means either x or y
d|ted means d or ted but by adding brackets
(d|t)ed means ded or ted
28. 28
Pronunciation of “th”
Pron Feature Potential regex
/𝜹/ Voiced initial th /btha(n|t|) b/gi;
/bthe(b|ir|m|re|se|y) b/gi;
/bthisb/gi;
/btho(se|ugh|) b/gi;
/bthusb/gi;
/𝜽/ Voiceless initial th /bth/gi;
/t/ th pronounced as t /bthomas|thames|thyme/gi;
Pronunciation of “th” can be predicted by the law that for function words
the initial th is pronounced as a voiced sound.
29. References
29
Blake, J. (2012, November 28-30). Corpus-based academic written error
detector. Conference proceedings of the 20th International Conference on
Computers in Education. Nanyang Technological University, Singapore.
Blake, X. and Blake, J. (2015, January 29-31). Academic literacy: Mentor and
mentee perspectives. Poster presented at 35th International Conference of
ThaiTESOL, Bangkok, Thailand.
Morrall, A. (2000-2014). Common Error Detector. [Online tool]
http://www2.elc.polyu.edu.hk/cill/errordetector.htm