Presentation given at the 2015/16 ETJ English Language Teaching Expo at Kanda Institute of Foreign Languages. Tokyo, Japan (Jan 30-31)
Note: Slide 26 should say "Sentence Corpus of Remedial English", not "Score Corpus of Remedial English"
2. What is Data-Driven Learning?
It is an approach where “the language-learner is also, essentially, a research worker whose learning
needs to be driven by access to linguistic data - hence the term ‘data-driven learning’ (DDL) to
describe the approach” (Johns 1991, p.2).
It is an approach where real language data are investigated by learners, and learner-centered
activities focus on language discovery (Smart, 2014).
It can be conceived of as corpus linguistics being applied to second- and foreign-language learning.
3. What is corpus
linguistics and what
are corpora?
○ Types of corpora
○ There are many types of corpora, including, but not
limited to:
○ General
○ Specialized (academic, business, literary, etc.)
○ Pedagogic
○ Learner
Corpus linguistics is the study
of language based on
analysis of corpora. Corpora
are systematically compiled
collections of language data.
Corpus linguistics is an
approach or methodology,
rather than a theory or
branch of linguistics.
4. How are corpora analyzed?
Corpora are analyzed through the use of specialized software that can search them quickly for
specific kinds of information, and the information can be presented as concordances, frequency
counts, mutual information scores, etc.
Some software is browser-based, and sometimes associated with particular corpora, such as the
BYU-COCA interface.
Some software is stand-alone, such as WordSmith and AntConc.
5. What’s involved in DDL?
The basic components:
- Authentic language data
+ Corpora
+ Analysis
- Constructivist approach
- Inductive learning
- Learner-centered
Especially good for:
- Vocabulary depth and usage (Shaw 2011)
+ morphology
+ semantic prosody, semantic preference
+ part of speech knowledge
- Collocations (O’keeffe et al 2007)
- Phraseology (Römer 2009)
- Common or recurring error correction
(Frankenberg-Garcia 2014; Smart 2014)
- Improving retention and recall (Cobb 1999;
Sonbull & Schmitt 2010)
6. The ‘traditional’ view of DDL
“At the heart of the approach is the use of the machine not as a surrogate teacher or tutor, but as a
rather special type of informant. The difference between teacher and informant can best be,defined in
terms of the flow of questions and answers. The teacher typically asks a question (answer already
known) to check that learning has taken place: the learner attempts to answer that question: and the
teacher gives feedback on whether the question has been successfully answered. ... The informant, on
the other hand is passive - and silent - until a question (answer unknown) is asked by the learner. The
informant responds to that question as best he (or she) can: and the learner then tries to make sense
of that response (possibly asking other questions in order to do so) and to integrate it with what is
already known.”
(Johns 1991, p.1)
7. The ‘traditional’ view of DDL, ctd.
“If we wish to use the computer as an informant there is, however, an alternative to a rule-based
approach which attempts to encapsulate linguistic ‘competence’, and that is a data-driven
approach which gives the learner access to the facts of linguistic ‘performance’. If we take this
second approach we do not attempt to make the system intelligent: we simply provide the
evidence needed to answer the learner's questions, and rely on the learner's intelligence to find
answers.”
(Johns 1991, p.2)
20. The tools and software can be problematic
- most corpora are designed for linguists
- the format of menus and the presentation of data can be overwhelming
- how to formulate queries and searches that will give fruitful results is not intuitive
- concordances and other outputs of most software are not very readable without training
- understanding the relevance of various corpora, search options, and analysis tools requires
practice and familiarity
- general technological anxiety can be exacerbated by the complexity of the tools and processes
21. Other concerns
- unfamiliarity with inductive learning in the classroom
- unfamiliarity or discomfort with learner-centered and learner-directed activity
- lack of motivation to try new learning strategies, techniques, or tools
- lack of confidence in technological and computer know-how
- expectation that answers should be ‘given’
22. Managing issues
Some ideas for ameliorating the difficulties novices face in DDL:
- guided induction
- introduce learners to corpora in their native language(s)
- parallel corpora
- corpora and software specially designed for learners (not linguists); pedagogic corpora
- paper-based DDL or hands-off DDL
- blend DDL activity with other kinds of activity
- pre-set search parameters
- edit concordances and other software output for readability purposes
- alternative sources of linguistic data (data-driven, not corpus-driven)
23. A closer look
+Guided induction
+Pedagogic corpora
+Paper-based or hands-off DDL
+Blending DDL activity with other activity
24. Guided induction GI is “an approach that provides a
structured, scaffolded framework for
inductive learning, places the learner at the
center of the learning task, with the learner
seeking to discover the nature of the
grammar structure through interacting with
the language.”
(Smart 2014, p.187)
+ Teacher-facilitated
discovery learning
+ Learners develop
generalizable abilities
+ Highly interactive
25. Guided induction steps
1. Illustration: looking at data.
2. Interaction: discussion and sharing observations and opinions.
3. Intervention: optional step to provide learners with hints or clearer guides for
induction.
4. Induction: making one’s own rule for a particular feature.
(Flowerdew 2009)
26. Pedagogic corpora
- designed for Japanese learners of English
- simple interface
- appropriate for hands-on DDL
- individual questions, individual look-ups
- needs-driven corpus (Braun 2007)
SCoRE is a corpus and browsing tool developed
to align with the proficiency level(s) of
learners. It focuses on helping learners
understand basic grammar items.
(Chujo & Oghigian 2015; Chujo, Oghigian, &
Akasegawa 2015)
Score Corpus of Remedial English
27.
28.
29.
30.
31.
32.
33. Paper-based or hands-off DDL
“eliminating the computer from the equation, far from fatally undermining
the conceptual basis of DDL, can in fact make the learners’ task
considerably easier. In particular, it alleviates a number of methodological
difficulties … Learners such as ours need the scaffolding that prepared
materials can provide and may also initially feel that paper-based
resources are more relevant or efficient”.
(Boulton 2010, p.559)
34.
35.
36.
37. Blend DDL activity with other activity
- Treat DDL as just another tool or
option
- Supplement and complement other
kinds of activity
- Relevance and engagement
- Hands-on and hands-off DDL
DDL “can be used to boost incidental learning, to
promote learner autonomy and to create
customized exercises for a specific group of
learners on the fly, as the need arises. ... it is
perfectly feasible to use corpora for teaching
languages without disrupting the normal
classroom routine.”
(Frankenberg-Garcia 2012, p.46)
40. References
Braun, S., 2007. ‘Integrating corpus work into secondary education: From data-driven learning to needs-driven corpora’. ReCALL, 19(03), pp.307-328.
Boulton, A., 2010. ‘Data‐driven learning: Taking the computer out of the equation’. Language Learning, 60(3), pp.534-572.
Chujo, K. and Oghigian, K., 2015. ‘Modified authenticity: A sentence corpus and grammar search tool for L2 beginners’. Available at http://www.decode.waseda.ac.
jp/announcement/documents-for-2015-12-11-12/Chujo&Oghigian.pdf
Chujo, K., Oghigian, K. and Akasegawa, S., 2015. ‘A corpus and grammatical browsing system for remedial EFL learners’ in Lenko-Szymanska, A. and Boulton, A. (eds.) Multiple
Affordances of Language Corpora for Data-driven Learning. Amsterdam: John Benjamins, pp.109-128.
Cobb, T., 1999. ‘Breadth and depth of lexical acquisition with hands-on concordancing’. Computer Assisted Language Learning, 12(4), pp.345-360.
Flowerdew, L., 2009. ‘Applying corpus linguistics to pedagogy: A critical evaluation’. International Journal of Corpus Linguistics, 14(3), pp.393-417.
Frankenberg-Garcia, A., 2012. ‘Integrating corpora with everyday language teaching’ in Thomas, J.E. and Boulton, A. (eds) Input, Process and Product: Development in Teaching and
Language Corpora. Brno: Masaryk University Press, pp.36-53.
Frankenberg-Garcia, A., 2014. ‘The use of corpus examples for language comprehension and production’. ReCALL, 26(02), pp.128-146.
Johns, T., 1991. 'Should you be persuaded: Two examples of data-driven learning' in Johns, T. and King, P. (eds.) Classroom Concordancing. Birmingham: English Language Research
Journal, 4, pp.1-13.
O'keeffe, A., McCarthy, M. and Carter, R., 2007. From corpus to classroom: Language use and language teaching. Cambridge: Cambridge University Press.
Römer, U., 2009. ‘Corpus research and practice: What help do teachers need and what can we offer?’ in Aijmer, K. (ed.) Corpora and Language Teaching. Amsterdam: John Benjamins, pp.
83-98.
Shaw, E.M., 2011. Teaching vocabulary through data-driven learning. Available at http://corpus.byu.edu/coca/files/Teaching_Vocabulary_Through_DDL.pdf
Smart, J., 2014. ‘The role of guided induction in paper-based data-driven learning’. ReCALL, 26(02), pp.184-201.
Sonbul, S. and Schmitt, N., 2010. ‘Direct teaching of vocabulary after reading: Is it worth the effort?’ ELT Journal 64(3), pp.253-260.