Developing corpus-based resources for language learning: looking back in "hope"
1. Developing corpus-based
resources for language learning:
looking back in "hope"
Pascual Pérez-Paredes
English Studies, U. Murcia
www.perezparedes.es
@perezparedes
2. The future of Corpus Linguistics will
largely be determined by its
practical applications.
Guy Aston (2011:8)
Perspectives on Corpus Linguistics
3. More corpus and applied linguistics
presentations like this on
www.perezparedes.es
5. 1. Early steps
2. Growing up
3. Young adults
4. Maturity
Developing corpus-based resources for
language learning: looking back in "hope"
6. Corpus linguistics & language
learning
1. Attested uses of language v. made-up, invented
language
2. Combinatory nature of language v. isolated accounts
of lexis and grammar
3. CL favours active learning and discovery
7.
8.
9. Corpus linguistics: L1 & L2
corpora
1. British National Corpus (BNC)
2. Corpus of Contemporary American English (COCA)
1. MICUSP corpus of essays: micusp.elicorpora.info/
12. Ylva Berglund, Sabine Braun, Pascual Pérez-Paredes. Multimedia
Corpora for Applied Linguistic Contexts. Birmingham Corpus
Linguistics Conference, 27-30 July 2007, Birmingham, UK.
Pérez-Paredes, P. 2010. Corpus Linguistics and Language Education
in Perspective: Appropriation and the Possibilities Scenario. In
Corpus Linguistics in Language Teaching, pp: 53-73.
13. The challenges (1)
CORPUS DESIGN
Traditional reference corpora
(content, size, data format,
transcription, annotation, query)
CORPUS EXPLOITATION
Data-Driven Learning
(focus on non-linear reading:
concordances and co-texts)
• Corpora contain textual records of discourse; their interpretation requires
(re-)contextualisation.
• Learners may have difficulties analysing corpus data; they require
pedagogical mediation.
• Pedagogical corpus uses differ from linguistic description; this requires e.g.
pedagogically motivated query options.
• Corpora need to be integrated with curricula; this requires e.g.
complementarity of content and effective delivery.
Do not fully support pedagogical requirements.
14. The challenges (2)
CORPUS DESIGN
Traditionally: representation
in written format
CORPUS EXPLOITATION
Work with text-only data and e.g.
conversational markup
• Spoken discourse is more dependent on shared physical contexts.
• It is adjusted to aural and online perception (e.g. chunking)
• and affected by limitations of processing capacity (false starts, repair).
• It is marked by accents.
• It is multimodal.
Again, this does not fully support
pedagogical requirements.
15. Requirements
• Format: multimedia to retain multimodal character of spoken
language
• Content: complementary with curriculum topics, more
coherence than in traditional corpora
• Pedagogically motivated transcription, annotation and
alignment (transcript-video)
• Combination of query methods: text-based exploration and
application of corpus techniques
• Pedagogical enrichment of corpora with complementary
resources (e.g. exercises, explanations)
• Effective delivery of corpora and additional resources to
learners/teachers
16. Corpus creation (1)
ELISA
• Professional English
• Accounts of professional life
• Different varieties
SACODEYL
• 7 European languages
• Youth language corpora
• Two age groups
• Examples: ELISA and SACODEYL
• Interview format
• Video clips with transcript
• Communicatively relevant topics, e.g. in SACODEYL topics outlined in the
Common European Framework
• Elicitation process: briefing informants and prompting them during the
interview, ensuring naturally flowing discourse
17. Corpus creation (1)
Topic Interview
questions
Age CEF Gramm.
functions
Holidays 1. Where did you
spend your
last holidays?
13-15
16-18
A2 can describe past
activities, personal
experiences
Past tense
2. What are your
plans for the
next holidays?
13-15 B1 can describe dreams,
hopes and ambitions
Future
Conditonal
Modal
verbs
Plans
for the
future
1. What are your
plans for your
career?
16-18 B1 can explain/give
reasons for my plans,
intentions and actions
Future
2. On what
grounds do
you decide?
16-18 B2 can speculate about
causes,
consequences,
hypothetical situations
Conditional
Modal
verbs
Example of topics in SACODEYL
21. [METADATA]
Title: La Unión Europea une a los
ciudadanos
Date Recording:2006-11-05
Date Transcription:2007-02-02
Locale:I.E.S. Floridablanca,Murcia,
España
Principal Investigator: Pascual
Perez-Paredes
Researcher:Pascual Perez-Paredes
Transcriber: Encarnación Tornero
Valero
Editor:
Autority: SACODEYL Project
ID:
Language:ES
MediaFileName:ES02.avi
Participants:
person:Chico
name:
role: Entrevistado
sex: Hombre
age: 16
description:
person: E
name: Andrés Mercader
Rodríguez
role: Entrevistador
sex: Hombre
age: 32
description:
[/METADATA]
22.
23.
24.
25.
26. Corpus query
• Query options will support text- and corpus-based exploration
and include e.g.
– Easy access to entire interviews
– A topic index supporting the analysis of similar sections
across interviews ("topic concordances")
– Other indices based on the annotation categories
– Ready-made data (e.g. frequency lists of each interview;
selective concordances)
– A concordancer for extended/advanced search; adapted to
pedagogical requirements
27. Corpus delivery
• Effective delivery as a further prerequisite for
integration into curriculum
• In SACODEYL, use of Moodle platform, giving
access to:
– Corpora (query interfaces)
– Resources (exercises and activities)
28. Summary
• Method outlined is transferable to other
pedagogical contexts, topics, languages
• Method helps to use corpora more efficiently
in pedagogical contexts – from sporadically
used resource to systematic exploitation
• Corpus creation complies with standards to
facilitate reuse of corpora for other contexts
(research)
35. TELL-OP is a Strategic Partnership that seeks to promote the take-
up of innovative practices in European language learning (Data
Driven Learning, DDL) by supporting personalised learning
approaches that rely on the use of ICT & OER by bringing together
the knowledge & expertise of European stakeholders in the fields
of language education, corpus & applied linguistics, e-learning &
knowledge engineering in order to promote cooperation &
contribute to unleash the potential behind already available web
2.0 services to promote the personalized e-learning of languages
in the contexts of higher & adult education, in particular, through
mobile devices.