A presentation by Dr. Xiaofei Lu on the Graphic Online Language Diagnostic (GOLD) tool developed by the Center for Advanced Language Proficiency Education and Research (CALPER) at The Pennsylvania State University.
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
A Corpus-based Approach to Tracking L2 Development
1. A Corpus-based Approach to Tracking L2 Development
Xiaofei Lu
Center for Advanced Language Proficiency
Education and Research
The Pennsylvania State University
November 20, 2009 CALPER at Penn State
2. 2
Outline
Corpora and learner corpora
Graphic Online Language Diagnostic (GOLD) CALPER at Penn State
3. 3
Corpora and learner corpora
What is a corpus
Types of corpora
Learner corpus design
Learner corpora and L2 development CALPER at Penn State
4. 4
What is a corpus
Leech (1992):
an unexciting phenomenon, a helluva lot of text, stored on a computer
Sinclair (1991, 2004):
a collection of naturally-occurring language text in electronic form, selected according to external criteria to represent, as far as possible, a language or language variety as a source of data for linguistic research
CALPER at Penn State
5. 5
Types of corpora
General-purpose vs. specialized corpora
The British National Corpus
Michigan Corpus of Academic Spoken English
Synchronic vs. diachronic corpora
Spoken vs. written corpora
Native vs. learner corpora
International Corpus of Learner English
CALPER at Penn State
6. 6
Learner corpus design
Purpose and type of corpus
Cross-sectional vs. longitudinal
Spoken vs. written
Representativeness and size
CALPER at Penn State
7. 7
Learner corpus design (cont.)
External criteria for text selection
Communicative function of the text
Mode, medium, interaction, genre
Encoding meaningful metadata information
Learner: L1, gender, program level, discipline …
Sample: date, mode, task, genre, rating …
Facilitates contrastive and longitudinal studies CALPER at Penn State
8. 8
Learner corpora and L2 development
Samples from same students at different times
Did (targeted) language development take place?
Was a particular pedagogical intervention effective?
Samples from different students
What areas do students show different levels of development?
What factors affect students’ language development? CALPER at Penn State
9. 9
Graphic Online Language Diagnostic
A free online tool for teachers to assess their students’ language development
Developed at CALPER, Penn State, funded by DOE
Project co-directors: Xiaofei Lu and Michael McCarthy
Teachers can use GOLD to
Compile, upload, and manage their own corpora
Share corpora with each other
Search and analyze corpora CALPER at Penn State
10. CALPER at Penn State
Graphic Online Language Diagnostic
Please know: GOLD is a free tool to use for language educators. Teachers need to register and apply for access. Teachers need to provide the name of their institution. We will verify whether your name is in the school’s directory. Explicitly not for commercial use.
13. 13
Corpus compilation
A user can compile a corpus by
Directly compiling and uploading an XML file
Using the easy-to-use guided XML creation interface
An uploaded corpus can be easily managed
Documents can be added or deleted
The whole corpus can be deleted
Content and metadata of individual documents can be easily accessed CALPER at Penn State
14. 14
Corpus sharing
GOLD facilitates easy data sharing
A corpus may be set to be
Private, shared, or public
Corpus owner may give other users right to
View, add, edit, or delete corpora CALPER at Penn State
34. 34
Basic corpus information
Word count
Alphabetic or numeric order
Can be downloaded as a text file
Corpus and document statistics
Mean sentence length
Mean word length
Type-token ratio CALPER at Penn State
41. 41
Corpus search
Select one or more corpora to search
Specify key words or phrases
May use the wildcard character, e.g. book*
Specify contexts
Size of context window
Context words and their positions
Specify metadata conditions CALPER at Penn State
42. 42
Corpus search results
Display of search results
Sortable KWIC display of search results
Sortable graphic display of search results
CALPER at Penn State
71. 71
Lexical bundle/collocation search
Procedure
Select one or more corpora to search
Specify search word
Specify contexts
Specify metadata conditions
Search results
Sortable list of n-grams found in selected corpora CALPER at Penn State
81. 81
Summary of features
Difference from other online tools
Can create, share, and search multiple corpora
Can easily search subsets of data
Can work with any language
Summary of corpus analysis functions
Word list
Corpus and document statistics: mean sentence length, mean word length, type-token ratio
Corpus search and collocation search CALPER at Penn State
82. 82
Sample questions to ask
With data from an individual student, one can either describe or track development in
Patterns of usages of words and phrases – frequency, underuse, overuse, etc.
Lexical and syntactic complexity
Appropriate usage of words and phrases in context
Patterns of usages of lexical buncles CALPER at Penn State
83. 83
Sample questions to ask (cont.)
With data from different (groups of) students, one can compare similarities or differences among different (groups of) students in terms of
Patterns of usages of words and phrases – frequency, underuse, overuse, etc.
Lexical and syntactic complexity
Appropriate usage of words and phrases in context
Patterns of usages of lexical buncles CALPER at Penn State
84. 84
Future enhancements
Corpora for benchmarking
Multilingual natural language processing
Suggestions on desirable functions welcome CALPER at Penn State
85. 85
How to learn more about GOLD
CALPER’s Corpus Portal
http://calper.la.psu.edu/corpus_portal/
Links to GOLD and other corpus-related resources
Follow us on Facebook
http://www.facebook.com/CALPERPA
Follow us on Twitter
http://www.twitter.com/CALPERPA CALPER at Penn State