SlideShare a Scribd company logo
1 of 85
A Corpus-based Approach to Tracking L2 Development 
Xiaofei Lu 
Center for Advanced Language Proficiency 
Education and Research 
The Pennsylvania State University 
November 20, 2009 CALPER at Penn State
2 
Outline 
 
Corpora and learner corpora 
 
Graphic Online Language Diagnostic (GOLD) CALPER at Penn State
3 
Corpora and learner corpora 
 
What is a corpus 
 
Types of corpora 
 
Learner corpus design 
 
Learner corpora and L2 development CALPER at Penn State
4 
What is a corpus 
 
Leech (1992): 
 
an unexciting phenomenon, a helluva lot of text, stored on a computer 
 
Sinclair (1991, 2004): 
 
a collection of naturally-occurring language text in electronic form, selected according to external criteria to represent, as far as possible, a language or language variety as a source of data for linguistic research 
CALPER at Penn State
5 
Types of corpora 
 
General-purpose vs. specialized corpora 
 
The British National Corpus 
 
Michigan Corpus of Academic Spoken English 
 
Synchronic vs. diachronic corpora 
 
Spoken vs. written corpora 
 
Native vs. learner corpora 
 
International Corpus of Learner English 
CALPER at Penn State
6 
Learner corpus design 
 
Purpose and type of corpus 
 
Cross-sectional vs. longitudinal 
 
Spoken vs. written 
 
Representativeness and size 
CALPER at Penn State
7 
Learner corpus design (cont.) 
 
External criteria for text selection 
 
Communicative function of the text 
 
Mode, medium, interaction, genre 
 
Encoding meaningful metadata information 
 
Learner: L1, gender, program level, discipline … 
 
Sample: date, mode, task, genre, rating … 
 
Facilitates contrastive and longitudinal studies CALPER at Penn State
8 
Learner corpora and L2 development 
 
Samples from same students at different times 
 
Did (targeted) language development take place? 
 
Was a particular pedagogical intervention effective? 
 
Samples from different students 
 
What areas do students show different levels of development? 
 
What factors affect students’ language development? CALPER at Penn State
9 
Graphic Online Language Diagnostic 
 
A free online tool for teachers to assess their students’ language development 
 
Developed at CALPER, Penn State, funded by DOE 
 
Project co-directors: Xiaofei Lu and Michael McCarthy 
 
Teachers can use GOLD to 
 
Compile, upload, and manage their own corpora 
 
Share corpora with each other 
 
Search and analyze corpora CALPER at Penn State
CALPER at Penn State 
Graphic Online Language Diagnostic 
Please know: GOLD is a free tool to use for language educators. Teachers need to register and apply for access. Teachers need to provide the name of their institution. We will verify whether your name is in the school’s directory. Explicitly not for commercial use.
CALPER at Penn State
CALPER at Penn State
13 
Corpus compilation 
 
A user can compile a corpus by 
 
Directly compiling and uploading an XML file 
 
Using the easy-to-use guided XML creation interface 
 
An uploaded corpus can be easily managed 
 
Documents can be added or deleted 
 
The whole corpus can be deleted 
 
Content and metadata of individual documents can be easily accessed CALPER at Penn State
14 
Corpus sharing 
 
GOLD facilitates easy data sharing 
 
A corpus may be set to be 
 
Private, shared, or public 
 
Corpus owner may give other users right to 
 
View, add, edit, or delete corpora CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
34 
Basic corpus information 
 
Word count 
 
Alphabetic or numeric order 
 
Can be downloaded as a text file 
 
Corpus and document statistics 
 
Mean sentence length 
 
Mean word length 
 
Type-token ratio CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
41 
Corpus search 
 
Select one or more corpora to search 
 
Specify key words or phrases 
 
May use the wildcard character, e.g. book* 
 
Specify contexts 
 
Size of context window 
 
Context words and their positions 
 
Specify metadata conditions CALPER at Penn State
42 
Corpus search results 
 
Display of search results 
 
Sortable KWIC display of search results 
 
Sortable graphic display of search results 
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
71 
Lexical bundle/collocation search 
 
Procedure 
 
Select one or more corpora to search 
 
Specify search word 
 
Specify contexts 
 
Specify metadata conditions 
 
Search results 
 
Sortable list of n-grams found in selected corpora CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
CALPER at Penn State
81 
Summary of features 
 
Difference from other online tools 
 
Can create, share, and search multiple corpora 
 
Can easily search subsets of data 
 
Can work with any language 
 
Summary of corpus analysis functions 
 
Word list 
 
Corpus and document statistics: mean sentence length, mean word length, type-token ratio 
 
Corpus search and collocation search CALPER at Penn State
82 
Sample questions to ask 
 
With data from an individual student, one can either describe or track development in 
 
Patterns of usages of words and phrases – frequency, underuse, overuse, etc. 
 
Lexical and syntactic complexity 
 
Appropriate usage of words and phrases in context 
 
Patterns of usages of lexical buncles CALPER at Penn State
83 
Sample questions to ask (cont.) 
 
With data from different (groups of) students, one can compare similarities or differences among different (groups of) students in terms of 
 
Patterns of usages of words and phrases – frequency, underuse, overuse, etc. 
 
Lexical and syntactic complexity 
 
Appropriate usage of words and phrases in context 
 
Patterns of usages of lexical buncles CALPER at Penn State
84 
Future enhancements 
 
Corpora for benchmarking 
 
Multilingual natural language processing 
 
Suggestions on desirable functions welcome CALPER at Penn State
85 
How to learn more about GOLD 
 
CALPER’s Corpus Portal 
 
http://calper.la.psu.edu/corpus_portal/ 
 
Links to GOLD and other corpus-related resources 
Follow us on Facebook 
http://www.facebook.com/CALPERPA 
Follow us on Twitter 
http://www.twitter.com/CALPERPA CALPER at Penn State

More Related Content

What's hot

Using corpora in instruction
Using corpora in instructionUsing corpora in instruction
Using corpora in instruction
Jonathan Smart
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
Alicia Ruiz
 

What's hot (20)

Language corpora and the language classroom.
Language corpora and the language classroom.Language corpora and the language classroom.
Language corpora and the language classroom.
 
Using corpora in instruction
Using corpora in instructionUsing corpora in instruction
Using corpora in instruction
 
The corpus research method
The corpus research methodThe corpus research method
The corpus research method
 
Applications of CL to FLT
Applications of CL to FLTApplications of CL to FLT
Applications of CL to FLT
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Hassan presentation of corpus
Hassan presentation of corpusHassan presentation of corpus
Hassan presentation of corpus
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
 
Corpus Linguistics: An Introduction
Corpus Linguistics: An IntroductionCorpus Linguistics: An Introduction
Corpus Linguistics: An Introduction
 
Corpora translation
Corpora translationCorpora translation
Corpora translation
 
Corpus Approaches to the Language of Literature 2008
Corpus Approaches to the Language of Literature 2008Corpus Approaches to the Language of Literature 2008
Corpus Approaches to the Language of Literature 2008
 
Pedagogical applications of corpus data for English for General and Specific ...
Pedagogical applications of corpus data for English for General and Specific ...Pedagogical applications of corpus data for English for General and Specific ...
Pedagogical applications of corpus data for English for General and Specific ...
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Corpora and its use in elt
Corpora and its use in eltCorpora and its use in elt
Corpora and its use in elt
 

Similar to A Corpus-based Approach to Tracking L2 Development

CCSS Overview For Teacher Educators
CCSS Overview For Teacher EducatorsCCSS Overview For Teacher Educators
CCSS Overview For Teacher Educators
Eileen Murphy
 
Common core and readibility
Common core and readibilityCommon core and readibility
Common core and readibility
kjones9999
 
ENG II Honors Curriculum Map
ENG II Honors Curriculum MapENG II Honors Curriculum Map
ENG II Honors Curriculum Map
Katye Jones
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
Sebastian Hellmann
 

Similar to A Corpus-based Approach to Tracking L2 Development (20)

CCSS Overview For Teacher Educators
CCSS Overview For Teacher EducatorsCCSS Overview For Teacher Educators
CCSS Overview For Teacher Educators
 
Common core 2
Common core 2Common core 2
Common core 2
 
Common core
Common coreCommon core
Common core
 
Common core and readibility
Common core and readibilityCommon core and readibility
Common core and readibility
 
Tesol 2010 Boston
Tesol 2010 BostonTesol 2010 Boston
Tesol 2010 Boston
 
Ccss structure
Ccss structureCcss structure
Ccss structure
 
2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring
 
Finding a Common Language: Bringing Complex and Disparate Vocabularies Together
Finding a Common Language: Bringing Complex and Disparate Vocabularies TogetherFinding a Common Language: Bringing Complex and Disparate Vocabularies Together
Finding a Common Language: Bringing Complex and Disparate Vocabularies Together
 
RDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization dataRDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization data
 
Radcliffe online presentation
Radcliffe online presentationRadcliffe online presentation
Radcliffe online presentation
 
LRC (National Foreign Language Resource Centers) - Free Resources
LRC (National Foreign Language Resource Centers) - Free ResourcesLRC (National Foreign Language Resource Centers) - Free Resources
LRC (National Foreign Language Resource Centers) - Free Resources
 
Webinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceWebinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior Relevance
 
Common Core State Standards: An Occasion for Change
Common Core State Standards: An Occasion for ChangeCommon Core State Standards: An Occasion for Change
Common Core State Standards: An Occasion for Change
 
ENG II Honors Curriculum Map
ENG II Honors Curriculum MapENG II Honors Curriculum Map
ENG II Honors Curriculum Map
 
Writing and cc calsa
Writing and cc   calsaWriting and cc   calsa
Writing and cc calsa
 
Getting to the Core: Integrating Technology into Common Core Standards
Getting to the Core: Integrating Technology into Common Core StandardsGetting to the Core: Integrating Technology into Common Core Standards
Getting to the Core: Integrating Technology into Common Core Standards
 
The Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal RegulationsThe Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal Regulations
 
Writing and CCSS - Lead3.0
Writing and CCSS - Lead3.0Writing and CCSS - Lead3.0
Writing and CCSS - Lead3.0
 
Key Principles & Digital Tools for ELL Instruction in CCSS Fall CUE2015
Key Principles & Digital Tools for ELL Instruction in CCSS Fall CUE2015Key Principles & Digital Tools for ELL Instruction in CCSS Fall CUE2015
Key Principles & Digital Tools for ELL Instruction in CCSS Fall CUE2015
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
 

Recently uploaded

Recently uploaded (20)

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 

A Corpus-based Approach to Tracking L2 Development

  • 1. A Corpus-based Approach to Tracking L2 Development Xiaofei Lu Center for Advanced Language Proficiency Education and Research The Pennsylvania State University November 20, 2009 CALPER at Penn State
  • 2. 2 Outline  Corpora and learner corpora  Graphic Online Language Diagnostic (GOLD) CALPER at Penn State
  • 3. 3 Corpora and learner corpora  What is a corpus  Types of corpora  Learner corpus design  Learner corpora and L2 development CALPER at Penn State
  • 4. 4 What is a corpus  Leech (1992):  an unexciting phenomenon, a helluva lot of text, stored on a computer  Sinclair (1991, 2004):  a collection of naturally-occurring language text in electronic form, selected according to external criteria to represent, as far as possible, a language or language variety as a source of data for linguistic research CALPER at Penn State
  • 5. 5 Types of corpora  General-purpose vs. specialized corpora  The British National Corpus  Michigan Corpus of Academic Spoken English  Synchronic vs. diachronic corpora  Spoken vs. written corpora  Native vs. learner corpora  International Corpus of Learner English CALPER at Penn State
  • 6. 6 Learner corpus design  Purpose and type of corpus  Cross-sectional vs. longitudinal  Spoken vs. written  Representativeness and size CALPER at Penn State
  • 7. 7 Learner corpus design (cont.)  External criteria for text selection  Communicative function of the text  Mode, medium, interaction, genre  Encoding meaningful metadata information  Learner: L1, gender, program level, discipline …  Sample: date, mode, task, genre, rating …  Facilitates contrastive and longitudinal studies CALPER at Penn State
  • 8. 8 Learner corpora and L2 development  Samples from same students at different times  Did (targeted) language development take place?  Was a particular pedagogical intervention effective?  Samples from different students  What areas do students show different levels of development?  What factors affect students’ language development? CALPER at Penn State
  • 9. 9 Graphic Online Language Diagnostic  A free online tool for teachers to assess their students’ language development  Developed at CALPER, Penn State, funded by DOE  Project co-directors: Xiaofei Lu and Michael McCarthy  Teachers can use GOLD to  Compile, upload, and manage their own corpora  Share corpora with each other  Search and analyze corpora CALPER at Penn State
  • 10. CALPER at Penn State Graphic Online Language Diagnostic Please know: GOLD is a free tool to use for language educators. Teachers need to register and apply for access. Teachers need to provide the name of their institution. We will verify whether your name is in the school’s directory. Explicitly not for commercial use.
  • 11. CALPER at Penn State
  • 12. CALPER at Penn State
  • 13. 13 Corpus compilation  A user can compile a corpus by  Directly compiling and uploading an XML file  Using the easy-to-use guided XML creation interface  An uploaded corpus can be easily managed  Documents can be added or deleted  The whole corpus can be deleted  Content and metadata of individual documents can be easily accessed CALPER at Penn State
  • 14. 14 Corpus sharing  GOLD facilitates easy data sharing  A corpus may be set to be  Private, shared, or public  Corpus owner may give other users right to  View, add, edit, or delete corpora CALPER at Penn State
  • 15. CALPER at Penn State
  • 16. CALPER at Penn State
  • 17. CALPER at Penn State
  • 18. CALPER at Penn State
  • 19. CALPER at Penn State
  • 20. CALPER at Penn State
  • 21. CALPER at Penn State
  • 22. CALPER at Penn State
  • 23. CALPER at Penn State
  • 24. CALPER at Penn State
  • 25. CALPER at Penn State
  • 26. CALPER at Penn State
  • 27. CALPER at Penn State
  • 28. CALPER at Penn State
  • 29. CALPER at Penn State
  • 30. CALPER at Penn State
  • 31. CALPER at Penn State
  • 32. CALPER at Penn State
  • 33. CALPER at Penn State
  • 34. 34 Basic corpus information  Word count  Alphabetic or numeric order  Can be downloaded as a text file  Corpus and document statistics  Mean sentence length  Mean word length  Type-token ratio CALPER at Penn State
  • 35. CALPER at Penn State
  • 36. CALPER at Penn State
  • 37. CALPER at Penn State
  • 38. CALPER at Penn State
  • 39. CALPER at Penn State
  • 40. CALPER at Penn State
  • 41. 41 Corpus search  Select one or more corpora to search  Specify key words or phrases  May use the wildcard character, e.g. book*  Specify contexts  Size of context window  Context words and their positions  Specify metadata conditions CALPER at Penn State
  • 42. 42 Corpus search results  Display of search results  Sortable KWIC display of search results  Sortable graphic display of search results CALPER at Penn State
  • 43. CALPER at Penn State
  • 44. CALPER at Penn State
  • 45. CALPER at Penn State
  • 46. CALPER at Penn State
  • 47. CALPER at Penn State
  • 48. CALPER at Penn State
  • 49. CALPER at Penn State
  • 50. CALPER at Penn State
  • 51. CALPER at Penn State
  • 52. CALPER at Penn State
  • 53. CALPER at Penn State
  • 54. CALPER at Penn State
  • 55. CALPER at Penn State
  • 56. CALPER at Penn State
  • 57. CALPER at Penn State
  • 58. CALPER at Penn State
  • 59. CALPER at Penn State
  • 60. CALPER at Penn State
  • 61. CALPER at Penn State
  • 62. CALPER at Penn State
  • 63. CALPER at Penn State
  • 64. CALPER at Penn State
  • 65. CALPER at Penn State
  • 66. CALPER at Penn State
  • 67. CALPER at Penn State
  • 68. CALPER at Penn State
  • 69. CALPER at Penn State
  • 70. CALPER at Penn State
  • 71. 71 Lexical bundle/collocation search  Procedure  Select one or more corpora to search  Specify search word  Specify contexts  Specify metadata conditions  Search results  Sortable list of n-grams found in selected corpora CALPER at Penn State
  • 72. CALPER at Penn State
  • 73. CALPER at Penn State
  • 74. CALPER at Penn State
  • 75. CALPER at Penn State
  • 76. CALPER at Penn State
  • 77. CALPER at Penn State
  • 78. CALPER at Penn State
  • 79. CALPER at Penn State
  • 80. CALPER at Penn State
  • 81. 81 Summary of features  Difference from other online tools  Can create, share, and search multiple corpora  Can easily search subsets of data  Can work with any language  Summary of corpus analysis functions  Word list  Corpus and document statistics: mean sentence length, mean word length, type-token ratio  Corpus search and collocation search CALPER at Penn State
  • 82. 82 Sample questions to ask  With data from an individual student, one can either describe or track development in  Patterns of usages of words and phrases – frequency, underuse, overuse, etc.  Lexical and syntactic complexity  Appropriate usage of words and phrases in context  Patterns of usages of lexical buncles CALPER at Penn State
  • 83. 83 Sample questions to ask (cont.)  With data from different (groups of) students, one can compare similarities or differences among different (groups of) students in terms of  Patterns of usages of words and phrases – frequency, underuse, overuse, etc.  Lexical and syntactic complexity  Appropriate usage of words and phrases in context  Patterns of usages of lexical buncles CALPER at Penn State
  • 84. 84 Future enhancements  Corpora for benchmarking  Multilingual natural language processing  Suggestions on desirable functions welcome CALPER at Penn State
  • 85. 85 How to learn more about GOLD  CALPER’s Corpus Portal  http://calper.la.psu.edu/corpus_portal/  Links to GOLD and other corpus-related resources Follow us on Facebook http://www.facebook.com/CALPERPA Follow us on Twitter http://www.twitter.com/CALPERPA CALPER at Penn State