SlideShare a Scribd company logo
1 of 60
June 14, 2013 | IALLT Conference
A Video Corpus for Language Learning
Open Source Tools & Materials from the Corpus-to-Classroom Project
Who we are
• Rachael Gilg
• Project Manager / Web Developer
• Arthur Wendorf
• Educational Technologist / Developer / Spanish Instructor
• Martí Quixal
• Computational Linguist / Developer / Spanish Instructor
• Almeida Jacqueline Toribio & Barbara E. Bullock
• Project Co-Directors
• Carl Blyth
• Director of COERLL
2
3
Agenda
1. Introduction to the Corpus-to-Classroom Project
2. Project results:
• The SpinTX Video Archive: a pedagogically-friendly
interface to the Spanish in Texas Corpus.
• Involving teachers in the development of open
educational resources.
• A model for open source corpus development.
4
Introduction to the Corpus-to-
Classroom Project
5
Corpora in the Classroom: the promise
• Corpus = a large, structured, collection of language
• Benefits for language learning:
• Naturalistic language use
• Motivation
• „Real‟ language
• Discovery learning
6
Example: CORPUS DEL ESPAÑOL
7
Example: CORPUS DEL ESPAÑOL
8
Pros:
• View examples of language in context.
• Linguistic annotations enable searching
by part-of-speech, etc.
Example: CORPUS DEL ESPAÑOL
9
Cons:
• Designed for researchers, not educators.
• Limited utility to untrained end users.
• Content not openly licensed.
Example: YouTube
10
Example: YouTube
11
Pros:
• Engaging video content, many with captions.
• Many videos are openly licensed (CC-BY).
Example: YouTube
12
Cons
• Searching is time-consuming.
• Content can disappear without warning.
• Sometimes blocked by K12 schools.
Our two-pronged approach
SpinTX: Corpus-to-Classroom
Grant from the University of Texas
Longhorn Innovation Fund for
Technology (2012-2013)
13
Spanish in Texas Video
Corpus
A project of COERLL, a
National Foreign Language
Resource Center (2010-2014)
Spanish in Texas Corpus
• Goals:
• make publically available authentic data about variation in
Spanish as spoken in Texas
• for education
• for research
• encourage teachers/students/public to view local varieties
as a resource
14
A collection of sociolinguistic video interviews that
provide rich content for language learning.
Corpus-to-Classroom
• Goals:
• develop a pedagogically friendly interface for the Spanish in
Texas Corpus
• involve teachers and learners in the development of open
educational resources based on the corpus
• create a model for using open source tools and a pedagogical
interface that can be adapted for any language corpus
15
A searchable collection of pre-
selected, corrected, annotated clips from the larger
corpus
About the Corpus
16
Spanish in Texas Corpus SpinTX Video Archive
92 sociolinguistic interview videos
(avg. 30–45 min)
327 video clips from 33 speakers (avg.
1-4 min)
Transcribed (approx. 650,000 words) Transcribed (approx. 80,000 words)
Time-synced video caption files Time-synced video caption files
Tagged for linguistic features Tagged for linguistic and pedagogical
features
Completely open (no registration
required, open CC license)
Teacher-friendly interface
17
The SpinTX Video Archive: a
pedagogically-friendly interface
to the Spanish in Texas Corpus
18
Needs assessment with educators
19
Needs assessment with educators
20
• How do you use authentic video in your teaching?
• How do you find videos to use? What problems do
you encounter?
• How can you imagine using the Spanish in Texas
videos in your classes?
Primary goals of the interface
• Enable educators to easily find and use videos that suit
the curriculum.
• Search by grammar point, theme, vocabulary, etc.
• Enable accessibility and content openness.
• Downloadable from open site with a license enabling remixing
• Enable educators to curate sets of videos for comparison
and study.
• Favoriting and tagging videos
• Provide access to supporting materials (lesson
plans, activity templates, etc).
• Develop a community to share ready-made materials and
templates
21
Secondary goals of the interface
• Employ in the development of materials for teacher
training.
• Engage students as co-researchers.
22
23
Technical Overview of SpinTX Archive
• Drupal 7
• Taxonomy module integration
• Community tags module
• Apache Solr search engine
• Keyword search
• Faceted browsing
24
Ideas for future development
• Advanced search capability
• support for wildcards
• improved phrase searching
• improved “keyword in context” result view
• Data visualizations
• word and/or tag clouds
• language maps
• Enhanced word-level annotations
• hover over a word in a transcript and see all annotations
25
Formative evaluation of Beta version
Data collection methods:
• Online user survey (http://goo.gl/4Lbbg)
• Web analytics (navigation patterns, popular content)
• Search analytics
• User observation and feedback through ongoing
workshops and focus groups
26
Formative evaluation of Beta version
Data collection methods:
• Online user survey (http://goo.gl/4Lbbg)
• Web analytics (navigation patterns, popular content)
• Search analytics
• User observation and feedback through ongoing
workshops and focus groups
27
Results of formative evaluation will drive future
development of the interface.
Involving Teachers in the
Development of OER
28
Workshops with Educators
• Summer 2012 Workshop
• ~100 secondary and college Spanish teachers
• Fall 2012 Working Group
• ~10 Univ. of Texas Spanish teachers
• Spring 2013 Workshops
• Multiple conferences & Univ. of Texas Spanish teachers
• Summer 2013 Working Group
• ~10 secondary and college Spanish teachers
29
Sample materials from the community (1)
30
31
Sample materials from the community (2)
• Idea from teacher workshop: Use videos for grammar
lessons to develop the student‟s metalinguistic and critical
thinking skills as they pertain to language.
• Searched and selected clips for lesson on “por vs. para”.
• Lesson tested in heritage learners class.
• Anecdotal evidence that video lessons were effective and
motivating to students.
32
Current Templates
• Four templates:
• Cloze
• Data-Driven Learning (DDL)
• Variation
• Schema
33
Cloze Template
34
Cloze Template: Activity
35
Data-Driven Learning (DDL) Template
36
Data-Driven Learning (DDL) Template:
Activity
37
Variation Template: Pre-class Preparation
38
Variation Template: Activity
39
Schema Template: Pre-class Preparation
40
Schema Template: Activity
41
Publication of OER
• Templates and community-developed lesson plans will be
available on the SpinTX website by August, 2013
• We encourage the publication of videos on third-party
platforms for remixing educational content.
42
A Model for Open Source
Corpus Development
43
Sharing development practices and code
• Use of open source software and open API‟s
• Custom code developed for the project
• Public GitHub repository: http://github.com/coerll
• Project documentation (research protocols, development
processes and methodologies, etc):
• Corpus-to-Classroom Blog: http://sites.la.utexas.edu/corpus-to-
classroom/
• “For Researchers” page on
spanishintexas.orghttp://spanishintexas.org/for-researchers/
44
Recruit „locally‟
• Recruit and train interns
• Internal Review Board training
• Video shooting and audio recording
• Practice interviews on site
• Recruit family, friends, acquaintances
• Any Spanish-speaking resident of TX
• Conduct interviews in their home communities
45
Interview protocol
• Sampling of a large set of questions (~75)
• from NPR Storycorps (Historias)
• biographical information
• Average Length: 30-45 min.
• Language: Spanish and mixed
• Consent form and talent release
• Metadata on speaker and interviewer
• Google docs
46
Interview Metadata
Processing the Videos
• Intake interview materials
• create unique ID for video and forms
• archive raw video and remove from camera
• Video and transcript preparation
• Edit and export videos using Final Cut Pro
• Sound and image correction
• Upload to Automatic Sync to be transcribed by bilingual transcriber
• 3-5 day turnaround
• Approx $85 per hour of video
48
Original Transcript from Automatic Sync
Upload video and transcript to YouTube for syncing
Download SRT file
Prepare Transcript for TreeTagger
Run through TreeTagger
Combine Data from SRT File and
TreeTagger File, and add additional Tags
Manual clip selection and description
Divide CSV Files and Videos into Clips and
adjust Timings and Numberings
Automatic Pedagogical Annotation of Clips
57
SpinTX Clip Data Published on GitHub
http://www.github.com/coerll
58
Questions?
59
Links
• SpinTX Video Archive:
http://www.spintx.org
• Spanish in Texas Corpus:
http://www.spanishintexas.org
• Slides from this Presentation will be posted at:
http://www.slideshare.net/spanish_in_texas
60

More Related Content

Similar to A Video Corpus for Language Learning: Open Source Tools & Materials from the Corpus-to-Classroom Project

OER: insights into a multilingual landscape - EUROCALL 2014 conference
OER: insights into a multilingual landscape - EUROCALL 2014 conference  OER: insights into a multilingual landscape - EUROCALL 2014 conference
OER: insights into a multilingual landscape - EUROCALL 2014 conference LangOER
 
ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...
ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...
ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...eaquals
 
Designing for Diversity: Creating Learning Experiences that Travel the Globe
Designing for Diversity: Creating Learning Experiences that Travel the GlobeDesigning for Diversity: Creating Learning Experiences that Travel the Globe
Designing for Diversity: Creating Learning Experiences that Travel the GlobeUna Daly
 
AudioVisuals In the Disciplines: Developing libraries of recommended TV and r...
AudioVisuals In the Disciplines: Developing libraries of recommended TV and r...AudioVisuals In the Disciplines: Developing libraries of recommended TV and r...
AudioVisuals In the Disciplines: Developing libraries of recommended TV and r...Chris Willmott
 
Two Hot Topics in Online Language Learning: Corpus Linguistics and Telecollab...
Two Hot Topics in Online Language Learning: Corpus Linguistics and Telecollab...Two Hot Topics in Online Language Learning: Corpus Linguistics and Telecollab...
Two Hot Topics in Online Language Learning: Corpus Linguistics and Telecollab...acornrevolution
 
CCCOER Sept 24 advisory
CCCOER Sept 24 advisoryCCCOER Sept 24 advisory
CCCOER Sept 24 advisoryUna Daly
 
Kirsten Holt The material writer’s toolkit for success
Kirsten Holt The material writer’s toolkit for successKirsten Holt The material writer’s toolkit for success
Kirsten Holt The material writer’s toolkit for successeaquals
 
The Open Science training hub FOSTER Plus - new resources and courses
The Open Science training hub FOSTER Plus - new resources and coursesThe Open Science training hub FOSTER Plus - new resources and courses
The Open Science training hub FOSTER Plus - new resources and coursesMaria Antónia Correia
 
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Alannah Fitzgerald
 
How Open Education Practices Support Student Centered Design & Accessibility
How Open Education Practices Support Student Centered Design & AccessibilityHow Open Education Practices Support Student Centered Design & Accessibility
How Open Education Practices Support Student Centered Design & AccessibilityUna Daly
 
Catering for linguistic domain specialisations through computer-assisted lang...
Catering for linguistic domain specialisations through computer-assisted lang...Catering for linguistic domain specialisations through computer-assisted lang...
Catering for linguistic domain specialisations through computer-assisted lang...Ana Gimeno-Sanz
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Pascual Pérez-Paredes
 
The Open Education Handbook
The Open Education HandbookThe Open Education Handbook
The Open Education HandbookMarieke Guy
 
Pedagogy, technology and training for language learning and teaching: the ECM...
Pedagogy, technology and training for language learning and teaching: the ECM...Pedagogy, technology and training for language learning and teaching: the ECM...
Pedagogy, technology and training for language learning and teaching: the ECM...LangOER
 

Similar to A Video Corpus for Language Learning: Open Source Tools & Materials from the Corpus-to-Classroom Project (20)

OER: insights into a multilingual landscape - EUROCALL 2014 conference
OER: insights into a multilingual landscape - EUROCALL 2014 conference  OER: insights into a multilingual landscape - EUROCALL 2014 conference
OER: insights into a multilingual landscape - EUROCALL 2014 conference
 
Testing
TestingTesting
Testing
 
ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...
ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...
ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...
 
Designing for Diversity: Creating Learning Experiences that Travel the Globe
Designing for Diversity: Creating Learning Experiences that Travel the GlobeDesigning for Diversity: Creating Learning Experiences that Travel the Globe
Designing for Diversity: Creating Learning Experiences that Travel the Globe
 
Training Heritage Speakers: A Journey Worth Taking
Training Heritage Speakers: A Journey Worth TakingTraining Heritage Speakers: A Journey Worth Taking
Training Heritage Speakers: A Journey Worth Taking
 
AudioVisuals In the Disciplines: Developing libraries of recommended TV and r...
AudioVisuals In the Disciplines: Developing libraries of recommended TV and r...AudioVisuals In the Disciplines: Developing libraries of recommended TV and r...
AudioVisuals In the Disciplines: Developing libraries of recommended TV and r...
 
Two Hot Topics in Online Language Learning: Corpus Linguistics and Telecollab...
Two Hot Topics in Online Language Learning: Corpus Linguistics and Telecollab...Two Hot Topics in Online Language Learning: Corpus Linguistics and Telecollab...
Two Hot Topics in Online Language Learning: Corpus Linguistics and Telecollab...
 
CCCOER Sept 24 advisory
CCCOER Sept 24 advisoryCCCOER Sept 24 advisory
CCCOER Sept 24 advisory
 
Kirsten Holt The material writer’s toolkit for success
Kirsten Holt The material writer’s toolkit for successKirsten Holt The material writer’s toolkit for success
Kirsten Holt The material writer’s toolkit for success
 
The Open Science training hub FOSTER Plus - new resources and courses
The Open Science training hub FOSTER Plus - new resources and coursesThe Open Science training hub FOSTER Plus - new resources and courses
The Open Science training hub FOSTER Plus - new resources and courses
 
Using pedagogic corpora in ELT
Using pedagogic corpora in ELTUsing pedagogic corpora in ELT
Using pedagogic corpora in ELT
 
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
 
Blended Learning-Best Practices
Blended Learning-Best PracticesBlended Learning-Best Practices
Blended Learning-Best Practices
 
OER Workshop
OER Workshop OER Workshop
OER Workshop
 
How Open Education Practices Support Student Centered Design & Accessibility
How Open Education Practices Support Student Centered Design & AccessibilityHow Open Education Practices Support Student Centered Design & Accessibility
How Open Education Practices Support Student Centered Design & Accessibility
 
Catering for linguistic domain specialisations through computer-assisted lang...
Catering for linguistic domain specialisations through computer-assisted lang...Catering for linguistic domain specialisations through computer-assisted lang...
Catering for linguistic domain specialisations through computer-assisted lang...
 
Target Your Training: Techniques to Adapt Your Content to Meet Your Students ...
Target Your Training: Techniques to Adapt Your Content to Meet Your Students ...Target Your Training: Techniques to Adapt Your Content to Meet Your Students ...
Target Your Training: Techniques to Adapt Your Content to Meet Your Students ...
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"
 
The Open Education Handbook
The Open Education HandbookThe Open Education Handbook
The Open Education Handbook
 
Pedagogy, technology and training for language learning and teaching: the ECM...
Pedagogy, technology and training for language learning and teaching: the ECM...Pedagogy, technology and training for language learning and teaching: the ECM...
Pedagogy, technology and training for language learning and teaching: the ECM...
 

Recently uploaded

MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 

Recently uploaded (20)

MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 

A Video Corpus for Language Learning: Open Source Tools & Materials from the Corpus-to-Classroom Project

  • 1. June 14, 2013 | IALLT Conference A Video Corpus for Language Learning Open Source Tools & Materials from the Corpus-to-Classroom Project
  • 2. Who we are • Rachael Gilg • Project Manager / Web Developer • Arthur Wendorf • Educational Technologist / Developer / Spanish Instructor • Martí Quixal • Computational Linguist / Developer / Spanish Instructor • Almeida Jacqueline Toribio & Barbara E. Bullock • Project Co-Directors • Carl Blyth • Director of COERLL 2
  • 3. 3
  • 4. Agenda 1. Introduction to the Corpus-to-Classroom Project 2. Project results: • The SpinTX Video Archive: a pedagogically-friendly interface to the Spanish in Texas Corpus. • Involving teachers in the development of open educational resources. • A model for open source corpus development. 4
  • 5. Introduction to the Corpus-to- Classroom Project 5
  • 6. Corpora in the Classroom: the promise • Corpus = a large, structured, collection of language • Benefits for language learning: • Naturalistic language use • Motivation • „Real‟ language • Discovery learning 6
  • 7. Example: CORPUS DEL ESPAÑOL 7
  • 8. Example: CORPUS DEL ESPAÑOL 8 Pros: • View examples of language in context. • Linguistic annotations enable searching by part-of-speech, etc.
  • 9. Example: CORPUS DEL ESPAÑOL 9 Cons: • Designed for researchers, not educators. • Limited utility to untrained end users. • Content not openly licensed.
  • 11. Example: YouTube 11 Pros: • Engaging video content, many with captions. • Many videos are openly licensed (CC-BY).
  • 12. Example: YouTube 12 Cons • Searching is time-consuming. • Content can disappear without warning. • Sometimes blocked by K12 schools.
  • 13. Our two-pronged approach SpinTX: Corpus-to-Classroom Grant from the University of Texas Longhorn Innovation Fund for Technology (2012-2013) 13 Spanish in Texas Video Corpus A project of COERLL, a National Foreign Language Resource Center (2010-2014)
  • 14. Spanish in Texas Corpus • Goals: • make publically available authentic data about variation in Spanish as spoken in Texas • for education • for research • encourage teachers/students/public to view local varieties as a resource 14 A collection of sociolinguistic video interviews that provide rich content for language learning.
  • 15. Corpus-to-Classroom • Goals: • develop a pedagogically friendly interface for the Spanish in Texas Corpus • involve teachers and learners in the development of open educational resources based on the corpus • create a model for using open source tools and a pedagogical interface that can be adapted for any language corpus 15 A searchable collection of pre- selected, corrected, annotated clips from the larger corpus
  • 16. About the Corpus 16 Spanish in Texas Corpus SpinTX Video Archive 92 sociolinguistic interview videos (avg. 30–45 min) 327 video clips from 33 speakers (avg. 1-4 min) Transcribed (approx. 650,000 words) Transcribed (approx. 80,000 words) Time-synced video caption files Time-synced video caption files Tagged for linguistic features Tagged for linguistic and pedagogical features Completely open (no registration required, open CC license) Teacher-friendly interface
  • 17. 17
  • 18. The SpinTX Video Archive: a pedagogically-friendly interface to the Spanish in Texas Corpus 18
  • 19. Needs assessment with educators 19
  • 20. Needs assessment with educators 20 • How do you use authentic video in your teaching? • How do you find videos to use? What problems do you encounter? • How can you imagine using the Spanish in Texas videos in your classes?
  • 21. Primary goals of the interface • Enable educators to easily find and use videos that suit the curriculum. • Search by grammar point, theme, vocabulary, etc. • Enable accessibility and content openness. • Downloadable from open site with a license enabling remixing • Enable educators to curate sets of videos for comparison and study. • Favoriting and tagging videos • Provide access to supporting materials (lesson plans, activity templates, etc). • Develop a community to share ready-made materials and templates 21
  • 22. Secondary goals of the interface • Employ in the development of materials for teacher training. • Engage students as co-researchers. 22
  • 23. 23
  • 24. Technical Overview of SpinTX Archive • Drupal 7 • Taxonomy module integration • Community tags module • Apache Solr search engine • Keyword search • Faceted browsing 24
  • 25. Ideas for future development • Advanced search capability • support for wildcards • improved phrase searching • improved “keyword in context” result view • Data visualizations • word and/or tag clouds • language maps • Enhanced word-level annotations • hover over a word in a transcript and see all annotations 25
  • 26. Formative evaluation of Beta version Data collection methods: • Online user survey (http://goo.gl/4Lbbg) • Web analytics (navigation patterns, popular content) • Search analytics • User observation and feedback through ongoing workshops and focus groups 26
  • 27. Formative evaluation of Beta version Data collection methods: • Online user survey (http://goo.gl/4Lbbg) • Web analytics (navigation patterns, popular content) • Search analytics • User observation and feedback through ongoing workshops and focus groups 27 Results of formative evaluation will drive future development of the interface.
  • 28. Involving Teachers in the Development of OER 28
  • 29. Workshops with Educators • Summer 2012 Workshop • ~100 secondary and college Spanish teachers • Fall 2012 Working Group • ~10 Univ. of Texas Spanish teachers • Spring 2013 Workshops • Multiple conferences & Univ. of Texas Spanish teachers • Summer 2013 Working Group • ~10 secondary and college Spanish teachers 29
  • 30. Sample materials from the community (1) 30
  • 31. 31
  • 32. Sample materials from the community (2) • Idea from teacher workshop: Use videos for grammar lessons to develop the student‟s metalinguistic and critical thinking skills as they pertain to language. • Searched and selected clips for lesson on “por vs. para”. • Lesson tested in heritage learners class. • Anecdotal evidence that video lessons were effective and motivating to students. 32
  • 33. Current Templates • Four templates: • Cloze • Data-Driven Learning (DDL) • Variation • Schema 33
  • 37. Data-Driven Learning (DDL) Template: Activity 37
  • 40. Schema Template: Pre-class Preparation 40
  • 42. Publication of OER • Templates and community-developed lesson plans will be available on the SpinTX website by August, 2013 • We encourage the publication of videos on third-party platforms for remixing educational content. 42
  • 43. A Model for Open Source Corpus Development 43
  • 44. Sharing development practices and code • Use of open source software and open API‟s • Custom code developed for the project • Public GitHub repository: http://github.com/coerll • Project documentation (research protocols, development processes and methodologies, etc): • Corpus-to-Classroom Blog: http://sites.la.utexas.edu/corpus-to- classroom/ • “For Researchers” page on spanishintexas.orghttp://spanishintexas.org/for-researchers/ 44
  • 45. Recruit „locally‟ • Recruit and train interns • Internal Review Board training • Video shooting and audio recording • Practice interviews on site • Recruit family, friends, acquaintances • Any Spanish-speaking resident of TX • Conduct interviews in their home communities 45
  • 46. Interview protocol • Sampling of a large set of questions (~75) • from NPR Storycorps (Historias) • biographical information • Average Length: 30-45 min. • Language: Spanish and mixed • Consent form and talent release • Metadata on speaker and interviewer • Google docs 46
  • 48. Processing the Videos • Intake interview materials • create unique ID for video and forms • archive raw video and remove from camera • Video and transcript preparation • Edit and export videos using Final Cut Pro • Sound and image correction • Upload to Automatic Sync to be transcribed by bilingual transcriber • 3-5 day turnaround • Approx $85 per hour of video 48
  • 49. Original Transcript from Automatic Sync
  • 50. Upload video and transcript to YouTube for syncing
  • 54. Combine Data from SRT File and TreeTagger File, and add additional Tags
  • 55. Manual clip selection and description
  • 56. Divide CSV Files and Videos into Clips and adjust Timings and Numberings
  • 58. SpinTX Clip Data Published on GitHub http://www.github.com/coerll 58
  • 60. Links • SpinTX Video Archive: http://www.spintx.org • Spanish in Texas Corpus: http://www.spanishintexas.org • Slides from this Presentation will be posted at: http://www.slideshare.net/spanish_in_texas 60

Editor's Notes

  1. Results: still in progress!
  2. Will introduce corpora in general, our source corpus, and the pedagogical corpus
  3. Discuss examples briefly one at a time.How frequently do teachers use them?How easy are they to use?Emphasis on YouTube as probably the most popular in language classes, but hard to use.
  4. Considering the pros and cons of these types of corpus interfaces, we took a two-pronged approach to developing a pedagogically friendly corpus.One the one hand there is the Spanish in Texas project – collecting sociolinguistic video interviews since 2010Recently got a grant focused on developing a pedagogically friendly interface to this existing corpus.
  5. Both for research and for education dual purpseShow that language is alive and to view local varieties positively rather than negatively
  6. To give you a sense of the scope of the corpus we are working with.
  7. Will introduce corpora in general, our source corpus, and the pedagogical corpus
  8. We asked teachers how they use videos and how they would like to use videos. (interviews and focus groups)
  9. We asked teachers how they use videos and how they would like to use videos. (interviews and focus groups)
  10. Teachers of heritage learners can learn about local variationInterviews collected by students can be contributed to the corpus
  11. 1. Anonymous userWatch intro video.Show search criteria: topics, grammar, pragmatics, keywords, etc.Show video page: related items, transcripts with highlighting, sharing & downloading tabs2. Registered userHow to favorite and tag a videoTagged video lists
  12. We asked teachers how they use videos and how they would like to use videos.Here is how we havemet their needs
  13. Observe how teachers are using the system to develop OER
  14. Observe how teachers are using the system to develop OER
  15. But that’s not all!
  16. This will be an ongoing process that will hopefully eventually be taken over by the users.
  17. This will be an ongoing process that will hopefully eventually be taken over by the users.
  18. This will be an ongoing process that will hopefully eventually be taken over by the users.
  19. This will be an ongoing process that will hopefully eventually be taken over by the users.
  20. Pull up favorited videoHide target wordsProject video and cloze text in front of class
  21. Discuss prescriptive rules for target as a class.Students pull up worksheet (example)Students complete worksheet by finding and recording examples, and then indicating whether they think it is a standard or non-standard use
  22. This will be an ongoing process that will hopefully eventually be taken over by the users.
  23. 5 guidelines for developing open corporaWill also illustrate how we have implemented each guideline