Presentation at CALICO 2013: Corpora provide a promising way of creating language learning materials that accurately depict languages, but corpus search interfaces typically aren't designed with this goal in mind. The SPinTX Corpus-to-Classroom project is developing a website for educators to search and adapt authentic video for the teaching of Spanish. This presentation will describe the main results to date: (1) a pedagogically friendly interface to search over 300 tagged video clips from the Spanish in Texas Corpus; (2) tools for educators to easily create lessons and activities based on the videos; (3) an open source model for developing video corpora for language learning.
2. Who we are
• Barbara E. Bullock & Almeida Jacqueline Toribio
• Project Directors / Sociolinguistics Researchers
• Rachael Gilg
• Project Manager / Web Developer
• Arthur Wendorf
• Corpus Linguist / Developer
• Martí Quixal
• Computational Linguist / Developer
• Carl Blyth
• Director of COERLL
2
3. Agenda
• Part 1: Introduction to the Corpus-to-Classroom Project
• Part 2: Project Results
• The SpinTX Video Archive: a pedagogically-friendly interface to the
Spanish in Texas Corpus
• Involving teachers in the development of open educational
resources
• A model for open source corpus development
3
5. Corpora in the Classroom: the promise
• Corpus: a large, structured, collection of language
• Benefits:
• Naturalistic language use
• Motivation
• „Real‟ language
• Discovery learning
• Examples:
5
6. Corpora in the Classroom: the reality
• Large linguistic corpora are of limited utility to untrained
end users.
• Designed for researchers, not educators.
• Collections such as YouTube are popular for language
classes, but can present problems
• Searching for appropriate content is time-consuming using
available search methods.
• Content is not necessarily openly-licensed and can disappear
without warning.
6
7. Our two-pronged approach
Spanish in Texas Corpus Project
A project of COERLL, a National Foreign Language
Resource Center (2010-2014)
• Video interviews provide rich content
SpinTX: Corpus-to-Classroom Project
Grant from the University of Texas Longhorn
Innovation Fund for Technology (2012-2013)
• Collection of pre-selected, corrected, annotated
clips from the larger corpus
• Open-source, pedagogically-friendly search and
authoring tools
7
8. Spanish in Texas Corpus: Goals
• To make publically available authentic data about
variation in Spanish as spoken in Texas
• for education
• for research
• Encourage teachers/students/public to view local
varieties as a resource
8
9. Corpus-to-Classroom: Goals
• develop a pedagogically friendly interface for using
the Spanish in Texas corpus
• involve teachers and learners, via crowd-sourcing,
social networking, and workshops, in the
development of open educational resources
• create a model for using open source tools and a
pedagogical interface that can be adapted for any
language corpus collection
9
10. Corpus Overview
Spanish in Texas corpus
• Approx. 92 videos of sociolinguistic interviews (avg.
30–45 min)
• Transcribed (approx. 600,000 words)
• Time-synced video caption files
• Tagged for linguistic features
SpinTX Video Archive corpus
• Approx. 327 video clips from 33 speakers (avg. 1-4
min)
• Transcribed (approx. 80,000 words)
• Time-synced video caption files
• Tagged for linguistic and pedagogical features
• Completely open (no registration required, open CC
license)
• Teacher-friendly interface
10
21. Combine Data from SRT File and
TreeTagger File, and add additional Tags
22. Divide CSV Files and Videos into Clips and
adjust Timings and Numberings
23. The SpinTX Video Archive: a
pedagogically-friendly interface
to the Spanish in Texas Corpus
23
24. Needs assessment: teacher interviews
• How do you use authentic video in your teaching?
• Describe searches you have done in the past for video
content. What were you looking for and were you able to
find it?
• How can you imagine using clips from the Spanish in
Texas video corpus in your classes?
24
25. Needs assessment results: primary goals
• Enable teachers to easily videos that suit the
curriculum/work plan
• Search by grammar, theme, vocabulary, etc.
• Provide open, non-ephemeral content
• Downloadable from open site with a license enabling remixing
• Curating sets of videos for comparison and study
• Favoriting and tagging videos
• Provide access to supporting materials.
• Creating a “community of practice” around the videos so materials
can be shared among educators.
25
26. Needs assessment results: secondary goals
• Materials for teacher trainers
• Teachers of heritage learners can learn about local variation
• Video recording as a cross-competence task
• Interviews collected by students can be contributed to the corpus
26
28. Ideas for future development
• Advanced search capability
• support for wildcards
• improved phrase searching
• improved “keyword in context” result view
• Data visualizations
• word and/or tag clouds
• language maps
• Enhanced word-level annotations
• hover over a word in a transcript and see all annotations
28
29. Formative evaluation of Beta version
Data collection methods:
• Online user survey
• Web analytics (navigation patterns, popular content)
• Search analytics
• User observation and feedback through ongoing
workshops and focus groups
Results will drive future development of the interface.
29
31. Workshops with Educators
• Summer 2012 Workshop
• ~100 secondary and college Spanish teachers
• Fall 2012 Working Group
• ~10 Univ. of Texas Spanish teachers
• Spring 2013 Workshops
• Multiple conferences & Univ. of Texas Spanish teachers
• Summer 2013 Working Group
• ~10 secondary and college Spanish teachers
31
34. Sample materials from the community (2)
• Idea from teacher workshop: Use videos for grammar
lessons to develop the student‟s metalinguistic and critical
thinking skills as they pertain to language.
• Searched and selected clips for lesson on “por vs. para”.
• Lesson tested in heritage learners class.
• Anecdotal evidence that video lessons were effective and
motivating to students.
34
35. Template development ideas
• Using video clips from the SpinTX video archive, create
an activity for classroom use (at any level).
• Focus on Topics: Familia, Idioma, Identidad
• Focus on Grammar: Por vs. Para, Gustar, Ser vs. Estar
• Four steps
• Predict: Before watching
• Observe: While watching
• Discuss: After watching
• Produce: Follow-up activity
35
36. Publication of OER
• Community-developed lesson plans will be available on
the SpinTX website by August, 2013
• We encourage the publication of videos on third-party
platforms for remixing educational content, such as TedEd
(http://www.ed.ted.com)
36
38. Open source development
• Open Source Software
• TreeTagger (part-of-speech tagger)
• Drupal
• Open API‟s
• YouTube Captioning API
• Google Fusion Tables API
• Custom code developed for the project
• Freely available in our GitHub repository: http://github.com/coerll
38
39. Enable sharing of content and data
• With educators:
• SpinTX interface allows embedding, downloading, & social sharing
of videos and transcripts.
• With researchers:
• Source tagged data in our GitHub repository
https://github.com/coerll/SpinTXCorpusData
• Documentation of data in our GitHub wiki
https://github.com/coerll/SpinTXCorpusData/wiki
39
40. Open content licenses
• Creative Commons provides licenses for Open
Educational Resources
• We use CC BY-NC-SA (Attribution, Non-Commercial, Share-Alike)
40
41. Open Project Documentation
• Research protocols, development processes and
methodologies, and other project documentation
publically available:
• Corpus-to-Classroom Blog: http://sites.la.utexas.edu/corpus-to-
classroom/
• “For Researchers” page on
spanishintexas.orghttp://spanishintexas.org/for-researchers/
41
43. Links
• SpinTX Video Archive:
http://www.spintx.org
• Spanish in Texas Corpus:
http://www.spanishintexas.org
43
Hinweis der Redaktion
Will introduce corpora in general, our source corpus, and the pedagogical corpus
Discuss examples briefly one at a time.How frequently do teachers use them?How easy are they to use?Emphasis on YouTube as probably the most popular in language classes, but hard to use.
Discuss examples briefly one at a time.How frequently do teachers use them?How easy are they to use?Emphasis on YouTube as probably the most popular in language classes, but hard to use.
Describe original corpusThis is similar to the other corpora we looked at earlierIntroduce SpinTX corpus and highlight differences
Will introduce corpora in general, our source corpus, and the pedagogical corpus
We asked teachers how they use videos and how they would like to use videos. (interviews and focus groups
We asked teachers how they use videos and how they would like to use videos.Here is how we havemet their needs
We asked teachers how they use videos and how they would like to use videos.Here is how we havemet their needs
1. Anonymous userWatch intro video.Show search criteria: topics, grammar, pragmatics, keywords, etc.Show video page: related items, transcripts with highlighting, sharing & downloading tabs2. Registered userHow to favorite and tag a videoTagged video lists
We asked teachers how they use videos and how they would like to use videos.Here is how we havemet their needs
We asked teachers how they use videos and how they would like to use videos.Here is how we havemet their needs
But that’s not all!
This will be an ongoing process that will hopefully eventually be taken over by the users.
This will be an ongoing process that will hopefully eventually be taken over by the users.
This will be an ongoing process that will hopefully eventually be taken over by the users.
This will be an ongoing process that will hopefully eventually be taken over by the users.
This will be an ongoing process that will hopefully eventually be taken over by the users.
This will be an ongoing process that will hopefully eventually be taken over by the users.
5 guidelines for developing open corporaWill also illustrate how we have implemented each guideline