4. Agenda
1. Introduction to the Corpus-to-Classroom Project
2. Project results:
• The SpinTX Video Archive: a pedagogically-friendly
interface to the Spanish in Texas Corpus.
• Involving teachers in the development of open
educational resources.
• A model for open source corpus development.
4
6. Corpora in the Classroom: the promise
• Corpus = a large, structured, collection of language
• Benefits for language learning:
• Naturalistic language use
• Motivation
• „Real‟ language
• Discovery learning
6
8. Example: CORPUS DEL ESPAÑOL
8
Pros:
• View examples of language in context.
• Linguistic annotations enable searching
by part-of-speech, etc.
9. Example: CORPUS DEL ESPAÑOL
9
Cons:
• Designed for researchers, not educators.
• Limited utility to untrained end users.
• Content not openly licensed.
13. Our two-pronged approach
SpinTX: Corpus-to-Classroom
Grant from the University of Texas
Longhorn Innovation Fund for
Technology (2012-2013)
13
Spanish in Texas Video
Corpus
A project of COERLL, a
National Foreign Language
Resource Center (2010-2014)
14. Spanish in Texas Corpus
• Goals:
• make publically available authentic data about variation in
Spanish as spoken in Texas
• for education
• for research
• encourage teachers/students/public to view local varieties
as a resource
14
A collection of sociolinguistic video interviews that
provide rich content for language learning.
15. Corpus-to-Classroom
• Goals:
• develop a pedagogically friendly interface for the Spanish in
Texas Corpus
• involve teachers and learners in the development of open
educational resources based on the corpus
• create a model for using open source tools and a pedagogical
interface that can be adapted for any language corpus
15
A searchable collection of pre-
selected, corrected, annotated clips from the larger
corpus
16. About the Corpus
16
Spanish in Texas Corpus SpinTX Video Archive
92 sociolinguistic interview videos
(avg. 30–45 min)
327 video clips from 33 speakers (avg.
1-4 min)
Transcribed (approx. 650,000 words) Transcribed (approx. 80,000 words)
Time-synced video caption files Time-synced video caption files
Tagged for linguistic features Tagged for linguistic and pedagogical
features
Completely open (no registration
required, open CC license)
Teacher-friendly interface
20. Needs assessment with educators
20
• How do you use authentic video in your teaching?
• How do you find videos to use? What problems do
you encounter?
• How can you imagine using the Spanish in Texas
videos in your classes?
21. Primary goals of the interface
• Enable educators to easily find and use videos that suit
the curriculum.
• Search by grammar point, theme, vocabulary, etc.
• Enable accessibility and content openness.
• Downloadable from open site with a license enabling remixing
• Enable educators to curate sets of videos for comparison
and study.
• Favoriting and tagging videos
• Provide access to supporting materials (lesson
plans, activity templates, etc).
• Develop a community to share ready-made materials and
templates
21
22. Secondary goals of the interface
• Employ in the development of materials for teacher
training.
• Engage students as co-researchers.
22
25. Ideas for future development
• Advanced search capability
• support for wildcards
• improved phrase searching
• improved “keyword in context” result view
• Data visualizations
• word and/or tag clouds
• language maps
• Enhanced word-level annotations
• hover over a word in a transcript and see all annotations
25
26. Formative evaluation of Beta version
Data collection methods:
• Online user survey (http://goo.gl/4Lbbg)
• Web analytics (navigation patterns, popular content)
• Search analytics
• User observation and feedback through ongoing
workshops and focus groups
26
27. Formative evaluation of Beta version
Data collection methods:
• Online user survey (http://goo.gl/4Lbbg)
• Web analytics (navigation patterns, popular content)
• Search analytics
• User observation and feedback through ongoing
workshops and focus groups
27
Results of formative evaluation will drive future
development of the interface.
29. Workshops with Educators
• Summer 2012 Workshop
• ~100 secondary and college Spanish teachers
• Fall 2012 Working Group
• ~10 Univ. of Texas Spanish teachers
• Spring 2013 Workshops
• Multiple conferences & Univ. of Texas Spanish teachers
• Summer 2013 Working Group
• ~10 secondary and college Spanish teachers
29
32. Sample materials from the community (2)
• Idea from teacher workshop: Use videos for grammar
lessons to develop the student‟s metalinguistic and critical
thinking skills as they pertain to language.
• Searched and selected clips for lesson on “por vs. para”.
• Lesson tested in heritage learners class.
• Anecdotal evidence that video lessons were effective and
motivating to students.
32
33. Current Templates
• Four templates:
• Cloze
• Data-Driven Learning (DDL)
• Variation
• Schema
33
42. Publication of OER
• Templates and community-developed lesson plans will be
available on the SpinTX website by August, 2013
• We encourage the publication of videos on third-party
platforms for remixing educational content.
42
44. Sharing development practices and code
• Use of open source software and open API‟s
• Custom code developed for the project
• Public GitHub repository: http://github.com/coerll
• Project documentation (research protocols, development
processes and methodologies, etc):
• Corpus-to-Classroom Blog: http://sites.la.utexas.edu/corpus-to-
classroom/
• “For Researchers” page on
spanishintexas.orghttp://spanishintexas.org/for-researchers/
44
45. Recruit „locally‟
• Recruit and train interns
• Internal Review Board training
• Video shooting and audio recording
• Practice interviews on site
• Recruit family, friends, acquaintances
• Any Spanish-speaking resident of TX
• Conduct interviews in their home communities
45
46. Interview protocol
• Sampling of a large set of questions (~75)
• from NPR Storycorps (Historias)
• biographical information
• Average Length: 30-45 min.
• Language: Spanish and mixed
• Consent form and talent release
• Metadata on speaker and interviewer
• Google docs
46
48. Processing the Videos
• Intake interview materials
• create unique ID for video and forms
• archive raw video and remove from camera
• Video and transcript preparation
• Edit and export videos using Final Cut Pro
• Sound and image correction
• Upload to Automatic Sync to be transcribed by bilingual transcriber
• 3-5 day turnaround
• Approx $85 per hour of video
48
60. Links
• SpinTX Video Archive:
http://www.spintx.org
• Spanish in Texas Corpus:
http://www.spanishintexas.org
• Slides from this Presentation will be posted at:
http://www.slideshare.net/spanish_in_texas
60
Editor's Notes
Results: still in progress!
Will introduce corpora in general, our source corpus, and the pedagogical corpus
Discuss examples briefly one at a time.How frequently do teachers use them?How easy are they to use?Emphasis on YouTube as probably the most popular in language classes, but hard to use.
Considering the pros and cons of these types of corpus interfaces, we took a two-pronged approach to developing a pedagogically friendly corpus.One the one hand there is the Spanish in Texas project – collecting sociolinguistic video interviews since 2010Recently got a grant focused on developing a pedagogically friendly interface to this existing corpus.
Both for research and for education dual purpseShow that language is alive and to view local varieties positively rather than negatively
To give you a sense of the scope of the corpus we are working with.
Will introduce corpora in general, our source corpus, and the pedagogical corpus
We asked teachers how they use videos and how they would like to use videos. (interviews and focus groups)
We asked teachers how they use videos and how they would like to use videos. (interviews and focus groups)
Teachers of heritage learners can learn about local variationInterviews collected by students can be contributed to the corpus
1. Anonymous userWatch intro video.Show search criteria: topics, grammar, pragmatics, keywords, etc.Show video page: related items, transcripts with highlighting, sharing & downloading tabs2. Registered userHow to favorite and tag a videoTagged video lists
We asked teachers how they use videos and how they would like to use videos.Here is how we havemet their needs
Observe how teachers are using the system to develop OER
Observe how teachers are using the system to develop OER
But that’s not all!
This will be an ongoing process that will hopefully eventually be taken over by the users.
This will be an ongoing process that will hopefully eventually be taken over by the users.
This will be an ongoing process that will hopefully eventually be taken over by the users.
This will be an ongoing process that will hopefully eventually be taken over by the users.
Pull up favorited videoHide target wordsProject video and cloze text in front of class
Discuss prescriptive rules for target as a class.Students pull up worksheet (example)Students complete worksheet by finding and recording examples, and then indicating whether they think it is a standard or non-standard use
This will be an ongoing process that will hopefully eventually be taken over by the users.
5 guidelines for developing open corporaWill also illustrate how we have implemented each guideline