A corpus is a large collection of authentic written or spoken texts that is representative of a particular language or language variety. The corpus data can be computer readable and used to verify hypotheses about language or provide linguistic descriptions. Corpora can be general, aiming to represent language broadly across many contexts, or specialized, focusing on a specific register, time period, topic, or other limiting factors. Examples of general corpora include the Brown Corpus, LOB Corpus, and British National Corpus, while the Michigan Corpus of Academic Spoken English is a specialized corpus of university speech. Corpus analysis provides a more objective view of language patterns than intuition alone.
2. A collection of spoken or written authentic text. that
is representative of a particular area of language
use by virtue on the size and composition.
The corpus is representative of language use in
general, or even of a specific language variety, as
the data set may be very specialized. and it may
not always be based on samples of complete
texts
3. A corpus is usually computer readable.
A collection of linguistics data either written text or
transcription of recorded data which can be used
as shorting point of linguistics descriptions or as a
means verifying hipothesis about language.
4. o The texts that do not belong to a single text type, subject field, or register.
o May include written or spoken language, or both.
o May include texts produced in one country or many
o They aim to represent language in its broadest sense and to serve as a widely
available resource for baseline or comparative studies of general linguistic
feature.
General Corpora
5. o Brown Corpus – 1 million words
o LOB Corpus – 1 million words
o BNC (British National Corpus) – 100 million words
General Corpora
Examples
6. o The texts that designed with more specific research goals in mind – register-
specific descriptions.
o It aims to be representative of a given type of text
o The kind of texts included are limited :
1. A time frame – such as a particular century
2. A social setting – such as conversation taking place in a bookshop
3. A given topic – such as newspaper articles dealing with particular thing
o May include written or spoken language, or both.
Specialized Corpora
7. o A specialized corpus of contemporary speech recorded at the University of
Michigan between 19997 and 2001
o It is freely available via the Web
o It contains 197 hours of recorded speech, totaling about 1.7 million words in
152 speech events.
o These speech events range from large lectures, etc.
General Corpora
Michigan Corpus of Academic Spoken English (MICASE)
8.
9. The Benefit of Corpus Analyse
• Corpus linguistics provides a more objective view of language than that of
introspection, intuition and anecdotes.
• A corpus-based analysis can investigate almost any language patterns--
lexical, structural, lexico-grammatical, discourse, phonological,
morphological
10. Applying Corpus Linguistics to Teaching
• The teacher would act as a research facilitator rather than the more
traditional imparter of knowledge.
• According to Willis (1998), students may be able to determine:
• the potential different meanings and uses of common words
• useful phrases and typical collocations they might use themselves
• useful phrases and typical collocations they might use themselves
• that certain language features are more typical of some kinds of text than
others