1. What can a copus tell us about multi-word units? Chris Greaves and Martin Warren GROUP 2
2. Background Thecontextisthebestwaytoknowthemeaning of a word. Thesecontextprovideusassociations of words, which are call “ collocations”. Itispossibletoapplythe test of “collocability” tothisassociations. Itreferstothe idea thatwordscollocateswhenthey are associatedwith a frequencythatexcludethepossibility of co-ocurrances. This test of collocability shows thatwords has preferencesforcombination. Lexical repulsionisthetendencyforwordsnottobeassociated.
3. In the 1960’s thanks to computers, a study of English collocations described three fundamental statement : 1.Primacy of lexis over grammar in terms of meaning creation. 2. Meaning is created through the co-selection of words. 3. By virtue of the way in which meaning is created, language is phraseological in nature which is embodied in his famous “idiom principle”.
4. What is a multi word unit? The study of these elements and their extent in meaning, pattern grammar, phraseology, etc is concern with words which share pattern features, but which may differ in other respects in their phraseologies. Most of the studies of multi-word units have focused on n-grams.
5. N-grams Are frequently occurring contiguous words that constitute a phrase or a pattern of use. Are group together based on the number of words they contain. The frequency of n-grams decreases dramatically relative to their size. This phenomena implicates the undoubted prevalence of phraseology in the language does not mean that language is not unique or creative. A sequence as short as ten running words has a very high chance of being a unique occurrence This confirms the phraseological tendency in language as its uniqueness and creativity.
6. Studying N-grams Restricting the size of n-grams. For example avoiding the two word n-grams. Depending on the situation: A. A less inclusive approach: ignore strings that are incomplete or span two syntactic units. B. A broad approach: which keep all recurring contiguous grouping of words in their lists of data as long as they meet the threshold frequency level.
12. REGISTER/GENRE SPECIFICITY WRITTEN N-grams to express time and place relations (in the middle of the night). N-grams used when describing possession, agency, purpose, goal and direction (to the). Linking (as a result of). SPOKEN N-grams to reflect interpersonal meanings (you know). Being vague (something like that).
13. REGISTER/GENRE SPECIFICITY Academic genres – genre-specific features in language use. N-grams characterise the conventions of academic spoken and written discourses – better appreciation of differences between disciplines. (for example, the importance of, in the case of)
14. REGISTER/GENRE SPECIFICITY Scott & Tribble (2006:132) – British National Corpus (BNC). Clusters provide insights into the phraseology used in different contexts. Certain structures occur with differing frequencies across the four corpora: 1. One of the most. 2. One of the main. 3. One of the major. 4. One of the first. 5. And one of the reasons...
15. IDIOMS Certain words are “idiomoprone”, they are “basic cognitive metaphors”: parts of the body, money, and light and colour (let’s face it, face to face). Study texts from the corpus qualitatively to identify idioms (fair enough, at the end of the day). Can be described in terms of their functions and registers- and genre-specificity .
17. Sinclair defines the n-grams as the nearest complete grammaticalstructure. He statesthat the relationbetween the n-grams and the grammaticalunitsisdoomedtofailbecause «A grammarmustremainaware of lexis, and that the patterns of lexiscannor be reconciledwiththose of a traditionalgrammar».
18. Biber and hisgroupworkhave extended the methodsusedtoidentify lexical bundlestoallowforvariationson a pattern. Butthis can bringuscomplicationsorproblemsbecausewewouldneedtoidentifyall the lexical bundlesacrosslarge corpus of texts. Sinclair created a modeltoidentify and describe lexical items. Thisitemis analysed in itscotext and context and itisusuallyphrasal.
20. Single-wordfrequencies are not a veryreliable guide nor n-gramswhenwewanttoidentify lexical items. Chengdistinguishedbetween «co-ocurring» and «associated» words. Studies of concgramssuggestthattheyhelp in the identification of threekinds of multi-wordunits. The study of concgramsisusefultoidentify and describe units.
21. «Clausecollocations» exhibit extreme variation. They are the product of the tendencyfor particular types of clausetoco-occur in discourses. Multi-wordunits are pervasive in language, concgrams can be usedtoextent the notion of keyness.
22. What has corpus researchintomulti-wordunitstoldusaboutphraseologythatwedidnotknowbefore?
23.
24.
25. Hoey’shypotheses Everywordisprimedtooccurwith particular otherwords, semantic sets, pragmaticfunctions and grammatical positions. Wordswhich are eitherco-hyponymsorsynonymsdifferwithrespecttotheircollocations, semanticasociations… Words are primedfor use in oneor more grammatical roles and toparticipate in, oravoid, particular types of cohesiverealtion in a discourse