SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
The Old Bailey Corpus
  Spoken English in the 18th and
         19th centuries
   The use of historical court records in
   the investigation of language change
           Digital History Seminar, 21 February 2012

Magnus Huber
Department of English
University of Giessen
Otto-Behaghel-Str. 10B
D-35394 Giessen, Germany
magnus.huber@anglistik.uni-giessen.de
Structure
1. Introduction
  1.1 Corpus linguistics, sociolinguistics and
      sociohistorical linguistics
  1.2 The Proceedings of the Old Bailey
  1.3 Turning the Proceedings into a linguistic corpus
2. How linguistically accurate is OBC?
  2.1   Comparison with alternative accounts
  2.2   Language event and its representation
  2.3   Internal consistency: negative contraction
  2.4   Sociolinguistic potential: relative clauses
3. Brief summary                                      2
1. Introduction
1.1 Corpus linguistics, sociolinguistics and
     sociohistorical linguistics
Definition of linguistic corpus
Generally speaking, a
(usually large) collection of
machine-readable texts used
as a database in linguistic
analyses
Importance of
spoken language
Spoken language precedes
written language
Peter Trudgill (1974)
The social differentiation of English in Norwich
100                                Percentage
 80                                of (ng):[n] by
 60                                social class
 40                                and sex
 20                                  Female
  0                                  Male
      MMC LMC UWC MWC LWC
      MMC   middle middle class       drinking
      LMC   lower middle class
      UWC   upper working class
                                      (ng):[n]
      MWC   middle working class        = [drɪnkɪn]
      LWC   lower working class
Historical linguistics: language change
ye > you in subject position
when ye
come set it in
sech rewle as
ye seeme
best (1465)

And thus in
hast fare you
hartely well
(1545)
Sociohistorical linguistics
Gender-related change: ye > you
1.2 The Proceedings of the Old Bailey


•   Old Bailey = London's Central Criminal Court
•   meets 8 times/year, from 1830s 10 times/year
•   "Proceedings" published 1674-1913
•   start as a commercial enterprise: publishers
    send scribes into courtroom
•   proceedings taken down in shorthand
•   sold privately by publishers
•   City of London gains more and more control
    during 18th century
                                                   7
• 2100+ volumes
• ca. 200,000 trials
• ca. 134 million words
www.oldbaileyonline.org
Original computerized Proceedings (Sheffield)
<unit id="t17330510-1"><trial><info><identifier>t17330510-
1</identifier><source>173305100002</source><header>Sar
ah Sanders, theft: specified place, 10 May 1733.</header>
<pfro>17330510</pfro><ntrial>2</ntrial><psession>1733040
4</psession><nsession>17330628</nsession></info>
<p>1. <person gender="f"><defend
gender="f"><given>Sarah </given><surname>Sanders
</surname></defend></person>, was indicted for <off><theft
type="specified place">stealing a Portugal Piece of Gold,
value 36 s. a Gold Ring, value 10 s. a Gold Ring set with
Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value
10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the
Goods of <person gender="m"><victim
gender="m"><given>John </given><surname>Underwood
</surname></victim> </person>, in his House</theft></off>,
<cd>March 4</cd>.</p>
<p>John Underwood. The Prisoner was my
<deflabel>Servant</deflabel>, she came to me very well
recommended, but had not staid above ten Weeks before
several [. . .]
Original computerized Proceedings (Sheffield)
<unit id="t17330510-1"><trial><info><identifier>t17330510-
1</identifier><source>173305100002</source><header>Sar
ah Sanders, theft: specified place, 10 May 1733.</header>
<pfro>17330510</pfro><ntrial>2</ntrial><psession>1733040
4</psession><nsession>17330628</nsession></info>
<p>1. <person gender="f"><defend
gender="f"><given>Sarah </given><surname>Sanders
</surname></defend></person>, was indicted for <off><theft
type="specified place">stealing a Portugal Piece of Gold,
value 36 s. a Gold Ring, value 10 s. a Gold Ring set with
Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value
10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the
Goods of <person gender="m"><victim
gender="m"><given>John </given><surname>Underwood
</surname></victim> </person>, in his House</theft></off>,
<cd>March 4</cd>.</p>
<p>John Underwood. The Prisoner was my
<deflabel>Servant</deflabel>, she came to me very well
recommended, but had not staid above ten Weeks before
several [. . .]
Sociolinguistically useful XML-tags
in Sheffield Proceedings
• name
   <given>Sarah</given> <surname>Sanders</surname>
• year
   <identifier>t17180110-1</identifier>
• gender
   <defend gender="f">
• age
   <age>43</age>
• profession
   <deflabel>Servant</deflabel>
• origin
   <crimeloc>Tottenham</crimeloc>
1.3 Turning the Proceedings
    into a linguistic corpus of
    early spoken English




                                  13
<unit id="t17330510-1"><trial><info><identifier>t17330510-
1</identifier><source>173305100002</source><header>Sa
rah Sanders, theft: specified place, 10 May 1733.</header>
<pfro>17330510</pfro><ntrial>2</ntrial><psession>173304
04</psession><nsession>17330628</nsession></info>
<p>1. <person gender="f"><defend
gender="f"><given>Sarah </given><surname>Sanders
</surname></defend></person>, was indicted for
<off><theft type="specified place">stealing a Portugal Piece
of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set
with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle,
value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of
                <speech>
Holland, the Goods of <person gender="m"><victim
gender="m"><given>John </given><surname>Underwood
</surname></victim> </person>, in his House</theft></off>,
<cd>March 4</cd>.</p>
<p>John Underwood. The Prisoner was my
<deflabel>Servant</deflabel>, she came to me very well
recommended, but had not staid above ten Weeks before
several [. . .]
Tagging spoken language
• Need for automatic annotation
• Perl script identifying non-linguistic
  patterns indicating spoken language
  in the original proceedings
  – layout
  – metalinguistic information
• Linguistic markers indicating spoken
  language? > 1st + 2nd person prns
Automatic speech tagging
  e.g. "Q. – A."-sequences
  <speech>                          </speech>
    Q. Did you see him on Sunday night? - A.
<speech>
    Yes, at Walworth, on Sunday night, the

    12th of January, at one o'clock - I am sure
     </speech>
    of that.</p>
Sociobiographical speech event annotation
The New Bailey Tag Assistant




                                            17
- <xml>
  - <document name="19100426">              Social data file
    ...                                     • XML format
     - <speaker id="271">                   • attributes of every speaker
      <sex>m</sex>
      <age></age>
                                               in OBC
      <given>Thomas</given>                 • plus: scribe, printer,
      <surname>Tuckey</surname>                publisher
      <occupation>Warder</occupation>
      <occupation2></occupation2>
      <hiscolabel>Prison Guard</hiscolabel>
      <hiscocode>58930</hiscocode>
      <hiscolabel2></hiscolabel2>
      <hiscocode2></hiscocode2>
      <crimescene></crimescene>
      <birthplace></birthplace>
      <workplace>Wormwood Scrubs Prison</workplace>
      <placeofresidence></placeofresidence>
      <role>witness</role>
      </speaker>
     ...
  - </document>                                                       18
- </xml>
2. How linguistically accurate is OBC?
2.1. Comparison with alternative accounts, e.g.
     trial of John Ayliffe, 17591024-27, vs. alternative
     account The tryal at large of John Ayliffe

Proceedings (718 words)           Tryal (1290 words)
Thomas. I am clerk to Mr Jones,   Henry Thomas. I am clerk to Mr
a Stationer in the Temple.        Jones, a Stationer, in the Temple.
Hargrave. By Mr Ayliffe: I saw    Walter Hargrave. By Mr Ayliffe. – I
him seal and deliver it.          saw him sign, seal, and deliver it, as
                                  his act and deed.
./.                               John Fannen. I am not sure; but to
                                  the best of my remembrance, it was
                                  sometime the beginning of
                                  December last, at Mr Fox's house.
                                                                       19
Proceedings (718 words)              Tryal (1290 words)
Hargrave. Because he said he         Walter Hargrave. The reason Mr
was not willing Mr Fox should        Ayliffe gave, was, that he would not
know of it?                          on any account have it come to Mr
                                     Fox's ears.
Thomas. I can't particularly say     Henry Thomas. I cannot positively
that; sometimes we leave a           say. – We sometimes leave out the
blank by the gentlemens desire,      conclusion by gentlemen's desire, in
perhaps they may add another         order that they may add a covenant,
covenant, or something of that       or some such thing, if it should be
sort, I can't recollect the reason   thought necessary; but I cannot
for that.                            particularly recollect the reason why
                                     the conclusion was omitted in this
                                     case.


                                                                         20
2.2 Language event ↔ written representation


Letters
formulation     writing




Trial proceedings (e.g. Old Bailey Proceedings)
 speech       perception   shorthand   expanding    proof     type
  event        by scribe     script    shorthand   reading   setting




                                                                21
Gurney (1752)
Brachygraphy: or short-writing
'to take a Speech,
or Sermon
verbatim, as a
Person talks in
common' (p. 3)

Scribes
Thomas Gurney
(1749-1770)
Joseph Gurney
(1770-1782)


                                 22
Recording linguisticdetails
• no distinction between inflected and
  uninflected auxiliaries
         = 'may' or 'mayst'
         = 'can' or 'canst'
        = 'should' or 'shouldst'
• dot placed on the top left of the noun phrase
  = allomorphs a and an
• auxiliary contractions
           'you will' (you w-il) vs.         'you'll' (you-l)
 but │        'it will' ~ 'twill' (│= <t> and it)
                                                           23
2.3 Internal consistency:
     negative contraction
     e.g. do not > don't, need not > needn't, was not > wasn't
     N = 1,344,244
                      NEG contraction in %
18
16
14
12
10
8
6
4
2
0
                                                                24
      1732-1759 1760-1789 1790-1819 1820-1849 1850-1879 1818-1913
Negative contraction in the
OBC, 1732-1912 1. Lexeme?
AUX form    % contr.       N   AUX form % contr.        N
do not       28.9    189,776   is not     0.2      47,142
will not     27.7     17,302   must not   0.2       1,620
shall not    20.6      4,172   would not  0.2      52,123
cannot       13.3    106,005   had not    0.1      72,395
are not       3.2     11,552   has not    0.1       9,244
dare not      3.1        260   should not 0.1      20,192
need not      0.6      2,136   was not    0.1      64,574
did not       0.4    429,143   may not    0.0       1,271
does not      0.4      9,539   might not  0.0       2,404
have not      0.4     44,038   ought not  0.0       1,221
could not     0.2     85,361
                                                        25
Negative contraction in the
OBC, 1732-1912 2. Frequency?
AUX form    % contr.       N   AUX form % contr.        N
do not       28.9    189,776   is not     0.2      47,142
will not     27.7     17,302   must not   0.2       1,620
shall not    20.6      4,172   would not  0.2      52,123
cannot       13.3    106,005   had not    0.1      72,395
are not       3.2     11,552   has not    0.1       9,244
dare not      3.1        260   should not 0.1      20,192
need not      0.6      2,136   was not    0.1      64,574
did not       0.4    429,143   may not    0.0       1,271
does not      0.4      9,539   might not  0.0       2,404
have not      0.4     44,038   ought not  0.0       1,221
could not     0.2     85,361
                                                        26
Negative contraction in the
OBC, 1732-1912 3. Tense?
AUX form    % contr.       N   AUX form % contr.        N
do not       28.9    189,776   is not     0.2      47,142
will not     27.7     17,302   must not   0.2       1,620
shall not    20.6      4,172   would not  0.2      52,123
cannot       13.3    106,005   had not    0.1      72,395
are not       3.2     11,552   has not    0.1       9,244
dare not      3.1        260   should not 0.1      20,192
need not      0.6      2,136   was not    0.1      64,574
did not       0.4    429,143   may not    0.0       1,271
does not      0.4      9,539   might not  0.0       2,404
have not      0.4     44,038   ought not  0.0       1,221
could not     0.2     85,361
                                                        27
Explaining the absence of
negative contraction
• combination of phonology and genre
• n't is phonetically reduced, less salient than not
• do-don't [u - o(u)] vs. did-didn't [ɪ - ɪ]
  can-can't            vs. could-couldn't
  will-won't           vs. would-wouldn't
  shall-shan't         vs. should-shouldn't
• negative contraction is (near) absent where the
  context (e.g. change in the stem vowel in the
  negative) does not allow disambiguation
                                                       28
Hierarchy of perceptive difference
      between positive and negative
             contracted forms
                 V change   C change/   Score
                            addition
do-don('t)           1           1        2
will-won('t)         1           1        2
shall-shan('t)       0.5         1        1.5

can-can('t)          0.5         0        0.5
                                                29
2.4 Sociolinguistic potential: relative
     clauses
 • random extracts of speech events from OBC:
   20,000 words/decade (10,000 w. each for m + f)
 • 2500+ relative clauses, of which 1533 restrictive
      1720-     % 1780-     % 1840-      %     ∑       %
       1779        1839        1913
that    259   53.8  240   45.4  136   26.0    635   41.4
zero    107   22.2  118   22.3  201   38.4    426   27.8
which    70   14.6   97   18.3   92   17.6    259   16.9
who      38    7.9   69   13.0   89   17.0    196   12.8
whom      6    1.2    2    0.4    5    1.0     13    0.8
whose     1    0.2    3    0.6    0    0.0      4    0.3
∑       481         529         523          1533    30
Diagram 1 Distribution of that with regard to
          animacy of the head

          100%
           80%
           60%
           40%
           20%
            0%
                 1720-1779   1780-1839       1840-1913
       non-human    121         164             105
       human        137         76              31
                             1720-1779 vs 1780-1839 p = 0.000
                             1720-1779 vs 1840-1913 p = 0.000
                             1780-1839 vs 1840-1913 p = 0.070
                                                                31
Diagram 2 Distribution of that and pronominal
          relativizers with human heads

       100%
        80%
        60%
        40%
        20%
         0%
                1720-1779   1780-1839        1840-1913
         PRN        49         72               93
         that      137         76               31

                             1720-1779 vs 1780-1839: p = 0.000
                             1720-1779 vs 1840-1913: p = 0.000
                             1780-1839 vs 1840-1913: p = 0.000   32
Diagram 3 Relativizers by gender (excl. genitives)
                              p = 0.135       p = 0.001         p = 0.000
                100%
                 80%
                 60%
                 40%
                 20%
                  0%
                         f     m             f     m           f     m
                        1720-1779           1780-1839         1840-1913
                   PRN 43      71           56    112         66    119
                   zero 53     54           66     52         110    73
                   that 124   134           108   132         72     64
      f 1720-1779 vs 1780-1839: p = 0.135   m 1720-1779 vs 1780-1839: p = 0.033
      f 1720-1779 vs 1840-1913: p = 0.000   m 1720-1779 vs 1840-1913: p = 0.000
      f 1780-1839 vs 1840-1913: p = 0.000   m 1780-1839 vs 1840-1913: p = 0.000
Diagram 4 Zero relativizer by gender (excl. genitives)

                100%
                  80%
                  60%
                  40%
                  20%
                    0%
                          f     m            f     m           f     m
                         1720-1779          1780-1839         1840-1913
                   other 167   205          164 244           138   173
                   zero 53      54          66     52         110    73
      f 1720-1779 vs 1780-1839: p = 0.268   m 1720-1779 vs 1780-1839: p = 0.326
      f 1720-1779 vs 1840-1913: p = 0.000   m 1720-1779 vs 1840-1913: p = 0.022
      f 1780-1839 vs 1840-1913: p = 0.000   m 1780-1839 vs 1840-1913: p = 0.001
Thank you




            35
References
• Gurney, Thomas. 1752. Brachygraphy: or short-writing.
  2nd ed. London: [no publisher].
• Nevalainen, Terttu & Raumolin-Brunberg, Helena (eds).
  1996. Sociolinguistics and language history: studies
  based on the corpus of early English correspondence.
  Amsterdam: Rodopi.
• Trudgill, Peter. 1974. The Social Differentiation of
  English in Norwich. Cambridge: Cambridge University
  Press.
• van Leeuwen, Marco H.D., Ineke Maas and Andrew
  Miles. 2002. HISCO: Historical international standard
  classification of occupations. Leuven: Leuven University
  Press.                                                  36

Weitere ähnliche Inhalte

Mehr von Digital History

Identifying responses to revolution
Identifying responses to revolutionIdentifying responses to revolution
Identifying responses to revolutionDigital History
 
Chance encounters with the past
Chance encounters with the pastChance encounters with the past
Chance encounters with the pastDigital History
 
The lives and criminal careers of juvenile offenders
The lives and criminal careers of juvenile offendersThe lives and criminal careers of juvenile offenders
The lives and criminal careers of juvenile offendersDigital History
 
Tudor Intelligence Networks - Ruth Ahnert
Tudor Intelligence Networks - Ruth AhnertTudor Intelligence Networks - Ruth Ahnert
Tudor Intelligence Networks - Ruth AhnertDigital History
 
The Pictorial publisher - Agents technologies and the illustrrated book in Br...
The Pictorial publisher - Agents technologies and the illustrrated book in Br...The Pictorial publisher - Agents technologies and the illustrrated book in Br...
The Pictorial publisher - Agents technologies and the illustrrated book in Br...Digital History
 
Cordell scientific american
Cordell scientific americanCordell scientific american
Cordell scientific americanDigital History
 
Political Meetings Mapper with British Library Labs: mapping the origins of B...
Political Meetings Mapper with British Library Labs: mapping the origins of B...Political Meetings Mapper with British Library Labs: mapping the origins of B...
Political Meetings Mapper with British Library Labs: mapping the origins of B...Digital History
 
European or Imperial Metropolis? Depictions of London in British Newspapers, ...
European or Imperial Metropolis? Depictions of London in British Newspapers, ...European or Imperial Metropolis? Depictions of London in British Newspapers, ...
European or Imperial Metropolis? Depictions of London in British Newspapers, ...Digital History
 
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...Digital History
 
Emma Bayne: ‘Traces Through Time overview and next steps’
Emma Bayne: ‘Traces Through Time overview and next steps’ Emma Bayne: ‘Traces Through Time overview and next steps’
Emma Bayne: ‘Traces Through Time overview and next steps’ Digital History
 
Sonia Ranade: 'Traces Through Time overview and next steps'
Sonia Ranade: 'Traces Through Time overview and next steps'Sonia Ranade: 'Traces Through Time overview and next steps'
Sonia Ranade: 'Traces Through Time overview and next steps'Digital History
 
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Digital History
 
Writing a Big Data History of Music
Writing a Big Data History of MusicWriting a Big Data History of Music
Writing a Big Data History of MusicDigital History
 
Text Mining the History of Medicine
Text Mining the History of MedicineText Mining the History of Medicine
Text Mining the History of MedicineDigital History
 
Tracking the Emergence of New Words across Time and Space
Tracking the Emergence of New Words across Time and SpaceTracking the Emergence of New Words across Time and Space
Tracking the Emergence of New Words across Time and SpaceDigital History
 
Citizen History and its Discontents
Citizen History and its DiscontentsCitizen History and its Discontents
Citizen History and its DiscontentsDigital History
 

Mehr von Digital History (20)

Identifying responses to revolution
Identifying responses to revolutionIdentifying responses to revolution
Identifying responses to revolution
 
Chance encounters with the past
Chance encounters with the pastChance encounters with the past
Chance encounters with the past
 
The lives and criminal careers of juvenile offenders
The lives and criminal careers of juvenile offendersThe lives and criminal careers of juvenile offenders
The lives and criminal careers of juvenile offenders
 
History of teaching ihr
History of teaching ihrHistory of teaching ihr
History of teaching ihr
 
Tudor Intelligence Networks - Ruth Ahnert
Tudor Intelligence Networks - Ruth AhnertTudor Intelligence Networks - Ruth Ahnert
Tudor Intelligence Networks - Ruth Ahnert
 
The Pictorial publisher - Agents technologies and the illustrrated book in Br...
The Pictorial publisher - Agents technologies and the illustrrated book in Br...The Pictorial publisher - Agents technologies and the illustrrated book in Br...
The Pictorial publisher - Agents technologies and the illustrrated book in Br...
 
Cordell scientific american
Cordell scientific americanCordell scientific american
Cordell scientific american
 
Mapping paris
Mapping parisMapping paris
Mapping paris
 
Political Meetings Mapper with British Library Labs: mapping the origins of B...
Political Meetings Mapper with British Library Labs: mapping the origins of B...Political Meetings Mapper with British Library Labs: mapping the origins of B...
Political Meetings Mapper with British Library Labs: mapping the origins of B...
 
European or Imperial Metropolis? Depictions of London in British Newspapers, ...
European or Imperial Metropolis? Depictions of London in British Newspapers, ...European or Imperial Metropolis? Depictions of London in British Newspapers, ...
European or Imperial Metropolis? Depictions of London in British Newspapers, ...
 
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
 
Emma Bayne: ‘Traces Through Time overview and next steps’
Emma Bayne: ‘Traces Through Time overview and next steps’ Emma Bayne: ‘Traces Through Time overview and next steps’
Emma Bayne: ‘Traces Through Time overview and next steps’
 
Ihr june15-evans
Ihr june15-evansIhr june15-evans
Ihr june15-evans
 
Petrie ihr presentation
Petrie ihr presentationPetrie ihr presentation
Petrie ihr presentation
 
Sonia Ranade: 'Traces Through Time overview and next steps'
Sonia Ranade: 'Traces Through Time overview and next steps'Sonia Ranade: 'Traces Through Time overview and next steps'
Sonia Ranade: 'Traces Through Time overview and next steps'
 
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
 
Writing a Big Data History of Music
Writing a Big Data History of MusicWriting a Big Data History of Music
Writing a Big Data History of Music
 
Text Mining the History of Medicine
Text Mining the History of MedicineText Mining the History of Medicine
Text Mining the History of Medicine
 
Tracking the Emergence of New Words across Time and Space
Tracking the Emergence of New Words across Time and SpaceTracking the Emergence of New Words across Time and Space
Tracking the Emergence of New Words across Time and Space
 
Citizen History and its Discontents
Citizen History and its DiscontentsCitizen History and its Discontents
Citizen History and its Discontents
 

Kürzlich hochgeladen

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 

Kürzlich hochgeladen (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 

Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

  • 1. The Old Bailey Corpus Spoken English in the 18th and 19th centuries The use of historical court records in the investigation of language change Digital History Seminar, 21 February 2012 Magnus Huber Department of English University of Giessen Otto-Behaghel-Str. 10B D-35394 Giessen, Germany magnus.huber@anglistik.uni-giessen.de
  • 2. Structure 1. Introduction 1.1 Corpus linguistics, sociolinguistics and sociohistorical linguistics 1.2 The Proceedings of the Old Bailey 1.3 Turning the Proceedings into a linguistic corpus 2. How linguistically accurate is OBC? 2.1 Comparison with alternative accounts 2.2 Language event and its representation 2.3 Internal consistency: negative contraction 2.4 Sociolinguistic potential: relative clauses 3. Brief summary 2
  • 3. 1. Introduction 1.1 Corpus linguistics, sociolinguistics and sociohistorical linguistics Definition of linguistic corpus Generally speaking, a (usually large) collection of machine-readable texts used as a database in linguistic analyses Importance of spoken language Spoken language precedes written language
  • 4. Peter Trudgill (1974) The social differentiation of English in Norwich 100 Percentage 80 of (ng):[n] by 60 social class 40 and sex 20 Female 0 Male MMC LMC UWC MWC LWC MMC middle middle class drinking LMC lower middle class UWC upper working class (ng):[n] MWC middle working class = [drɪnkɪn] LWC lower working class
  • 5. Historical linguistics: language change ye > you in subject position when ye come set it in sech rewle as ye seeme best (1465) And thus in hast fare you hartely well (1545)
  • 7. 1.2 The Proceedings of the Old Bailey • Old Bailey = London's Central Criminal Court • meets 8 times/year, from 1830s 10 times/year • "Proceedings" published 1674-1913 • start as a commercial enterprise: publishers send scribes into courtroom • proceedings taken down in shorthand • sold privately by publishers • City of London gains more and more control during 18th century 7
  • 8. • 2100+ volumes • ca. 200,000 trials • ca. 134 million words
  • 10. Original computerized Proceedings (Sheffield) <unit id="t17330510-1"><trial><info><identifier>t17330510- 1</identifier><source>173305100002</source><header>Sar ah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>1733040 4</psession><nsession>17330628</nsession></info> <p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p> <p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]
  • 11. Original computerized Proceedings (Sheffield) <unit id="t17330510-1"><trial><info><identifier>t17330510- 1</identifier><source>173305100002</source><header>Sar ah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>1733040 4</psession><nsession>17330628</nsession></info> <p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p> <p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]
  • 12. Sociolinguistically useful XML-tags in Sheffield Proceedings • name <given>Sarah</given> <surname>Sanders</surname> • year <identifier>t17180110-1</identifier> • gender <defend gender="f"> • age <age>43</age> • profession <deflabel>Servant</deflabel> • origin <crimeloc>Tottenham</crimeloc>
  • 13. 1.3 Turning the Proceedings into a linguistic corpus of early spoken English 13
  • 14. <unit id="t17330510-1"><trial><info><identifier>t17330510- 1</identifier><source>173305100002</source><header>Sa rah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>173304 04</psession><nsession>17330628</nsession></info> <p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of <speech> Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p> <p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]
  • 15. Tagging spoken language • Need for automatic annotation • Perl script identifying non-linguistic patterns indicating spoken language in the original proceedings – layout – metalinguistic information • Linguistic markers indicating spoken language? > 1st + 2nd person prns
  • 16. Automatic speech tagging e.g. "Q. – A."-sequences <speech> </speech> Q. Did you see him on Sunday night? - A. <speech> Yes, at Walworth, on Sunday night, the 12th of January, at one o'clock - I am sure </speech> of that.</p>
  • 17. Sociobiographical speech event annotation The New Bailey Tag Assistant 17
  • 18. - <xml> - <document name="19100426"> Social data file ... • XML format - <speaker id="271"> • attributes of every speaker <sex>m</sex> <age></age> in OBC <given>Thomas</given> • plus: scribe, printer, <surname>Tuckey</surname> publisher <occupation>Warder</occupation> <occupation2></occupation2> <hiscolabel>Prison Guard</hiscolabel> <hiscocode>58930</hiscocode> <hiscolabel2></hiscolabel2> <hiscocode2></hiscocode2> <crimescene></crimescene> <birthplace></birthplace> <workplace>Wormwood Scrubs Prison</workplace> <placeofresidence></placeofresidence> <role>witness</role> </speaker> ... - </document> 18 - </xml>
  • 19. 2. How linguistically accurate is OBC? 2.1. Comparison with alternative accounts, e.g. trial of John Ayliffe, 17591024-27, vs. alternative account The tryal at large of John Ayliffe Proceedings (718 words) Tryal (1290 words) Thomas. I am clerk to Mr Jones, Henry Thomas. I am clerk to Mr a Stationer in the Temple. Jones, a Stationer, in the Temple. Hargrave. By Mr Ayliffe: I saw Walter Hargrave. By Mr Ayliffe. – I him seal and deliver it. saw him sign, seal, and deliver it, as his act and deed. ./. John Fannen. I am not sure; but to the best of my remembrance, it was sometime the beginning of December last, at Mr Fox's house. 19
  • 20. Proceedings (718 words) Tryal (1290 words) Hargrave. Because he said he Walter Hargrave. The reason Mr was not willing Mr Fox should Ayliffe gave, was, that he would not know of it? on any account have it come to Mr Fox's ears. Thomas. I can't particularly say Henry Thomas. I cannot positively that; sometimes we leave a say. – We sometimes leave out the blank by the gentlemens desire, conclusion by gentlemen's desire, in perhaps they may add another order that they may add a covenant, covenant, or something of that or some such thing, if it should be sort, I can't recollect the reason thought necessary; but I cannot for that. particularly recollect the reason why the conclusion was omitted in this case. 20
  • 21. 2.2 Language event ↔ written representation Letters formulation writing Trial proceedings (e.g. Old Bailey Proceedings) speech perception shorthand expanding proof type event by scribe script shorthand reading setting 21
  • 22. Gurney (1752) Brachygraphy: or short-writing 'to take a Speech, or Sermon verbatim, as a Person talks in common' (p. 3) Scribes Thomas Gurney (1749-1770) Joseph Gurney (1770-1782) 22
  • 23. Recording linguisticdetails • no distinction between inflected and uninflected auxiliaries = 'may' or 'mayst' = 'can' or 'canst'  = 'should' or 'shouldst' • dot placed on the top left of the noun phrase = allomorphs a and an • auxiliary contractions 'you will' (you w-il) vs. 'you'll' (you-l) but │ 'it will' ~ 'twill' (│= <t> and it) 23
  • 24. 2.3 Internal consistency: negative contraction e.g. do not > don't, need not > needn't, was not > wasn't N = 1,344,244 NEG contraction in % 18 16 14 12 10 8 6 4 2 0 24 1732-1759 1760-1789 1790-1819 1820-1849 1850-1879 1818-1913
  • 25. Negative contraction in the OBC, 1732-1912 1. Lexeme? AUX form % contr. N AUX form % contr. N do not 28.9 189,776 is not 0.2 47,142 will not 27.7 17,302 must not 0.2 1,620 shall not 20.6 4,172 would not 0.2 52,123 cannot 13.3 106,005 had not 0.1 72,395 are not 3.2 11,552 has not 0.1 9,244 dare not 3.1 260 should not 0.1 20,192 need not 0.6 2,136 was not 0.1 64,574 did not 0.4 429,143 may not 0.0 1,271 does not 0.4 9,539 might not 0.0 2,404 have not 0.4 44,038 ought not 0.0 1,221 could not 0.2 85,361 25
  • 26. Negative contraction in the OBC, 1732-1912 2. Frequency? AUX form % contr. N AUX form % contr. N do not 28.9 189,776 is not 0.2 47,142 will not 27.7 17,302 must not 0.2 1,620 shall not 20.6 4,172 would not 0.2 52,123 cannot 13.3 106,005 had not 0.1 72,395 are not 3.2 11,552 has not 0.1 9,244 dare not 3.1 260 should not 0.1 20,192 need not 0.6 2,136 was not 0.1 64,574 did not 0.4 429,143 may not 0.0 1,271 does not 0.4 9,539 might not 0.0 2,404 have not 0.4 44,038 ought not 0.0 1,221 could not 0.2 85,361 26
  • 27. Negative contraction in the OBC, 1732-1912 3. Tense? AUX form % contr. N AUX form % contr. N do not 28.9 189,776 is not 0.2 47,142 will not 27.7 17,302 must not 0.2 1,620 shall not 20.6 4,172 would not 0.2 52,123 cannot 13.3 106,005 had not 0.1 72,395 are not 3.2 11,552 has not 0.1 9,244 dare not 3.1 260 should not 0.1 20,192 need not 0.6 2,136 was not 0.1 64,574 did not 0.4 429,143 may not 0.0 1,271 does not 0.4 9,539 might not 0.0 2,404 have not 0.4 44,038 ought not 0.0 1,221 could not 0.2 85,361 27
  • 28. Explaining the absence of negative contraction • combination of phonology and genre • n't is phonetically reduced, less salient than not • do-don't [u - o(u)] vs. did-didn't [ɪ - ɪ] can-can't vs. could-couldn't will-won't vs. would-wouldn't shall-shan't vs. should-shouldn't • negative contraction is (near) absent where the context (e.g. change in the stem vowel in the negative) does not allow disambiguation 28
  • 29. Hierarchy of perceptive difference between positive and negative contracted forms V change C change/ Score addition do-don('t) 1 1 2 will-won('t) 1 1 2 shall-shan('t) 0.5 1 1.5 can-can('t) 0.5 0 0.5 29
  • 30. 2.4 Sociolinguistic potential: relative clauses • random extracts of speech events from OBC: 20,000 words/decade (10,000 w. each for m + f) • 2500+ relative clauses, of which 1533 restrictive 1720- % 1780- % 1840- % ∑ % 1779 1839 1913 that 259 53.8 240 45.4 136 26.0 635 41.4 zero 107 22.2 118 22.3 201 38.4 426 27.8 which 70 14.6 97 18.3 92 17.6 259 16.9 who 38 7.9 69 13.0 89 17.0 196 12.8 whom 6 1.2 2 0.4 5 1.0 13 0.8 whose 1 0.2 3 0.6 0 0.0 4 0.3 ∑ 481 529 523 1533 30
  • 31. Diagram 1 Distribution of that with regard to animacy of the head 100% 80% 60% 40% 20% 0% 1720-1779 1780-1839 1840-1913 non-human 121 164 105 human 137 76 31 1720-1779 vs 1780-1839 p = 0.000 1720-1779 vs 1840-1913 p = 0.000 1780-1839 vs 1840-1913 p = 0.070 31
  • 32. Diagram 2 Distribution of that and pronominal relativizers with human heads 100% 80% 60% 40% 20% 0% 1720-1779 1780-1839 1840-1913 PRN 49 72 93 that 137 76 31 1720-1779 vs 1780-1839: p = 0.000 1720-1779 vs 1840-1913: p = 0.000 1780-1839 vs 1840-1913: p = 0.000 32
  • 33. Diagram 3 Relativizers by gender (excl. genitives) p = 0.135 p = 0.001 p = 0.000 100% 80% 60% 40% 20% 0% f m f m f m 1720-1779 1780-1839 1840-1913 PRN 43 71 56 112 66 119 zero 53 54 66 52 110 73 that 124 134 108 132 72 64 f 1720-1779 vs 1780-1839: p = 0.135 m 1720-1779 vs 1780-1839: p = 0.033 f 1720-1779 vs 1840-1913: p = 0.000 m 1720-1779 vs 1840-1913: p = 0.000 f 1780-1839 vs 1840-1913: p = 0.000 m 1780-1839 vs 1840-1913: p = 0.000
  • 34. Diagram 4 Zero relativizer by gender (excl. genitives) 100% 80% 60% 40% 20% 0% f m f m f m 1720-1779 1780-1839 1840-1913 other 167 205 164 244 138 173 zero 53 54 66 52 110 73 f 1720-1779 vs 1780-1839: p = 0.268 m 1720-1779 vs 1780-1839: p = 0.326 f 1720-1779 vs 1840-1913: p = 0.000 m 1720-1779 vs 1840-1913: p = 0.022 f 1780-1839 vs 1840-1913: p = 0.000 m 1780-1839 vs 1840-1913: p = 0.001
  • 35. Thank you 35
  • 36. References • Gurney, Thomas. 1752. Brachygraphy: or short-writing. 2nd ed. London: [no publisher]. • Nevalainen, Terttu & Raumolin-Brunberg, Helena (eds). 1996. Sociolinguistics and language history: studies based on the corpus of early English correspondence. Amsterdam: Rodopi. • Trudgill, Peter. 1974. The Social Differentiation of English in Norwich. Cambridge: Cambridge University Press. • van Leeuwen, Marco H.D., Ineke Maas and Andrew Miles. 2002. HISCO: Historical international standard classification of occupations. Leuven: Leuven University Press. 36