SlideShare ist ein Scribd-Unternehmen logo
1 von 24
1
Corpus Protocols:
digital transformations of commercial
newspaper collections for text and data
mining to support academic research
IFLA 2014 Pre-Conference: Digital Transformation and the Changing Role of News
Media in the 21st Century
August 13-14, 2014
International Telecommunication Union, Geneva, Switzerland
Conceptualising new researchWorking together: building research relationships
Corpus Protocols Project Team
• Seth Cayley: Head of Research Solutions, Cengage Learning EMEA
• Mike Gardner: Analyst and Web Developer in Web Technologies,
University of Nottingham
• Kat Gupta: Corpus Protocols Researcher, University of Nottingham
• Michaela Mahlberg: Professor of English Language and Linguistics,
University of Nottingham
• Neil Smyth: Faculty Team Leader – Arts, University of Nottingham
• Stella Wisdom: Digital Curator at the British Library
3
Context
• Focusing on textual data (corpus) we review
relationships between institutional infra-structure,
external business partners and research
questions and methods that deal with textual
data
• Exploring opportunities to future developments
(protocols)
8/18/2014 Event Name and Venue 4
Problem (practical and research)
• Some University of Nottingham owned data sets are
not available for research for several reasons,
including:
– licence agreements restrict copying for data analytics;
– additional charges for additional licenses;
– technical issues: copying storage, accessibility of data;
– and, a lack of shared understanding of research trends
in applied linguistics, corpus linguistics and corpus
stylistics.
5
Aims
• To articulate the relationship between data and tools
for text and data mining
• Prototype a methodology for using locally stored data
using existing corpus linguistic tools
• Use an existing corpus linguist tool to analyse a
University owned data set that will be made available
on the University network. 6
Corpus Protocols: didn’t start with discourse
research questions
• Declassified Documents as opportunistic data
• The Declassified Documents corpus
– 197,837,328 words in total
– divided into 30 yearly subcorpora (between 3.3 to
8.3 million words each) 7
UK Copyright legislation: 2014
"the making of a copy of a work by a person who has lawful
access to the work does not infringe copyright in the work
provided that (a) the copy is made in order that a person who
has lawful access to the work may carry out a computational
analysis of anything recorded in the work for the sole
purpose of research for a non-commercial purpose, and (b)
the copy is accompanied by a sufficient acknowledgement.”
(The Copyright and Rights in Performances (Research,
Education, Libraries and Archives) Regulations 2014. 8
Corpus methods:
• Discourse as representation: labeling, naming,
creating reality (e.g. weapons of mass
destruction).
• Lexis in context
8/18/2014 9
8/18/2014 10
Reading vertically
Getting behind the
interface
8/18/2014 11
TIDAL: TImes Data Archive Lab project
• Continuing partnership with Cengage
• Times Digital Archive
• University of Michigan
• Investigating social reality in 19th century based
on news discourse (to complement picture of
Dickensian London)
12
British Library News Data: Future Plans
The British Library has one of the world's greatest
news archives. Our collection of UK, Irish and world
newspapers numbers over 60 million issues, from the
17th century to the present day, and we have growing
collections of television, radio and web news.
http://britishlibrary.typepad.co.uk/thenewsroom/
https://twitter.com/BL_newsroom 13
British Library News Data: Future Plans
Print newspapers from Colindale moved to a new
newspaper storage building in Boston Spa in Yorkshire
8/18/2014 14
8/18/2014 15
British Library: Newsroom Opened 7th April 2014
8/18/2014 16
British Library News Data: Future Plans
Broadcast News:
Daily television and radio news programmes broadcast
in the UK since May 2010 available though an instant
access service in the Reading Rooms, new
programmes available within hours of broadcast.
Over 60 hours of news are recorded every day from 22
channels including BBC, ITV, Channel 4, Sky News, Al
Jazeera English and CNN.8/18/2014 17
8/18/2014 18
British Library News Data: Future Plans
Many recorded news
programmes come
with subtitles
A corpus of data and
metadata with
potential for
innovative research 19
8/18/2014 21
8/18/2014 23
24
Corpus Protocols:
digital transformations of commercial
newspaper collections for text and data
mining to support academic research
IFLA 2014 Pre-Conference: Digital Transformation and the Changing Role of News
Media in the 21st Century
August 13-14, 2014
International Telecommunication Union, Geneva, Switzerland

Weitere ähnliche Inhalte

Was ist angesagt?

Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshellcneudecker
 
Isaac - W3C Data on the Web Best Practices - Data Vocabularies
Isaac - W3C Data on the Web Best Practices - Data VocabulariesIsaac - W3C Data on the Web Best Practices - Data Vocabularies
Isaac - W3C Data on the Web Best Practices - Data VocabulariesAntoine Isaac
 
Tuesday 5 May: Negotiating the archives of UK web space, Jane Winters, Univer...
Tuesday 5 May: Negotiating the archives of UK web space, Jane Winters, Univer...Tuesday 5 May: Negotiating the archives of UK web space, Jane Winters, Univer...
Tuesday 5 May: Negotiating the archives of UK web space, Jane Winters, Univer...WARCnet
 
You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?The European Library
 
Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Andy Jackson
 
Building a Collection of the Historical UK Web for scholarly use
Building a Collection of the Historical UK Web for scholarly useBuilding a Collection of the Historical UK Web for scholarly use
Building a Collection of the Historical UK Web for scholarly useALISS
 
Legal deposit in Norway and new ways of archiving recorded sound - Keeping Tr...
Legal deposit in Norway and new ways of archiving recorded sound - Keeping Tr...Legal deposit in Norway and new ways of archiving recorded sound - Keeping Tr...
Legal deposit in Norway and new ways of archiving recorded sound - Keeping Tr...awilson_bl
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Robert H. McDonald
 
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...Péter Király
 
Berlin 6 Open Access Conference: Deirdre Furlong
Berlin 6 Open Access Conference: Deirdre FurlongBerlin 6 Open Access Conference: Deirdre Furlong
Berlin 6 Open Access Conference: Deirdre FurlongCornelius Puschmann
 
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...TU Delft, Netherlands
 
Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...
Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...
Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...Equipex Biblissima
 
Gaenovium - Open data in the Netherlands
Gaenovium - Open data in the NetherlandsGaenovium - Open data in the Netherlands
Gaenovium - Open data in the NetherlandsBob Coret
 

Was ist angesagt? (18)

Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
 
Isaac - W3C Data on the Web Best Practices - Data Vocabularies
Isaac - W3C Data on the Web Best Practices - Data VocabulariesIsaac - W3C Data on the Web Best Practices - Data Vocabularies
Isaac - W3C Data on the Web Best Practices - Data Vocabularies
 
Tuesday 5 May: Negotiating the archives of UK web space, Jane Winters, Univer...
Tuesday 5 May: Negotiating the archives of UK web space, Jane Winters, Univer...Tuesday 5 May: Negotiating the archives of UK web space, Jane Winters, Univer...
Tuesday 5 May: Negotiating the archives of UK web space, Jane Winters, Univer...
 
You've Digitised. What Next ?
You've Digitised. What Next ?You've Digitised. What Next ?
You've Digitised. What Next ?
 
You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?
 
Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27
 
Building a Collection of the Historical UK Web for scholarly use
Building a Collection of the Historical UK Web for scholarly useBuilding a Collection of the Historical UK Web for scholarly use
Building a Collection of the Historical UK Web for scholarly use
 
Legal deposit in Norway and new ways of archiving recorded sound - Keeping Tr...
Legal deposit in Norway and new ways of archiving recorded sound - Keeping Tr...Legal deposit in Norway and new ways of archiving recorded sound - Keeping Tr...
Legal deposit in Norway and new ways of archiving recorded sound - Keeping Tr...
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
 
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
 
D2L_2014
D2L_2014D2L_2014
D2L_2014
 
Ceba geoportail
Ceba geoportailCeba geoportail
Ceba geoportail
 
Berlin 6 Open Access Conference: Deirdre Furlong
Berlin 6 Open Access Conference: Deirdre FurlongBerlin 6 Open Access Conference: Deirdre Furlong
Berlin 6 Open Access Conference: Deirdre Furlong
 
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
 
Europeana Newspapers -
Europeana Newspapers - Europeana Newspapers -
Europeana Newspapers -
 
Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...
Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...
Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...
 
Gaenovium - Open data in the Netherlands
Gaenovium - Open data in the NetherlandsGaenovium - Open data in the Netherlands
Gaenovium - Open data in the Netherlands
 
2013 05-23-knowledge triangle
2013 05-23-knowledge triangle2013 05-23-knowledge triangle
2013 05-23-knowledge triangle
 

Andere mochten auch (7)

James Baker's slides from the Bloomsbury DH Meeting 30/09/2013
James Baker's slides from the Bloomsbury DH Meeting 30/09/2013James Baker's slides from the Bloomsbury DH Meeting 30/09/2013
James Baker's slides from the Bloomsbury DH Meeting 30/09/2013
 
DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013
 
Lm gsa training
Lm gsa trainingLm gsa training
Lm gsa training
 
Digital Scholarship Training Programme @ British Library
Digital Scholarship Training Programme @ British Library Digital Scholarship Training Programme @ British Library
Digital Scholarship Training Programme @ British Library
 
BL Digital Scholarship
BL Digital Scholarship BL Digital Scholarship
BL Digital Scholarship
 
Tech WG report 2011
Tech WG report 2011Tech WG report 2011
Tech WG report 2011
 
TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...
 

Ähnlich wie Corpus Protocols: digital transformations for TDM

Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryMia
 
NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three dri_ireland
 
A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...EDINA, University of Edinburgh
 
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ projectlabsbl
 
Presentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of SciencesPresentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of Scienceslabsbl
 
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...The European Library
 
‘Everything Available’ – the strategy for the British Library’s research serv...
‘Everything Available’ – the strategy for the British Library’s research serv...‘Everything Available’ – the strategy for the British Library’s research serv...
‘Everything Available’ – the strategy for the British Library’s research serv...Torsten Reimer
 
ENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project OverviewENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project OverviewEuropeana Newspapers
 
eResearch-Oz-Bellamy
eResearch-Oz-BellamyeResearch-Oz-Bellamy
eResearch-Oz-BellamyCraig Bellamy
 
JISC: Supporting The Future of Research
JISC: Supporting The Future of ResearchJISC: Supporting The Future of Research
JISC: Supporting The Future of ResearchJisc
 
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...The European Library
 
Living with Machines at The Past, Present and Future of Digital Scholarship w...
Living with Machines at The Past, Present and Future of Digital Scholarship w...Living with Machines at The Past, Present and Future of Digital Scholarship w...
Living with Machines at The Past, Present and Future of Digital Scholarship w...Mia
 
Enabling complex analysis of large scale digital collections
Enabling complex analysis of large scale digital collectionsEnabling complex analysis of large scale digital collections
Enabling complex analysis of large scale digital collectionsJisc
 
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.Mike Mertens
 
More than just books - British Library Labs Presentation given at MSc Compute...
More than just books - British Library Labs Presentation given at MSc Compute...More than just books - British Library Labs Presentation given at MSc Compute...
More than just books - British Library Labs Presentation given at MSc Compute...labsbl
 

Ähnlich wie Corpus Protocols: digital transformations for TDM (20)

Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British Library
 
NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three
 
A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...
 
Securing continuing access to ejournal content
Securing continuing access to ejournal contentSecuring continuing access to ejournal content
Securing continuing access to ejournal content
 
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
 
NECTAR_VRE1
NECTAR_VRE1NECTAR_VRE1
NECTAR_VRE1
 
Presentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of SciencesPresentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of Sciences
 
The Europeana Newspapers Project
The Europeana Newspapers ProjectThe Europeana Newspapers Project
The Europeana Newspapers Project
 
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
 
‘Everything Available’ – the strategy for the British Library’s research serv...
‘Everything Available’ – the strategy for the British Library’s research serv...‘Everything Available’ – the strategy for the British Library’s research serv...
‘Everything Available’ – the strategy for the British Library’s research serv...
 
ENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project OverviewENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project Overview
 
eResearch-Oz-Bellamy
eResearch-Oz-BellamyeResearch-Oz-Bellamy
eResearch-Oz-Bellamy
 
JISC: Supporting The Future of Research
JISC: Supporting The Future of ResearchJISC: Supporting The Future of Research
JISC: Supporting The Future of Research
 
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
 
Living with Machines at The Past, Present and Future of Digital Scholarship w...
Living with Machines at The Past, Present and Future of Digital Scholarship w...Living with Machines at The Past, Present and Future of Digital Scholarship w...
Living with Machines at The Past, Present and Future of Digital Scholarship w...
 
Enabling complex analysis of large scale digital collections
Enabling complex analysis of large scale digital collectionsEnabling complex analysis of large scale digital collections
Enabling complex analysis of large scale digital collections
 
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
 
Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016
 
More than just books - British Library Labs Presentation given at MSc Compute...
More than just books - British Library Labs Presentation given at MSc Compute...More than just books - British Library Labs Presentation given at MSc Compute...
More than just books - British Library Labs Presentation given at MSc Compute...
 
The British Library Digital Research Centre
The British Library Digital Research CentreThe British Library Digital Research Centre
The British Library Digital Research Centre
 

Mehr von Stella Wisdom

Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Stella Wisdom
 
Wikimedia Residencies: Reflecting on Wikimedian Residencies in the GLAM secto...
Wikimedia Residencies: Reflecting on Wikimedian Residencies in the GLAM secto...Wikimedia Residencies: Reflecting on Wikimedian Residencies in the GLAM secto...
Wikimedia Residencies: Reflecting on Wikimedian Residencies in the GLAM secto...Stella Wisdom
 
The VHS Tapes: Preserving Emerging Formats at the British Library
The VHS Tapes: Preserving Emerging Formats at the British LibraryThe VHS Tapes: Preserving Emerging Formats at the British Library
The VHS Tapes: Preserving Emerging Formats at the British LibraryStella Wisdom
 
A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!Stella Wisdom
 
Games in the woods - Branching out with Twine
Games in the woods - Branching out with TwineGames in the woods - Branching out with Twine
Games in the woods - Branching out with TwineStella Wisdom
 
Games In the Woods - launch event
Games In the Woods - launch eventGames In the Woods - launch event
Games In the Woods - launch eventStella Wisdom
 
Taking a Virtual Walk on the Wild Side
Taking a Virtual Walk on the Wild SideTaking a Virtual Walk on the Wild Side
Taking a Virtual Walk on the Wild SideStella Wisdom
 
Creating, Curating and Collecting Interactive Fiction at the British Library
Creating, Curating and Collecting Interactive Fiction at the British LibraryCreating, Curating and Collecting Interactive Fiction at the British Library
Creating, Curating and Collecting Interactive Fiction at the British LibraryStella Wisdom
 

Mehr von Stella Wisdom (8)

Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods
 
Wikimedia Residencies: Reflecting on Wikimedian Residencies in the GLAM secto...
Wikimedia Residencies: Reflecting on Wikimedian Residencies in the GLAM secto...Wikimedia Residencies: Reflecting on Wikimedian Residencies in the GLAM secto...
Wikimedia Residencies: Reflecting on Wikimedian Residencies in the GLAM secto...
 
The VHS Tapes: Preserving Emerging Formats at the British Library
The VHS Tapes: Preserving Emerging Formats at the British LibraryThe VHS Tapes: Preserving Emerging Formats at the British Library
The VHS Tapes: Preserving Emerging Formats at the British Library
 
A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!A Virtual Walk on the Wild Side!
A Virtual Walk on the Wild Side!
 
Games in the woods - Branching out with Twine
Games in the woods - Branching out with TwineGames in the woods - Branching out with Twine
Games in the woods - Branching out with Twine
 
Games In the Woods - launch event
Games In the Woods - launch eventGames In the Woods - launch event
Games In the Woods - launch event
 
Taking a Virtual Walk on the Wild Side
Taking a Virtual Walk on the Wild SideTaking a Virtual Walk on the Wild Side
Taking a Virtual Walk on the Wild Side
 
Creating, Curating and Collecting Interactive Fiction at the British Library
Creating, Curating and Collecting Interactive Fiction at the British LibraryCreating, Curating and Collecting Interactive Fiction at the British Library
Creating, Curating and Collecting Interactive Fiction at the British Library
 

Kürzlich hochgeladen

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Kürzlich hochgeladen (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

Corpus Protocols: digital transformations for TDM

  • 1. 1 Corpus Protocols: digital transformations of commercial newspaper collections for text and data mining to support academic research IFLA 2014 Pre-Conference: Digital Transformation and the Changing Role of News Media in the 21st Century August 13-14, 2014 International Telecommunication Union, Geneva, Switzerland
  • 2. Conceptualising new researchWorking together: building research relationships
  • 3. Corpus Protocols Project Team • Seth Cayley: Head of Research Solutions, Cengage Learning EMEA • Mike Gardner: Analyst and Web Developer in Web Technologies, University of Nottingham • Kat Gupta: Corpus Protocols Researcher, University of Nottingham • Michaela Mahlberg: Professor of English Language and Linguistics, University of Nottingham • Neil Smyth: Faculty Team Leader – Arts, University of Nottingham • Stella Wisdom: Digital Curator at the British Library 3
  • 4. Context • Focusing on textual data (corpus) we review relationships between institutional infra-structure, external business partners and research questions and methods that deal with textual data • Exploring opportunities to future developments (protocols) 8/18/2014 Event Name and Venue 4
  • 5. Problem (practical and research) • Some University of Nottingham owned data sets are not available for research for several reasons, including: – licence agreements restrict copying for data analytics; – additional charges for additional licenses; – technical issues: copying storage, accessibility of data; – and, a lack of shared understanding of research trends in applied linguistics, corpus linguistics and corpus stylistics. 5
  • 6. Aims • To articulate the relationship between data and tools for text and data mining • Prototype a methodology for using locally stored data using existing corpus linguistic tools • Use an existing corpus linguist tool to analyse a University owned data set that will be made available on the University network. 6
  • 7. Corpus Protocols: didn’t start with discourse research questions • Declassified Documents as opportunistic data • The Declassified Documents corpus – 197,837,328 words in total – divided into 30 yearly subcorpora (between 3.3 to 8.3 million words each) 7
  • 8. UK Copyright legislation: 2014 "the making of a copy of a work by a person who has lawful access to the work does not infringe copyright in the work provided that (a) the copy is made in order that a person who has lawful access to the work may carry out a computational analysis of anything recorded in the work for the sole purpose of research for a non-commercial purpose, and (b) the copy is accompanied by a sufficient acknowledgement.” (The Copyright and Rights in Performances (Research, Education, Libraries and Archives) Regulations 2014. 8
  • 9. Corpus methods: • Discourse as representation: labeling, naming, creating reality (e.g. weapons of mass destruction). • Lexis in context 8/18/2014 9
  • 12. TIDAL: TImes Data Archive Lab project • Continuing partnership with Cengage • Times Digital Archive • University of Michigan • Investigating social reality in 19th century based on news discourse (to complement picture of Dickensian London) 12
  • 13. British Library News Data: Future Plans The British Library has one of the world's greatest news archives. Our collection of UK, Irish and world newspapers numbers over 60 million issues, from the 17th century to the present day, and we have growing collections of television, radio and web news. http://britishlibrary.typepad.co.uk/thenewsroom/ https://twitter.com/BL_newsroom 13
  • 14. British Library News Data: Future Plans Print newspapers from Colindale moved to a new newspaper storage building in Boston Spa in Yorkshire 8/18/2014 14
  • 16. British Library: Newsroom Opened 7th April 2014 8/18/2014 16
  • 17. British Library News Data: Future Plans Broadcast News: Daily television and radio news programmes broadcast in the UK since May 2010 available though an instant access service in the Reading Rooms, new programmes available within hours of broadcast. Over 60 hours of news are recorded every day from 22 channels including BBC, ITV, Channel 4, Sky News, Al Jazeera English and CNN.8/18/2014 17
  • 19. British Library News Data: Future Plans Many recorded news programmes come with subtitles A corpus of data and metadata with potential for innovative research 19
  • 20.
  • 22.
  • 24. 24 Corpus Protocols: digital transformations of commercial newspaper collections for text and data mining to support academic research IFLA 2014 Pre-Conference: Digital Transformation and the Changing Role of News Media in the 21st Century August 13-14, 2014 International Telecommunication Union, Geneva, Switzerland

Hinweis der Redaktion

  1. Title: Corpus Protocols: digital transformations of commercial newspaper collections for text and data mining to support academic research IFLA 2014 Pre-Conference Digital Transformation and the Changing Role of News Media in the 21st Century Dates: August 13-14, 2014 Venue: International Telecommunication Union, Geneva, Switzerland
  2. Probably an intro needed, especially as ‘corpus’ and ‘protocols’ will not make sense to everyone easily.
  3. Had ‘problem’ twice
  4. Add a text box with some info on the ‘consistent’
  5. Title: Corpus Protocols: digital transformations of commercial newspaper collections for text and data mining to support academic research IFLA 2014 Pre-Conference Digital Transformation and the Changing Role of News Media in the 21st Century Dates: August 13-14, 2014 Venue: International Telecommunication Union, Geneva, Switzerland