SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Are new technical literacy
skills needed?
Remarrying research and collection services around
access to corpora and text mining.
INGRID MASON
DEPLOYMENT STRATEGIST
I think yes,
and I’m a bit
excited about
it.
2
Text & data mining
(TDM) is being used by a
range of researchers to
target relevant literature
and in HASS research.
More
research
support will
need to be
provided.
3
Should TDM
services be
coordinated
nationally?
Many many questions
What library technical skills are needed (if there is a growing research support need)?
Where do researchers go if they want to find, use, move, store or create a corpus?
How do researchers learn to build, evaluate, and text mine a corpus?
Where can/does/should this specialist service sit (in Library Research Support or in
eResearch or in Faculty or in national research infrastructure services)?
Psst. I don’t have answers, just the questions at this point. Sorry!
4
O M G O M G O M G
OK, what’s a corpus? Find a definition, somewhere reliable [searches the web].
What does a corpus look like? Linguists will know this [searches the web].
How on earth do you “make that blob of stuff accessible”? [compute/storage?]
How big is that text blob and what’s it made of? Corpus analyst? [new job title?]
Who do I know that knows how to build a corpus? Ah, Steve Cassidy from Alveo VL.
What makes for a well balanced/formed corpus? Breathe, reach for library skills.
What about commercially hosted text blobs? Read: Kylie Poulton’s VALA 2018 paper.
5
I’m a corpus
building &
TDM novice -
I feel like an
imposter.
6
I’m old style
but I’d like to
give this a go.
Would you?
Schonfeld, Roger C & Christine Wolff-Eisenberg (2017). Taking a Closer Look at Talent Management: Findings from the US Library Survey, 10
April 2017. Ithaka S+R Blog. http://www.sr.ithaka.org/blog/taking-a-closer-look-at-talent-management/ Last accessed: 18/04/2017
7
8
9
Digital Humanities Australasia 2016
Hobart, Australia
10
Digital Humanities Australasia 2016
Hobart, Australia
11
Alan Liu’s DH Toychest
Data Collections and Datasets
Question: How does this arrangement of resources in Liu’s DH Toychest change my
understanding of collecting resources for research and supporting research?
Answer: Quite a lot, I feel out of my depth, but also very intrigued and my fingers are
tingling. Why?
Challenge: I need to start looking into corpora and have a go at constructing a corpus
(hint: two projects this year).
12
13
14
Library Technical Skills
Research support in:
Research Data Management / Digital Scholarship / Digital Curation / Research
Techniques
Using:
iPython (now Jupyter) notebook - Natural Language Toolkit / Library Carpentry or Data
Carpentry or Software Carpentry / Text Mining with R (O’Reilly)
Psst we aim for Jupyter notebooks connected to CloudStor (1 notebook pp to play with)
A Trend
Expertise lies in the university to
support text mining for research
and scholarly literature searches.
Biomedical Text Mining
An important problem that text mining attempts to address is
information overload and overlook. Examples of solutions to this
problem include Information Extraction, Document Summarisation,
and Document Classification. In the following example we
demonstrate the use of Text Mining to classify sentences in
biomedical articles and extract key units of information. This
provides a way for busy professionals to reduce the amount of
information to which they are exposed and focus only on salient
aspects in which they are interested.
From Text Mining Collaboration - UNSW
15
Learn More
Some history and definition of the
terms (and more) is offered.
Text mining & Text analysis - what
is the difference?
Text mining began with the computational and information
management fields (e.g. database searching and information
retrieval), whereas Text analysis began in the humanities with the
manual analysis of text, (e.g Bible concordances and newspaper
indexes). More recently, the two terms have become synonymous,
and now generally refer to the use of computational methods to
search, retrieve, and analyse text data.
"Text mining or text analytics is an umbrella term describing a
range of techniques that seek to extract useful information
from document collections through the identification and
exploration of interesting patterns in the unstructured textual
data of various types of documents – such as books, web pages,
emails, reports or product descriptions." (Truyens & van Eecke,
2014)
From: Text Mining and Text Analysis - UQ (Research Techniques)
16
Digital
Scholarship
How can research support for
corpus building and text mining be
scaled up?
17
Text and data mining
Analyse large scale datasets in your research
Data mining is the process of applying open-ended
computational methods to large scale datasets to
discover new insights that may not be revealed through
targeted smaller scale analyses. When the datasets used
are bodies of text, this process is often termed text
mining and can provide a complementary approach to
traditional close readings of texts. Text and data mining
(TDM) approaches can open up new areas of scholarly
enquiry.
Research Data Management - USYD (RDM)
18
Institutional vs National
Services for Corpus
Building & TDM?
More library
minds and
coordination
is needed in
this space.
What overlap
is there with
CAUL/CEIRC
& NCRIS?
19
Sydney Stock Exchange Records - Institutional
Digitisation for research. AARNet partnership with ANU Library and Noel Butlin Archive.
Stock and Share Lists include ~199 registers of printed and written (copperplate)
information that requires format conversion and automated translation. Records
includes company names, price of stocks, and share transactions from 1901-1950.
An archival series that can be delivered for search and browse via an interface.
A corpus that can be built and text mined and analysed via an interface.
HASS DEVL - National
The Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab (DEVL) will
bring together fragmented data, tools and services into a shared workspace.
Key outcomes from the project will be:
● Lowering barriers to entry for HASS infrastructure
● Increased interoperability between existing HASS platforms
● More joined up data landscape
● Data curation for better reuse, reproduction, and publishing of research data sets
● Game-changing skills and training activities
Funding and co-investment via NCRIS and institutional partners. https://www.ands-
nectar-rds.org.au/ 20
21
HASS DEVL
Data curation package
- Datasets sourced from Prosecution Project, NLA/TROVE, SLQ and APO
- Datasets processed via Alveo and AURIN
- Data curation framework between UoM, Alveo, AURIN, and NLA/TROVE
Will these composites of digital objects be a digital collection, a dataset, a data collection,
a series, a demo corpus, a text corpus, or a linguistic corpus?
We will need to explore this question together [please all don your curator’s hat].
22
Digital Collections
● AU government gazettes (NLA)
● QLD records of railway workers / publicans / government workers (SLQ)
● Court records from various states and territories (PP)
● Historical census data (ADA)
● Grey literature (APO)
Trick question: which of these collections could be text mined and/or become a corpus?
23
UL
Research
Support?
24
Want to know more?
#datawhodunnit
#datalibs
25
AARNet
http://eepurl.co
m/xmnpn
eRSA
http://bit.ly/2n
BvBun
INGRID MASON
DEPLOYMENT STRATEGIST
Read: Kylie
Poulton’s
VALA 2018
TDM Paper
Definitions and
Examples
26
Text Mining
Identifying linguistic patterns in text (as data)
Categorising, clustering, or identifying named entities
Abstracting, analysing and summarising (the textual content)
Constrained by the extent and scope of the textual data
Using programming languages like R or tools like Voyant
27
Text Corpora
The selection, extraction and processing of the text may involve linguistic methods but
may not be for the purpose of studying language, rather to investigate the nature of text
as semantic content.
Take a look at Visualising Raynal - three editions of Guillame-Thomas Raynal’s Histoire de
deux Indes (1770, 1774, 1780).
Part of the ANU Digitizing Raynal project led by Glenn Roe (working with Centre for
Literary and Linguistic Computing (UoN)).
PDFs from BNF (1770 + 1780) and Bodleian (1774).
28
Corpus (Corpora)
If in doubt - dictionary time!
a : all the writings or works of a particular kind or on a particular subject; especially : the
complete works of an author
b : a collection or body of knowledge or evidence; especially : a collection of recorded
utterances used as a basis for the descriptive analysis of a language
https://www.merriam-webster.com/dictionary/corpus
29
Linguistic Corpora
Australian National Corpus
June Farris (Subject Specialist) at University of Chicago Library
Linguistic Data Consortium (UPenn)
30

Weitere ähnliche Inhalte

Was ist angesagt?

Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...Andrea Scharnhorst
 
Connecting Heterogeneous Collections using Linked Data
Connecting Heterogeneous Collections using Linked DataConnecting Heterogeneous Collections using Linked Data
Connecting Heterogeneous Collections using Linked DataVictor de Boer
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...Micah Altman
 
Open hpi semweb-06-part8
Open hpi semweb-06-part8Open hpi semweb-06-part8
Open hpi semweb-06-part8Nadine Ludwig
 
Workset Creation for Scholarly Analysis Project presentation at CNI 2013
Workset Creation for Scholarly Analysis Project presentation at CNI 2013Workset Creation for Scholarly Analysis Project presentation at CNI 2013
Workset Creation for Scholarly Analysis Project presentation at CNI 2013Harriett Green
 
Marketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data InsideMarketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data InsideTony Hirst
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesChristophe Guéret
 
Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview   Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview Jennifer D'Souza
 
Use of Research (Meta-)Data - Finding researchers in/across organizations -
Use of Research (Meta-)Data  - Finding researchers in/across organizations -Use of Research (Meta-)Data  - Finding researchers in/across organizations -
Use of Research (Meta-)Data - Finding researchers in/across organizations - National Institute of Informatics (NII)
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Open hpi semweb-06-part7
Open hpi semweb-06-part7Open hpi semweb-06-part7
Open hpi semweb-06-part7Nadine Ludwig
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval systemLeslie Vargas
 

Was ist angesagt? (19)

Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
Text mining
Text miningText mining
Text mining
 
Connecting Museums with Linked Data
Connecting Museums with Linked DataConnecting Museums with Linked Data
Connecting Museums with Linked Data
 
Connecting Heterogeneous Collections using Linked Data
Connecting Heterogeneous Collections using Linked DataConnecting Heterogeneous Collections using Linked Data
Connecting Heterogeneous Collections using Linked Data
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
 
Semantic Web in the Digital Humanities
Semantic Web in the Digital HumanitiesSemantic Web in the Digital Humanities
Semantic Web in the Digital Humanities
 
Text mining
Text miningText mining
Text mining
 
ld4dh demo lecture
ld4dh demo lectureld4dh demo lecture
ld4dh demo lecture
 
Open hpi semweb-06-part8
Open hpi semweb-06-part8Open hpi semweb-06-part8
Open hpi semweb-06-part8
 
Workset Creation for Scholarly Analysis Project presentation at CNI 2013
Workset Creation for Scholarly Analysis Project presentation at CNI 2013Workset Creation for Scholarly Analysis Project presentation at CNI 2013
Workset Creation for Scholarly Analysis Project presentation at CNI 2013
 
Marketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data InsideMarketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data Inside
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview   Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview
 
Use of Research (Meta-)Data - Finding researchers in/across organizations -
Use of Research (Meta-)Data  - Finding researchers in/across organizations -Use of Research (Meta-)Data  - Finding researchers in/across organizations -
Use of Research (Meta-)Data - Finding researchers in/across organizations -
 
LODAC Museum -- Connecting Museums with LOD --
LODAC Museum -- Connecting Museums with LOD --LODAC Museum -- Connecting Museums with LOD --
LODAC Museum -- Connecting Museums with LOD --
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Open hpi semweb-06-part7
Open hpi semweb-06-part7Open hpi semweb-06-part7
Open hpi semweb-06-part7
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval system
 

Ähnlich wie Are New Digital Literacies Skills Neededrscd2018

Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collectionslljohnston
 
Copac: Reengineering the UK national academic union catalogue to serve the 21...
Copac: Reengineering the UK national academic union catalogue to serve the 21...Copac: Reengineering the UK national academic union catalogue to serve the 21...
Copac: Reengineering the UK national academic union catalogue to serve the 21...Joy Palmer
 
BIBLIOTECARII MANAGERI AI DATELOR, BIBLIOTECILE API-URI
BIBLIOTECARII MANAGERI AI DATELOR, BIBLIOTECILE API-URIBIBLIOTECARII MANAGERI AI DATELOR, BIBLIOTECILE API-URI
BIBLIOTECARII MANAGERI AI DATELOR, BIBLIOTECILE API-URINicolaie Constantinescu
 
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...Micah Altman
 
Open Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UKOpen Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UKEDINA, University of Edinburgh
 
Melissa Terras' Report on the #UKMHLiveLab
Melissa Terras' Report on the #UKMHLiveLabMelissa Terras' Report on the #UKMHLiveLab
Melissa Terras' Report on the #UKMHLiveLabUniversity of Edinburgh
 
Rebecca Grant - DH research data: identification and challenges (DH2016)
Rebecca Grant - DH research data: identification and challenges (DH2016)Rebecca Grant - DH research data: identification and challenges (DH2016)
Rebecca Grant - DH research data: identification and challenges (DH2016)dri_ireland
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so farElena Simperl
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsSusanMRob
 
Pampel/Bertelnmann/Hobohm: Data Librarianship
Pampel/Bertelnmann/Hobohm: Data LibrarianshipPampel/Bertelnmann/Hobohm: Data Librarianship
Pampel/Bertelnmann/Hobohm: Data LibrarianshipHans-Christoph Hobohm
 
In other words...: Using multiple taxonimies
In other words...: Using multiple taxonimiesIn other words...: Using multiple taxonimies
In other words...: Using multiple taxonimieskramsey
 
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBeth Plale
 
Evaluating Library Capacity to Manage Research Data
Evaluating Library Capacity to Manage Research DataEvaluating Library Capacity to Manage Research Data
Evaluating Library Capacity to Manage Research DataCharleston Conference
 
Integrating Unique Materials into the Global Discovery Network
Integrating Unique Materials into the Global Discovery NetworkIntegrating Unique Materials into the Global Discovery Network
Integrating Unique Materials into the Global Discovery NetworkOCLC Research
 
WP3: overzicht van de voortgang van WP# op de CLARIAH-dag
WP3: overzicht van de voortgang van WP# op de CLARIAH-dagWP3: overzicht van de voortgang van WP# op de CLARIAH-dag
WP3: overzicht van de voortgang van WP# op de CLARIAH-dagCLARIAH
 
"From Reading Rooms to Research Commons" Sheila Corrall, DARTS4
"From Reading Rooms to Research Commons" Sheila Corrall, DARTS4"From Reading Rooms to Research Commons" Sheila Corrall, DARTS4
"From Reading Rooms to Research Commons" Sheila Corrall, DARTS4ARLGSW
 

Ähnlich wie Are New Digital Literacies Skills Neededrscd2018 (20)

Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collections
 
Copac: Reengineering the UK national academic union catalogue to serve the 21...
Copac: Reengineering the UK national academic union catalogue to serve the 21...Copac: Reengineering the UK national academic union catalogue to serve the 21...
Copac: Reengineering the UK national academic union catalogue to serve the 21...
 
BIBLIOTECARII MANAGERI AI DATELOR, BIBLIOTECILE API-URI
BIBLIOTECARII MANAGERI AI DATELOR, BIBLIOTECILE API-URIBIBLIOTECARII MANAGERI AI DATELOR, BIBLIOTECILE API-URI
BIBLIOTECARII MANAGERI AI DATELOR, BIBLIOTECILE API-URI
 
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
 
Open Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UKOpen Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UK
 
LKG Editor Dev
LKG Editor DevLKG Editor Dev
LKG Editor Dev
 
Melissa Terras' Report on the #UKMHLiveLab
Melissa Terras' Report on the #UKMHLiveLabMelissa Terras' Report on the #UKMHLiveLab
Melissa Terras' Report on the #UKMHLiveLab
 
Rebecca Grant - DH research data: identification and challenges (DH2016)
Rebecca Grant - DH research data: identification and challenges (DH2016)Rebecca Grant - DH research data: identification and challenges (DH2016)
Rebecca Grant - DH research data: identification and challenges (DH2016)
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libs
 
Aggregation as tactic sm new
Aggregation as tactic sm newAggregation as tactic sm new
Aggregation as tactic sm new
 
Aggregation as Tactic
Aggregation as TacticAggregation as Tactic
Aggregation as Tactic
 
Pampel/Bertelnmann/Hobohm: Data Librarianship
Pampel/Bertelnmann/Hobohm: Data LibrarianshipPampel/Bertelnmann/Hobohm: Data Librarianship
Pampel/Bertelnmann/Hobohm: Data Librarianship
 
In other words...: Using multiple taxonimies
In other words...: Using multiple taxonimiesIn other words...: Using multiple taxonimies
In other words...: Using multiple taxonimies
 
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
 
Evaluating Library Capacity to Manage Research Data
Evaluating Library Capacity to Manage Research DataEvaluating Library Capacity to Manage Research Data
Evaluating Library Capacity to Manage Research Data
 
Integrating Unique Materials into the Global Discovery Network
Integrating Unique Materials into the Global Discovery NetworkIntegrating Unique Materials into the Global Discovery Network
Integrating Unique Materials into the Global Discovery Network
 
WP3: overzicht van de voortgang van WP# op de CLARIAH-dag
WP3: overzicht van de voortgang van WP# op de CLARIAH-dagWP3: overzicht van de voortgang van WP# op de CLARIAH-dag
WP3: overzicht van de voortgang van WP# op de CLARIAH-dag
 
"From Reading Rooms to Research Commons" Sheila Corrall, DARTS4
"From Reading Rooms to Research Commons" Sheila Corrall, DARTS4"From Reading Rooms to Research Commons" Sheila Corrall, DARTS4
"From Reading Rooms to Research Commons" Sheila Corrall, DARTS4
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 

Mehr von SusanMRob

Innovative services across the research lifecyle v1.5 20180209
Innovative services across the research lifecyle v1.5 20180209Innovative services across the research lifecyle v1.5 20180209
Innovative services across the research lifecyle v1.5 20180209SusanMRob
 
Amanda Lawarence lief linked semantic platforms project summary
Amanda Lawarence lief linked semantic platforms project summaryAmanda Lawarence lief linked semantic platforms project summary
Amanda Lawarence lief linked semantic platforms project summarySusanMRob
 
ERA NTROs no worries RSCD2018
ERA NTROs no worries RSCD2018ERA NTROs no worries RSCD2018
ERA NTROs no worries RSCD2018SusanMRob
 
Ingrid Mason rscday 2018_eresearch
Ingrid Mason rscday 2018_eresearchIngrid Mason rscday 2018_eresearch
Ingrid Mason rscday 2018_eresearchSusanMRob
 
Pru Mitchell rscd2018
Pru Mitchell rscd2018Pru Mitchell rscd2018
Pru Mitchell rscd2018SusanMRob
 
Lisa Kruesi presentation_kruesi_condronRSCD18
Lisa Kruesi presentation_kruesi_condronRSCD18Lisa Kruesi presentation_kruesi_condronRSCD18
Lisa Kruesi presentation_kruesi_condronRSCD18SusanMRob
 
Dawn Mc loughlin_researchsupportcommunityday2018
Dawn Mc loughlin_researchsupportcommunityday2018Dawn Mc loughlin_researchsupportcommunityday2018
Dawn Mc loughlin_researchsupportcommunityday2018SusanMRob
 
Mullumby Charing rscd2018_predatory_scnm
Mullumby Charing rscd2018_predatory_scnmMullumby Charing rscd2018_predatory_scnm
Mullumby Charing rscd2018_predatory_scnmSusanMRob
 
Nicola Ivory rscd2018
Nicola Ivory rscd2018Nicola Ivory rscd2018
Nicola Ivory rscd2018SusanMRob
 
Jayshree Mamtora moving towards a new IR using Pure_rscd2018
Jayshree Mamtora moving towards a new IR using Pure_rscd2018Jayshree Mamtora moving towards a new IR using Pure_rscd2018
Jayshree Mamtora moving towards a new IR using Pure_rscd2018SusanMRob
 
Julia Phillips rscd_2018
Julia Phillips rscd_2018Julia Phillips rscd_2018
Julia Phillips rscd_2018SusanMRob
 
Chris Evans Research Support Community Day 2018
Chris Evans Research Support Community Day 2018Chris Evans Research Support Community Day 2018
Chris Evans Research Support Community Day 2018SusanMRob
 
Research in practice for LIS professionals twitter chat RSCD2018
Research in practice for LIS professionals twitter chat RSCD2018Research in practice for LIS professionals twitter chat RSCD2018
Research in practice for LIS professionals twitter chat RSCD2018SusanMRob
 
Wikipedia Editing rscd2018
Wikipedia Editing rscd2018Wikipedia Editing rscd2018
Wikipedia Editing rscd2018SusanMRob
 
The Conversation rscd2018
The Conversation rscd2018The Conversation rscd2018
The Conversation rscd2018SusanMRob
 
Clarivate ERA Supplier rscd2018
Clarivate ERA Supplier rscd2018Clarivate ERA Supplier rscd2018
Clarivate ERA Supplier rscd2018SusanMRob
 
Journal Data Sharing Policies rscd2018
Journal Data Sharing Policies rscd2018Journal Data Sharing Policies rscd2018
Journal Data Sharing Policies rscd2018SusanMRob
 
Wikimedia Australia rscd2018
Wikimedia Australia rscd2018Wikimedia Australia rscd2018
Wikimedia Australia rscd2018SusanMRob
 
Bibliometric Competencies rscd2018
Bibliometric Competencies rscd2018Bibliometric Competencies rscd2018
Bibliometric Competencies rscd2018SusanMRob
 
SAGE Publishing and Big Data RSCD2018
SAGE Publishing and Big Data RSCD2018SAGE Publishing and Big Data RSCD2018
SAGE Publishing and Big Data RSCD2018SusanMRob
 

Mehr von SusanMRob (20)

Innovative services across the research lifecyle v1.5 20180209
Innovative services across the research lifecyle v1.5 20180209Innovative services across the research lifecyle v1.5 20180209
Innovative services across the research lifecyle v1.5 20180209
 
Amanda Lawarence lief linked semantic platforms project summary
Amanda Lawarence lief linked semantic platforms project summaryAmanda Lawarence lief linked semantic platforms project summary
Amanda Lawarence lief linked semantic platforms project summary
 
ERA NTROs no worries RSCD2018
ERA NTROs no worries RSCD2018ERA NTROs no worries RSCD2018
ERA NTROs no worries RSCD2018
 
Ingrid Mason rscday 2018_eresearch
Ingrid Mason rscday 2018_eresearchIngrid Mason rscday 2018_eresearch
Ingrid Mason rscday 2018_eresearch
 
Pru Mitchell rscd2018
Pru Mitchell rscd2018Pru Mitchell rscd2018
Pru Mitchell rscd2018
 
Lisa Kruesi presentation_kruesi_condronRSCD18
Lisa Kruesi presentation_kruesi_condronRSCD18Lisa Kruesi presentation_kruesi_condronRSCD18
Lisa Kruesi presentation_kruesi_condronRSCD18
 
Dawn Mc loughlin_researchsupportcommunityday2018
Dawn Mc loughlin_researchsupportcommunityday2018Dawn Mc loughlin_researchsupportcommunityday2018
Dawn Mc loughlin_researchsupportcommunityday2018
 
Mullumby Charing rscd2018_predatory_scnm
Mullumby Charing rscd2018_predatory_scnmMullumby Charing rscd2018_predatory_scnm
Mullumby Charing rscd2018_predatory_scnm
 
Nicola Ivory rscd2018
Nicola Ivory rscd2018Nicola Ivory rscd2018
Nicola Ivory rscd2018
 
Jayshree Mamtora moving towards a new IR using Pure_rscd2018
Jayshree Mamtora moving towards a new IR using Pure_rscd2018Jayshree Mamtora moving towards a new IR using Pure_rscd2018
Jayshree Mamtora moving towards a new IR using Pure_rscd2018
 
Julia Phillips rscd_2018
Julia Phillips rscd_2018Julia Phillips rscd_2018
Julia Phillips rscd_2018
 
Chris Evans Research Support Community Day 2018
Chris Evans Research Support Community Day 2018Chris Evans Research Support Community Day 2018
Chris Evans Research Support Community Day 2018
 
Research in practice for LIS professionals twitter chat RSCD2018
Research in practice for LIS professionals twitter chat RSCD2018Research in practice for LIS professionals twitter chat RSCD2018
Research in practice for LIS professionals twitter chat RSCD2018
 
Wikipedia Editing rscd2018
Wikipedia Editing rscd2018Wikipedia Editing rscd2018
Wikipedia Editing rscd2018
 
The Conversation rscd2018
The Conversation rscd2018The Conversation rscd2018
The Conversation rscd2018
 
Clarivate ERA Supplier rscd2018
Clarivate ERA Supplier rscd2018Clarivate ERA Supplier rscd2018
Clarivate ERA Supplier rscd2018
 
Journal Data Sharing Policies rscd2018
Journal Data Sharing Policies rscd2018Journal Data Sharing Policies rscd2018
Journal Data Sharing Policies rscd2018
 
Wikimedia Australia rscd2018
Wikimedia Australia rscd2018Wikimedia Australia rscd2018
Wikimedia Australia rscd2018
 
Bibliometric Competencies rscd2018
Bibliometric Competencies rscd2018Bibliometric Competencies rscd2018
Bibliometric Competencies rscd2018
 
SAGE Publishing and Big Data RSCD2018
SAGE Publishing and Big Data RSCD2018SAGE Publishing and Big Data RSCD2018
SAGE Publishing and Big Data RSCD2018
 

Kürzlich hochgeladen

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 

Kürzlich hochgeladen (20)

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 

Are New Digital Literacies Skills Neededrscd2018

  • 1. Are new technical literacy skills needed? Remarrying research and collection services around access to corpora and text mining. INGRID MASON DEPLOYMENT STRATEGIST
  • 2. I think yes, and I’m a bit excited about it. 2
  • 3. Text & data mining (TDM) is being used by a range of researchers to target relevant literature and in HASS research. More research support will need to be provided. 3 Should TDM services be coordinated nationally?
  • 4. Many many questions What library technical skills are needed (if there is a growing research support need)? Where do researchers go if they want to find, use, move, store or create a corpus? How do researchers learn to build, evaluate, and text mine a corpus? Where can/does/should this specialist service sit (in Library Research Support or in eResearch or in Faculty or in national research infrastructure services)? Psst. I don’t have answers, just the questions at this point. Sorry! 4
  • 5. O M G O M G O M G OK, what’s a corpus? Find a definition, somewhere reliable [searches the web]. What does a corpus look like? Linguists will know this [searches the web]. How on earth do you “make that blob of stuff accessible”? [compute/storage?] How big is that text blob and what’s it made of? Corpus analyst? [new job title?] Who do I know that knows how to build a corpus? Ah, Steve Cassidy from Alveo VL. What makes for a well balanced/formed corpus? Breathe, reach for library skills. What about commercially hosted text blobs? Read: Kylie Poulton’s VALA 2018 paper. 5
  • 6. I’m a corpus building & TDM novice - I feel like an imposter. 6 I’m old style but I’d like to give this a go. Would you? Schonfeld, Roger C & Christine Wolff-Eisenberg (2017). Taking a Closer Look at Talent Management: Findings from the US Library Survey, 10 April 2017. Ithaka S+R Blog. http://www.sr.ithaka.org/blog/taking-a-closer-look-at-talent-management/ Last accessed: 18/04/2017
  • 7. 7
  • 8. 8
  • 9. 9
  • 10. Digital Humanities Australasia 2016 Hobart, Australia 10
  • 11. Digital Humanities Australasia 2016 Hobart, Australia 11
  • 12. Alan Liu’s DH Toychest Data Collections and Datasets Question: How does this arrangement of resources in Liu’s DH Toychest change my understanding of collecting resources for research and supporting research? Answer: Quite a lot, I feel out of my depth, but also very intrigued and my fingers are tingling. Why? Challenge: I need to start looking into corpora and have a go at constructing a corpus (hint: two projects this year). 12
  • 13. 13
  • 14. 14 Library Technical Skills Research support in: Research Data Management / Digital Scholarship / Digital Curation / Research Techniques Using: iPython (now Jupyter) notebook - Natural Language Toolkit / Library Carpentry or Data Carpentry or Software Carpentry / Text Mining with R (O’Reilly) Psst we aim for Jupyter notebooks connected to CloudStor (1 notebook pp to play with)
  • 15. A Trend Expertise lies in the university to support text mining for research and scholarly literature searches. Biomedical Text Mining An important problem that text mining attempts to address is information overload and overlook. Examples of solutions to this problem include Information Extraction, Document Summarisation, and Document Classification. In the following example we demonstrate the use of Text Mining to classify sentences in biomedical articles and extract key units of information. This provides a way for busy professionals to reduce the amount of information to which they are exposed and focus only on salient aspects in which they are interested. From Text Mining Collaboration - UNSW 15
  • 16. Learn More Some history and definition of the terms (and more) is offered. Text mining & Text analysis - what is the difference? Text mining began with the computational and information management fields (e.g. database searching and information retrieval), whereas Text analysis began in the humanities with the manual analysis of text, (e.g Bible concordances and newspaper indexes). More recently, the two terms have become synonymous, and now generally refer to the use of computational methods to search, retrieve, and analyse text data. "Text mining or text analytics is an umbrella term describing a range of techniques that seek to extract useful information from document collections through the identification and exploration of interesting patterns in the unstructured textual data of various types of documents – such as books, web pages, emails, reports or product descriptions." (Truyens & van Eecke, 2014) From: Text Mining and Text Analysis - UQ (Research Techniques) 16
  • 17. Digital Scholarship How can research support for corpus building and text mining be scaled up? 17 Text and data mining Analyse large scale datasets in your research Data mining is the process of applying open-ended computational methods to large scale datasets to discover new insights that may not be revealed through targeted smaller scale analyses. When the datasets used are bodies of text, this process is often termed text mining and can provide a complementary approach to traditional close readings of texts. Text and data mining (TDM) approaches can open up new areas of scholarly enquiry. Research Data Management - USYD (RDM)
  • 18. 18 Institutional vs National Services for Corpus Building & TDM? More library minds and coordination is needed in this space. What overlap is there with CAUL/CEIRC & NCRIS?
  • 19. 19 Sydney Stock Exchange Records - Institutional Digitisation for research. AARNet partnership with ANU Library and Noel Butlin Archive. Stock and Share Lists include ~199 registers of printed and written (copperplate) information that requires format conversion and automated translation. Records includes company names, price of stocks, and share transactions from 1901-1950. An archival series that can be delivered for search and browse via an interface. A corpus that can be built and text mined and analysed via an interface.
  • 20. HASS DEVL - National The Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab (DEVL) will bring together fragmented data, tools and services into a shared workspace. Key outcomes from the project will be: ● Lowering barriers to entry for HASS infrastructure ● Increased interoperability between existing HASS platforms ● More joined up data landscape ● Data curation for better reuse, reproduction, and publishing of research data sets ● Game-changing skills and training activities Funding and co-investment via NCRIS and institutional partners. https://www.ands- nectar-rds.org.au/ 20
  • 21. 21
  • 22. HASS DEVL Data curation package - Datasets sourced from Prosecution Project, NLA/TROVE, SLQ and APO - Datasets processed via Alveo and AURIN - Data curation framework between UoM, Alveo, AURIN, and NLA/TROVE Will these composites of digital objects be a digital collection, a dataset, a data collection, a series, a demo corpus, a text corpus, or a linguistic corpus? We will need to explore this question together [please all don your curator’s hat]. 22
  • 23. Digital Collections ● AU government gazettes (NLA) ● QLD records of railway workers / publicans / government workers (SLQ) ● Court records from various states and territories (PP) ● Historical census data (ADA) ● Grey literature (APO) Trick question: which of these collections could be text mined and/or become a corpus? 23
  • 25. Want to know more? #datawhodunnit #datalibs 25 AARNet http://eepurl.co m/xmnpn eRSA http://bit.ly/2n BvBun INGRID MASON DEPLOYMENT STRATEGIST Read: Kylie Poulton’s VALA 2018 TDM Paper
  • 27. Text Mining Identifying linguistic patterns in text (as data) Categorising, clustering, or identifying named entities Abstracting, analysing and summarising (the textual content) Constrained by the extent and scope of the textual data Using programming languages like R or tools like Voyant 27
  • 28. Text Corpora The selection, extraction and processing of the text may involve linguistic methods but may not be for the purpose of studying language, rather to investigate the nature of text as semantic content. Take a look at Visualising Raynal - three editions of Guillame-Thomas Raynal’s Histoire de deux Indes (1770, 1774, 1780). Part of the ANU Digitizing Raynal project led by Glenn Roe (working with Centre for Literary and Linguistic Computing (UoN)). PDFs from BNF (1770 + 1780) and Bodleian (1774). 28
  • 29. Corpus (Corpora) If in doubt - dictionary time! a : all the writings or works of a particular kind or on a particular subject; especially : the complete works of an author b : a collection or body of knowledge or evidence; especially : a collection of recorded utterances used as a basis for the descriptive analysis of a language https://www.merriam-webster.com/dictionary/corpus 29
  • 30. Linguistic Corpora Australian National Corpus June Farris (Subject Specialist) at University of Chicago Library Linguistic Data Consortium (UPenn) 30

Hinweis der Redaktion

  1. 12.20pm-12.30pm (10 mins)
  2. https://media.giphy.com/media/10jKq6FU0AOdHO/giphy.gif
  3. http://text.mine.unsw.edu.au/tutorials?id=12002
  4. https://www.ifla.org/publications/node/8225 https://ota.ox.ac.uk/documents/creating/dlc/index.htm https://www.ifla.org/files/assets/newspapers/Geneva_2014/s6-smyth_wisdom-en.pdf http://library.ifla.org/930/1/119-leonard-en.pdf https://guides.library.uq.edu.au/research-techniques/text-mining-analysis/language-corpora http://libguides.usc.edu/textmining/databases http://library.ifla.org/233/1/153-cheney-en.pdf http://istl.org/17-spring/internet.html http://libguides.libraries.wsu.edu/c.php?g=388821&p=2638273 http://libguides.usc.edu/textmining/tools https://www2.fgw.vu.nl/werkbanken/dighum/data_analysis/text_analysis/corpus_analysis.php https://www.datacamp.com/courses/intro-to-text-mining-bag-of-words https://www.youtube.com/watch?v=4vuw0AsHeGw https://www.tidytextmining.com/
  5. https://media.giphy.com/media/wru2b6eWKaY9i/giphy.gif
  6. https://media.giphy.com/media/JIX9t2j0ZTN9S/giphy.gif Some further questions… I investigated the last few years of aaDH and library involvement/participation featuring in the bi-annual conference. Has there been a history of collaboration? Yes. Is there mutual benefit through collaboration? Yes. Let’s take a quick look at the results of some rudimentary results from treating a corpus of text #asData
  7. https://voyant-tools.org/
  8. 2016 aaDH DHA Conference Programme https://voyant-tools.org/?corpus=0fb9b2d0a2a09dede1f2cc0609c52f69 https://voyant-tools.org/?corpus=0fb9b2d0a2a09dede1f2cc0609c52f69&mode=corpus&view=CollocatesGraph https://voyant-tools.org/?corpus=0fb9b2d0a2a09dede1f2cc0609c52f69&query=data&view=Contexts Context (11 words) - raw frequencies Top term in the corpus: “digital” Addition of the term “librar*” The word “collections” is highly connected to: “digital” “librar*” “archives”. There is clustering here… common work terrain around collections … of digital stuff.
  9. Core community strengths is across the GLAMS, librarians and archivists emerge in association with discussions about collections and domain expertise.
  10. Where a distinction was made between demo corpora and linguistic corpora and datasets and collections.
  11. Fraser Corpus and NLTK - VALA Tech Camp 2017
  12. http://www.jstor.org/analyze/
  13. http://subjectguides.library.westernsydney.edu.au/openaccess/oar https://library.sydney.edu.au/research/data-management/text-data-mining.html
  14. Sydney Stock Exchange Stock and Share Lists includes ~199 registers of records written in copperplate that require format conversion and automated translation. Records includes company names, price of stocks, and share transactions from 1901-1950. Deposit N193 in Noel Butlin Archive.