SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Downloaden Sie, um offline zu lesen
The PLOS Thesaurus: the first year
Rachel Drysdale – Taxonomy Manager, PLOS
DHUG 2014
11th February, 2014
Public Library of Science - evolution
2000 PLOS founded
2003 PLOS Biology
2004 PLOS Medicine
2005 PLOS Computational Biology (June)
PLOS Genetics (July)
PLOS Pathogens (September)
2006 PLOS ONE
2007 PLOS Neglected Tropical
Diseases
2
Journal Article Count
PLOS Biology 3,450
PLOS Medicine 2,626
PLOS Computational Biology 3,112
PLOS Genetics 4,048
PLOS Pathogens 3,639
PLOS ONE 87,296
PLOS Neglect Trop Diseases 2,444
Journal Article Count
PLOS Biology 3,450
PLOS Medicine 2,626
PLOS Computational Biology 3,112
PLOS Genetics 4,048
PLOS Pathogens 3,639
PLOS ONE 87,296
PLOS Neglect Trop Diseases 2,444
beautiful monster….
Overview – today’s talk
The Solution: Good Thesaurus + Machine Aided Indexing
 Building the new Thesaurus with AI
 The initial implementation at plos.org
 MAIstro integration into Publishing workflow
 Thesaurus maintenance
The Service:
 Content Discovery
Article Analysis
 Relative Metrics
5
Starting point
2011 – the old Taxonomy
Inadequate
in content – just over 3100 specific terms
Inflexible
in structure – terms in pre-defined paths
Housed in Editorial Manager
ossified and difficult to update
Author-chosen terms - association with article
6
PLOS delivered to Access Innovations….
A copy of the old PLOS Taxonomy
Over 2,000 suggested changes
“Research analysis and methods” branch request
Use cases:
Subject Area-based searches
Hierarchy-based exploration of our corpus
Email Alerts based on Subject Area searches
RSS Feeds based on Subject Areas
7
Access Innovations added:
 STEM vocabulary
 Broader/Narrower term relationships
 Rules for the Machine Aided Indexing
 Synonyms
 Analysis with respect to the PLOS corpus
.....to and fro with PLOS ….
Result:
Vastly improved NISO Z-39.19-compliant thesaurus
8
Statistics
9
Old Taxonomy A. I. Thesaurus
Terms 3,132 10,156
Synonyms 0 3,291
Tiers 5 7
Rules 0 14,798
Top-level Terms
1. Biology and life sciences
2. Computer and information sciences
3. Earth sciences
4. Engineering and technology
5. Environmental sciences and ecology
6. Medicine and health sciences
7. Physical sciences
8. Research and analysis methods
9. Science policy
10. Social sciences
10
Infrastructure
PLOS Taxonomy server:
Thesaurus – plos2012thes
Data Harmony Thesaurus Master and
MAI Rule Builder
Corpus fed to the Taxonomy Server for MAI
Article by article
Initial implementation:
Title – Abstract - Results – Methods
Top 8 hits selected
11
Elapsed time from
project kick-off
until terms appeared
on published articles:
9 months
13
Learning curve – teething troubles
Not all articles had Subject Area terms – why not?
Initial implementation – text to index:
Title + Abstract* + Results + Methods
Upon consideration – text to index:
Full Text (though not references)
Implementation of “all paths”
Polyhierarchy implications
Consider “White blood cells”
Biology and life sciences Medicine and health sciences
Immunology Immunology
Immune cells Immune cells
White blood cells White blood cells
Biology and life sciences Biology and life sciences
Cell biology Cell biology
Cellular types Cellular types
Animal cells Animal cells
Blood cells Immune cells
White blood cells White blood cells
14
The polyhierarchy and Search
15
Establishing update cycle - articles:
Initial implementation:
Entire back-corpus indexed at once
New Papers:
PLOS submits text to MAIstro at publication
MAI returns terms and term frequencies
PLOS stores terms in search engine
16
Establishing update cycle - thesauri:
Separate instances (nerves):
Production server – plosthes.2013-6
Working version – plosthes.2013-7
When ready to release a new version:
Load onto test server – MAI corpus - Index
Test: new/changed/deleted terms
rule changes
structural changes
any implementation changes
17
Thesaurus updates – why?
More terms : Memory T cells, Monocotyledons
Errrm… : Report gene detection
What? : Webs
Hierarchy changes deemed desirable:
Geographical locations
Organisms
(Un)Rule(y) : snails, fabrication, pumas
Thesaurus updates – how?
18
Thesaurus updates – how?
19
Thesaurus updates – how?
20
Thesaurus updates – how?
21
22
Rule-Building in MAIstro – Pumas before...
23
Rule-Building in MAIstro – Pumas before...
p53 upregulated modifier of apoptosis
or
Rule-Building in MAIstro – Pumas after…
24
25
26
Thesaurus updates – prioritisation?
Miss-hits and missed term reports:
Ourselves:
article pages
Our readers:
in email
complaints in twitter
in correspondence with our editorial staff
via Journal and Saved Search alerts
via article pages – Flagged Term reports
27
28
Things we learned – Thesaurus editorial
Tension:
strict and rigorous taxonomy/ontology construction
vs
user utility
Abbreviations and Synonyms
Issues that continue to exercise us:
T cells/Memory T cells
Obesity/Childhood obesity
When should we make both explicit?
Rule work – working to top 8
29
Building a new project - exports
30
Building a new project - import
Content Discovery
How has having the thesaurus changed the way that
users interact with PLOS web sites?
32
• Journal alerts
• Saved Searches
• RSS feeds
• Hierarchy exploration
Problem:
How to keep up?
Solution:
Current Awareness Tools
33
34
Journal alerts
35
Journal alerts
36
Journal alerts
37
Journal alerts
38
Journal alerts
39
Saved search
40
Saved search
41
RSS feeds
42
RSS feeds
43
Hierarchy exploration
44
Hierarchy exploration
45
Hierarchy exploration
46
Hierarchy exploration
47
Hierarchy exploration
48
Hierarchy exploration
Relative Metrics
Relative Metrics:
Defining a Paper’s Peer Group
1. Group papers by Subject Area
Accommodate multiple topics per paper
2. Group papers by age
Important for comparison of cumulative measures like
total downloads or citations
3. Determine norms for peer group
The average usage of each paper is compared with the
median usage of its peer group
More on Relative Metrics at:
http://www.plosone.org/static/almInfo#relativeMetrics
50
51
Relative Metrics
52
Relative Metrics
53
54
Area of development - Editorial Workflow
The PLOS Thesaurus and Peer Review
Maintaining a copy of the PLOS thesaurus in Editorial
Manager helps with editor and reviewer matching
56
Classifications for
People
Classifications for
Papers
The PLOS Thesaurus and Peer Review
• Authors select Subject Area terms related to their article
submissions
• Editors and Reviewers select terms that represent their
areas of expertise
• Staff and Editors use these terms to help ensure editors
and reviewers are well matched to the submissions they
are handling
57
Planned Enhancements
• Automate the application of terms associated with
Editors, Reviewers and submitted articles with MAIstro
• Provide Editors and Staff with detailed terms to assist
with reviewer selection and vetting
– Academic disciplines help Editors gauge Subject Area
relevance of potential Reviewers
– Methods, protocols and model organisms help Editors
gauge technical suitability of potential Reviewers
58
59
Jonas Dupuich Product Manager
Patrick Polischuk Product Manager
Sebastian Toomey Interaction Designer
Jennifer Lin Senior Product Manager
Martin Fenner ALM Technical Lead
Kallie Huss Senior Publications Assistant
John Chodacki Director - Product Management
Dramatis personae:
60

Weitere ähnliche Inhalte

Was ist angesagt?

Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Todd Vision
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Todd Vision
 
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...geraintduck
 
Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Todd Vision
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in contextRajarshi Guha
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...Todd Vision
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...GigaScience, BGI Hong Kong
 
System Update 2010 Annual Meeting
System Update 2010 Annual MeetingSystem Update 2010 Annual Meeting
System Update 2010 Annual MeetingCrossref
 
SciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoverySciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoveryAlichy Sowmya
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnTodd Vision
 
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v12016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v1Bruce Kozuma
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Ravi Madduri
 
Annotopia open annotation services platform
Annotopia open annotation services platformAnnotopia open annotation services platform
Annotopia open annotation services platformTim Clark
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!petermurrayrust
 

Was ist angesagt? (20)

Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
 
Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Eng search techniques-
Eng search techniques-Eng search techniques-
Eng search techniques-
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
 
System Update 2010 Annual Meeting
System Update 2010 Annual MeetingSystem Update 2010 Annual Meeting
System Update 2010 Annual Meeting
 
SciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoverySciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discovery
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v12016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
 
Annotopia open annotation services platform
Annotopia open annotation services platformAnnotopia open annotation services platform
Annotopia open annotation services platform
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
Ngsp
NgspNgsp
Ngsp
 

Andere mochten auch

The Grandtown Library Case Study Project Revised
The Grandtown Library Case Study Project RevisedThe Grandtown Library Case Study Project Revised
The Grandtown Library Case Study Project RevisedCassandra Justine Bradley
 
Case Study: Ottawa Public Library
Case Study: Ottawa Public LibraryCase Study: Ottawa Public Library
Case Study: Ottawa Public LibraryIdeaScale
 
Case study of surrey library
Case study of surrey libraryCase study of surrey library
Case study of surrey libraryYas Meet
 
Carnegie Library Lab case study - Digital Toybox
Carnegie Library Lab case study - Digital ToyboxCarnegie Library Lab case study - Digital Toybox
Carnegie Library Lab case study - Digital ToyboxCILIPScotland
 
PUBLIC LIBRARY DESIGN
PUBLIC LIBRARY DESIGNPUBLIC LIBRARY DESIGN
PUBLIC LIBRARY DESIGNsalsa moyara
 
Peckham Library Case Study
Peckham Library Case StudyPeckham Library Case Study
Peckham Library Case StudyFatima Akbar
 

Andere mochten auch (9)

The Grandtown Library Case Study Project Revised
The Grandtown Library Case Study Project RevisedThe Grandtown Library Case Study Project Revised
The Grandtown Library Case Study Project Revised
 
Case Study: Ottawa Public Library
Case Study: Ottawa Public LibraryCase Study: Ottawa Public Library
Case Study: Ottawa Public Library
 
Case study of surrey library
Case study of surrey libraryCase study of surrey library
Case study of surrey library
 
Punjab University Library Case Study
Punjab University Library Case StudyPunjab University Library Case Study
Punjab University Library Case Study
 
Carnegie Library Lab case study - Digital Toybox
Carnegie Library Lab case study - Digital ToyboxCarnegie Library Lab case study - Digital Toybox
Carnegie Library Lab case study - Digital Toybox
 
LIBRARY
LIBRARY LIBRARY
LIBRARY
 
PUBLIC LIBRARY DESIGN
PUBLIC LIBRARY DESIGNPUBLIC LIBRARY DESIGN
PUBLIC LIBRARY DESIGN
 
Seattle public library
Seattle public librarySeattle public library
Seattle public library
 
Peckham Library Case Study
Peckham Library Case StudyPeckham Library Case Study
Peckham Library Case Study
 

Ähnlich wie Case Study: Public Library of Science Thesaurus: Year One

Systematic Review
Systematic ReviewSystematic Review
Systematic Review2015UPM
 
Clinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching WorkshopClinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching WorkshopRobin Featherstone
 
Library skills for clinical anatomy 9566 jan 2011
Library skills for clinical anatomy 9566 jan 2011Library skills for clinical anatomy 9566 jan 2011
Library skills for clinical anatomy 9566 jan 2011Robin Featherstone
 
JPROT-TargetedProteomics-CallforPapers
JPROT-TargetedProteomics-CallforPapersJPROT-TargetedProteomics-CallforPapers
JPROT-TargetedProteomics-CallforPapersmanrai1953
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSMaaike Duine
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
 
How to Conduct a Systematic Search
How to Conduct a Systematic SearchHow to Conduct a Systematic Search
How to Conduct a Systematic SearchRobin Featherstone
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)markmac
 
2015 12 ebi_ganley_final
2015 12 ebi_ganley_final2015 12 ebi_ganley_final
2015 12 ebi_ganley_finalEmma Ganley
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceDavid Johnson
 
One Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific PublishersOne Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific PublishersPhilip Bourne
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015William Gunn
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
 
How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...LEARN Project
 

Ähnlich wie Case Study: Public Library of Science Thesaurus: Year One (20)

Systematic Review
Systematic ReviewSystematic Review
Systematic Review
 
Searching for Trials for a Systematic Review
Searching for Trials for a Systematic ReviewSearching for Trials for a Systematic Review
Searching for Trials for a Systematic Review
 
Using OA Content
Using OA ContentUsing OA Content
Using OA Content
 
Clinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching WorkshopClinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching Workshop
 
Library skills for clinical anatomy 9566 jan 2011
Library skills for clinical anatomy 9566 jan 2011Library skills for clinical anatomy 9566 jan 2011
Library skills for clinical anatomy 9566 jan 2011
 
JPROT-TargetedProteomics-CallforPapers
JPROT-TargetedProteomics-CallforPapersJPROT-TargetedProteomics-CallforPapers
JPROT-TargetedProteomics-CallforPapers
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOS
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
How to Conduct a Systematic Search
How to Conduct a Systematic SearchHow to Conduct a Systematic Search
How to Conduct a Systematic Search
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)
 
Open Access Publishing: More Readers, More Impact
Open Access Publishing: More Readers, More ImpactOpen Access Publishing: More Readers, More Impact
Open Access Publishing: More Readers, More Impact
 
2015 12 ebi_ganley_final
2015 12 ebi_ganley_final2015 12 ebi_ganley_final
2015 12 ebi_ganley_final
 
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
One Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific PublishersOne Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific Publishers
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
Prosdocimi ucb cdao
Prosdocimi ucb cdaoProsdocimi ucb cdao
Prosdocimi ucb cdao
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...
 

Mehr von Access Innovations, Inc.

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsAccess Innovations, Inc.
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8Access Innovations, Inc.
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Access Innovations, Inc.
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Access Innovations, Inc.
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Access Innovations, Inc.
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut ItAccess Innovations, Inc.
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityAccess Innovations, Inc.
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedAccess Innovations, Inc.
 

Mehr von Access Innovations, Inc. (20)

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
 
Smart submit
Smart submitSmart submit
Smart submit
 
Plos taxonomy beyond search dhug 2021
Plos taxonomy beyond search   dhug 2021Plos taxonomy beyond search   dhug 2021
Plos taxonomy beyond search dhug 2021
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacing
 
Data harmony update 2021
Data harmony update 2021 Data harmony update 2021
Data harmony update 2021
 
Atypon dhug2021
Atypon dhug2021Atypon dhug2021
Atypon dhug2021
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021
 
Asce more than just topic taxonomies
Asce more than just topic taxonomiesAsce more than just topic taxonomies
Asce more than just topic taxonomies
 
Acs discoverability-dhug2021
Acs discoverability-dhug2021Acs discoverability-dhug2021
Acs discoverability-dhug2021
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut It
 
Health Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItHealth Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut It
 
Why Keywords Don't Cut It
Why Keywords Don't Cut ItWhy Keywords Don't Cut It
Why Keywords Don't Cut It
 
Data Harmony update 2020 final
Data Harmony update 2020 finalData Harmony update 2020 final
Data Harmony update 2020 final
 
Data Harmony Update 2020 final
Data Harmony Update 2020 finalData Harmony Update 2020 final
Data Harmony Update 2020 final
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
DHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCRDHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCR
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
 

Case Study: Public Library of Science Thesaurus: Year One

  • 1. The PLOS Thesaurus: the first year Rachel Drysdale – Taxonomy Manager, PLOS DHUG 2014 11th February, 2014
  • 2. Public Library of Science - evolution 2000 PLOS founded 2003 PLOS Biology 2004 PLOS Medicine 2005 PLOS Computational Biology (June) PLOS Genetics (July) PLOS Pathogens (September) 2006 PLOS ONE 2007 PLOS Neglected Tropical Diseases 2
  • 3. Journal Article Count PLOS Biology 3,450 PLOS Medicine 2,626 PLOS Computational Biology 3,112 PLOS Genetics 4,048 PLOS Pathogens 3,639 PLOS ONE 87,296 PLOS Neglect Trop Diseases 2,444
  • 4. Journal Article Count PLOS Biology 3,450 PLOS Medicine 2,626 PLOS Computational Biology 3,112 PLOS Genetics 4,048 PLOS Pathogens 3,639 PLOS ONE 87,296 PLOS Neglect Trop Diseases 2,444 beautiful monster….
  • 5. Overview – today’s talk The Solution: Good Thesaurus + Machine Aided Indexing  Building the new Thesaurus with AI  The initial implementation at plos.org  MAIstro integration into Publishing workflow  Thesaurus maintenance The Service:  Content Discovery Article Analysis  Relative Metrics 5
  • 6. Starting point 2011 – the old Taxonomy Inadequate in content – just over 3100 specific terms Inflexible in structure – terms in pre-defined paths Housed in Editorial Manager ossified and difficult to update Author-chosen terms - association with article 6
  • 7. PLOS delivered to Access Innovations…. A copy of the old PLOS Taxonomy Over 2,000 suggested changes “Research analysis and methods” branch request Use cases: Subject Area-based searches Hierarchy-based exploration of our corpus Email Alerts based on Subject Area searches RSS Feeds based on Subject Areas 7
  • 8. Access Innovations added:  STEM vocabulary  Broader/Narrower term relationships  Rules for the Machine Aided Indexing  Synonyms  Analysis with respect to the PLOS corpus .....to and fro with PLOS …. Result: Vastly improved NISO Z-39.19-compliant thesaurus 8
  • 9. Statistics 9 Old Taxonomy A. I. Thesaurus Terms 3,132 10,156 Synonyms 0 3,291 Tiers 5 7 Rules 0 14,798
  • 10. Top-level Terms 1. Biology and life sciences 2. Computer and information sciences 3. Earth sciences 4. Engineering and technology 5. Environmental sciences and ecology 6. Medicine and health sciences 7. Physical sciences 8. Research and analysis methods 9. Science policy 10. Social sciences 10
  • 11. Infrastructure PLOS Taxonomy server: Thesaurus – plos2012thes Data Harmony Thesaurus Master and MAI Rule Builder Corpus fed to the Taxonomy Server for MAI Article by article Initial implementation: Title – Abstract - Results – Methods Top 8 hits selected 11
  • 12. Elapsed time from project kick-off until terms appeared on published articles: 9 months
  • 13. 13 Learning curve – teething troubles Not all articles had Subject Area terms – why not? Initial implementation – text to index: Title + Abstract* + Results + Methods Upon consideration – text to index: Full Text (though not references) Implementation of “all paths” Polyhierarchy implications
  • 14. Consider “White blood cells” Biology and life sciences Medicine and health sciences Immunology Immunology Immune cells Immune cells White blood cells White blood cells Biology and life sciences Biology and life sciences Cell biology Cell biology Cellular types Cellular types Animal cells Animal cells Blood cells Immune cells White blood cells White blood cells 14 The polyhierarchy and Search
  • 15. 15 Establishing update cycle - articles: Initial implementation: Entire back-corpus indexed at once New Papers: PLOS submits text to MAIstro at publication MAI returns terms and term frequencies PLOS stores terms in search engine
  • 16. 16 Establishing update cycle - thesauri: Separate instances (nerves): Production server – plosthes.2013-6 Working version – plosthes.2013-7 When ready to release a new version: Load onto test server – MAI corpus - Index Test: new/changed/deleted terms rule changes structural changes any implementation changes
  • 17. 17 Thesaurus updates – why? More terms : Memory T cells, Monocotyledons Errrm… : Report gene detection What? : Webs Hierarchy changes deemed desirable: Geographical locations Organisms (Un)Rule(y) : snails, fabrication, pumas
  • 22. 22 Rule-Building in MAIstro – Pumas before...
  • 23. 23 Rule-Building in MAIstro – Pumas before... p53 upregulated modifier of apoptosis or
  • 24. Rule-Building in MAIstro – Pumas after… 24
  • 25. 25
  • 26. 26 Thesaurus updates – prioritisation? Miss-hits and missed term reports: Ourselves: article pages Our readers: in email complaints in twitter in correspondence with our editorial staff via Journal and Saved Search alerts via article pages – Flagged Term reports
  • 27. 27
  • 28. 28 Things we learned – Thesaurus editorial Tension: strict and rigorous taxonomy/ontology construction vs user utility Abbreviations and Synonyms Issues that continue to exercise us: T cells/Memory T cells Obesity/Childhood obesity When should we make both explicit? Rule work – working to top 8
  • 29. 29 Building a new project - exports
  • 30. 30 Building a new project - import
  • 31. Content Discovery How has having the thesaurus changed the way that users interact with PLOS web sites?
  • 32. 32 • Journal alerts • Saved Searches • RSS feeds • Hierarchy exploration Problem: How to keep up? Solution: Current Awareness Tools
  • 33. 33
  • 50. Relative Metrics: Defining a Paper’s Peer Group 1. Group papers by Subject Area Accommodate multiple topics per paper 2. Group papers by age Important for comparison of cumulative measures like total downloads or citations 3. Determine norms for peer group The average usage of each paper is compared with the median usage of its peer group More on Relative Metrics at: http://www.plosone.org/static/almInfo#relativeMetrics 50
  • 53. 53
  • 54. 54
  • 55. Area of development - Editorial Workflow
  • 56. The PLOS Thesaurus and Peer Review Maintaining a copy of the PLOS thesaurus in Editorial Manager helps with editor and reviewer matching 56 Classifications for People Classifications for Papers
  • 57. The PLOS Thesaurus and Peer Review • Authors select Subject Area terms related to their article submissions • Editors and Reviewers select terms that represent their areas of expertise • Staff and Editors use these terms to help ensure editors and reviewers are well matched to the submissions they are handling 57
  • 58. Planned Enhancements • Automate the application of terms associated with Editors, Reviewers and submitted articles with MAIstro • Provide Editors and Staff with detailed terms to assist with reviewer selection and vetting – Academic disciplines help Editors gauge Subject Area relevance of potential Reviewers – Methods, protocols and model organisms help Editors gauge technical suitability of potential Reviewers 58
  • 59. 59 Jonas Dupuich Product Manager Patrick Polischuk Product Manager Sebastian Toomey Interaction Designer Jennifer Lin Senior Product Manager Martin Fenner ALM Technical Lead Kallie Huss Senior Publications Assistant John Chodacki Director - Product Management Dramatis personae:
  • 60. 60