SlideShare ist ein Scribd-Unternehmen logo
1 von 15
David Kuilman, Gina Donato, Dr. Rinke Hoekstra
A content standard for data-platform use cases:
Content Profiles
& linked documents
NISO Diversity of formats
February 10, 2021 11:00am
Working Group initiative to create a NISO standard for the interchange
of academic, research, and professional content, data, and semantics
2
Elsevier Data Platform vision

entity-driven processes
(Early) access
and visibility
Expedite shapes
Lineage
Provenance
Policy / license
Priority of
content and
authorship
Content is data
Content and data
operate seamlessly
Content structure
follows document
entity structure
Rich HTML5 literals
for UI/UX use cases
Role based
processing
Content typology
Granular
Context-based
using process
and purpose
intelligence
Content is
shared
All content can be
leveraged throughout
the platform by all
contributor/consumer
roles using a common
vocabulary
Zero organisational
boundaries
Policies for compliance
Continuous
flow and
hydration
Partial and
complete resources
Extensible types
and enrichments
Optimisation
of formats
Machine
learning
Human
interaction
Agile, extensible
and resilient
Fast services development
Nimble models
Extensible models
Arbitrary content (types)
Service level agreement
Handle exception flows
gracefully and informed
Business requirement: from a content perspective
Anatomy of content entity processes on a data platform
Source
Data
Harvesting Normalisation Extraction matching Linking Curation Publishing

 entity driven workflow
Classic document driven workflow

manuscript Internal format copyedit Mastercopy Product
mappings mappings
The Content Profiles & Linked Document standard (CP/LD) is the result of
adopting content platform principles to provide the flexibility, extensibility and
connectivity required on a
data platform for academic, research and professional content
Lets consider a few critical design considerations first

Pipeline to cyclic
Human-in-the-loop
Merging data entities and content entities on demand
Sourcing
Harvesting
Normalizing
Extraction
Matching
linking
Publishing
Key concept: think cyclic, not linear

Sourcing
Harvesting
Normalizing
Extraction
Matching
linking
Publishing
Sourcing
Harvesting
Normalizing
Extraction
Matching
linking
Publishing
Sourcing
Harvesting
Normalizing
Extraction
Matching
linking
Publishing
Sourcing Harvesting Normalisation Extraction matching Linking Curation Publishing

 in parallel workflows

 author

 review

 approve

 connect

 edit

 recommend

 annotate


Human-in-the-loop
Key concept: think human-in-the-loop and machine learning
Sourcing
Harvesting
Normalizing
Extraction
Matching
linking
Publishing
Gold set
Test sets
Human curation within
content centric workflows
Human curation within
Machine Learning
Contributor
Consumer
Continuous improvement
Content operations
Platform operations
Continuous deployment
Model operations
Content
artefacts
Enhanced
Content
artefacts
Human supervised
Content usage metrics
The CP/LD standard uses established standards to create the
format framework that supports data platform content
operations without compromise
Linked data and HTML5 unite syntax, structure and semantics
needed on the platform
HTML5
JSON-LD +
Structured narrative
Semantic data layer
XHTML dialect
Linked Data
Usage standard and guidelines
Independent of any particular use case
Content Profile standard & Linked Document
XML Schema
RDF Schema
SHACL
XML
Schema
RDF: Discovery
XML: consistency
JSON: messaging
JSON-LD: knowledge infusion
HTML5: representation
Business roles
This is a part of text that has a specific style (italic)
This is a paragraph
This paragraph is the abstract of the paper
This paragraph is the title of the paper
This is author Alba Grifoni
This is a citation of another paper
This is a result reported on in this paper
This is a mention of the “COVID-19” concept
This is a mention of the “SARS-CoV2” concept
This states that “SARS-CoV2” reactive “CD4+ T-cells” exist in ~40%-
60% of unexposed individuals, suggesting cross-reactive T-cell
recognition with “common cold”
doi:10.1126/sciimunol.aan5393
“55425663600”
hgraph:id-88f9e4ca-c776-3380-933b-f1218c4ef1fd (COVID-19)
hgraph:id-2ab6cd87-e543-3229-85ff-c862a90f415c (SARS-CoV2)
hgraph:id-88f9e4ca-c776-3380-933b-f1218c4ef1fd (T-CD4+)
hgraph:id-2ab6cd87-e543-3229-85ff-c862a90f415c (SARS-CoV2)
hgraph:id-a28e7725-1919-34f0-a648-45721d8bd6a2 (common cold)
reactive to
reactive to
The anatomy of a Linked
Document
service
service
service
service
service
service
service
service
service
service
assertions
documents
resources
Aggregations
products
Content Topics blueprint for data platform
Bespoke normalizers Linked Data processors Query
harversting
Harvested
manuscript
Normalized
document
Enriched article A finished
article
Article
Author
Document
Document
Document
Author
Document
Article
Document
Author
attributes
Manuscript
Conclusion
Abstract
Author
String
Author
String
Activating the platform: listen and merge application
An author manuscript
Author mention
Author as Person Entity
Author as Entity and representation
Conclusion
Abstract
service
service
service
merge
Activating the platform: merge topics and create a product view
After merging the topics, the
finished view offers:
‱ A manuscript becomes an
Document
‱ the position of an abstract
and a conclusion
‱ An person has been identified
as author
‱ The author string has been
identified within the
document.
‱ The author has entity
attributes
‱ The document assembly is a
scientific article of type
‘Finished’ because it satisfies
the above criteria
merge
Article Author
Author
attributes
Abstract
Author
String
Conclusion
Outside document
Inside document
HTML5 vocabulary
JSON-LD predicates
Relationships legend
A finished article
Key takeaways
‱ Content is data; treat it as data not as documents
‱ Normalization is great divider from files to entities, items and assertions
‱ Entity-designed data and Author-designed data become blended
‱ Machine learner and researcher forge alliance
On standards & formats

‱ RDF and XML schema technology (remain) backbone for information
modelling
‱ JSON, JSON-LD and HTML5 serialisations dominant for content standards
Working Group initiative to create a NISO standard for the interchange
of academic, research, and professional content, data, and semantics
Further information:
Kuliman "Content Profiles & linked documents"

Weitere Àhnliche Inhalte

Was ist angesagt?

Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018
Jisc RDM
 
2013 CrossRef Workshops Citing Data Ed Pentz
2013 CrossRef Workshops Citing Data Ed Pentz2013 CrossRef Workshops Citing Data Ed Pentz
2013 CrossRef Workshops Citing Data Ed Pentz
Crossref
 

Was ist angesagt? (20)

Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018
 
Building a national Data Repository Data Modelling
Building a national Data Repository Data ModellingBuilding a national Data Repository Data Modelling
Building a national Data Repository Data Modelling
 
Research Data Toolkit
Research Data ToolkitResearch Data Toolkit
Research Data Toolkit
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
UK Research Data Discovery Service metadata schema
UK Research Data Discovery Service metadata schemaUK Research Data Discovery Service metadata schema
UK Research Data Discovery Service metadata schema
 
2013 CrossRef Workshops Citing Data Ed Pentz
2013 CrossRef Workshops Citing Data Ed Pentz2013 CrossRef Workshops Citing Data Ed Pentz
2013 CrossRef Workshops Citing Data Ed Pentz
 
DMPOnline by Sarah Jones
DMPOnline by Sarah JonesDMPOnline by Sarah Jones
DMPOnline by Sarah Jones
 
Sharing & Licensing Research Outputs
Sharing & Licensing Research OutputsSharing & Licensing Research Outputs
Sharing & Licensing Research Outputs
 
data.bris - Use case, role and functionality for CKAN adoption
data.bris - Use case, role and functionality for CKAN adoptiondata.bris - Use case, role and functionality for CKAN adoption
data.bris - Use case, role and functionality for CKAN adoption
 
Grant Funding Programme
Grant Funding ProgrammeGrant Funding Programme
Grant Funding Programme
 
Rachel Bruce on DMP
Rachel Bruce on DMPRachel Bruce on DMP
Rachel Bruce on DMP
 
Wikis as Social Networks: Evolution and Dynamics
Wikis as Social Networks:Evolution and Dynamics Wikis as Social Networks:Evolution and Dynamics
Wikis as Social Networks: Evolution and Dynamics
 
Northumbria University case study
Northumbria University case studyNorthumbria University case study
Northumbria University case study
 
Crossref in your publishing workflow
Crossref in your publishing workflowCrossref in your publishing workflow
Crossref in your publishing workflow
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
Pre jisc datachampday_260318
Pre jisc datachampday_260318Pre jisc datachampday_260318
Pre jisc datachampday_260318
 
Lightning Talk - Angela Dappart
Lightning Talk - Angela DappartLightning Talk - Angela Dappart
Lightning Talk - Angela Dappart
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via Archivematica
 
Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016
 
Journal research data policy update
Journal research data policy updateJournal research data policy update
Journal research data policy update
 

Ähnlich wie Kuliman "Content Profiles & linked documents"

OpenKM commercial
OpenKM commercialOpenKM commercial
OpenKM commercial
gpalmerpujol
 
Serving Information Needs of Knowledge Workers
Serving Information Needs of Knowledge WorkersServing Information Needs of Knowledge Workers
Serving Information Needs of Knowledge Workers
Debdoot Mukherjee
 

Ähnlich wie Kuliman "Content Profiles & linked documents" (20)

Building an effective sharepoint team
Building an effective sharepoint teamBuilding an effective sharepoint team
Building an effective sharepoint team
 
OpenKM commercial
OpenKM commercialOpenKM commercial
OpenKM commercial
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
FHIR Client Development with .NET
FHIR Client Development with .NETFHIR Client Development with .NET
FHIR Client Development with .NET
 
LavaCon 2017 - Authored by Man and Machine: Interactive Documents?
LavaCon 2017 - Authored by Man and Machine: Interactive Documents?LavaCon 2017 - Authored by Man and Machine: Interactive Documents?
LavaCon 2017 - Authored by Man and Machine: Interactive Documents?
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigm
 
TSPUG: Content Management in SharePoint 2010
TSPUG: Content Management in SharePoint 2010TSPUG: Content Management in SharePoint 2010
TSPUG: Content Management in SharePoint 2010
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Serving Information Needs of Knowledge Workers
Serving Information Needs of Knowledge WorkersServing Information Needs of Knowledge Workers
Serving Information Needs of Knowledge Workers
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
 
Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009
 
Approaches to machine actionable links
Approaches to machine actionable linksApproaches to machine actionable links
Approaches to machine actionable links
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Microsoft SharePoint Syntex
Microsoft SharePoint SyntexMicrosoft SharePoint Syntex
Microsoft SharePoint Syntex
 
Mark Orange - SharePoint 2010 Content Types Model - SPC NZ 2011
Mark Orange - SharePoint 2010 Content Types Model - SPC NZ 2011Mark Orange - SharePoint 2010 Content Types Model - SPC NZ 2011
Mark Orange - SharePoint 2010 Content Types Model - SPC NZ 2011
 
Enterprise Content Management Migration Best Practices Feat Migrations From...
Enterprise Content Management Migration Best Practices   Feat Migrations From...Enterprise Content Management Migration Best Practices   Feat Migrations From...
Enterprise Content Management Migration Best Practices Feat Migrations From...
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 

Mehr von National Information Standards Organization (NISO)

Mehr von National Information Standards Organization (NISO) (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 
Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"
 
Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"
 

KĂŒrzlich hochgeladen

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

KĂŒrzlich hochgeladen (20)

Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
TỔNG ÔN TáșŹP THI VÀO LỚP 10 MÔN TIáșŸNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGở Â...
TỔNG ÔN TáșŹP THI VÀO LỚP 10 MÔN TIáșŸNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGở Â...TỔNG ÔN TáșŹP THI VÀO LỚP 10 MÔN TIáșŸNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGở Â...
TỔNG ÔN TáșŹP THI VÀO LỚP 10 MÔN TIáșŸNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGở Â...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 

Kuliman "Content Profiles & linked documents"

  • 1. David Kuilman, Gina Donato, Dr. Rinke Hoekstra A content standard for data-platform use cases: Content Profiles & linked documents NISO Diversity of formats February 10, 2021 11:00am Working Group initiative to create a NISO standard for the interchange of academic, research, and professional content, data, and semantics
  • 2. 2 Elsevier Data Platform vision 
entity-driven processes
  • 3. (Early) access and visibility Expedite shapes Lineage Provenance Policy / license Priority of content and authorship Content is data Content and data operate seamlessly Content structure follows document entity structure Rich HTML5 literals for UI/UX use cases Role based processing Content typology Granular Context-based using process and purpose intelligence Content is shared All content can be leveraged throughout the platform by all contributor/consumer roles using a common vocabulary Zero organisational boundaries Policies for compliance Continuous flow and hydration Partial and complete resources Extensible types and enrichments Optimisation of formats Machine learning Human interaction Agile, extensible and resilient Fast services development Nimble models Extensible models Arbitrary content (types) Service level agreement Handle exception flows gracefully and informed Business requirement: from a content perspective
  • 4. Anatomy of content entity processes on a data platform Source Data Harvesting Normalisation Extraction matching Linking Curation Publishing 
 entity driven workflow Classic document driven workflow
 manuscript Internal format copyedit Mastercopy Product mappings mappings
  • 5. The Content Profiles & Linked Document standard (CP/LD) is the result of adopting content platform principles to provide the flexibility, extensibility and connectivity required on a data platform for academic, research and professional content Lets consider a few critical design considerations first
 Pipeline to cyclic Human-in-the-loop Merging data entities and content entities on demand
  • 6. Sourcing Harvesting Normalizing Extraction Matching linking Publishing Key concept: think cyclic, not linear
 Sourcing Harvesting Normalizing Extraction Matching linking Publishing Sourcing Harvesting Normalizing Extraction Matching linking Publishing Sourcing Harvesting Normalizing Extraction Matching linking Publishing Sourcing Harvesting Normalisation Extraction matching Linking Curation Publishing 
 in parallel workflows 
 author 
 review 
 approve 
 connect 
 edit 
 recommend 
 annotate 
 Human-in-the-loop
  • 7. Key concept: think human-in-the-loop and machine learning Sourcing Harvesting Normalizing Extraction Matching linking Publishing Gold set Test sets Human curation within content centric workflows Human curation within Machine Learning Contributor Consumer Continuous improvement Content operations Platform operations Continuous deployment Model operations Content artefacts Enhanced Content artefacts Human supervised Content usage metrics
  • 8. The CP/LD standard uses established standards to create the format framework that supports data platform content operations without compromise Linked data and HTML5 unite syntax, structure and semantics needed on the platform
  • 9. HTML5 JSON-LD + Structured narrative Semantic data layer XHTML dialect Linked Data Usage standard and guidelines Independent of any particular use case Content Profile standard & Linked Document XML Schema RDF Schema SHACL XML Schema RDF: Discovery XML: consistency JSON: messaging JSON-LD: knowledge infusion HTML5: representation Business roles
  • 10. This is a part of text that has a specific style (italic) This is a paragraph This paragraph is the abstract of the paper This paragraph is the title of the paper This is author Alba Grifoni This is a citation of another paper This is a result reported on in this paper This is a mention of the “COVID-19” concept This is a mention of the “SARS-CoV2” concept This states that “SARS-CoV2” reactive “CD4+ T-cells” exist in ~40%- 60% of unexposed individuals, suggesting cross-reactive T-cell recognition with “common cold” doi:10.1126/sciimunol.aan5393 “55425663600” hgraph:id-88f9e4ca-c776-3380-933b-f1218c4ef1fd (COVID-19) hgraph:id-2ab6cd87-e543-3229-85ff-c862a90f415c (SARS-CoV2) hgraph:id-88f9e4ca-c776-3380-933b-f1218c4ef1fd (T-CD4+) hgraph:id-2ab6cd87-e543-3229-85ff-c862a90f415c (SARS-CoV2) hgraph:id-a28e7725-1919-34f0-a648-45721d8bd6a2 (common cold) reactive to reactive to The anatomy of a Linked Document
  • 11. service service service service service service service service service service assertions documents resources Aggregations products Content Topics blueprint for data platform Bespoke normalizers Linked Data processors Query harversting Harvested manuscript Normalized document Enriched article A finished article
  • 12. Article Author Document Document Document Author Document Article Document Author attributes Manuscript Conclusion Abstract Author String Author String Activating the platform: listen and merge application An author manuscript Author mention Author as Person Entity Author as Entity and representation Conclusion Abstract service service service merge
  • 13. Activating the platform: merge topics and create a product view After merging the topics, the finished view offers: ‱ A manuscript becomes an Document ‱ the position of an abstract and a conclusion ‱ An person has been identified as author ‱ The author string has been identified within the document. ‱ The author has entity attributes ‱ The document assembly is a scientific article of type ‘Finished’ because it satisfies the above criteria merge Article Author Author attributes Abstract Author String Conclusion Outside document Inside document HTML5 vocabulary JSON-LD predicates Relationships legend A finished article
  • 14. Key takeaways ‱ Content is data; treat it as data not as documents ‱ Normalization is great divider from files to entities, items and assertions ‱ Entity-designed data and Author-designed data become blended ‱ Machine learner and researcher forge alliance On standards & formats
 ‱ RDF and XML schema technology (remain) backbone for information modelling ‱ JSON, JSON-LD and HTML5 serialisations dominant for content standards Working Group initiative to create a NISO standard for the interchange of academic, research, and professional content, data, and semantics Further information:

Hinweis der Redaktion

  1. XML DTD 5.6 (OPS), XOCS
 Common Index Profile (CIP) -> structure & metadata NLP: CM2, FPE, Leadmine, MedScan, Termite (SciBite) 
 Linking: Parity, FPE,