This presentation was provided by David Kuliman of Elsevier, during the NISO event "Content Presentation: Diversity of Formats." The webinar was held on February 10, 2021.
1. David Kuilman, Gina Donato, Dr. Rinke Hoekstra
A content standard for data-platform use cases:
Content Profiles
& linked documents
NISO Diversity of formats
February 10, 2021 11:00am
Working Group initiative to create a NISO standard for the interchange
of academic, research, and professional content, data, and semantics
3. (Early) access
and visibility
Expedite shapes
Lineage
Provenance
Policy / license
Priority of
content and
authorship
Content is data
Content and data
operate seamlessly
Content structure
follows document
entity structure
Rich HTML5 literals
for UI/UX use cases
Role based
processing
Content typology
Granular
Context-based
using process
and purpose
intelligence
Content is
shared
All content can be
leveraged throughout
the platform by all
contributor/consumer
roles using a common
vocabulary
Zero organisational
boundaries
Policies for compliance
Continuous
flow and
hydration
Partial and
complete resources
Extensible types
and enrichments
Optimisation
of formats
Machine
learning
Human
interaction
Agile, extensible
and resilient
Fast services development
Nimble models
Extensible models
Arbitrary content (types)
Service level agreement
Handle exception flows
gracefully and informed
Business requirement: from a content perspective
4. Anatomy of content entity processes on a data platform
Source
Data
Harvesting Normalisation Extraction matching Linking Curation Publishing
⊠entity driven workflow
Classic document driven workflowâŠ
manuscript Internal format copyedit Mastercopy Product
mappings mappings
5. The Content Profiles & Linked Document standard (CP/LD) is the result of
adopting content platform principles to provide the flexibility, extensibility and
connectivity required on a
data platform for academic, research and professional content
Lets consider a few critical design considerations firstâŠ
Pipeline to cyclic
Human-in-the-loop
Merging data entities and content entities on demand
7. Key concept: think human-in-the-loop and machine learning
Sourcing
Harvesting
Normalizing
Extraction
Matching
linking
Publishing
Gold set
Test sets
Human curation within
content centric workflows
Human curation within
Machine Learning
Contributor
Consumer
Continuous improvement
Content operations
Platform operations
Continuous deployment
Model operations
Content
artefacts
Enhanced
Content
artefacts
Human supervised
Content usage metrics
8. The CP/LD standard uses established standards to create the
format framework that supports data platform content
operations without compromise
Linked data and HTML5 unite syntax, structure and semantics
needed on the platform
9. HTML5
JSON-LD +
Structured narrative
Semantic data layer
XHTML dialect
Linked Data
Usage standard and guidelines
Independent of any particular use case
Content Profile standard & Linked Document
XML Schema
RDF Schema
SHACL
XML
Schema
RDF: Discovery
XML: consistency
JSON: messaging
JSON-LD: knowledge infusion
HTML5: representation
Business roles
10. This is a part of text that has a specific style (italic)
This is a paragraph
This paragraph is the abstract of the paper
This paragraph is the title of the paper
This is author Alba Grifoni
This is a citation of another paper
This is a result reported on in this paper
This is a mention of the âCOVID-19â concept
This is a mention of the âSARS-CoV2â concept
This states that âSARS-CoV2â reactive âCD4+ T-cellsâ exist in ~40%-
60% of unexposed individuals, suggesting cross-reactive T-cell
recognition with âcommon coldâ
doi:10.1126/sciimunol.aan5393
â55425663600â
hgraph:id-88f9e4ca-c776-3380-933b-f1218c4ef1fd (COVID-19)
hgraph:id-2ab6cd87-e543-3229-85ff-c862a90f415c (SARS-CoV2)
hgraph:id-88f9e4ca-c776-3380-933b-f1218c4ef1fd (T-CD4+)
hgraph:id-2ab6cd87-e543-3229-85ff-c862a90f415c (SARS-CoV2)
hgraph:id-a28e7725-1919-34f0-a648-45721d8bd6a2 (common cold)
reactive to
reactive to
The anatomy of a Linked
Document
13. Activating the platform: merge topics and create a product view
After merging the topics, the
finished view offers:
âą A manuscript becomes an
Document
âą the position of an abstract
and a conclusion
âą An person has been identified
as author
âą The author string has been
identified within the
document.
âą The author has entity
attributes
âą The document assembly is a
scientific article of type
âFinishedâ because it satisfies
the above criteria
merge
Article Author
Author
attributes
Abstract
Author
String
Conclusion
Outside document
Inside document
HTML5 vocabulary
JSON-LD predicates
Relationships legend
A finished article
14. Key takeaways
âą Content is data; treat it as data not as documents
âą Normalization is great divider from files to entities, items and assertions
âą Entity-designed data and Author-designed data become blended
âą Machine learner and researcher forge alliance
On standards & formatsâŠ
âą RDF and XML schema technology (remain) backbone for information
modelling
âą JSON, JSON-LD and HTML5 serialisations dominant for content standards
Working Group initiative to create a NISO standard for the interchange
of academic, research, and professional content, data, and semantics
Further information:
Hinweis der Redaktion
XML DTD 5.6 (OPS), XOCS⊠Common Index Profile (CIP) -> structure & metadata
NLP: CM2, FPE, Leadmine, MedScan, Termite (SciBite) âŠ
Linking: Parity, FPE, âŠ