SlideShare ist ein Scribd-Unternehmen logo
1 von 84
Downloaden Sie, um offline zu lesen
Scientific Knowledge Graphs: An Overview
Dr Angelo Salatino
Knowledge Media Institute
The Open University
United Kingdom
Université Libre de Bruxelles - 12th May 2021
About me – Angelo Salatino
Research Associate and Associate Lecturer at the Open University
Research Interests: i) new technologies for classifying scientific
papers according to their relevant research topics, and ii) how the
research output of academia fosters innovation in the industry
At the SKM3 team we produce innovative approaches leveraging
large-scale data mining, semantic technologies, machine learning,
and visual analytics to extract meaning from scholarly data and
shed light on the research dynamic
angelo.salatino@open.ac.uk https://salatino.org @angelosalatino
This work is licensed under a Creative Commons Attribution 4.0
International License.
Agenda
• Scientific Knowledge Graphs
• AIDA
• Use cases of AIDA
• Practical tests
Why do we need Scientific Knowledge Graphs?
Science of Science
Picture from the cover of Science Vol 361, Issue 6408
Science of Science
• Science of Science is a multidisciplinary field which helps us to
understand in a quantitative fashion the evolution of science.
• This is possible by capitalising on large amounts of data scientists
produce nowadays:
• Research articles
• Pre-prints
• Grant proposals
• Patents
• Spoiler: SKGs come quite handy for structuring and collecting such
data
Scientific Knowledge Graphs
Research dissemination
Scholarly Data
Improving Editorial Workflow and
Metadata Quality at Springer Nature.
Identifying the research topics that best describe the scope of a scientific publication is a
crucial task for editors, in particular because the quality of these annotations determine how
effectively users are able to discover the right content in online libraries. For this reason,
Springer Nature, the world’s largest academic book publisher, has traditionally entrusted this
task to their most expert editors. These editors manually analyse all new books, possibly
including hundreds of chapters, and produce a list of the most relevant topics. Hence, this
process has traditionally been very expensive, time-consuming, and confined to a few senior
editors. For these reasons, back in 2016 we developed Smart Topic Miner (STM), an ontology-
driven application that assists the Springer Nature editorial team in annotating the volumes of
all books covering conference proceedings in Computer Science. Since then STM has been
regularly used by editors in Germany, China, Brazil, India, and Japan, …
Angelo Salatino
Francesco Osborne
Aliaksandr Birukou
Enrico Motta
The Open University
Springer Nature
The 18th International Semantic Web Conference (ISWC 2019)
Affiliations
Authors
Citations
References
Conference/Journal
Text: Title, Abstract
Keywords
Scholarly data, Bibliographic metadata, Topic classification, Topic detection, …
Scholarly Data
Improving Editorial Workflow and
Metadata Quality at Springer Nature.
Identifying the research topics that best describe the scope of a scientific publication is a
crucial task for editors, in particular because the quality of these annotations determine how
effectively users are able to discover the right content in online libraries. For this reason,
Springer Nature, the world’s largest academic book publisher, has traditionally entrusted this
task to their most expert editors. These editors manually analyse all new books, possibly
including hundreds of chapters, and produce a list of the most relevant topics. Hence, this
process has traditionally been very expensive, time-consuming, and confined to a few senior
editors. For these reasons, back in 2016 we developed Smart Topic Miner (STM), an ontology-
driven application that assists the Springer Nature editorial team in annotating the volumes of
all books covering conference proceedings in Computer Science. Since then STM has been
regularly used by editors in Germany, China, Brazil, India, and Japan, …
Angelo Salatino
Francesco Osborne
Aliaksandr Birukou
Enrico Motta
mantic Web Conference (ISWC 2019)
Authors
Conference/Journal
Text: Title, Abstract
Keywords
Scholarly data, Bibliographic metadata, Topic classification, Topic detection, …
Keywords
Topic Detection, Science Of Science, Topic classification, Semantic Web, …
23rd International Conference on Theory and Practice of Digital Libraries(TPDL 2019)
Conference/Journal
The 17th International Semantic Web Conference (ISWC 2018)
Conference/Journal
Scholarly Data
<publish>
Project B
<co-op>
Project A
Other
literature
<$$$>
<cite>
Researcher
Institution
Funder
Project
Community
Courtesy of Andrea Mannocci from “Big Scholarly Data and Applications”
Scientific
Knowledge
Graph
Scientific Knowledge Graphs (SKGs) are a way
for representing scholarly knowledge in a
structured, interlinked, and semantically rich
manner.
Scientific Knowledge Graph - Definition
Given a set of entities E, and a set of relations R, a Scientific Knowledge Graph is a
directed multi-relational graph G that comprises triples (subject, predicate, object)
and is a subset of the cross product G ⊆ E ⨉ R ⨉ E.
Nodes and edges have well-defined meanings
Representation through Resource Description Framework
RDF is a standard for data interchange that is used for representing highly interconnected data.
Each RDF statement is a three-part structure consisting of resources where every resource is
identified by a URI. Representing data in RDF allows information to be easily identified,
disambiguated and interconnected by AI systems.
Previous graph in RDF (NT format):
<https://skg.org/paper_635219> <https://skg.org/sc#title> “Detection, Analysis, …”@en .
<https://skg.org/paper_635219> <https://skg.org/sc#abstract> “Analysing rese …”@en .
<https://skg.org/paper_635219> <https://skg.org/sc#has_keyword> “Scholarly Communication”@en .
<https://skg.org/paper_635219> <https://skg.org/sc#type> <https://skg.org/sc#paper> .
<https://skg.org/paper_635219> <https://skg.org/sc#has_author> <https://skg.org/angelo_salatino> .
<https://skg.org/paper_635219> <https://skg.org/sc#has_author> <https://skg.org/francesco_osborne> .
<https://skg.org/paper_635219> <https://skg.org/sc#has_author> <https://skg.org/andrea_mannocci> .
<https://skg.org/angelo_salatino> <https://skg.org/sc#has_affiliation> <https://skg.org/open_university> .
<https://skg.org/francesco_osborne> <https://skg.org/sc#has_affiliation> <https://skg.org/open_university> .
<https://skg.org/andrea_mannocci> <https://skg.org/sc#has_affiliation> <https://skg.org/italian_research_council> .
However …
Not all SKGs are published in RDF. E.g. sample from Dimensions
{
"format":3,
"status":"active",
"id":"pub.1009237776",
"publication_type":"article",
"doi":"10.1093/bybil/49.1.221",
"version_of_record":"https://doi.org/10.1093/bybil/49.1.221",
"pmid":"2650905",
"pmcid":"5381240",
"title":"Does Penicillin Kill Bacteria?",
"year":2017,
"concepts":{
"structure":0.3,
"fire":0.34,
"case study":0.01,
"serviceability":0.6,
"damage":0.12,
"residual mechanical properties":0.67
},
"publication_date":"2017-12-25",
"volume":"32",
"issue":"4",
"pages":"330-333",
….
"journal":{
"id":"jour.1138253",
"title":"The Journal of Clinical Evidence",
"issn":"0068-2691",
"eissn":"2044-9437"
},
"publisher":{
"id":"pblshr.1001577",
"name":"Radiological Society of North America (RSNA)"
},
"journal_lists":[
"ERA 2015",
"Norwegian register level 2",
"PubMed"
],
"clinical_trials":[
"NCT00605345"
],
"open_access_categories":[
"closed"
],
"author_affiliations":[{
"first_name":"Ian",
"last_name":"Bobbington",
"grid_ids":["grid.5335.0", "grid.1001.0"]},
….
Big Scholarly Datasets
• Web of Science
• Scopus
• Google Scholar
• Microsoft Academic Graph
• MA-KG, ma-graph.org
• PubMed
• Dimensions
• Semantic Scholar
• DBLP
• Open Academic Graph
• ScholarlyData
• PID Graph
• Open Research Knowledge Graph
• OpenCitations
• OpenAIRE research graph
• Crossref
• Academy/Industry Dynamics KG
(AIDA)
Disclaimer: this is far from being exhaustive
Differences between datasets
All these datasets are different from
each other:
• size
• scope
• quality
• mistakes, author disambiguation
• index vs. scraping
• comprehensiveness
• integration with other sources
• format
• access to data: license
“The comparison considers all scientific
documents from the period 2008–2017
covered by these data sources.”
Picture from Martijn Visser, Nees Jan van Eck, and Ludo Waltman. "Large-scale comparison of bibliographic
data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic." (2021).
Differences between datasets
All these datasets are different from
each other:
• size
• scope
• quality
• mistakes, author disambiguation
• index vs. scraping
• comprehensiveness
• integration with other sources
• format
• access to data: license
• MAG, Dimensions, Scopus, WOS cover
all areas of Science
• DBLP covers Computer Science
• PubMed covers the field of Medicine
• Semantic Scholar covers Computer
Science and Medicine
Differences between datasets
All these datasets are different from
each other:
• size
• scope
• quality
• mistakes, author disambiguation
• index vs. scraping
• comprehensiveness
• integration with other sources
• format
• access to data: license
• Cleaning data.
• Removing duplicates (same document
can appear in multiple places on the
web)
• Disambiguating authors
• Disambiguating affiliations
• Disambiguating references
WoS > Scopus > MAG
Example of paper formats
There are automatic tools like GROBID that with time get
better and better in extracting metadata, but still it makes
errors
Differences between datasets
All these datasets are different from each
other:
• size
• scope
• quality
• mistakes, author disambiguation
• index vs. scraping
• comprehensiveness
• integration with other sources
• format
• access to data: license
• Web Of Science, Scopus, PubMed index
paper:
• Papers are added to the collection in a
controlled way (metadata are curated)
• MAG, Dimensions, Google Scholar scrape
the web (journal websites) and PDFs.
• This also leads to other quality issues like
identifying the correct metadata
Differences between datasets
All these datasets are different from
each other:
• size
• scope
• quality
• mistakes, author disambiguation
• index vs. scraping
• comprehensiveness
• integration with other sources
• format
• access to data: license
MAG
Dimensions
Crossref
Scopus
WOS
Semantic
Scholar
OpenAIRE
RG
OpenAG
PubMed
title x x x x x x x x x
id x x x x x x x x x
abstract x p x x x x x x x
year x x x x x x x x x
references x x x x x x x x
authors x x x x x x x x x
doi x x x x x x x x x
topics x x x x x
citationcount x x x x x x x
conferences x x x x p x x
journal x x x x x p x x x
authors keywords x x
Legend
x = ok
p = partial information
Differences between datasets
All these datasets are different from
each other:
• size
• scope
• quality
• mistakes, author disambiguation
• index vs. scraping
• comprehensiveness
• integration with other sources
• format
• access to data: license
Description New Scopus RIS tag
Abbreviated source title J2
Abstract AB
Affiliations AD
Article number C7
Article title TI
Authors AU
Chemical name and CAS registry number N1
Cited by count N1
Conference Code N1
CODEN N1
Conference name T2
Correspondence name N1
Conference date Y2
DOI DO
Editor A2
End tag ER
Export date N1
First page SP
Funding Details N1
ISSN/ISBN/EISSN SN
Issue IS
Keywords KW
Language LA
Last page EP
Conference Location CY
Manufacturers N1
PMID/PMCID C2
Proceedings title C3
Publication year PY
Publisher PB
References N1
Scopus database DB
Scopus URL UR
Second article title ST
Sequence database accession number N1
Source title T2
Source type TY
Document type M3
Conference sponsors A4
Scopus TAGS
Differences between datasets
All these datasets are different from
each other:
• size
• scope
• quality
• mistakes, author disambiguation
• index vs. scraping
• comprehensiveness
• integration with other sources
• format
• access to data: license
Open Academic Graph integrates MAG
and Aminer
OpenAIRE research graph integrates
Crossref, Unpaywall, ORCID and MAG
AIDA integrates MAG, GRID, DBPEDIA,
CSO
Differences between datasets
All these datasets are different from
each other:
• size
• scope
• quality
• mistakes, author disambiguation
• index vs. scraping
• comprehensiveness
• integration with other sources
• format
• access to data: license
• Dimension, PubMed, Semantic
Scholar, Crossref are distributed in
JSON
• WOS, MAG in TSV files (MAKG as RDF)
• AIDA, ScholarlyData, OpenCitations in
RDF
• DBLP in XML
Differences between datasets
All these datasets are different from each
other:
• size
• scope
• quality
• mistakes, author disambiguation
• index vs. scraping
• comprehensiveness
• integration with other sources
• format
• access to data: license
Some datasets are available for free:
• Semantic Scholar [ODC-BY]
• Dimensions.ai (if you are a scholar)
Manageable fee
• Mag (50$ for downloading it) [ODC-BY]
Costly
• Scopus
• Web Of Science
Not available to buy
• Google Scholar
Academy/Industry Dynamics KG
Academia/Industry DynAmics (AIDA) Knowledge Graph
• 14M papers and 8M patents annotated with research topics from the Computer
Science Ontology (CSO)
• 4M papers and 5M patents classified according to the type of the author’s
affiliations (academia, industry, or collaborative) and 66 industrial sectors from
Industrial Sectors Ontology (INDUSO)
• Released as an RDF Graph and available via SPARQL or as a dump
http://w3id.org/aida
AIDA pipeline
Research
Papers
Patents
Academia/Industry
DynAmics (AIDA)
Knowledge Graph
AIDA Schema
INDUSO.ttl
Computer
Science
Ontology
Filtering
documents
Filtering
documents
CSO
Classifier
Extraction of:
- affiliation types
- industry sectors
RDF
Generator
AIDA pipeline
Research
Papers
Patents
Academia/Industry
DynAmics (AIDA)
Knowledge Graph
AIDA Schema
INDUSO.ttl
Computer
Science
Ontology
Filtering
documents
Filtering
documents
CSO
Classifier
Extraction of:
- affiliation types
- industry sectors
RDF
Generator
AIDA pipeline
Research
Papers
Patents
Academia/Industry
DynAmics (AIDA)
Knowledge Graph
AIDA Schema
INDUSO.ttl
Computer
Science
Ontology
Filtering
documents
Filtering
documents
CSO
Classifier
Extraction of:
- affiliation types
- industry sectors
RDF
Generator
pip install cso-classifier
Salatino, A.A., Osborne, F., Thanapalasingam, T., Motta, E.: The CSO Classifier: Ontology-Driven
Detection of Research Topics in Scholarly Articles. In: TPDL 2019: 23rd International Conference
on Theory and Practice of Digital Libraries. Springer.
CSO Classifier
Uses state-of-the-art technologies to parse documents and recognise
research concepts/topics. As input, it takes the metadata associated with a
research paper (title, abstract, keywords) and returns a selection of
research concepts drawn from the Computer Science Ontology
Salatino, Angelo A., et al. "The CSO classifier:
Ontology-driven detection of research topics in
scholarly articles." International Conference on Theory
and Practice of Digital Libraries. Springer, Cham, 2019.
Syntactic Module
• We split the text in unigrams, bigrams and trigrams
• For each n-gram we measure the Levenshtein similarity with the topics in CSO
• We select CSO topics having similarity above or equal to 0.94 with n-grams
• Helps handling plurals, hyphenated topics, and American vs. British spelling such as:
• “knowledge based systems” and “knowledge-based systems”
• “database” and “databases”
• “data visualisation” and “data visualization”
Semantic Module
• We used a Word Embedding model to capture semantics of words.
• We process the documents
• for each relevant word
• we retrieve from the model its related words
• then we check if those words are in the Computer Science Ontology.
Word Embedding model
“king” = [0.32, 0.76,…]
“queen” = [0.42, 0.76,…]
“woman” = [0.56, 0.43,…]
“man” = [0.59, 0.42,...]
king + (woman – man) = queen
It locates synonyms
(related topics) close to
each other in this vector
space: high cosine
similarity
Semantic Module
Word Embedding model
• We used titles and abstracts from 4.5M papers in Computer Science
• Pre-processed text:
• Topic replacement – “digital libraries” → “digital_libraries”
• Collocation analysis – “highest_accuracies”, “highly_cited_journals”
• Trained word embeddings model (word2vec)
method
skipgram
emb. size
128
window size
10
negative
5
max iter.
5
min-count cutoff
10
Semantic Module
Entity Extraction
• POS tagger, and grammar-based chunk parser <JJ.*>*<NN.*>+
“digital libraries”
CSO concept identification
• Selects all CSO topics found in the top-10 similar words of the resulting n-grams
(with cosine similarity > 0.7)
Semantic Module
Concept ranking
• We assign a score to each identified topic:
• Frequency – number of times it was inferred
• Diversity – number of unique text chunks from which it was inferred
Concept Selection
• Elbow method
CSO Topic score
domain ontologies 40
semantic web 40
ontology learning 40
data mining 40
heterogeneous resources 24
semantics 24
world wide web 10
network architecture 6
scholarly communication 6
ontology matching 6
… …
Post Processing
Combination of output
Semantic enhancement
• We use the superTopicOf to enhance the output set
• E.g., if “machine learning” then also “artificial intelligence”
• Provides wider context for the analysed paper
• Enables analytics on high-level abstract topics (e.g., digital libraries)
AIDA pipeline
Research
Papers
Patents
Academia/Industry
DynAmics (AIDA)
Knowledge Graph
AIDA Schema
INDUSO.ttl
Computer
Science
Ontology
Filtering
documents
Filtering
documents
CSO
Classifier
Extraction of:
- affiliation types
- industry sectors
RDF
Generator
AIDA pipeline
Research
Papers
Patents
Academia/Industry
DynAmics (AIDA)
Knowledge Graph
AIDA Schema
INDUSO.ttl
Computer
Science
Ontology
Filtering
documents
Filtering
documents
CSO
Classifier
Extraction of:
- affiliation types
- industry sectors
RDF
Generator
Scholarly Data++
Improving Editorial Workflow and
Metadata Quality at Springer Nature.
Identifying the research topics that best describe the scope of a scientific publication is a
crucial task for editors, in particular because the quality of these annotations determine how
effectively users are able to discover the right content in online libraries. For this reason,
Springer Nature, the world’s largest academic book publisher, has traditionally entrusted this
task to their most expert editors. These editors manually analyse all new books, possibly
including hundreds of chapters, and produce a list of the most relevant topics. Hence, this
process has traditionally been very expensive, time-consuming, and confined to a few senior
editors. For these reasons, back in 2016 we developed Smart Topic Miner (STM), an ontology-
driven application that assists the Springer Nature editorial team in annotating the volumes of
all books covering conference proceedings in Computer Science. Since then STM has been
regularly used by editors in Germany, China, Brazil, India, and Japan, …
Angelo Salatino
Francesco Osborne
Aliaksandr Birukou
Enrico Motta
The Open University
Springer Nature
The 18th International Semantic Web Conference (ISWC 2019)
Affiliations
Authors
Citations
References
Conference/Journal
Text: Title, Abstract, Keywords
scholarly data, semantic web, data mining, ontology, digital libraries, …
Topics
Affiliation Types
Academia
Industry
Keywords
Scholarly data, Bibliographic metadata, Topic classification,
Industrial Sectors
Publishing
What can we do with it?
Research Flow: Understanding the Knowledge Flow between Academia and
Industry
Each research topic is represented through 4 signals:
Papers from Academia (RA)
Papers from Industry (RI)
Patents from Academia (PA)
Patents from Industry (PI)
A. Salatino, F. Osborne, E. Motta. ResearchFlow:
Understanding the Knowledge Flow between
Academia and Industry. In Knowledge Engineering and
Knowledge Management – 22nd International
Conference, EKAW 2020, Springer, 2020
Diachronic analysis of topics
• First, we normalized all signals according to the ones associated to the main topic
Computer Science
• We devised two indices: RP and AI
𝑅𝑃!
=
𝑅!
− 𝑃!
𝑅!
+ 𝑃! ; 𝐴𝐼!
=
𝐴!
− 𝐼!
𝐴!
+ 𝐼!
• We performed a global analysis in 2007-18
• Topic evolution: we split the time period split in 4 windows of 3 years each,
computed RP and AI, and used the slope 𝛼 of the line 𝑓 𝑥 = 𝛼 . 𝑥 + 𝛽 to assess
its evolution
Diachronic analysis of topics
Distribution of topics according to RP and AI in 2007-18
Topic evolution in 2007-2018 - examples
Forecasting Topic Impact on Industry
• We created a new approach for predicting the impact of a topic on industry.
• It uses four temporal time-series: i) publications from academia, ii) publications from
industry, iii) patents from academia, and iv) patents from industry.
• We tested it on the task of predicting if an emergent research topic will have a
significant impact on industry (> 50 patents) in the following 10 years.
• This evaluation substantiates the hypothesis that considering the four timeseries
separately is conducive to higher quality predictions and suggests that RI and RA
are good indicators for PI.
Machine Learning approach
We used:
• Logistic Regression (LR)
• Random Forest (RF)
• AdaBoost (AB)
• Convoluted Neural Network (CNN)
• Long Short-term Memory Neural Network (LSTM)
On several combinations of time-series: RA, RI, PA and PI
Forecasting Topic Impact on Industry
Analysing Conferences
Conference Dashboard
Angioni, Simone, et al. "The AIDA Dashboard: Analysing Conferences with Semantic Technologies."
Conference Dashboard
Conference Dashboard
Conference Dashboard
Conference Dashboard
Conference Dashboard
Conference Dashboard
Conference Dashboard
Conference Dashboard
Let’s get our hands dirty
AIDA35K – A similar but not-so-similar version of AIDA
Download: http://aida.kmi.open.ac.uk/aida35k/downloads/aida35k.ttl.zip
AIDA35K – Stats
• Contains 35 thousand papers in the field of Semantic Web and Neural Networks
• 249,969 facts (triples)
• 26 different relationships
Download: http://aida.kmi.open.ac.uk/aida35k/downloads/aida35k.ttl.zip
AIDA35K
Relationships from paper
• hasAuthor states the author of the paper
• hasConfName and hasConfSeries provide details
about the conference: “The 21st World Wide
Web Conference” and “webconf”
• hasCsoEnhancedTopic, topics extracted with the
CSO Classifier
• hasEntityType defines the type of document
“paper”
• hasJourName states the name of the journal
• hasReference points to all referenced papers
• hasType defines whether the paper is from
academia, industry of collaborative
• hasIndustrialSector, if a paper is industrial it
describes the company industrial sector
• hasYear states the publishing year
Additional relationships from paper with reification
• hasAffiliationDistribution describes the
affiliation of authors. The object of this
relationship is another statement: reified
object.
• This reified object then contains
hasAffiliation and hasAffiliation-weight
identifying the affiliation of the paper and
the percentage of authors belonging to
that affiliation.
To better understand reification
• Imagine there are three authors Angelo, Francesco from The Open University and
Dimitris from the Université Libre De Bruxelles who co-author a paper.
• In simple RDF:
@prefix sc: <http://aida.kmi.open.ac.uk/aida35k/ontology#>.
<https://aida35k.org/p_654> sc:hasEntityType sc:paper .
<https://aida35k.org/p_654> sc:hasAuthor <https://aida35k.org/angelo_salatino> .
<https://aida35k.org/p_654> sc:hasAuthor <https://aida35k.org/francesco_osborne> .
<https://aida35k.org/p_654> sc:hasAuthor <https://aida35k.org/dimitris_sacharidis> .
<https://aida35k.org/p_654> sc:hasAffiliation "The Open University" .
<https://aida35k.org/p_654> sc:hasAffiliation "Université Libre De Bruxelles" .
<https://aida35k.org/p_654> sc:hasAffiliation-weight 0.66 .
<https://aida35k.org/p_654> sc:hasAffiliation-weight 0.33 .
Well. How do we tell which affiliation has weight 0.33?
A revised version with reification
• Imagine there are three authors Angelo, Francesco from The Open University and
Dimitris from the Université Libre De Bruxelles who co-author a paper.
• With reification:
@prefix sc: <http://aida.kmi.open.ac.uk/aida35k/ontology#>.
@prefix re: <https://aida35k.org/>
re:p_654 sc:hasEntityType sc:paper .
re:p_654 sc:hasAuthor re:angelo_salatino .
re:p_654 sc:hasAuthor re:francesco_osborne .
re:p_654 sc:hasAuthor re:dimitris_sacharidis .
re:p_654 sc:hasAffiliationDistribution re:AffiliationDistribution_p_654_open_university .
re:p_654 sc:hasAffiliationDistribution re:AffiliationDistribution_p_654_universite_libre_de_bruxelles .
re:AffiliationDistribution_p_654_open_university sc:hasAffiliation "The Open University" .
re:AffiliationDistribution_p_654_open_university sc:hasAffiliation-weight 0.66 .
re:AffiliationDistribution_p_654_universite_libre_de_bruxelles sc:hasAffiliation "Université Libre De Bruxelles" .
re:AffiliationDistribution_p_654_universite_libre_de_bruxelles sc:hasAffiliation-weight 0.33 .
Additional relationships from paper with reification
• hasCitationDistribution describes the
received citations. The reified object then
contains hasCitationYear and
hasCitationYear-weight identifying the
year and the percentage of total citations
received.
• hasCountryDistribution describes the
countries of the affiliations. Similar to
hasAffiliationDistribution
• hasGridTypeDistribution describes the
grid types of the paper. The reified object
contains hasGridType and hasGridType-
weight identifying the type and the
percentage of affiliations with such type.
Relationships from author
• hasPaper states the paper written by
the author
• hasNetworkInDistribution describes
the affiliation of authors. Similar to
hasAffiliationDistribution
• hasWorkedInDistribution describes
the countries of the affiliations. Similar
to hasCountryDistribution
How do we interact with such data?
Set up a Triple Store
GraphDB
GraphDB
GraphDB – Select file
GraphDB – Import (leave default values)
GraphDB – Write SPARQL query
Running SPARQL queries
• Describe
• Select papers by year
• Identify types
Running SPARQL queries
• Get all ‘industry’ papers and their affiliations
• Get 100 ‘academia’ papers and their affiliations
Running SPARQL queries
• Get papers written by Carnegie Mellon University
• Count papers written by United States researchers
Running SPARQL queries
• Count citation of a paper
• Count papers of a topic
Running SPARQL queries
• Get Journals containing the word ‘semantic’
• ASK
References
• Simone Angioni, Angelo Salatino, Francesco Osborne, Diego Reforgiato Recupero, and
Enrico Motta. Integrating Knowledge Graphs for Analysing Academia and Industry
Dynamics. Scientific Knowledge Graph Workshop at TPDL 2020.
• Simone Angioni, Angelo Salatino, Francesco Osborne, Diego Reforgiato Recupero, and
Enrico Motta. Integrating Knowledge Graphs for Comparing the Scientific Output of
Academia and Industry. In ISWC 2019 Posters & Demonstrations and Industry Tracks @
The Semantic Web – ISWC 2019, 26-30 October 2019, Auckland, New Zeland, CEUR
Workshop, 2019.
Francesco
Osborne
Angelo
Salatino
Simone
Angioni
Enrico
Motta
Diego Ref.
Recupero

Weitere ähnliche Inhalte

Was ist angesagt?

Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningPaul Groth
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
Oop principles a good book
Oop principles a good bookOop principles a good book
Oop principles a good booklahorisher
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphPaul Groth
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0Mathieu d'Aquin
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chainPaul Groth
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphSören Auer
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of DataPaul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
Trusted data and services from the GDML
Trusted data and services from the GDMLTrusted data and services from the GDML
Trusted data and services from the GDMLOlga Caprotti
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Sören Auer
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 

Was ist angesagt? (20)

Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
Oop principles a good book
Oop principles a good bookOop principles a good book
Oop principles a good book
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Trusted data and services from the GDML
Trusted data and services from the GDMLTrusted data and services from the GDML
Trusted data and services from the GDML
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 

Ähnlich wie Scientific Knowledge Graphs: an Overview

Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainAngelo Salatino
 
Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries? Robin Rice
 
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...DataScienceConferenc1
 
20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraph20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraphOpenAIRE
 
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)OpenAIRE
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarshiptsbbbu
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...CILIP MDG
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...EDINA, University of Edinburgh
 
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748Jyotindra Zaveri
 
Upgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureUpgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureBjörn Brembs
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryRobin Rice
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13DataDryad
 
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...OpenAIRE
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Paolo Manghi
 
Vellino presentationtocisti
Vellino presentationtocistiVellino presentationtocisti
Vellino presentationtocistiAndre Vellino
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional RepositoriesSridhar Gutam
 

Ähnlich wie Scientific Knowledge Graphs: an Overview (20)

Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
 
Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?
 
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
 
20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraph20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraph
 
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
 
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
 
Upgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureUpgrading the Scholarly Infrastructure
Upgrading the Scholarly Infrastructure
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
 
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
 
Vellino presentationtocisti
Vellino presentationtocistiVellino presentationtocisti
Vellino presentationtocisti
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional Repositories
 
Szomszor "Methods and Tools for Scholarly Data Analytics"
Szomszor "Methods and Tools for Scholarly Data Analytics"Szomszor "Methods and Tools for Scholarly Data Analytics"
Szomszor "Methods and Tools for Scholarly Data Analytics"
 

Mehr von Angelo Salatino

ResearchFlow: Understanding the Knowledge Flow between Academia and Industry
ResearchFlow: Understanding the Knowledge Flow between Academia and IndustryResearchFlow: Understanding the Knowledge Flow between Academia and Industry
ResearchFlow: Understanding the Knowledge Flow between Academia and IndustryAngelo Salatino
 
Early Detection of Research Trends [thesis defence]
Early Detection of Research Trends [thesis defence]Early Detection of Research Trends [thesis defence]
Early Detection of Research Trends [thesis defence]Angelo Salatino
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Angelo Salatino
 
AUGUR: Forecasting the Emergence of New Research Topics
AUGUR: Forecasting the Emergence of New Research TopicsAUGUR: Forecasting the Emergence of New Research Topics
AUGUR: Forecasting the Emergence of New Research TopicsAngelo Salatino
 
Detection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksDetection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksAngelo Salatino
 
Early Detection and Forecasting of Research Trends
Early Detection and Forecasting of Research TrendsEarly Detection and Forecasting of Research Trends
Early Detection and Forecasting of Research TrendsAngelo Salatino
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingAngelo Salatino
 

Mehr von Angelo Salatino (11)

ResearchFlow: Understanding the Knowledge Flow between Academia and Industry
ResearchFlow: Understanding the Knowledge Flow between Academia and IndustryResearchFlow: Understanding the Knowledge Flow between Academia and Industry
ResearchFlow: Understanding the Knowledge Flow between Academia and Industry
 
Early Detection of Research Trends [thesis defence]
Early Detection of Research Trends [thesis defence]Early Detection of Research Trends [thesis defence]
Early Detection of Research Trends [thesis defence]
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics
 
AUGUR: Forecasting the Emergence of New Research Topics
AUGUR: Forecasting the Emergence of New Research TopicsAUGUR: Forecasting the Emergence of New Research Topics
AUGUR: Forecasting the Emergence of New Research Topics
 
Detection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksDetection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic Networks
 
Early Detection and Forecasting of Research Trends
Early Detection and Forecasting of Research TrendsEarly Detection and Forecasting of Research Trends
Early Detection and Forecasting of Research Trends
 
Tesi Triennale Slide
Tesi Triennale SlideTesi Triennale Slide
Tesi Triennale Slide
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal Processing
 

Kürzlich hochgeladen

Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Silpa
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxSilpa
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxSilpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 

Kürzlich hochgeladen (20)

Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 

Scientific Knowledge Graphs: an Overview

  • 1. Scientific Knowledge Graphs: An Overview Dr Angelo Salatino Knowledge Media Institute The Open University United Kingdom Université Libre de Bruxelles - 12th May 2021
  • 2. About me – Angelo Salatino Research Associate and Associate Lecturer at the Open University Research Interests: i) new technologies for classifying scientific papers according to their relevant research topics, and ii) how the research output of academia fosters innovation in the industry At the SKM3 team we produce innovative approaches leveraging large-scale data mining, semantic technologies, machine learning, and visual analytics to extract meaning from scholarly data and shed light on the research dynamic angelo.salatino@open.ac.uk https://salatino.org @angelosalatino
  • 3. This work is licensed under a Creative Commons Attribution 4.0 International License.
  • 4. Agenda • Scientific Knowledge Graphs • AIDA • Use cases of AIDA • Practical tests
  • 5. Why do we need Scientific Knowledge Graphs?
  • 6. Science of Science Picture from the cover of Science Vol 361, Issue 6408
  • 7. Science of Science • Science of Science is a multidisciplinary field which helps us to understand in a quantitative fashion the evolution of science. • This is possible by capitalising on large amounts of data scientists produce nowadays: • Research articles • Pre-prints • Grant proposals • Patents • Spoiler: SKGs come quite handy for structuring and collecting such data
  • 10. Scholarly Data Improving Editorial Workflow and Metadata Quality at Springer Nature. Identifying the research topics that best describe the scope of a scientific publication is a crucial task for editors, in particular because the quality of these annotations determine how effectively users are able to discover the right content in online libraries. For this reason, Springer Nature, the world’s largest academic book publisher, has traditionally entrusted this task to their most expert editors. These editors manually analyse all new books, possibly including hundreds of chapters, and produce a list of the most relevant topics. Hence, this process has traditionally been very expensive, time-consuming, and confined to a few senior editors. For these reasons, back in 2016 we developed Smart Topic Miner (STM), an ontology- driven application that assists the Springer Nature editorial team in annotating the volumes of all books covering conference proceedings in Computer Science. Since then STM has been regularly used by editors in Germany, China, Brazil, India, and Japan, … Angelo Salatino Francesco Osborne Aliaksandr Birukou Enrico Motta The Open University Springer Nature The 18th International Semantic Web Conference (ISWC 2019) Affiliations Authors Citations References Conference/Journal Text: Title, Abstract Keywords Scholarly data, Bibliographic metadata, Topic classification, Topic detection, …
  • 11. Scholarly Data Improving Editorial Workflow and Metadata Quality at Springer Nature. Identifying the research topics that best describe the scope of a scientific publication is a crucial task for editors, in particular because the quality of these annotations determine how effectively users are able to discover the right content in online libraries. For this reason, Springer Nature, the world’s largest academic book publisher, has traditionally entrusted this task to their most expert editors. These editors manually analyse all new books, possibly including hundreds of chapters, and produce a list of the most relevant topics. Hence, this process has traditionally been very expensive, time-consuming, and confined to a few senior editors. For these reasons, back in 2016 we developed Smart Topic Miner (STM), an ontology- driven application that assists the Springer Nature editorial team in annotating the volumes of all books covering conference proceedings in Computer Science. Since then STM has been regularly used by editors in Germany, China, Brazil, India, and Japan, … Angelo Salatino Francesco Osborne Aliaksandr Birukou Enrico Motta mantic Web Conference (ISWC 2019) Authors Conference/Journal Text: Title, Abstract Keywords Scholarly data, Bibliographic metadata, Topic classification, Topic detection, … Keywords Topic Detection, Science Of Science, Topic classification, Semantic Web, … 23rd International Conference on Theory and Practice of Digital Libraries(TPDL 2019) Conference/Journal The 17th International Semantic Web Conference (ISWC 2018) Conference/Journal
  • 12. Scholarly Data <publish> Project B <co-op> Project A Other literature <$$$> <cite> Researcher Institution Funder Project Community Courtesy of Andrea Mannocci from “Big Scholarly Data and Applications”
  • 13. Scientific Knowledge Graph Scientific Knowledge Graphs (SKGs) are a way for representing scholarly knowledge in a structured, interlinked, and semantically rich manner.
  • 14. Scientific Knowledge Graph - Definition Given a set of entities E, and a set of relations R, a Scientific Knowledge Graph is a directed multi-relational graph G that comprises triples (subject, predicate, object) and is a subset of the cross product G ⊆ E ⨉ R ⨉ E. Nodes and edges have well-defined meanings
  • 15. Representation through Resource Description Framework RDF is a standard for data interchange that is used for representing highly interconnected data. Each RDF statement is a three-part structure consisting of resources where every resource is identified by a URI. Representing data in RDF allows information to be easily identified, disambiguated and interconnected by AI systems. Previous graph in RDF (NT format): <https://skg.org/paper_635219> <https://skg.org/sc#title> “Detection, Analysis, …”@en . <https://skg.org/paper_635219> <https://skg.org/sc#abstract> “Analysing rese …”@en . <https://skg.org/paper_635219> <https://skg.org/sc#has_keyword> “Scholarly Communication”@en . <https://skg.org/paper_635219> <https://skg.org/sc#type> <https://skg.org/sc#paper> . <https://skg.org/paper_635219> <https://skg.org/sc#has_author> <https://skg.org/angelo_salatino> . <https://skg.org/paper_635219> <https://skg.org/sc#has_author> <https://skg.org/francesco_osborne> . <https://skg.org/paper_635219> <https://skg.org/sc#has_author> <https://skg.org/andrea_mannocci> . <https://skg.org/angelo_salatino> <https://skg.org/sc#has_affiliation> <https://skg.org/open_university> . <https://skg.org/francesco_osborne> <https://skg.org/sc#has_affiliation> <https://skg.org/open_university> . <https://skg.org/andrea_mannocci> <https://skg.org/sc#has_affiliation> <https://skg.org/italian_research_council> .
  • 16. However … Not all SKGs are published in RDF. E.g. sample from Dimensions { "format":3, "status":"active", "id":"pub.1009237776", "publication_type":"article", "doi":"10.1093/bybil/49.1.221", "version_of_record":"https://doi.org/10.1093/bybil/49.1.221", "pmid":"2650905", "pmcid":"5381240", "title":"Does Penicillin Kill Bacteria?", "year":2017, "concepts":{ "structure":0.3, "fire":0.34, "case study":0.01, "serviceability":0.6, "damage":0.12, "residual mechanical properties":0.67 }, "publication_date":"2017-12-25", "volume":"32", "issue":"4", "pages":"330-333", …. "journal":{ "id":"jour.1138253", "title":"The Journal of Clinical Evidence", "issn":"0068-2691", "eissn":"2044-9437" }, "publisher":{ "id":"pblshr.1001577", "name":"Radiological Society of North America (RSNA)" }, "journal_lists":[ "ERA 2015", "Norwegian register level 2", "PubMed" ], "clinical_trials":[ "NCT00605345" ], "open_access_categories":[ "closed" ], "author_affiliations":[{ "first_name":"Ian", "last_name":"Bobbington", "grid_ids":["grid.5335.0", "grid.1001.0"]}, ….
  • 17. Big Scholarly Datasets • Web of Science • Scopus • Google Scholar • Microsoft Academic Graph • MA-KG, ma-graph.org • PubMed • Dimensions • Semantic Scholar • DBLP • Open Academic Graph • ScholarlyData • PID Graph • Open Research Knowledge Graph • OpenCitations • OpenAIRE research graph • Crossref • Academy/Industry Dynamics KG (AIDA) Disclaimer: this is far from being exhaustive
  • 18. Differences between datasets All these datasets are different from each other: • size • scope • quality • mistakes, author disambiguation • index vs. scraping • comprehensiveness • integration with other sources • format • access to data: license “The comparison considers all scientific documents from the period 2008–2017 covered by these data sources.” Picture from Martijn Visser, Nees Jan van Eck, and Ludo Waltman. "Large-scale comparison of bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic." (2021).
  • 19. Differences between datasets All these datasets are different from each other: • size • scope • quality • mistakes, author disambiguation • index vs. scraping • comprehensiveness • integration with other sources • format • access to data: license • MAG, Dimensions, Scopus, WOS cover all areas of Science • DBLP covers Computer Science • PubMed covers the field of Medicine • Semantic Scholar covers Computer Science and Medicine
  • 20. Differences between datasets All these datasets are different from each other: • size • scope • quality • mistakes, author disambiguation • index vs. scraping • comprehensiveness • integration with other sources • format • access to data: license • Cleaning data. • Removing duplicates (same document can appear in multiple places on the web) • Disambiguating authors • Disambiguating affiliations • Disambiguating references WoS > Scopus > MAG
  • 21. Example of paper formats There are automatic tools like GROBID that with time get better and better in extracting metadata, but still it makes errors
  • 22. Differences between datasets All these datasets are different from each other: • size • scope • quality • mistakes, author disambiguation • index vs. scraping • comprehensiveness • integration with other sources • format • access to data: license • Web Of Science, Scopus, PubMed index paper: • Papers are added to the collection in a controlled way (metadata are curated) • MAG, Dimensions, Google Scholar scrape the web (journal websites) and PDFs. • This also leads to other quality issues like identifying the correct metadata
  • 23. Differences between datasets All these datasets are different from each other: • size • scope • quality • mistakes, author disambiguation • index vs. scraping • comprehensiveness • integration with other sources • format • access to data: license MAG Dimensions Crossref Scopus WOS Semantic Scholar OpenAIRE RG OpenAG PubMed title x x x x x x x x x id x x x x x x x x x abstract x p x x x x x x x year x x x x x x x x x references x x x x x x x x authors x x x x x x x x x doi x x x x x x x x x topics x x x x x citationcount x x x x x x x conferences x x x x p x x journal x x x x x p x x x authors keywords x x Legend x = ok p = partial information
  • 24. Differences between datasets All these datasets are different from each other: • size • scope • quality • mistakes, author disambiguation • index vs. scraping • comprehensiveness • integration with other sources • format • access to data: license Description New Scopus RIS tag Abbreviated source title J2 Abstract AB Affiliations AD Article number C7 Article title TI Authors AU Chemical name and CAS registry number N1 Cited by count N1 Conference Code N1 CODEN N1 Conference name T2 Correspondence name N1 Conference date Y2 DOI DO Editor A2 End tag ER Export date N1 First page SP Funding Details N1 ISSN/ISBN/EISSN SN Issue IS Keywords KW Language LA Last page EP Conference Location CY Manufacturers N1 PMID/PMCID C2 Proceedings title C3 Publication year PY Publisher PB References N1 Scopus database DB Scopus URL UR Second article title ST Sequence database accession number N1 Source title T2 Source type TY Document type M3 Conference sponsors A4 Scopus TAGS
  • 25. Differences between datasets All these datasets are different from each other: • size • scope • quality • mistakes, author disambiguation • index vs. scraping • comprehensiveness • integration with other sources • format • access to data: license Open Academic Graph integrates MAG and Aminer OpenAIRE research graph integrates Crossref, Unpaywall, ORCID and MAG AIDA integrates MAG, GRID, DBPEDIA, CSO
  • 26. Differences between datasets All these datasets are different from each other: • size • scope • quality • mistakes, author disambiguation • index vs. scraping • comprehensiveness • integration with other sources • format • access to data: license • Dimension, PubMed, Semantic Scholar, Crossref are distributed in JSON • WOS, MAG in TSV files (MAKG as RDF) • AIDA, ScholarlyData, OpenCitations in RDF • DBLP in XML
  • 27. Differences between datasets All these datasets are different from each other: • size • scope • quality • mistakes, author disambiguation • index vs. scraping • comprehensiveness • integration with other sources • format • access to data: license Some datasets are available for free: • Semantic Scholar [ODC-BY] • Dimensions.ai (if you are a scholar) Manageable fee • Mag (50$ for downloading it) [ODC-BY] Costly • Scopus • Web Of Science Not available to buy • Google Scholar
  • 29. Academia/Industry DynAmics (AIDA) Knowledge Graph • 14M papers and 8M patents annotated with research topics from the Computer Science Ontology (CSO) • 4M papers and 5M patents classified according to the type of the author’s affiliations (academia, industry, or collaborative) and 66 industrial sectors from Industrial Sectors Ontology (INDUSO) • Released as an RDF Graph and available via SPARQL or as a dump http://w3id.org/aida
  • 30. AIDA pipeline Research Papers Patents Academia/Industry DynAmics (AIDA) Knowledge Graph AIDA Schema INDUSO.ttl Computer Science Ontology Filtering documents Filtering documents CSO Classifier Extraction of: - affiliation types - industry sectors RDF Generator
  • 31. AIDA pipeline Research Papers Patents Academia/Industry DynAmics (AIDA) Knowledge Graph AIDA Schema INDUSO.ttl Computer Science Ontology Filtering documents Filtering documents CSO Classifier Extraction of: - affiliation types - industry sectors RDF Generator
  • 32. AIDA pipeline Research Papers Patents Academia/Industry DynAmics (AIDA) Knowledge Graph AIDA Schema INDUSO.ttl Computer Science Ontology Filtering documents Filtering documents CSO Classifier Extraction of: - affiliation types - industry sectors RDF Generator pip install cso-classifier Salatino, A.A., Osborne, F., Thanapalasingam, T., Motta, E.: The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles. In: TPDL 2019: 23rd International Conference on Theory and Practice of Digital Libraries. Springer.
  • 33. CSO Classifier Uses state-of-the-art technologies to parse documents and recognise research concepts/topics. As input, it takes the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the Computer Science Ontology Salatino, Angelo A., et al. "The CSO classifier: Ontology-driven detection of research topics in scholarly articles." International Conference on Theory and Practice of Digital Libraries. Springer, Cham, 2019.
  • 34. Syntactic Module • We split the text in unigrams, bigrams and trigrams • For each n-gram we measure the Levenshtein similarity with the topics in CSO • We select CSO topics having similarity above or equal to 0.94 with n-grams • Helps handling plurals, hyphenated topics, and American vs. British spelling such as: • “knowledge based systems” and “knowledge-based systems” • “database” and “databases” • “data visualisation” and “data visualization”
  • 35. Semantic Module • We used a Word Embedding model to capture semantics of words. • We process the documents • for each relevant word • we retrieve from the model its related words • then we check if those words are in the Computer Science Ontology.
  • 36. Word Embedding model “king” = [0.32, 0.76,…] “queen” = [0.42, 0.76,…] “woman” = [0.56, 0.43,…] “man” = [0.59, 0.42,...] king + (woman – man) = queen It locates synonyms (related topics) close to each other in this vector space: high cosine similarity
  • 37. Semantic Module Word Embedding model • We used titles and abstracts from 4.5M papers in Computer Science • Pre-processed text: • Topic replacement – “digital libraries” → “digital_libraries” • Collocation analysis – “highest_accuracies”, “highly_cited_journals” • Trained word embeddings model (word2vec) method skipgram emb. size 128 window size 10 negative 5 max iter. 5 min-count cutoff 10
  • 38. Semantic Module Entity Extraction • POS tagger, and grammar-based chunk parser <JJ.*>*<NN.*>+ “digital libraries” CSO concept identification • Selects all CSO topics found in the top-10 similar words of the resulting n-grams (with cosine similarity > 0.7)
  • 39. Semantic Module Concept ranking • We assign a score to each identified topic: • Frequency – number of times it was inferred • Diversity – number of unique text chunks from which it was inferred Concept Selection • Elbow method CSO Topic score domain ontologies 40 semantic web 40 ontology learning 40 data mining 40 heterogeneous resources 24 semantics 24 world wide web 10 network architecture 6 scholarly communication 6 ontology matching 6 … …
  • 40. Post Processing Combination of output Semantic enhancement • We use the superTopicOf to enhance the output set • E.g., if “machine learning” then also “artificial intelligence” • Provides wider context for the analysed paper • Enables analytics on high-level abstract topics (e.g., digital libraries)
  • 41. AIDA pipeline Research Papers Patents Academia/Industry DynAmics (AIDA) Knowledge Graph AIDA Schema INDUSO.ttl Computer Science Ontology Filtering documents Filtering documents CSO Classifier Extraction of: - affiliation types - industry sectors RDF Generator
  • 42. AIDA pipeline Research Papers Patents Academia/Industry DynAmics (AIDA) Knowledge Graph AIDA Schema INDUSO.ttl Computer Science Ontology Filtering documents Filtering documents CSO Classifier Extraction of: - affiliation types - industry sectors RDF Generator
  • 43. Scholarly Data++ Improving Editorial Workflow and Metadata Quality at Springer Nature. Identifying the research topics that best describe the scope of a scientific publication is a crucial task for editors, in particular because the quality of these annotations determine how effectively users are able to discover the right content in online libraries. For this reason, Springer Nature, the world’s largest academic book publisher, has traditionally entrusted this task to their most expert editors. These editors manually analyse all new books, possibly including hundreds of chapters, and produce a list of the most relevant topics. Hence, this process has traditionally been very expensive, time-consuming, and confined to a few senior editors. For these reasons, back in 2016 we developed Smart Topic Miner (STM), an ontology- driven application that assists the Springer Nature editorial team in annotating the volumes of all books covering conference proceedings in Computer Science. Since then STM has been regularly used by editors in Germany, China, Brazil, India, and Japan, … Angelo Salatino Francesco Osborne Aliaksandr Birukou Enrico Motta The Open University Springer Nature The 18th International Semantic Web Conference (ISWC 2019) Affiliations Authors Citations References Conference/Journal Text: Title, Abstract, Keywords scholarly data, semantic web, data mining, ontology, digital libraries, … Topics Affiliation Types Academia Industry Keywords Scholarly data, Bibliographic metadata, Topic classification, Industrial Sectors Publishing
  • 44. What can we do with it?
  • 45. Research Flow: Understanding the Knowledge Flow between Academia and Industry Each research topic is represented through 4 signals: Papers from Academia (RA) Papers from Industry (RI) Patents from Academia (PA) Patents from Industry (PI) A. Salatino, F. Osborne, E. Motta. ResearchFlow: Understanding the Knowledge Flow between Academia and Industry. In Knowledge Engineering and Knowledge Management – 22nd International Conference, EKAW 2020, Springer, 2020
  • 46. Diachronic analysis of topics • First, we normalized all signals according to the ones associated to the main topic Computer Science • We devised two indices: RP and AI 𝑅𝑃! = 𝑅! − 𝑃! 𝑅! + 𝑃! ; 𝐴𝐼! = 𝐴! − 𝐼! 𝐴! + 𝐼! • We performed a global analysis in 2007-18 • Topic evolution: we split the time period split in 4 windows of 3 years each, computed RP and AI, and used the slope 𝛼 of the line 𝑓 𝑥 = 𝛼 . 𝑥 + 𝛽 to assess its evolution
  • 47. Diachronic analysis of topics Distribution of topics according to RP and AI in 2007-18
  • 48. Topic evolution in 2007-2018 - examples
  • 49. Forecasting Topic Impact on Industry • We created a new approach for predicting the impact of a topic on industry. • It uses four temporal time-series: i) publications from academia, ii) publications from industry, iii) patents from academia, and iv) patents from industry. • We tested it on the task of predicting if an emergent research topic will have a significant impact on industry (> 50 patents) in the following 10 years. • This evaluation substantiates the hypothesis that considering the four timeseries separately is conducive to higher quality predictions and suggests that RI and RA are good indicators for PI.
  • 50. Machine Learning approach We used: • Logistic Regression (LR) • Random Forest (RF) • AdaBoost (AB) • Convoluted Neural Network (CNN) • Long Short-term Memory Neural Network (LSTM) On several combinations of time-series: RA, RI, PA and PI
  • 53. Conference Dashboard Angioni, Simone, et al. "The AIDA Dashboard: Analysing Conferences with Semantic Technologies."
  • 62. Let’s get our hands dirty
  • 63. AIDA35K – A similar but not-so-similar version of AIDA Download: http://aida.kmi.open.ac.uk/aida35k/downloads/aida35k.ttl.zip
  • 64. AIDA35K – Stats • Contains 35 thousand papers in the field of Semantic Web and Neural Networks • 249,969 facts (triples) • 26 different relationships Download: http://aida.kmi.open.ac.uk/aida35k/downloads/aida35k.ttl.zip
  • 66. Relationships from paper • hasAuthor states the author of the paper • hasConfName and hasConfSeries provide details about the conference: “The 21st World Wide Web Conference” and “webconf” • hasCsoEnhancedTopic, topics extracted with the CSO Classifier • hasEntityType defines the type of document “paper” • hasJourName states the name of the journal • hasReference points to all referenced papers • hasType defines whether the paper is from academia, industry of collaborative • hasIndustrialSector, if a paper is industrial it describes the company industrial sector • hasYear states the publishing year
  • 67. Additional relationships from paper with reification • hasAffiliationDistribution describes the affiliation of authors. The object of this relationship is another statement: reified object. • This reified object then contains hasAffiliation and hasAffiliation-weight identifying the affiliation of the paper and the percentage of authors belonging to that affiliation.
  • 68. To better understand reification • Imagine there are three authors Angelo, Francesco from The Open University and Dimitris from the Université Libre De Bruxelles who co-author a paper. • In simple RDF: @prefix sc: <http://aida.kmi.open.ac.uk/aida35k/ontology#>. <https://aida35k.org/p_654> sc:hasEntityType sc:paper . <https://aida35k.org/p_654> sc:hasAuthor <https://aida35k.org/angelo_salatino> . <https://aida35k.org/p_654> sc:hasAuthor <https://aida35k.org/francesco_osborne> . <https://aida35k.org/p_654> sc:hasAuthor <https://aida35k.org/dimitris_sacharidis> . <https://aida35k.org/p_654> sc:hasAffiliation "The Open University" . <https://aida35k.org/p_654> sc:hasAffiliation "Université Libre De Bruxelles" . <https://aida35k.org/p_654> sc:hasAffiliation-weight 0.66 . <https://aida35k.org/p_654> sc:hasAffiliation-weight 0.33 . Well. How do we tell which affiliation has weight 0.33?
  • 69. A revised version with reification • Imagine there are three authors Angelo, Francesco from The Open University and Dimitris from the Université Libre De Bruxelles who co-author a paper. • With reification: @prefix sc: <http://aida.kmi.open.ac.uk/aida35k/ontology#>. @prefix re: <https://aida35k.org/> re:p_654 sc:hasEntityType sc:paper . re:p_654 sc:hasAuthor re:angelo_salatino . re:p_654 sc:hasAuthor re:francesco_osborne . re:p_654 sc:hasAuthor re:dimitris_sacharidis . re:p_654 sc:hasAffiliationDistribution re:AffiliationDistribution_p_654_open_university . re:p_654 sc:hasAffiliationDistribution re:AffiliationDistribution_p_654_universite_libre_de_bruxelles . re:AffiliationDistribution_p_654_open_university sc:hasAffiliation "The Open University" . re:AffiliationDistribution_p_654_open_university sc:hasAffiliation-weight 0.66 . re:AffiliationDistribution_p_654_universite_libre_de_bruxelles sc:hasAffiliation "Université Libre De Bruxelles" . re:AffiliationDistribution_p_654_universite_libre_de_bruxelles sc:hasAffiliation-weight 0.33 .
  • 70. Additional relationships from paper with reification • hasCitationDistribution describes the received citations. The reified object then contains hasCitationYear and hasCitationYear-weight identifying the year and the percentage of total citations received. • hasCountryDistribution describes the countries of the affiliations. Similar to hasAffiliationDistribution • hasGridTypeDistribution describes the grid types of the paper. The reified object contains hasGridType and hasGridType- weight identifying the type and the percentage of affiliations with such type.
  • 71. Relationships from author • hasPaper states the paper written by the author • hasNetworkInDistribution describes the affiliation of authors. Similar to hasAffiliationDistribution • hasWorkedInDistribution describes the countries of the affiliations. Similar to hasCountryDistribution
  • 72. How do we interact with such data?
  • 73. Set up a Triple Store
  • 77. GraphDB – Import (leave default values)
  • 78. GraphDB – Write SPARQL query
  • 79. Running SPARQL queries • Describe • Select papers by year • Identify types
  • 80. Running SPARQL queries • Get all ‘industry’ papers and their affiliations • Get 100 ‘academia’ papers and their affiliations
  • 81. Running SPARQL queries • Get papers written by Carnegie Mellon University • Count papers written by United States researchers
  • 82. Running SPARQL queries • Count citation of a paper • Count papers of a topic
  • 83. Running SPARQL queries • Get Journals containing the word ‘semantic’ • ASK
  • 84. References • Simone Angioni, Angelo Salatino, Francesco Osborne, Diego Reforgiato Recupero, and Enrico Motta. Integrating Knowledge Graphs for Analysing Academia and Industry Dynamics. Scientific Knowledge Graph Workshop at TPDL 2020. • Simone Angioni, Angelo Salatino, Francesco Osborne, Diego Reforgiato Recupero, and Enrico Motta. Integrating Knowledge Graphs for Comparing the Scientific Output of Academia and Industry. In ISWC 2019 Posters & Demonstrations and Industry Tracks @ The Semantic Web – ISWC 2019, 26-30 October 2019, Auckland, New Zeland, CEUR Workshop, 2019. Francesco Osborne Angelo Salatino Simone Angioni Enrico Motta Diego Ref. Recupero