SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
[Unclear] words are denoted in brackets
Webinar: Tracking Research Data Footprints via
Integration with Research Graph
1 March 2018
Video & slides available from ANDS website
START OF TRANSCRIPT
Facilitator: Good afternoon everyone, thanks for coming to the webinar today. We
have a talk today on the topic of tracking the footprint of research data
across infrastructures, using the Research Graph API. The speakers
today are Doctor Ben Evans from NCI, Associate Director of NCI, and
Doctor Jingbo Wang who's a Collection Manager in NCI. So with that
introduction, I'll actually hand over the talk to Ben for starting the talk.
Ben Evans: So we're going to be talking about work that's going on to help track
research data and how it's used in a broader setting. I should mention,
NCI's got a lot of partners as a part of this that have been backing and
worked with us in this, including from NCRIS and Bureau of
Meteorology, Geoscience Australia, CSIRO, the ANU and a host of
other partners and collaborators, including ANDS in particular, for this
work.
So some of the open questions, motivating questions, beyond just
getting data management in place is - so say you publish data and
datasets, is how is the research community actually connecting with
that data? After you've put it into a public arena they could be
connecting with it in various ways and making [use], so how do you
track that? Also, how do you track the impact of that investment of that
Page 2 of 8
research data for other derived products downstream? So that's a
challenging question that we can't answer fully with inside] of a single
centre; you're really into an international world. That motivated us a lot
to be working on this particular project which has part of the solution.
So I should say that the standing of this work and this piece of
infrastructure that we'll be going through on Research Graph started
with a fairly small partnership. But now it's grown quite a bit and RDA,
Research Data Alliance, have picked it up as this Registry
Interoperability Working Group. It's got a number of players. You can
see some of the players who've been strongly supporting this work
over a period of time listed there and you can follow that link on RD
Alliance website to track this. But, furthermore, now really through
Amir's good work and others, the European Commission have picked
this up and said, yes, this needs to now be pushed into an ICT
specification. So all that is to say that this work is now on a pretty
strong pathway and well worth paying attention to now as it goes
forward.
So there's four types of what we call nodes in this graph network when
you're publishing data and using data. So one is the researcher, one's
the dataset, one's the publication, one's grants. There could be other
nodes as well, but the status of these whole graphs at the moment is
basically built up of those fundamental areas. When we get down to it
inside of the tool, you can see the attributes through that graphic on
the right-hand side. Research is always in green and datasets are in
orange, publication is blue and grants in yellow. You can see some of
the attributes that are listed there and we'll talk about that.
The other thing is that this graph network that's been built up
understands very well-known metadata standards like ISO 19115-4;
that's geospatial data, a lot of geospatial data fits into that. But also
things like RIF-CS that's used in the librarian world, and inside of
Research Data Australia - if you know that catalogue - uses RIF-CS,
and MARC 21 and there are others as well. So just to say that this
graph system is already supporting that framework.
Page 3 of 8
For NCI, we make a number of major national reference datasets
available on NCI. We've curated them and put them into a certain
form. They come, in principle, from a lot of the science agencies,
being Bureau of Meteorology and Geoscience Australia and so forth,
also sometimes from our research community itself. But they're being
classified as really the major national reference collections that are
associated with NCI. You can see some of the things listed there,
climate, weather and satellite imagery, bathymetry, elevation, all of
these earth systems, geospatial data in particular.
As an example of a dataset now is - so we've got this thing called
Bluelink ReANalysis dataset. On the left-hand side it gives you a
summary of what it is. On the right-hand side many people are familiar
and work with catalogue systems, so we're using GeoNetwork as part
of our core catalogue system. So you get the title, so that's the blue -
you can see on the right-hand side it's circled there and an abstract
about it. You can see points of contact. So this is all part of this ISO
19115 standard, that's how all of this is recorded, how to get hold of
that data.
So the question that you've got off something like this is what
researchers are working on that, or related datasets, how they're
publishing, is there anything else connected to it. So you end up with
this little graph of stuff. Just down on the bottom right-hand side here,
just off this basic diagram here, you can see [Peter Oak], who's the
main contact for that dataset, is somehow associated with this BRAN -
Bluelink ReANalysis - dataset. So they're somehow associated with
that even off our local information. So you can find out a little bit more
about Peter. We have other information systems that have got Peter's
details, so what project he's working on, publications somehow linked
to him, his contact detail and a pretty picture there of Peter looking
very spritely.
So we have that information in NCI. So on the left-hand side, in this
dotted line that you can see with the NCI logo around it, we know a
fair bit about Peter, that's the number one with the green, there he is.
Page 4 of 8
There he is with his - as a researcher and an identity and attributes
inside of our local information. We know various things about datasets
that Peter is associated with. But there's other things that live outside
of NIC. In particular, on the right-hand side there, you can say out in
the real world, or out in the external world, Peter Oak has what's
called an ORCID ID, and many of you know this. Inside of - associated
with his ORCID ID we know things about his publication record.
So the trick for all of this stuff is to try and associate our internal
information to the external information. There's a number of steps that
we go through here. Number one, let's have the information recorded
inside of a little graph that we'll go through in a second. Then we can
augment the graph with how it gets connected up with the ORCID ID.
Then we can find out further information, in particular about other
external records like his publication record.
So almost redescribing this same [step] is, in a fundamental way what
we do is we've got a GeoNetwork catalogue with a lot of this
information; that is via the utilities in the Research Graph system.
Harvest that and puts it into a Neo4j, which is a type of a graph
database, just the one that we happen to be using for this. That Neo4j
is just hosted inside of the cloud. That has our information, it's just a
recasting of the local information and put inside of this system. Then
what we do is go out into a broader Research Graph on the outside
world, and we augment then the local graph database with that extra
information.
Then we can visualise it in various ways. So that's what this image -
and there is a graphical tool that comes along with this, to start seeing
a whole bunch of connected things to do with this data that can start
to be exploited. So if we just had the local information of various
datasets, then all we would have is the left-hand side of this. Through
that extra augmentation, going and querying in the international
Research Graph and then augmenting for the local data, we end up
with a much richer set of information about what each of the individual
Page 5 of 8
datasets and researchers and what they're doing and their
associations. So that's pretty simply what's going on.
The Research Graph system that's been put in place really by the
partners, and particularly Amir driving this, interoperates with a whole
bunch of different services; ORCID, DataCite, Skolix has come on
board, and other major datacentres like [ASIS] and so on and so forth.
So there's a list there, and a growing list, of information being put into
an interoperable graph system. So now there's richer and deeper
details that we can start harvesting. There's actually - we did the
simplest augmentation, is the description on this previous page. But,
actually, you can run several levels of augmentation and we're still I
guess trying to explore what's the best way of augmenting the data of
the questions that we're trying to face.
So, look, I'm going to hand over now to Jingbo who's going to take us
a little bit more through some of the details of Research Graph and
where it's going.
Jingbo Wang: Thank you, Ben. Hi, from this point of time I wanted to go through a
couple of slides, in the next 10 minutes or so, to demonstrate how we
implement the Research Graph [pack line]. Also, report what are we
currently working on, plus some future plans going forward. So in this
slide, it shows you what is the input and what is the output. The input
is NCI's metadata database. As you see in the previous slides by Ben,
our dataset available in GeoNetwork in various formats - it could be
CSV or XML or JSON - they are the input so that Jenkins server take
that input from the [data hub] and build the NCI graph. So the output
will be NCI graph.
On the right-hand side, the bottom screenshot just shows you how
easy to maintain and update the database with only one click of the
button. The five different modules, in green colour, shows you the
step-by-step inside of the Jenkins server to build the NCI graph and
also augmentation with other database such as a geo - [ORCID]. So
what we get eventually is an NCI graph [ML]. There are different ways
Page 6 of 8
to visualise the graph. One way, which was not presented here, is we
can use the [GAVI] software to visualise. But a more popular way
would be we present our graph in a web-based format.
So if you click that link or type this link in your browser, you can
actually see this is online. I'm going to show you three screenshots on
this webpage, followed by a little live demo afterwards. Basically, this
is the interesting part, once we get the graph and we're going to
analyse the graph and try to tell the story from the graph. The first
screenshot just really gives you an overview of how many publications
in our augmented graph and how many datasets and how many
researchers here. I'm going to run a little live demo to repeat the story
that Ben told you about Peter Oak. If you type this,
researchgraph.org/NCI.
Jingbo Wang: Alright, in the web browser you can see a webpage about NCI's
graph. Click that orange button, it'll open a new tab to show the graph.
This is the actual graph look like. If I find Peter Oak as a researcher
and click that one, it only shows the connection with this researcher.
The colour code of the dot is that this is the dataset which is the
Bluelink ReANalysis data associated with Peter Oak. If you notice,
there is another green dot over here and this is the augmented part
from ORCID. The blue dot represents the publication associated with
this researcher. So this really demonstrates that, through the
augmentation, our own database with the dataset and researcher are
connected to the rest of the world.
Let me go back to my presentation again. I should say that we did
play around with the different analytics and this is the most interesting
part. We demonstrate a few cases that we think people are interested.
For example, what is the most publication related to a researcher, and
this researcher is always identified with the ORCID ID. Also, which
researcher has the most dataset associated with him, with his
affiliation. On the right-hand side, if you are still with the web browser,
Page 7 of 8
you can actually put your mouse onto some of the name. It will only
show the connections between this researcher and other researchers.
So it's more like an interactive mode.
I should also say that this augmentation is still work in progress. It
means that we can augment with other databases, such as DataCite
or other European data repository, and we can actually make our
graph bigger and bigger. The last screenshot is just showing the
number of publications along the year. As I said, this is not a static
graph because we can always augment with other database and we
can introduce more publication if it is not in the ORCID database. So
behind the scene we use the Jupyter Notebook to generate this web
interactive format. We plan to play around more by providing maybe
predefined query, so that people can put the person's name on
ORCID, find out what is the connection between this researcher and
the publication and the dataset and, in the future, even the grants if it's
available in our database.
So next is we think that Research Graph can be useful for a number
of different groups of people. We think also providing Research Graph
in the linked-data format would be beneficial for people who want to
work with more machine-searchable and actionable approach. So
what we've done is we did a bit of proof-concept work by extending
our current format of the Research Graph in JSON to JSON-LD, using
schema.org to enhance the schematic feature of the Research Graph.
We have a publication last year talking about the approach and the
ideas, so the reference is at the bottom of the slide.
The other thing is, once we build the Research Graph there are a lot
of interesting analysis that we can do. So we are currently exploring
the new ways of analysing the information in the Research Graph and
trying to pick up the good stories about what Research Graph can tell
us. The other thing is, because we are the national data repository we
actually encourage people to do the cross-disciplinary research based
on our high-performance platform. If we can demonstrate the value of
[cross-system] and disciplinary research, by showing that when
Page 8 of 8
different type of dataset available on the same platform, more
research, more publication and more funding was granted, it will be
quite good to demonstrate the impact of our data management
practice.
So in summary, I think Research Graph really means a couple of
things for a different group of user. For example, for a user itself of the
data repository, they can understand the dynamic research integration
through these analytics. I remember when some researcher submit an
ARC grant, they sometimes show their publication citation along the
year being increasingly better and better. But with the Research
Graph they can actually show more information, not just publication
but also their contribution of the dataset and their award on other
additional funding using the Research Graph.
For the higher-level executive and board, as a data repository we can
demonstrate the value of our good data management practice and
provide the interoperability of the data services through these more
advanced services. We also advance the science research by having
more publication and more impact in the matrix. Finally, for the funding
body, since they invested a good amount of money for the data
repository, we can demonstrate the impact of the investment on the
data repository by showing the quantitative analysis of the impact
matrix within the research community.
So if you want to learn more about the graph, we have the GitHub
source code and we also have the interactive demo of the graph, and
there is Twitter also if you wanted to socialise it. I think that's it.
Facilitator: Okay, thanks Jingbo. I'd like to thank Ben and Jingbo for giving this
talk and thank you, everyone, for attending the webinar. Thank you.
END OF TRANSCRIPT

Weitere ähnliche Inhalte

Ähnlich wie Transcript - Tracking Research Data Footprints via Integration with Research Graph

Transcript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audioTranscript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audioARDC
 
Transcript FAIR 3 -I-for-interoperable-13-9-17
Transcript FAIR 3 -I-for-interoperable-13-9-17Transcript FAIR 3 -I-for-interoperable-13-9-17
Transcript FAIR 3 -I-for-interoperable-13-9-17ARDC
 
2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open data2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open dataPeterWinstanley1
 
Transcript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literatureTranscript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literatureARDC
 
20170313 mr - gss presentation
20170313   mr - gss presentation20170313   mr - gss presentation
20170313 mr - gss presentationMichael Rose
 
Transcript - Trusted Data Repositories - 13 March 2018
Transcript -  Trusted Data Repositories - 13 March 2018Transcript -  Trusted Data Repositories - 13 March 2018
Transcript - Trusted Data Repositories - 13 March 2018ARDC
 
The Structured Data Hub in 2019
The Structured Data Hub in 2019The Structured Data Hub in 2019
The Structured Data Hub in 2019Richard Zijdeman
 
Visual Data Collection - Mike Morgan - REcon 18
Visual Data Collection - Mike Morgan - REcon 18Visual Data Collection - Mike Morgan - REcon 18
Visual Data Collection - Mike Morgan - REcon 18UX INXS
 
Orcid for funders_audio transcription
Orcid for funders_audio transcriptionOrcid for funders_audio transcription
Orcid for funders_audio transcriptionARDC
 
Transcript _Rise of drones in Australian research space
Transcript _Rise of drones in Australian research spaceTranscript _Rise of drones in Australian research space
Transcript _Rise of drones in Australian research spaceARDC
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchersDirk Roorda
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Carlo Vaccari
 
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...Neo4j
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?Tuan Yang
 
Libraries & Tech for Good, 11 July 2016 (with notes)
Libraries & Tech for Good, 11 July 2016 (with notes)Libraries & Tech for Good, 11 July 2016 (with notes)
Libraries & Tech for Good, 11 July 2016 (with notes)George Oates
 
Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presenta...
Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presenta...Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presenta...
Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presenta...Jessica Breiman
 
Osgis 10 arnulf-christl
Osgis 10 arnulf-christlOsgis 10 arnulf-christl
Osgis 10 arnulf-christlArnulf Christl
 
Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...
Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...
Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...Dana Gardner
 

Ähnlich wie Transcript - Tracking Research Data Footprints via Integration with Research Graph (20)

Transcript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audioTranscript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audio
 
Transcript FAIR 3 -I-for-interoperable-13-9-17
Transcript FAIR 3 -I-for-interoperable-13-9-17Transcript FAIR 3 -I-for-interoperable-13-9-17
Transcript FAIR 3 -I-for-interoperable-13-9-17
 
2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open data2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open data
 
Transcript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literatureTranscript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literature
 
20170313 mr - gss presentation
20170313   mr - gss presentation20170313   mr - gss presentation
20170313 mr - gss presentation
 
Transcript - Trusted Data Repositories - 13 March 2018
Transcript -  Trusted Data Repositories - 13 March 2018Transcript -  Trusted Data Repositories - 13 March 2018
Transcript - Trusted Data Repositories - 13 March 2018
 
The Structured Data Hub in 2019
The Structured Data Hub in 2019The Structured Data Hub in 2019
The Structured Data Hub in 2019
 
lec1_ref.pdf
lec1_ref.pdflec1_ref.pdf
lec1_ref.pdf
 
Visual Data Collection - Mike Morgan - REcon 18
Visual Data Collection - Mike Morgan - REcon 18Visual Data Collection - Mike Morgan - REcon 18
Visual Data Collection - Mike Morgan - REcon 18
 
Orcid for funders_audio transcription
Orcid for funders_audio transcriptionOrcid for funders_audio transcription
Orcid for funders_audio transcription
 
Transcript _Rise of drones in Australian research space
Transcript _Rise of drones in Australian research spaceTranscript _Rise of drones in Australian research space
Transcript _Rise of drones in Australian research space
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchers
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8
 
Database Essay
Database EssayDatabase Essay
Database Essay
 
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
 
Libraries & Tech for Good, 11 July 2016 (with notes)
Libraries & Tech for Good, 11 July 2016 (with notes)Libraries & Tech for Good, 11 July 2016 (with notes)
Libraries & Tech for Good, 11 July 2016 (with notes)
 
Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presenta...
Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presenta...Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presenta...
Text of Post-Digitization, Digital Preservation portion of AMIA 2015 presenta...
 
Osgis 10 arnulf-christl
Osgis 10 arnulf-christlOsgis 10 arnulf-christl
Osgis 10 arnulf-christl
 
Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...
Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...
Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...
 

Mehr von ARDC

Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADAARDC
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and StandardsARDC
 
Data Sharing and Release Legislation
Data Sharing and Release Legislation   Data Sharing and Release Legislation
Data Sharing and Release Legislation ARDC
 
Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)ARDC
 
Investigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveInvestigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveARDC
 
NCRIS and the health domain
NCRIS and the health domainNCRIS and the health domain
NCRIS and the health domainARDC
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataARDC
 
Clinical trials data sharing
Clinical trials data sharingClinical trials data sharing
Clinical trials data sharingARDC
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studiesARDC
 
Introduction to vision and scope
Introduction to vision and scopeIntroduction to vision and scope
Introduction to vision and scopeARDC
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things dataARDC
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC
 
Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128ARDC
 
Research data management and sharing of medical data
Research data management and sharing of medical dataResearch data management and sharing of medical data
Research data management and sharing of medical dataARDC
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataARDC
 
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesApplying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesARDC
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018ARDC
 
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintReady, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintARDC
 
How FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataHow FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataARDC
 
Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018ARDC
 

Mehr von ARDC (20)

Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADA
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and Standards
 
Data Sharing and Release Legislation
Data Sharing and Release Legislation   Data Sharing and Release Legislation
Data Sharing and Release Legislation
 
Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)
 
Investigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveInvestigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspective
 
NCRIS and the health domain
NCRIS and the health domainNCRIS and the health domain
NCRIS and the health domain
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
 
Clinical trials data sharing
Clinical trials data sharingClinical trials data sharing
Clinical trials data sharing
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studies
 
Introduction to vision and scope
Introduction to vision and scopeIntroduction to vision and scope
Introduction to vision and scope
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
 
Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128
 
Research data management and sharing of medical data
Research data management and sharing of medical dataResearch data management and sharing of medical data
Research data management and sharing of medical data
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) data
 
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesApplying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and Challenges
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
 
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintReady, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
 
How FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataHow FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of data
 
Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018
 

Kürzlich hochgeladen

Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 

Kürzlich hochgeladen (20)

Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 

Transcript - Tracking Research Data Footprints via Integration with Research Graph

  • 1. [Unclear] words are denoted in brackets Webinar: Tracking Research Data Footprints via Integration with Research Graph 1 March 2018 Video & slides available from ANDS website START OF TRANSCRIPT Facilitator: Good afternoon everyone, thanks for coming to the webinar today. We have a talk today on the topic of tracking the footprint of research data across infrastructures, using the Research Graph API. The speakers today are Doctor Ben Evans from NCI, Associate Director of NCI, and Doctor Jingbo Wang who's a Collection Manager in NCI. So with that introduction, I'll actually hand over the talk to Ben for starting the talk. Ben Evans: So we're going to be talking about work that's going on to help track research data and how it's used in a broader setting. I should mention, NCI's got a lot of partners as a part of this that have been backing and worked with us in this, including from NCRIS and Bureau of Meteorology, Geoscience Australia, CSIRO, the ANU and a host of other partners and collaborators, including ANDS in particular, for this work. So some of the open questions, motivating questions, beyond just getting data management in place is - so say you publish data and datasets, is how is the research community actually connecting with that data? After you've put it into a public arena they could be connecting with it in various ways and making [use], so how do you track that? Also, how do you track the impact of that investment of that
  • 2. Page 2 of 8 research data for other derived products downstream? So that's a challenging question that we can't answer fully with inside] of a single centre; you're really into an international world. That motivated us a lot to be working on this particular project which has part of the solution. So I should say that the standing of this work and this piece of infrastructure that we'll be going through on Research Graph started with a fairly small partnership. But now it's grown quite a bit and RDA, Research Data Alliance, have picked it up as this Registry Interoperability Working Group. It's got a number of players. You can see some of the players who've been strongly supporting this work over a period of time listed there and you can follow that link on RD Alliance website to track this. But, furthermore, now really through Amir's good work and others, the European Commission have picked this up and said, yes, this needs to now be pushed into an ICT specification. So all that is to say that this work is now on a pretty strong pathway and well worth paying attention to now as it goes forward. So there's four types of what we call nodes in this graph network when you're publishing data and using data. So one is the researcher, one's the dataset, one's the publication, one's grants. There could be other nodes as well, but the status of these whole graphs at the moment is basically built up of those fundamental areas. When we get down to it inside of the tool, you can see the attributes through that graphic on the right-hand side. Research is always in green and datasets are in orange, publication is blue and grants in yellow. You can see some of the attributes that are listed there and we'll talk about that. The other thing is that this graph network that's been built up understands very well-known metadata standards like ISO 19115-4; that's geospatial data, a lot of geospatial data fits into that. But also things like RIF-CS that's used in the librarian world, and inside of Research Data Australia - if you know that catalogue - uses RIF-CS, and MARC 21 and there are others as well. So just to say that this graph system is already supporting that framework.
  • 3. Page 3 of 8 For NCI, we make a number of major national reference datasets available on NCI. We've curated them and put them into a certain form. They come, in principle, from a lot of the science agencies, being Bureau of Meteorology and Geoscience Australia and so forth, also sometimes from our research community itself. But they're being classified as really the major national reference collections that are associated with NCI. You can see some of the things listed there, climate, weather and satellite imagery, bathymetry, elevation, all of these earth systems, geospatial data in particular. As an example of a dataset now is - so we've got this thing called Bluelink ReANalysis dataset. On the left-hand side it gives you a summary of what it is. On the right-hand side many people are familiar and work with catalogue systems, so we're using GeoNetwork as part of our core catalogue system. So you get the title, so that's the blue - you can see on the right-hand side it's circled there and an abstract about it. You can see points of contact. So this is all part of this ISO 19115 standard, that's how all of this is recorded, how to get hold of that data. So the question that you've got off something like this is what researchers are working on that, or related datasets, how they're publishing, is there anything else connected to it. So you end up with this little graph of stuff. Just down on the bottom right-hand side here, just off this basic diagram here, you can see [Peter Oak], who's the main contact for that dataset, is somehow associated with this BRAN - Bluelink ReANalysis - dataset. So they're somehow associated with that even off our local information. So you can find out a little bit more about Peter. We have other information systems that have got Peter's details, so what project he's working on, publications somehow linked to him, his contact detail and a pretty picture there of Peter looking very spritely. So we have that information in NCI. So on the left-hand side, in this dotted line that you can see with the NCI logo around it, we know a fair bit about Peter, that's the number one with the green, there he is.
  • 4. Page 4 of 8 There he is with his - as a researcher and an identity and attributes inside of our local information. We know various things about datasets that Peter is associated with. But there's other things that live outside of NIC. In particular, on the right-hand side there, you can say out in the real world, or out in the external world, Peter Oak has what's called an ORCID ID, and many of you know this. Inside of - associated with his ORCID ID we know things about his publication record. So the trick for all of this stuff is to try and associate our internal information to the external information. There's a number of steps that we go through here. Number one, let's have the information recorded inside of a little graph that we'll go through in a second. Then we can augment the graph with how it gets connected up with the ORCID ID. Then we can find out further information, in particular about other external records like his publication record. So almost redescribing this same [step] is, in a fundamental way what we do is we've got a GeoNetwork catalogue with a lot of this information; that is via the utilities in the Research Graph system. Harvest that and puts it into a Neo4j, which is a type of a graph database, just the one that we happen to be using for this. That Neo4j is just hosted inside of the cloud. That has our information, it's just a recasting of the local information and put inside of this system. Then what we do is go out into a broader Research Graph on the outside world, and we augment then the local graph database with that extra information. Then we can visualise it in various ways. So that's what this image - and there is a graphical tool that comes along with this, to start seeing a whole bunch of connected things to do with this data that can start to be exploited. So if we just had the local information of various datasets, then all we would have is the left-hand side of this. Through that extra augmentation, going and querying in the international Research Graph and then augmenting for the local data, we end up with a much richer set of information about what each of the individual
  • 5. Page 5 of 8 datasets and researchers and what they're doing and their associations. So that's pretty simply what's going on. The Research Graph system that's been put in place really by the partners, and particularly Amir driving this, interoperates with a whole bunch of different services; ORCID, DataCite, Skolix has come on board, and other major datacentres like [ASIS] and so on and so forth. So there's a list there, and a growing list, of information being put into an interoperable graph system. So now there's richer and deeper details that we can start harvesting. There's actually - we did the simplest augmentation, is the description on this previous page. But, actually, you can run several levels of augmentation and we're still I guess trying to explore what's the best way of augmenting the data of the questions that we're trying to face. So, look, I'm going to hand over now to Jingbo who's going to take us a little bit more through some of the details of Research Graph and where it's going. Jingbo Wang: Thank you, Ben. Hi, from this point of time I wanted to go through a couple of slides, in the next 10 minutes or so, to demonstrate how we implement the Research Graph [pack line]. Also, report what are we currently working on, plus some future plans going forward. So in this slide, it shows you what is the input and what is the output. The input is NCI's metadata database. As you see in the previous slides by Ben, our dataset available in GeoNetwork in various formats - it could be CSV or XML or JSON - they are the input so that Jenkins server take that input from the [data hub] and build the NCI graph. So the output will be NCI graph. On the right-hand side, the bottom screenshot just shows you how easy to maintain and update the database with only one click of the button. The five different modules, in green colour, shows you the step-by-step inside of the Jenkins server to build the NCI graph and also augmentation with other database such as a geo - [ORCID]. So what we get eventually is an NCI graph [ML]. There are different ways
  • 6. Page 6 of 8 to visualise the graph. One way, which was not presented here, is we can use the [GAVI] software to visualise. But a more popular way would be we present our graph in a web-based format. So if you click that link or type this link in your browser, you can actually see this is online. I'm going to show you three screenshots on this webpage, followed by a little live demo afterwards. Basically, this is the interesting part, once we get the graph and we're going to analyse the graph and try to tell the story from the graph. The first screenshot just really gives you an overview of how many publications in our augmented graph and how many datasets and how many researchers here. I'm going to run a little live demo to repeat the story that Ben told you about Peter Oak. If you type this, researchgraph.org/NCI. Jingbo Wang: Alright, in the web browser you can see a webpage about NCI's graph. Click that orange button, it'll open a new tab to show the graph. This is the actual graph look like. If I find Peter Oak as a researcher and click that one, it only shows the connection with this researcher. The colour code of the dot is that this is the dataset which is the Bluelink ReANalysis data associated with Peter Oak. If you notice, there is another green dot over here and this is the augmented part from ORCID. The blue dot represents the publication associated with this researcher. So this really demonstrates that, through the augmentation, our own database with the dataset and researcher are connected to the rest of the world. Let me go back to my presentation again. I should say that we did play around with the different analytics and this is the most interesting part. We demonstrate a few cases that we think people are interested. For example, what is the most publication related to a researcher, and this researcher is always identified with the ORCID ID. Also, which researcher has the most dataset associated with him, with his affiliation. On the right-hand side, if you are still with the web browser,
  • 7. Page 7 of 8 you can actually put your mouse onto some of the name. It will only show the connections between this researcher and other researchers. So it's more like an interactive mode. I should also say that this augmentation is still work in progress. It means that we can augment with other databases, such as DataCite or other European data repository, and we can actually make our graph bigger and bigger. The last screenshot is just showing the number of publications along the year. As I said, this is not a static graph because we can always augment with other database and we can introduce more publication if it is not in the ORCID database. So behind the scene we use the Jupyter Notebook to generate this web interactive format. We plan to play around more by providing maybe predefined query, so that people can put the person's name on ORCID, find out what is the connection between this researcher and the publication and the dataset and, in the future, even the grants if it's available in our database. So next is we think that Research Graph can be useful for a number of different groups of people. We think also providing Research Graph in the linked-data format would be beneficial for people who want to work with more machine-searchable and actionable approach. So what we've done is we did a bit of proof-concept work by extending our current format of the Research Graph in JSON to JSON-LD, using schema.org to enhance the schematic feature of the Research Graph. We have a publication last year talking about the approach and the ideas, so the reference is at the bottom of the slide. The other thing is, once we build the Research Graph there are a lot of interesting analysis that we can do. So we are currently exploring the new ways of analysing the information in the Research Graph and trying to pick up the good stories about what Research Graph can tell us. The other thing is, because we are the national data repository we actually encourage people to do the cross-disciplinary research based on our high-performance platform. If we can demonstrate the value of [cross-system] and disciplinary research, by showing that when
  • 8. Page 8 of 8 different type of dataset available on the same platform, more research, more publication and more funding was granted, it will be quite good to demonstrate the impact of our data management practice. So in summary, I think Research Graph really means a couple of things for a different group of user. For example, for a user itself of the data repository, they can understand the dynamic research integration through these analytics. I remember when some researcher submit an ARC grant, they sometimes show their publication citation along the year being increasingly better and better. But with the Research Graph they can actually show more information, not just publication but also their contribution of the dataset and their award on other additional funding using the Research Graph. For the higher-level executive and board, as a data repository we can demonstrate the value of our good data management practice and provide the interoperability of the data services through these more advanced services. We also advance the science research by having more publication and more impact in the matrix. Finally, for the funding body, since they invested a good amount of money for the data repository, we can demonstrate the impact of the investment on the data repository by showing the quantitative analysis of the impact matrix within the research community. So if you want to learn more about the graph, we have the GitHub source code and we also have the interactive demo of the graph, and there is Twitter also if you wanted to socialise it. I think that's it. Facilitator: Okay, thanks Jingbo. I'd like to thank Ben and Jingbo for giving this talk and thank you, everyone, for attending the webinar. Thank you. END OF TRANSCRIPT