SlideShare ist ein Scribd-Unternehmen logo
1 von 16
FAIR Data webinar series #2: A for Accessible –
ANDS Webinar
6 September 2017
Video & slides available from ANDS website
START OF TRANSCRIPT
Keith Russell: Welcome everybody to the second in this series of webinars about the
FAIR data principles. Today we are up to A for accessible. Last week
we talked about the first one, findable and now accessible and next
week we'll talk about interoperable and the week after that about
reusable. First of all, I'd like to introduce myself. My name is Keith
Russell. I'm from the Australian National Data Service. I'm your host
for today.
A big thank you to Susannah, Susannah Sabine in the background,
she's organising and co-hosting this webinar with me. Just as a bit of
background, the Australian National Data Service works with research
organisations around the country to establish trusted partnerships,
reliable services and to enhance capability around the sector to add
value to research data and to enhance the capability in the research
sector.
We are working together with two other NCRIS funded projects. So
that's RDS and Nectar, to create an aligned set of joint investments to
deliver transformation in the research sector. There you are. We
have three speakers for today. I'll do a quick kick off and just give a
very brief introduction to what the FAIR data principles say about
accessible.
[Unclear] words are denoted in square brackets
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 2 of 16
Then I'm really excited and very grateful for two of our speakers today.
David, David Fitzgerald. He is in this webinar and he doesn't have a
webcam, so that's why you don't see him at present. David is a data
manager at the Australian Longitudinal Study of Women's Health.
David is going to be talking about how in the study and how in the
data that's being provided, they make the data accessible.
I was especially interested in this perspective from the angle of
sensitive data and making sensitive data accessible. The other
speaker for today is Jingbo, Jingbo Wang, from NCI. I've asked
Jingbo to talk a bit about where - how NCI makes their data
accessible using services for the data. They can be interrogated used
by humans and machines. First of all, I'd like to give a brief
introduction about the A in the FAIR data principles.
The A stands for accessible. The way it's been described and the way
FORCE11 described the principles is that metadata, so data and the
metadata, both of them, are retrieved by their identifier, using a
standardised communications protocol. When we talk - when
retrieved by their identifier, that's the identifier we talked about last
week. That can be a DOI, a handle, a perl, something that's
persistent. By using the DOI, handle or perl, you should be able to get
access to the data or the metadata.
The protocol to get there should be open, free and universally
implementable. The thing to think about there is that it's something
that is a protocol which is standardised and used by - can be used by
anybody. It's not something that is bespoke. Not something that is
home built or badly documented. The classic example is just htttp.
That's the very normal way of using it through internet accessing
materials and accessing data.
It should not require some specialised expensive software. Another
point they make in the data principles is that the protocol should allow
for authentication and authorisation procedure where necessary. This
is a common misunderstanding, is that when people read accessible
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 3 of 16
they think that means I have to make my data open. If you actually
read the FAIR data principles, that's not what they're saying.
What they're saying is accessible does not actually have to be open or
free. But you are expected to give exact conditions under which the
data are accessible. Even heavily protected and private data can be
made fair. If you implement it properly, implement the FAIR data
principles properly, then a human being can see that the data is
maybe not openly available, but then what steps they need to take to
get access to the data and because in the FAIR data principles they
also talk about machine access to data.
If a machine goes hunting around looking for the data, the machine
should be able to recognise that the data is not open and what steps
need to be taken to get to the data. I'll talk about that a little further. If
the user, so that's either the human or the machine, has been granted
access to the data, then it should be accessible through some sort of
authentication and authorisation procedure, standard procedure.
The last point they make under the FAIR data principles about being
accessible is in the case, the case in which data is no longer
available, at least the metadata should be accessible. This is of
course not ideal. But in some cases it is necessary to take the data
down. That could be if consent for use was only for a limited period of
time or maybe there has been a legal takedown notice or something
along those lines that really make it impossible to no longer make the
data available.
In that case, it is valuable to still keep up a metadata record describing
the data and explaining that the data is no longer available. Now just
to reinforce that accessible does not always have to be open, there
are clear cases in which data cannot be made openly available.
Obvious example is where data refers to human beings and specific
characteristics of those human beings, like information about their
health, their income or religion, attitudes, political persuasion, all that
sort of stuff.
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 4 of 16
That's not the sort of information you can make publicly available.
Other examples, and that's probably worth remembering, is that there
are other sets of data, for example, a threatened species. The
location of where threatened species are can be data which is not
something you want to make openly available, because that could
mean that the last few of those species are hunted down or collected.
A famous example, the Wollemi Pine, the location of that - of those
specific species needs to be protected. Finally, the - another example
where data cannot always be made openly available is whether our
commercial interests in the data and maybe the metadata can be
shared. But the data itself is - there are commercial interests around
that. In that case, it would not be appropriate for that to be made
openly available.
When considering making data accessible, we do argue to make it as
accessible as possible and as openly available as possible. Possible
angle there is just to provide the metadata as a starting point. If the
rest cannot be made available, at least the metadata. Slightly more
useful perhaps is making it available through mediated access and in
that case, it's valuable to be clear about how the user can actually get
access. That can be through by providing an email address, name,
telephone number.
If, for example, the user has to through an ethics procedure to get
access to the data, then clearly describe that ethics procedure and
what sort of information is required to apply for that ethics procedure.
I was talking about the mediated access and about providing
information about who to contact if you want to get access to the data.
One thing to keep in mind there is if you are - if you list a person or a
person within the organisation, have a think about whether that person
is ever going to leave.
If that's a researcher. If they are going to another organisation. Have
a fall back, have some sort of mechanism to make sure that or maybe
a more general email address. So when that data custodian leaves,
somebody else can at least answer the question and grant access to
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 5 of 16
the data. Another possible angle in making data accessible is
creating a de-identified version of the data and making that public, as
long as it's properly de-identified.
That can be useful for certain data users. At least have a better view
of what's in the dataset. For some purposes, a de-identified version
can be enough. Finally, a good point to keep in mind is if you do want
to make the data accessible, plan for this in your consent forms,
because coming back afterwards and trying to get consent is not
easy. Another angle worth keeping in mind and that's something I've
invited Jingbo to talk about more, is making data accessible.
It can be through various roots and various protocols. In some cases
it doesn't make sense to have a large dataset available through
download. In some cases it can make much more sense to have
services over the data which allow the users to interrogate parts of the
data, pull in parts of that data that a much more specific and much -
and answer their requests. That can be for a human being, but
especially for a machine, that can be extremely useful.
One thing to keep in mind there, you need some sort of community
agreed standards around that. But Jingbo is going to talk much more
about that. So that was all from a much more theoretical perspective.
I'm very grateful that I have two speakers today to talk about
accessible in practice and how they've actually tackled making data
accessible.
The first speaker for today is David, David Fitzgerald. He is the data
manager at the Australian Longitudinal Study of Women's Health. I'm
very grateful that David is available to talk about what ALSWH has
done to make quite sensitive data still accessible for others to reuse.
David is on the line and I would like to hand over to David and then
David can talk about how the - how in the Australian Longitudinal
Study of Women's Health they have made data accessible.
David Fitzgerald: Thank you Keith. Okay. I am David Fitzgerald, the data manager for
ALSWH, that's how I pronounce it, the Australian Longitudinal Study
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 6 of 16
on Women's Health. I'll be talking about the accessibility issues for
this. I'm going to first of all explain and give background to our study
and then talk about the accessibility issues and try and relate them to
the FAIR data principles, which I've just listed here.
These are these act ones, which Keith showed earlier. I won't go
through them in detail. But I'll try and relate these to our study. Okay,
so what is the ALSWH study? It's a collaborative effort, project from
the two universities of Newcastle and Queensland. In fact, the two
universities there, related to keeping the sensitive data, which I'll talk
about briefly. It's one of Australia's longest running longitudinal
epidemiological studies.
It's been going since 1996 and is ongoing. We hope to go further into
the future, funded by the Australian Government. We started off with
over 40,000 women and a few years ago we got a new cohort of
17,000 women. I'll show you the four cohorts we work with. Here
they are. The four cohorts are aged based and we define them in the
years of birth. You can see there is one - the oldest one born 1921 to
1926 and there are three other ones of various ages.
As you can imagine, each cohort has their own health issues and
that's what we're interested in and indeed, the Australian Government
is interested in. What are we collecting and our methodology, so
health issues, in particular mental, physical, reproductive, social
health. There is more and also life transitions, the different ages of
women obviously going through different life transitions, life events
and things which are related to health and employment, health service
use and more.
I'll just mention a bit of data linkage. I don't want to stress this
because it's a big area with lots of issues. But we have actually linked
our survey data with some administrative datasets. In fact, they're
listed there. The NBS, PBS and Cancer Registries and admitted
patient hospital. The linkage is particularly sensitive and we treat
them quite differently in how we make the data accessible.
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 7 of 16
The data is used extensively and in particularly more than 680 peer
review papers have been published using our data and also we report
back to the Government frequently and national health policies have
been informed by reports and use of our data. I'll go on to the aspects
of accessibility and to see how it relates to our data. So that one there
about being retrievable by an identifier using standard
communications protocol.
All the datasets from our survey which are analysed and are used
have an identifier, the same identifier and I'll just stress here, it's de-
identified but with a consistent new identifier. That's across all
surveys. Anyone using our survey data - I'll just put the caveat. As
long as it's not part of the linked data. But anyone using this survey
data has one and only one identifier for use.
We say this has been de-identified because there are no personal
names on the data. No addresses. No postcodes. No dates of birth,
although the year and month of birth are actually given, obviously to
do things like age analysis. Any - they're the main ones. But any
other data which is deemed to be identifiable is stripped off. The
identifier is - we call it the ID alias.
It's actually not the administrative ID, which a respondent would see or
somebody working in an office in Newcastle who is communicating
with our respondents. They would not know what the identifier - the
analysable identifier is. They would have a different administrative ID.
Just on this point. Any small cell sizes which we think are identifiable
are grouped into larger groups.
For example, country of birth we group into broad continental,
geographical areas to avoid particular countries of birth coming up.
Anyone using the data has to - along with a number of other
conditions – they must not identify a respondent, which although we
go to lengths to make that very difficult. It's conceivable that
something could come up.
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 8 of 16
But they promise and sign that they will not identify respondents if
they ever have that possibility. I was also asked to look at legal and
ethical issues. We do have a legal contract with the Australian
Government Department of Health. The fact that this is ongoing and
we didn't get a 20 year one. They are regularly updated and short
term contracts. Also, the ethics committees from the two universities
there have approved usage.
In fact, every time we do a new survey, because it's longitudinal, every
year we're actually going back to at least one of the cohorts to survey
them. Each new survey which is not identical to previous surveys is
subject to ethics committee oversight and approval. We do have
extensive legal and ethical issues there. I want to talk about how an
investigator or a re-user would get access our survey data.
They - and as we explained, this is all on the website. But they must
first complete an expression of interest form and particular they would
say who they are, why they are a serious researcher, what they want
to find out from the data. That would be reviewed by our publications
sub-studies, that's the BSA committee.
Then if their EOI, expression of interest, is approved, they will sign the
confidentiality data use documents, statements, before receiving the
de-identified data. They also must report back to us about their
progress and we expect some sort of - some immediate work on the
data and for them to continue with that access. But if their expression
of interest is successful, the data are sent to them and this is an area
which I'm directly involved in.
We do, before sending it out, encrypt it. We use 7z software and
that's compressed as well. We use the AARNet CloudStor system to
send data to the approved researchers, reusers and an email was
sent to them as well with passwords, but also to establish contact with
the management here, for future correspondence. I just put a note
there about we have linked data, but we never send this out. Anyone
using this has to come to our offices.
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 9 of 16
Or there is the Sax Institute shore facility which also can have it. But
we don't own the linked data. We have agreed not to send it
anywhere. Public metadata, this refers back to protocol being open.
We have a website which lists the above procedure, in fact, that I went
through. But also has a lot of metadata on it, including a data
dictionary which lists all the variables and the many datasets we have,
a data dictionary supplement, which is a description of the frequently
used variables with some detail, a data map that shows how the
variables are used across the different surveys and cohorts.
When I say different surveys, the longitudinal, we have up to eight
surveys for some of our cohorts. Each one is deemed a different
survey and has slight differences from other surveys. We have a list
of all the variables used and spreadsheets for easy access. We also
have data books which list the frequency summaries of the variables.
The questionnaires that the respondents filled in, technical reports
which we produce that go into detail on many of our reports and a
frequently asked question page on exactly that. So, making metadata
accessible. In fact, we make data - although our data is not
completely open, we do want to make it accessible. We do archive
both the metadata and the data and we do that annually with the
Australian Data Archives.
Although they are not releasing it yet, the plan is in the future for them
to take over release of our data, perhaps when we're not doing it
ourselves. That will be a role to keep our data useful and used in the
long term. That's what I've got to say. I'd just like to acknowledge the
women in our study who fill in the surveys and of course the
Government Department of Health for funding us and the Universities
of Queensland and New South Wales for doing the job. Thank you.
That's what I have to say.
Keith Russell: Thank you David. Thanks, just really interesting presentation.
Interesting to hear how you've made data accessible in practice and
what it means to make sensitive data accessible to researchers.
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 10 of 16
Thanks for that perspective and thanks for that view on how quite
sensitive data can still be made accessible through various roots.
I think it's really interesting to hear that you both have the root of de-
identified data through appropriate roots, but also linked data. So a
much richer version, but then through either Sax institute shore facility
or through coming to the ALS,WH The Australian Longitudinal Study
of Women's Health. I've got to work on that one. Thanks.
Okay, I would like to now move onto Jingbo. Jingbo has got a - I've
asked Jingbo to talk also about making data accessible through a very
different perspective. Jingbo works at NCI and they're - NCI does all
sorts of elements around making data findable, accessible,
interoperable, reusable. Today I've asked Jingbo to focus on the
accessible side of things. But I do want to note that NCI also does a
whole bunch of other things in this space.
Jingbo Wang: Thank you Keith. I think I will just turn off my camera, because I can't
see my presentation. My name is Jingbo Wang. I work at National
Computational Infrastructure, which is a super computer centre
located in Australian National University campus. Today I'm going to
address different flavour of data accessibility practice at NCI. Before I
go further, I just want to make a comment that FAIR principle is quite
useful to govern our data management practice.
We use it a lot in every single aspect in our data management. This is
a quick overview of the dataset we have. As you can see, I've listed
here the main data type that we store at NCI are national collections
about climate models, satellite images with bathymetry elevation,
hydrology, geophysics. Those data are quite geospatial focussed.
But we also have other social science data and genomics sequencing
data and astronomy data. We aim to provide a user with data as a
service, as many digital repositories will do. In our data management,
we catalogue data so that people can query the metadata database to
find what we have here. We also publish data through various data
services. That's a focus I'm going to talk about in the next few slides.
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 11 of 16
We offer data quality assurance, data quality control and
benchmarking use cases. We provide data through virtual
laboratories. We also provide help on data visualisation. If I wanted
to make something that we are different from other different
repositories because we are co-located with HPC facility, high
performance computing. Given the nature of our large scale of the
data, we host more than 10 petabyte research data.
We really want to make good use of the high-performance computing
here to advance science research. This is the six dot points that I
wanted to address today about data access. I put the red colour
words to show the difference for each point. Initially, I will talk about
the - how do we control the data access and then I'm going to present
one example of how do we use process in identifying to manage data
access.
Then I will talk about two main data services that we offer at NCI for
our users when there are threats, when the other one is GSKY, which
is a more fancy and scalable distributed data server. Finally, I'm going
to cover very quickly about the data versioning and the quality of the
data. The first point is about how do we control the data access.
Most of our data are coming from our stakeholders, such as
Geoscience Australia, the Bureau of Metrology, CSIRO, universities.
Many data has been funded by Australian Government, so it naturally
falls into CC BY 4.0 licence. Some owners also impose that the data
should be non-commercial, non-derivative or share a like type of CC
BY.
We also have international partners such as in the European and US
and they impose even strict terms and conditions if people wanted to
access the data. This is the legal perspective about how do we
control the licence, data access through licences. On the file system
we hardcoded the data access control using echoes.
This is a way how do we separate different groups of people
accessing the same data. We have - basically for each collection, we
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 12 of 16
have two access groups. The first group has a read and write
permission, which means those are data managers who are able to
generate data, write that and modify data. The second group is read
only group.
For those people who are in the read only group, they can access the
data on the file system. But they can't really modify anything. This
way we actually protect the integrity of the data. We only give access
to an authorised person, who really can manage the data. There is
also a social aspect of data access. For a research project, we often
see the embargo period that maybe after two years of the project the
data can be made available.
Also, some researchers say I want to share my data after my journal
article about this dataset is published. Another example is from the
Bureau of Meteorology. We have a data that where there is a six
months' time delay between the data is being developed, verified until
it is being operational, available on our THREDDS server.
The second point I wanted to raise is our practice about implementing
a process identifier. Often we experience some frustration about
when we give people the URL to access the data it is only valid for a
certain period of time or only valid during the time that somebody can
maintain it. Afterwards, we can't really guarantee and also the URL,
the original URL, if you look at on the left-hand side of the slides.
Those are the metadata catalogue URL or service endpoint URL.
Let's look at the second one, which is service endpoint. From this
URL [unclear], you can tell the later part which includes the project
code, file path, file name. Anything in this path, for example, project
code change off - you rename the file or we shuffle the file around and
this link will be broken.
The original URL that we provided here is not a very stable one. We
adopt the product that the CSIRO developed some time ago, about a
persistent identifier as a broker. We now - most of the time we give
the external user the right-hand side, the name combination. As you
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 13 of 16
can see, we have four main categories after a pid.nci.org.au. We
have dataset, we have services. We have documentation and we
have vocabularies.
The only thing could be unique is the file identifier or UUID. It's
basically - as long as the identifier keeps the same, the URL on the
right-hand side is pretty consistent. If anything changed in the original
URL on the left-hand side, what we need to do is update the mapping
inside of the PID service broker without interrupting the URL that we
give to the external user.
We have the technical implementation published in the teachers'
science journals, so you are welcome to have a look. Now I'm going
to talk about the main data services that Keith really wanted me to
address from NCIS' perspective. I divided our type of data service
into two main groups. One is the OGC services. I'm going to talk
more about what is OGC in a second.
The other type of data services is more project specific, such as we
are one of the largest node in the southern hemisphere as part of the
Earth Systems Federation Grade which is the aggregation of climate
model from Global Research Institute. The way we provide services
is we copy the main of the data model to serve for Australian users.
Another fancy data service I am going to show you a bit more is
GSKY. It's a scalable data server that directly interacts with our file
system. What is OGC? OCG is Open Geospatial Consortium. It is
an international non profit organisation to make quality open
standards for global geospatial community. We find OGC standard is
quite useful for us because we have a lot of geospatial featured data.
OGC have all sorts of standards for different types of mapping, future
coverage processing for us to use. Because it's so common and it's
free for people to use and if we made data available through OGC
standards, a lot of people naturally can access our data. That's the
motivation. What is OGC services? It's actually an API in the middle
between the data store and the user.
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 14 of 16
The user can request whatever available on OGC services. Let's say
I want a map about the anomaly across whole Australian continental.
NCI host this data. But we host the data. We don't host images.
What's the OGC web services do is they actually extract the image
and return back to the user. The user can take the URL, which
contained the image of the data, put on their own web portal. For
example, you can get the URL and copy and paste onto the national
map to show the grades.
NCI has two main production data type service. One is the
THREDDS. You can often find the THREDDS available on our data
catalogue. This is the interface of the GeoNetwork. The red circled
link is the NCIS web server. You can open and click it. A second
interface is a data catalogue. They more or less contain the same
information, but serving for different purposes. GeoNetwork is mainly
for data harvester, for machine accessible.
The data catalogue is for human readable. THREDDS, in a very
simple term, is it's data services which allow you to browse and
access the data. I've listed here six main types of services that
THREDDS offer, the very first two OPeNDAP and NetCDF is subset,
sub-setting the data. We have a lot of very large data. But in practice
when scientists access the data they don't necessarily have to access
all the data.
They might just need a very small piece of data from this big pool.
What the THREDDS can offer is, you can define your query and only
get the data, the part that you want. It's really saves a lot of traffic on
the internet. The other two standard OGC web mapping services, the
Web Coverage is very popular for people to access the mapping and
coverage directly out of our data.
Of course, THREDDS offer a very quick data viewer. If you don't
know what this data is, you can have a quick look of what it is on the
web, without downloading it. Of course, also the THREDDS offer the
direct download, if you really want to download the data. Another
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 15 of 16
fancy scalable distributor data server that I was talking about is called
GSKY. GSKY is the in-house NCI developed product.
What it does is we have a lot of data on a file system, millions and
millions of files on the system. If we wanted people to query this data,
how? It's going to be very harder to create millions of metadata
records for every single file. What we've done is we use the crawler to
crawl the file system, get the header of the file and formulate as a
database, metadata database.
Then the database will be a clear window for people to hand in the
request. It gives me some images in the polygon at what - at some
time. The metadata database actually include those essential
geospatial information. It returns back to user of what they requested.
We have published recently technical details of GSKY
implementation. You're more than welcome to have a look.
Keith Russell: Jingbo? Sorry, Jingbo. I think you're getting close to the end. I just
wanted to ask you - there is only about a minute or two left, so if you
could work towards the end that would be…
Jingbo Wang: Sure, I'll quickly go through. The last two points will be version data.
Again, because of the scale of the data, we can't really store every
single step of the data. What we can do is we store the raw data and
the final version and we keep the URI of the metadata in the middle
step. In that way, the provenance information was kept and also
saved the storage.
The last point of the quality data is I would think some users say we
can't really assume we can access data and data is flawless. By
publishing data, aside with the quality report, we wanted to provide a
data access with a certain type of assurance. We also have the
publication that is going to be in place very soon. Thank you for your
attention. That's our experiences so far about this access.
Keith Russell: Thanks. Thanks Jingbo. That was really - a very quick overview of all
the work you've been doing there around services and all the work
you've been doing there about making data accessible, not only for
transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc
Page 16 of 16
humans, but also for machines. First of all, I would like thank David
and Jingbo again for providing an insight into what it means in practice
in making data accessible from different perspectives.
That was very interesting presentations. In case you are interested in
learning more about making your data accessible and things you can
think about there, this slide provides you some resources. The
med.data data project has got a number of materials around sensitive
data. There is a link here to the Australian Data Archive and the
access conditions there. On the ANDS website have some materials
on sensitive data.
Another piece of work we're doing together with the community is
looking at data services. This is the work that Jingbo also talked
about, making sure that the services over the data are discoverable.
There is an interest group working in this space. If you are interested
in learning more about it and also engaging more around that, please
follow the link and there is more information in there about that data
services interest group.
Last year we also did 23 things, research data things and two of the
research data things are relevant to the topics discussed today. Have
a look thing 10 and thing 19 if you want to learn more and also want to
get your hands dirty and try out a little bit what it means to make data
accessible. The link at the bottom is just a general link about the
FAIR data principles on the ANDS website.
This week we talked about accessible. Next week we are going to be
talking about interoperable. Thank you all for your attention. Finally, I
would like to acknowledge and thank first of all our speakers for today
but I would also like to thank NCRIS National Collaborative Research
Infrastructure Strategy Program for funding ANDS and making this all
possible. Thank you all for your time and look forward to seeing you
next week.
END OF TRANSCRIPT

Weitere ähnliche Inhalte

Was ist angesagt?

RDFa From Theory to Practice
RDFa From Theory to PracticeRDFa From Theory to Practice
RDFa From Theory to PracticeAdrian Stevenson
 
Sherpa Software Whitepaper Solving .Pst Management Problems In Microsoft Ex...
Sherpa Software Whitepaper   Solving .Pst Management Problems In Microsoft Ex...Sherpa Software Whitepaper   Solving .Pst Management Problems In Microsoft Ex...
Sherpa Software Whitepaper Solving .Pst Management Problems In Microsoft Ex...gopi1985
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2Tony Hirst
 
Ten Simple Rules for Open Access Publishers
Ten Simple Rules for Open Access PublishersTen Simple Rules for Open Access Publishers
Ten Simple Rules for Open Access PublishersPhilip Bourne
 
Sharing Data on the Web
Sharing Data on the WebSharing Data on the Web
Sharing Data on the Web3 Round Stones
 
Manufacturing Serendipity
Manufacturing SerendipityManufacturing Serendipity
Manufacturing SerendipityDorothea Salo
 
Emerging Technologies
Emerging TechnologiesEmerging Technologies
Emerging Technologiesrobin fay
 
Uses Of Internet In A Day To Day Life
Uses Of Internet In A Day To Day LifeUses Of Internet In A Day To Day Life
Uses Of Internet In A Day To Day LifeSundeep Malik
 
Research Data and Scholarly Communication
Research Data and Scholarly CommunicationResearch Data and Scholarly Communication
Research Data and Scholarly CommunicationDorothea Salo
 

Was ist angesagt? (10)

RDFa From Theory to Practice
RDFa From Theory to PracticeRDFa From Theory to Practice
RDFa From Theory to Practice
 
Sherpa Software Whitepaper Solving .Pst Management Problems In Microsoft Ex...
Sherpa Software Whitepaper   Solving .Pst Management Problems In Microsoft Ex...Sherpa Software Whitepaper   Solving .Pst Management Problems In Microsoft Ex...
Sherpa Software Whitepaper Solving .Pst Management Problems In Microsoft Ex...
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2
 
Ten Simple Rules for Open Access Publishers
Ten Simple Rules for Open Access PublishersTen Simple Rules for Open Access Publishers
Ten Simple Rules for Open Access Publishers
 
Sharing Data on the Web
Sharing Data on the WebSharing Data on the Web
Sharing Data on the Web
 
Marek Navratil Thesis
Marek Navratil ThesisMarek Navratil Thesis
Marek Navratil Thesis
 
Manufacturing Serendipity
Manufacturing SerendipityManufacturing Serendipity
Manufacturing Serendipity
 
Emerging Technologies
Emerging TechnologiesEmerging Technologies
Emerging Technologies
 
Uses Of Internet In A Day To Day Life
Uses Of Internet In A Day To Day LifeUses Of Internet In A Day To Day Life
Uses Of Internet In A Day To Day Life
 
Research Data and Scholarly Communication
Research Data and Scholarly CommunicationResearch Data and Scholarly Communication
Research Data and Scholarly Communication
 

Ähnlich wie Transcript FAIR webinar #2: A for Accessable-06-06-2017

How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...
How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...
How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...Dana Gardner
 
Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...ARDC
 
Transcript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audioTranscript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audioARDC
 
Data engagement and Local Information Systems
Data engagement and Local Information SystemsData engagement and Local Information Systems
Data engagement and Local Information SystemsOCSI
 
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONPranav Godse
 
Ethics In DW & DM
Ethics In DW & DMEthics In DW & DM
Ethics In DW & DMabethan
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextMurad Daryousse
 
20170313 mr - gss presentation
20170313   mr - gss presentation20170313   mr - gss presentation
20170313 mr - gss presentationMichael Rose
 
Englishmain12classix 131025065953-phpapp01
Englishmain12classix 131025065953-phpapp01Englishmain12classix 131025065953-phpapp01
Englishmain12classix 131025065953-phpapp01Harsh Tripathi
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open DataRoss Mounce
 
Data Science and its relationship to Big Data and data-driven decision making
Data Science and its relationship to Big Data and data-driven decision makingData Science and its relationship to Big Data and data-driven decision making
Data Science and its relationship to Big Data and data-driven decision makingDr. Volkan OBAN
 
Data set Legislation
Data set   Legislation Data set   Legislation
Data set Legislation Data-Set
 
The Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI WebinarThe Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI WebinarEryk Budi Pratama
 

Ähnlich wie Transcript FAIR webinar #2: A for Accessable-06-06-2017 (20)

How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...
How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...
How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...
 
Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...
 
Data mapping
Data mappingData mapping
Data mapping
 
Essay On Data Sharing
Essay On Data SharingEssay On Data Sharing
Essay On Data Sharing
 
Transcript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audioTranscript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audio
 
Data dynamite presentation
Data dynamite presentationData dynamite presentation
Data dynamite presentation
 
Data engagement and Local Information Systems
Data engagement and Local Information SystemsData engagement and Local Information Systems
Data engagement and Local Information Systems
 
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
 
Ethics In DW & DM
Ethics In DW & DMEthics In DW & DM
Ethics In DW & DM
 
What are the FAIR data principles?
What are the FAIR data principles?What are the FAIR data principles?
What are the FAIR data principles?
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data Context
 
20170313 mr - gss presentation
20170313   mr - gss presentation20170313   mr - gss presentation
20170313 mr - gss presentation
 
CBSE Open Textbook English
CBSE Open Textbook EnglishCBSE Open Textbook English
CBSE Open Textbook English
 
Englishmain12classix 131025065953-phpapp01
Englishmain12classix 131025065953-phpapp01Englishmain12classix 131025065953-phpapp01
Englishmain12classix 131025065953-phpapp01
 
Levine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal ConsiderationsLevine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal Considerations
 
9th
9th9th
9th
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
Data Science and its relationship to Big Data and data-driven decision making
Data Science and its relationship to Big Data and data-driven decision makingData Science and its relationship to Big Data and data-driven decision making
Data Science and its relationship to Big Data and data-driven decision making
 
Data set Legislation
Data set   Legislation Data set   Legislation
Data set Legislation
 
The Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI WebinarThe Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI Webinar
 

Mehr von ARDC

Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADAARDC
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and StandardsARDC
 
Data Sharing and Release Legislation
Data Sharing and Release Legislation   Data Sharing and Release Legislation
Data Sharing and Release Legislation ARDC
 
Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)ARDC
 
Investigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveInvestigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveARDC
 
NCRIS and the health domain
NCRIS and the health domainNCRIS and the health domain
NCRIS and the health domainARDC
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataARDC
 
Clinical trials data sharing
Clinical trials data sharingClinical trials data sharing
Clinical trials data sharingARDC
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studiesARDC
 
Introduction to vision and scope
Introduction to vision and scopeIntroduction to vision and scope
Introduction to vision and scopeARDC
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things dataARDC
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC
 
Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128ARDC
 
Research data management and sharing of medical data
Research data management and sharing of medical dataResearch data management and sharing of medical data
Research data management and sharing of medical dataARDC
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataARDC
 
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesApplying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesARDC
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018ARDC
 
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintReady, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintARDC
 
How FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataHow FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataARDC
 
Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018ARDC
 

Mehr von ARDC (20)

Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADA
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and Standards
 
Data Sharing and Release Legislation
Data Sharing and Release Legislation   Data Sharing and Release Legislation
Data Sharing and Release Legislation
 
Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)
 
Investigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveInvestigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspective
 
NCRIS and the health domain
NCRIS and the health domainNCRIS and the health domain
NCRIS and the health domain
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
 
Clinical trials data sharing
Clinical trials data sharingClinical trials data sharing
Clinical trials data sharing
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studies
 
Introduction to vision and scope
Introduction to vision and scopeIntroduction to vision and scope
Introduction to vision and scope
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
 
Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128
 
Research data management and sharing of medical data
Research data management and sharing of medical dataResearch data management and sharing of medical data
Research data management and sharing of medical data
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) data
 
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesApplying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and Challenges
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
 
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintReady, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
 
How FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataHow FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of data
 
Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018
 

Kürzlich hochgeladen

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 

Kürzlich hochgeladen (20)

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 

Transcript FAIR webinar #2: A for Accessable-06-06-2017

  • 1. FAIR Data webinar series #2: A for Accessible – ANDS Webinar 6 September 2017 Video & slides available from ANDS website START OF TRANSCRIPT Keith Russell: Welcome everybody to the second in this series of webinars about the FAIR data principles. Today we are up to A for accessible. Last week we talked about the first one, findable and now accessible and next week we'll talk about interoperable and the week after that about reusable. First of all, I'd like to introduce myself. My name is Keith Russell. I'm from the Australian National Data Service. I'm your host for today. A big thank you to Susannah, Susannah Sabine in the background, she's organising and co-hosting this webinar with me. Just as a bit of background, the Australian National Data Service works with research organisations around the country to establish trusted partnerships, reliable services and to enhance capability around the sector to add value to research data and to enhance the capability in the research sector. We are working together with two other NCRIS funded projects. So that's RDS and Nectar, to create an aligned set of joint investments to deliver transformation in the research sector. There you are. We have three speakers for today. I'll do a quick kick off and just give a very brief introduction to what the FAIR data principles say about accessible. [Unclear] words are denoted in square brackets
  • 2. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 2 of 16 Then I'm really excited and very grateful for two of our speakers today. David, David Fitzgerald. He is in this webinar and he doesn't have a webcam, so that's why you don't see him at present. David is a data manager at the Australian Longitudinal Study of Women's Health. David is going to be talking about how in the study and how in the data that's being provided, they make the data accessible. I was especially interested in this perspective from the angle of sensitive data and making sensitive data accessible. The other speaker for today is Jingbo, Jingbo Wang, from NCI. I've asked Jingbo to talk a bit about where - how NCI makes their data accessible using services for the data. They can be interrogated used by humans and machines. First of all, I'd like to give a brief introduction about the A in the FAIR data principles. The A stands for accessible. The way it's been described and the way FORCE11 described the principles is that metadata, so data and the metadata, both of them, are retrieved by their identifier, using a standardised communications protocol. When we talk - when retrieved by their identifier, that's the identifier we talked about last week. That can be a DOI, a handle, a perl, something that's persistent. By using the DOI, handle or perl, you should be able to get access to the data or the metadata. The protocol to get there should be open, free and universally implementable. The thing to think about there is that it's something that is a protocol which is standardised and used by - can be used by anybody. It's not something that is bespoke. Not something that is home built or badly documented. The classic example is just htttp. That's the very normal way of using it through internet accessing materials and accessing data. It should not require some specialised expensive software. Another point they make in the data principles is that the protocol should allow for authentication and authorisation procedure where necessary. This is a common misunderstanding, is that when people read accessible
  • 3. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 3 of 16 they think that means I have to make my data open. If you actually read the FAIR data principles, that's not what they're saying. What they're saying is accessible does not actually have to be open or free. But you are expected to give exact conditions under which the data are accessible. Even heavily protected and private data can be made fair. If you implement it properly, implement the FAIR data principles properly, then a human being can see that the data is maybe not openly available, but then what steps they need to take to get access to the data and because in the FAIR data principles they also talk about machine access to data. If a machine goes hunting around looking for the data, the machine should be able to recognise that the data is not open and what steps need to be taken to get to the data. I'll talk about that a little further. If the user, so that's either the human or the machine, has been granted access to the data, then it should be accessible through some sort of authentication and authorisation procedure, standard procedure. The last point they make under the FAIR data principles about being accessible is in the case, the case in which data is no longer available, at least the metadata should be accessible. This is of course not ideal. But in some cases it is necessary to take the data down. That could be if consent for use was only for a limited period of time or maybe there has been a legal takedown notice or something along those lines that really make it impossible to no longer make the data available. In that case, it is valuable to still keep up a metadata record describing the data and explaining that the data is no longer available. Now just to reinforce that accessible does not always have to be open, there are clear cases in which data cannot be made openly available. Obvious example is where data refers to human beings and specific characteristics of those human beings, like information about their health, their income or religion, attitudes, political persuasion, all that sort of stuff.
  • 4. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 4 of 16 That's not the sort of information you can make publicly available. Other examples, and that's probably worth remembering, is that there are other sets of data, for example, a threatened species. The location of where threatened species are can be data which is not something you want to make openly available, because that could mean that the last few of those species are hunted down or collected. A famous example, the Wollemi Pine, the location of that - of those specific species needs to be protected. Finally, the - another example where data cannot always be made openly available is whether our commercial interests in the data and maybe the metadata can be shared. But the data itself is - there are commercial interests around that. In that case, it would not be appropriate for that to be made openly available. When considering making data accessible, we do argue to make it as accessible as possible and as openly available as possible. Possible angle there is just to provide the metadata as a starting point. If the rest cannot be made available, at least the metadata. Slightly more useful perhaps is making it available through mediated access and in that case, it's valuable to be clear about how the user can actually get access. That can be through by providing an email address, name, telephone number. If, for example, the user has to through an ethics procedure to get access to the data, then clearly describe that ethics procedure and what sort of information is required to apply for that ethics procedure. I was talking about the mediated access and about providing information about who to contact if you want to get access to the data. One thing to keep in mind there is if you are - if you list a person or a person within the organisation, have a think about whether that person is ever going to leave. If that's a researcher. If they are going to another organisation. Have a fall back, have some sort of mechanism to make sure that or maybe a more general email address. So when that data custodian leaves, somebody else can at least answer the question and grant access to
  • 5. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 5 of 16 the data. Another possible angle in making data accessible is creating a de-identified version of the data and making that public, as long as it's properly de-identified. That can be useful for certain data users. At least have a better view of what's in the dataset. For some purposes, a de-identified version can be enough. Finally, a good point to keep in mind is if you do want to make the data accessible, plan for this in your consent forms, because coming back afterwards and trying to get consent is not easy. Another angle worth keeping in mind and that's something I've invited Jingbo to talk about more, is making data accessible. It can be through various roots and various protocols. In some cases it doesn't make sense to have a large dataset available through download. In some cases it can make much more sense to have services over the data which allow the users to interrogate parts of the data, pull in parts of that data that a much more specific and much - and answer their requests. That can be for a human being, but especially for a machine, that can be extremely useful. One thing to keep in mind there, you need some sort of community agreed standards around that. But Jingbo is going to talk much more about that. So that was all from a much more theoretical perspective. I'm very grateful that I have two speakers today to talk about accessible in practice and how they've actually tackled making data accessible. The first speaker for today is David, David Fitzgerald. He is the data manager at the Australian Longitudinal Study of Women's Health. I'm very grateful that David is available to talk about what ALSWH has done to make quite sensitive data still accessible for others to reuse. David is on the line and I would like to hand over to David and then David can talk about how the - how in the Australian Longitudinal Study of Women's Health they have made data accessible. David Fitzgerald: Thank you Keith. Okay. I am David Fitzgerald, the data manager for ALSWH, that's how I pronounce it, the Australian Longitudinal Study
  • 6. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 6 of 16 on Women's Health. I'll be talking about the accessibility issues for this. I'm going to first of all explain and give background to our study and then talk about the accessibility issues and try and relate them to the FAIR data principles, which I've just listed here. These are these act ones, which Keith showed earlier. I won't go through them in detail. But I'll try and relate these to our study. Okay, so what is the ALSWH study? It's a collaborative effort, project from the two universities of Newcastle and Queensland. In fact, the two universities there, related to keeping the sensitive data, which I'll talk about briefly. It's one of Australia's longest running longitudinal epidemiological studies. It's been going since 1996 and is ongoing. We hope to go further into the future, funded by the Australian Government. We started off with over 40,000 women and a few years ago we got a new cohort of 17,000 women. I'll show you the four cohorts we work with. Here they are. The four cohorts are aged based and we define them in the years of birth. You can see there is one - the oldest one born 1921 to 1926 and there are three other ones of various ages. As you can imagine, each cohort has their own health issues and that's what we're interested in and indeed, the Australian Government is interested in. What are we collecting and our methodology, so health issues, in particular mental, physical, reproductive, social health. There is more and also life transitions, the different ages of women obviously going through different life transitions, life events and things which are related to health and employment, health service use and more. I'll just mention a bit of data linkage. I don't want to stress this because it's a big area with lots of issues. But we have actually linked our survey data with some administrative datasets. In fact, they're listed there. The NBS, PBS and Cancer Registries and admitted patient hospital. The linkage is particularly sensitive and we treat them quite differently in how we make the data accessible.
  • 7. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 7 of 16 The data is used extensively and in particularly more than 680 peer review papers have been published using our data and also we report back to the Government frequently and national health policies have been informed by reports and use of our data. I'll go on to the aspects of accessibility and to see how it relates to our data. So that one there about being retrievable by an identifier using standard communications protocol. All the datasets from our survey which are analysed and are used have an identifier, the same identifier and I'll just stress here, it's de- identified but with a consistent new identifier. That's across all surveys. Anyone using our survey data - I'll just put the caveat. As long as it's not part of the linked data. But anyone using this survey data has one and only one identifier for use. We say this has been de-identified because there are no personal names on the data. No addresses. No postcodes. No dates of birth, although the year and month of birth are actually given, obviously to do things like age analysis. Any - they're the main ones. But any other data which is deemed to be identifiable is stripped off. The identifier is - we call it the ID alias. It's actually not the administrative ID, which a respondent would see or somebody working in an office in Newcastle who is communicating with our respondents. They would not know what the identifier - the analysable identifier is. They would have a different administrative ID. Just on this point. Any small cell sizes which we think are identifiable are grouped into larger groups. For example, country of birth we group into broad continental, geographical areas to avoid particular countries of birth coming up. Anyone using the data has to - along with a number of other conditions – they must not identify a respondent, which although we go to lengths to make that very difficult. It's conceivable that something could come up.
  • 8. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 8 of 16 But they promise and sign that they will not identify respondents if they ever have that possibility. I was also asked to look at legal and ethical issues. We do have a legal contract with the Australian Government Department of Health. The fact that this is ongoing and we didn't get a 20 year one. They are regularly updated and short term contracts. Also, the ethics committees from the two universities there have approved usage. In fact, every time we do a new survey, because it's longitudinal, every year we're actually going back to at least one of the cohorts to survey them. Each new survey which is not identical to previous surveys is subject to ethics committee oversight and approval. We do have extensive legal and ethical issues there. I want to talk about how an investigator or a re-user would get access our survey data. They - and as we explained, this is all on the website. But they must first complete an expression of interest form and particular they would say who they are, why they are a serious researcher, what they want to find out from the data. That would be reviewed by our publications sub-studies, that's the BSA committee. Then if their EOI, expression of interest, is approved, they will sign the confidentiality data use documents, statements, before receiving the de-identified data. They also must report back to us about their progress and we expect some sort of - some immediate work on the data and for them to continue with that access. But if their expression of interest is successful, the data are sent to them and this is an area which I'm directly involved in. We do, before sending it out, encrypt it. We use 7z software and that's compressed as well. We use the AARNet CloudStor system to send data to the approved researchers, reusers and an email was sent to them as well with passwords, but also to establish contact with the management here, for future correspondence. I just put a note there about we have linked data, but we never send this out. Anyone using this has to come to our offices.
  • 9. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 9 of 16 Or there is the Sax Institute shore facility which also can have it. But we don't own the linked data. We have agreed not to send it anywhere. Public metadata, this refers back to protocol being open. We have a website which lists the above procedure, in fact, that I went through. But also has a lot of metadata on it, including a data dictionary which lists all the variables and the many datasets we have, a data dictionary supplement, which is a description of the frequently used variables with some detail, a data map that shows how the variables are used across the different surveys and cohorts. When I say different surveys, the longitudinal, we have up to eight surveys for some of our cohorts. Each one is deemed a different survey and has slight differences from other surveys. We have a list of all the variables used and spreadsheets for easy access. We also have data books which list the frequency summaries of the variables. The questionnaires that the respondents filled in, technical reports which we produce that go into detail on many of our reports and a frequently asked question page on exactly that. So, making metadata accessible. In fact, we make data - although our data is not completely open, we do want to make it accessible. We do archive both the metadata and the data and we do that annually with the Australian Data Archives. Although they are not releasing it yet, the plan is in the future for them to take over release of our data, perhaps when we're not doing it ourselves. That will be a role to keep our data useful and used in the long term. That's what I've got to say. I'd just like to acknowledge the women in our study who fill in the surveys and of course the Government Department of Health for funding us and the Universities of Queensland and New South Wales for doing the job. Thank you. That's what I have to say. Keith Russell: Thank you David. Thanks, just really interesting presentation. Interesting to hear how you've made data accessible in practice and what it means to make sensitive data accessible to researchers.
  • 10. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 10 of 16 Thanks for that perspective and thanks for that view on how quite sensitive data can still be made accessible through various roots. I think it's really interesting to hear that you both have the root of de- identified data through appropriate roots, but also linked data. So a much richer version, but then through either Sax institute shore facility or through coming to the ALS,WH The Australian Longitudinal Study of Women's Health. I've got to work on that one. Thanks. Okay, I would like to now move onto Jingbo. Jingbo has got a - I've asked Jingbo to talk also about making data accessible through a very different perspective. Jingbo works at NCI and they're - NCI does all sorts of elements around making data findable, accessible, interoperable, reusable. Today I've asked Jingbo to focus on the accessible side of things. But I do want to note that NCI also does a whole bunch of other things in this space. Jingbo Wang: Thank you Keith. I think I will just turn off my camera, because I can't see my presentation. My name is Jingbo Wang. I work at National Computational Infrastructure, which is a super computer centre located in Australian National University campus. Today I'm going to address different flavour of data accessibility practice at NCI. Before I go further, I just want to make a comment that FAIR principle is quite useful to govern our data management practice. We use it a lot in every single aspect in our data management. This is a quick overview of the dataset we have. As you can see, I've listed here the main data type that we store at NCI are national collections about climate models, satellite images with bathymetry elevation, hydrology, geophysics. Those data are quite geospatial focussed. But we also have other social science data and genomics sequencing data and astronomy data. We aim to provide a user with data as a service, as many digital repositories will do. In our data management, we catalogue data so that people can query the metadata database to find what we have here. We also publish data through various data services. That's a focus I'm going to talk about in the next few slides.
  • 11. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 11 of 16 We offer data quality assurance, data quality control and benchmarking use cases. We provide data through virtual laboratories. We also provide help on data visualisation. If I wanted to make something that we are different from other different repositories because we are co-located with HPC facility, high performance computing. Given the nature of our large scale of the data, we host more than 10 petabyte research data. We really want to make good use of the high-performance computing here to advance science research. This is the six dot points that I wanted to address today about data access. I put the red colour words to show the difference for each point. Initially, I will talk about the - how do we control the data access and then I'm going to present one example of how do we use process in identifying to manage data access. Then I will talk about two main data services that we offer at NCI for our users when there are threats, when the other one is GSKY, which is a more fancy and scalable distributed data server. Finally, I'm going to cover very quickly about the data versioning and the quality of the data. The first point is about how do we control the data access. Most of our data are coming from our stakeholders, such as Geoscience Australia, the Bureau of Metrology, CSIRO, universities. Many data has been funded by Australian Government, so it naturally falls into CC BY 4.0 licence. Some owners also impose that the data should be non-commercial, non-derivative or share a like type of CC BY. We also have international partners such as in the European and US and they impose even strict terms and conditions if people wanted to access the data. This is the legal perspective about how do we control the licence, data access through licences. On the file system we hardcoded the data access control using echoes. This is a way how do we separate different groups of people accessing the same data. We have - basically for each collection, we
  • 12. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 12 of 16 have two access groups. The first group has a read and write permission, which means those are data managers who are able to generate data, write that and modify data. The second group is read only group. For those people who are in the read only group, they can access the data on the file system. But they can't really modify anything. This way we actually protect the integrity of the data. We only give access to an authorised person, who really can manage the data. There is also a social aspect of data access. For a research project, we often see the embargo period that maybe after two years of the project the data can be made available. Also, some researchers say I want to share my data after my journal article about this dataset is published. Another example is from the Bureau of Meteorology. We have a data that where there is a six months' time delay between the data is being developed, verified until it is being operational, available on our THREDDS server. The second point I wanted to raise is our practice about implementing a process identifier. Often we experience some frustration about when we give people the URL to access the data it is only valid for a certain period of time or only valid during the time that somebody can maintain it. Afterwards, we can't really guarantee and also the URL, the original URL, if you look at on the left-hand side of the slides. Those are the metadata catalogue URL or service endpoint URL. Let's look at the second one, which is service endpoint. From this URL [unclear], you can tell the later part which includes the project code, file path, file name. Anything in this path, for example, project code change off - you rename the file or we shuffle the file around and this link will be broken. The original URL that we provided here is not a very stable one. We adopt the product that the CSIRO developed some time ago, about a persistent identifier as a broker. We now - most of the time we give the external user the right-hand side, the name combination. As you
  • 13. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 13 of 16 can see, we have four main categories after a pid.nci.org.au. We have dataset, we have services. We have documentation and we have vocabularies. The only thing could be unique is the file identifier or UUID. It's basically - as long as the identifier keeps the same, the URL on the right-hand side is pretty consistent. If anything changed in the original URL on the left-hand side, what we need to do is update the mapping inside of the PID service broker without interrupting the URL that we give to the external user. We have the technical implementation published in the teachers' science journals, so you are welcome to have a look. Now I'm going to talk about the main data services that Keith really wanted me to address from NCIS' perspective. I divided our type of data service into two main groups. One is the OGC services. I'm going to talk more about what is OGC in a second. The other type of data services is more project specific, such as we are one of the largest node in the southern hemisphere as part of the Earth Systems Federation Grade which is the aggregation of climate model from Global Research Institute. The way we provide services is we copy the main of the data model to serve for Australian users. Another fancy data service I am going to show you a bit more is GSKY. It's a scalable data server that directly interacts with our file system. What is OGC? OCG is Open Geospatial Consortium. It is an international non profit organisation to make quality open standards for global geospatial community. We find OGC standard is quite useful for us because we have a lot of geospatial featured data. OGC have all sorts of standards for different types of mapping, future coverage processing for us to use. Because it's so common and it's free for people to use and if we made data available through OGC standards, a lot of people naturally can access our data. That's the motivation. What is OGC services? It's actually an API in the middle between the data store and the user.
  • 14. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 14 of 16 The user can request whatever available on OGC services. Let's say I want a map about the anomaly across whole Australian continental. NCI host this data. But we host the data. We don't host images. What's the OGC web services do is they actually extract the image and return back to the user. The user can take the URL, which contained the image of the data, put on their own web portal. For example, you can get the URL and copy and paste onto the national map to show the grades. NCI has two main production data type service. One is the THREDDS. You can often find the THREDDS available on our data catalogue. This is the interface of the GeoNetwork. The red circled link is the NCIS web server. You can open and click it. A second interface is a data catalogue. They more or less contain the same information, but serving for different purposes. GeoNetwork is mainly for data harvester, for machine accessible. The data catalogue is for human readable. THREDDS, in a very simple term, is it's data services which allow you to browse and access the data. I've listed here six main types of services that THREDDS offer, the very first two OPeNDAP and NetCDF is subset, sub-setting the data. We have a lot of very large data. But in practice when scientists access the data they don't necessarily have to access all the data. They might just need a very small piece of data from this big pool. What the THREDDS can offer is, you can define your query and only get the data, the part that you want. It's really saves a lot of traffic on the internet. The other two standard OGC web mapping services, the Web Coverage is very popular for people to access the mapping and coverage directly out of our data. Of course, THREDDS offer a very quick data viewer. If you don't know what this data is, you can have a quick look of what it is on the web, without downloading it. Of course, also the THREDDS offer the direct download, if you really want to download the data. Another
  • 15. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 15 of 16 fancy scalable distributor data server that I was talking about is called GSKY. GSKY is the in-house NCI developed product. What it does is we have a lot of data on a file system, millions and millions of files on the system. If we wanted people to query this data, how? It's going to be very harder to create millions of metadata records for every single file. What we've done is we use the crawler to crawl the file system, get the header of the file and formulate as a database, metadata database. Then the database will be a clear window for people to hand in the request. It gives me some images in the polygon at what - at some time. The metadata database actually include those essential geospatial information. It returns back to user of what they requested. We have published recently technical details of GSKY implementation. You're more than welcome to have a look. Keith Russell: Jingbo? Sorry, Jingbo. I think you're getting close to the end. I just wanted to ask you - there is only about a minute or two left, so if you could work towards the end that would be… Jingbo Wang: Sure, I'll quickly go through. The last two points will be version data. Again, because of the scale of the data, we can't really store every single step of the data. What we can do is we store the raw data and the final version and we keep the URI of the metadata in the middle step. In that way, the provenance information was kept and also saved the storage. The last point of the quality data is I would think some users say we can't really assume we can access data and data is flawless. By publishing data, aside with the quality report, we wanted to provide a data access with a certain type of assurance. We also have the publication that is going to be in place very soon. Thank you for your attention. That's our experiences so far about this access. Keith Russell: Thanks. Thanks Jingbo. That was really - a very quick overview of all the work you've been doing there around services and all the work you've been doing there about making data accessible, not only for
  • 16. transcript-fair-webinar-2-accessable06-06-2017-170929031312.doc Page 16 of 16 humans, but also for machines. First of all, I would like thank David and Jingbo again for providing an insight into what it means in practice in making data accessible from different perspectives. That was very interesting presentations. In case you are interested in learning more about making your data accessible and things you can think about there, this slide provides you some resources. The med.data data project has got a number of materials around sensitive data. There is a link here to the Australian Data Archive and the access conditions there. On the ANDS website have some materials on sensitive data. Another piece of work we're doing together with the community is looking at data services. This is the work that Jingbo also talked about, making sure that the services over the data are discoverable. There is an interest group working in this space. If you are interested in learning more about it and also engaging more around that, please follow the link and there is more information in there about that data services interest group. Last year we also did 23 things, research data things and two of the research data things are relevant to the topics discussed today. Have a look thing 10 and thing 19 if you want to learn more and also want to get your hands dirty and try out a little bit what it means to make data accessible. The link at the bottom is just a general link about the FAIR data principles on the ANDS website. This week we talked about accessible. Next week we are going to be talking about interoperable. Thank you all for your attention. Finally, I would like to acknowledge and thank first of all our speakers for today but I would also like to thank NCRIS National Collaborative Research Infrastructure Strategy Program for funding ANDS and making this all possible. Thank you all for your time and look forward to seeing you next week. END OF TRANSCRIPT