Recombination DNA Technology (Nucleic Acid Hybridization )
Open science in RIKEN-KI doctorial course on March 20, 2019
1. Open Science
RIKEN Center for Integrative Medical Sciences (IMS)
Takeya Kasukawa, Unit Leader
takeya.kasukawa@riken.jp
RIKEN Center for Integrative Medical Sciences
2. Purpose of the lecture
• Learn the recent movement of “open science”
• Consider how you should manage your research processes
about data analysis based on “open science”
RIKEN-KI doctorial course 2
3. Open science
• What is “Open Science”?
• “Open science is the movement to make scientific research
(including publications, data, physical samples, and software) and
its dissemination accessible to all levels of an inquiring society,
amateur or professional.”, WikiPedia
• https://en.wikipedia.org/wiki/Open_science
• “It is, however, commonly referred to as an umbrella term covering
different aspects of research activities that are made more open
and given more potential, thanks primarily to the digital age.”,
RCOS web site
• https://rcos.nii.ac.jp/en/openscience/
• “Open Science is about extending the principles of openness to the
whole research cycle”, FOSTER web site
• https://www.fosteropenscience.eu/content/what-open-
science-introduction
RIKEN-KI doctorial course 3
4. Open science
RIKEN-KI doctorial course 4
Training /
Learning
Data analysis
Publications
Start of
research
Process of researches
Data production
5. Open science
RIKEN-KI doctorial course 5
Opened process of researches
Open educational
resources
Open dataOpen methodology
Open sources
Training /
Learning
Data analysis
Publications
Start of
research
Data production
Open access
Open peer-review
6. Open science
• The components of “open science”
• Open data
• Published data is distributed and can be reused in other studies
with little limitations
• Open source / open methodology
• Software and pipelines are freely accessible and reproducible
• Open access
• Peer-reviewed papers are accessible in Internet at free of cost
• Open peer review
• Peer reviewing process and reviewers are transparent
• Open educational resources
• Resources for teaching are freely available
• ..... Int. J. Technology Enhanced Learning, Vol. 3, No. 6, 2011
FOSTER, https://www.fosteropenscience.eu/resources
RIKEN-KI doctorial course 6
7. Open science
• What “open science” can achieve?
• Providing “evidences” of scientific findings to everyone
• More reproducibility for evaluation and validation
• More than 2/3 of studies cannot be reproducible
(Nature 500, 14–16 (01 August 2013) doi:10.1038/500014a)
• Reuse of published achievements and data
• More developments of industrial/medical applications
• New studies by reusing published data (i.e. data science)
• Visibility of research scientists
• More credits to scientists
• More chances for wider networking
• Transparency to society and tax payers
• More citizen joining to the science
• Filling various gaps among researchers
• More chances to learn what other researches were doing
• More resources obtained by other groups become available
RIKEN-KI doctorial course 7
9. Open data
• Open data: making “data” to be:
• accessible without any restrictions
• reusable for any purposes (in academia and industry)
• redistributable by anyone
• Declarations on open (research) data
• OECD Declaration on Access to Research Data from Public Funding
(2003)
• “Work towards the establishment of access regimes for digital
research data from public funding” – openness
• OECD Principles and Guidelines for Access to Research Data from
Public Funding (2007)
• G8 Science Ministers Statement (2013)
• “we approved a statement which proposes to the G8 for
consideration new areas for ..., open scientific research data,...”
RIKEN-KI doctorial course 9
10. Open data
RIKEN-KI doctorial course 10
Could you give your data?
OK, I will send raw data files.
Question: Is this enough for open data?
11. Open data
• Its answer is usually “No”.
• How the data is described?
• data format, relationships among files, ...
• How the data is produced?
• experimental design, sample information, experimental
protocols, data processing methods, ...
• How the data (or components) can be uniquely identified?
• unique identifiers, reference to other resources
• How the data can be obtained in the long term?
• repository
RIKEN-KI doctorial course 11
12. Open data
• How data should be opened?
• FAIR principle for general scientific data
• MIBBI for biological data
RIKEN-KI doctorial course 12
13. Open data - guidelines
• FAIR data principle
• The principles to share scientific data
• Findable, Accessible, Interoperable and Reusable
• Published in 2016:
• “The FAIR Guiding Principles for scientific data management
and stewardship”, Scientific Data 3, 160018 (2016)
• https://www.force11.org/group/fairgroup/fairprinciples
RIKEN-KI doctorial course 13
14. Open data - guidelines
• TO BE FOUNDABLE
• F1. (meta)data are assigned a globally unique and persistent identifier
• F2. data are described with rich metadata (defined by R1 below)
• F3. metadata clearly and explicitly include the identifier of the data it describes
• F4. (meta)data are registered or indexed in a searchable resource
• TO BE ACCESSIBLE:
• A1 (meta)data are retrievable by their identifier using a standardized communications protocol.
• A1.1 the protocol is open, free, and universally implementable.
• A1.2 the protocol allows for an authentication and authorization procedure, where necessary.
• A2 metadata are accessible, even when the data are no longer available.
• TO BE INTEROPERABLE:
• I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge
representation.
• I2. (meta)data use vocabularies that follow FAIR principles.
• I3. (meta)data include qualified references to other (meta)data.
• TO BE RE-USABLE:
• R1. meta(data) have a plurality of accurate and relevant attributes.
• R1.1. (meta)data are released with a clear and accessible data usage license.
• R1.2. (meta)data are associated with their provenance.
• R1.3. (meta)data meet domain-relevant community standards.
RIKEN-KI doctorial course 14
15. Open data - guidelines
• MIBBI (Minimum Information for Biological and
Biomedical Investigations)
• https://fairsharing.org/collection/MIBBI
• Defining “minimum” information for various biological experiments
• 39 standard are available (as of March 2019)
• MINISEQE - Minimal Information about a high throughput
SEQuencing Experiment
1. The description of the biological system, samples, and the
experimental variables being studied:
2. The sequence read data for each assay:
3. The ‘final’ processed (or summary) data for the set of assays
in the study:
4. General information about the experiment and sample-data
relationships:
5. Essential experimental and data processing protocols:
RIKEN-KI doctorial course 15
17. Open data - metadata
• Metadata format
RIKEN-KI doctorial course 17
ISA model (http://isa-tools.org/)MAGE-TAB (http://fged.org/)
https://www.ddbj.nig.ac.jp/gea/metadata.html
https://isa-tools.org/format/specification.html
18. Open data - ontology
• Ontology
• “An ontology is a formal representation of a body of knowledge
within a given domain. Ontologies usually consist of a set of classes
(or terms or concepts) with relations that operate between them.”
• http://geneontology.org/docs/ontology-documentation/
RIKEN-KI doctorial course 18
term (brain)
term (neuron)
term (dopaminergic
neuron)
relationship (is-a)
relationship (part-of)
19. Open data - ontology
• Ontology
• By using ontologies, we are possible:
• controlling vocabularies based on terms in ontologies
• classification of data based on the associated terms in
ontologies
RIKEN-KI doctorial course 19
ID sample
A neural cell
B neuron
C neuronal cells
D nerve cell sample
ID sample by cell ontology
A CL:0000540 (neuron)
B CL:0000540 (neuron)
C CL:0000540 (neuron)
D CL:0000540 (neuron)
20. Open data - ontology
• Gene Ontology (http://geneontology.org/)
• “The Gene Ontology resource provides a computational
representation of our current scientific knowledge about the
functions of genes (or, more properly, the protein and non-coding
RNA molecules produced by genes) from many different organisms,
from humans to bacteria.”
• http://geneontology.org/docs/introduction-to-go-resource/
RIKEN-KI doctorial course 20
Example of Gene Ontology
Ashburner M et al., Nat Genet, 2000,
doi:10.1038/75556
21. Open data - ontology
• More ontology data
• cell types (CL)
• anatomy (UBERON)
• organisms (NCBI Taxonomy)
• sequence annotation (SO)
• and so on ....
RIKEN-KI doctorial course 21
The OBO Foundry
http://obofoundry.org/
NCBO BioPortal
https://bioportal.bioontology.org/
22. Open data – data identification
• (Unique) identifiers
• URI – Universal Resource Identifier
• An “address” of a resource in the web (or Internet)
• e.g. http://fantom.gsc.riken.jp/5/datafiles/
• DOI – Digital Object Identifier
• Managed by International DOI Foundation (IDF).
• Persistent identifiers assigned to digital objects, which can
solve the issue of URI (URL).
• URI can be invalid or changed when a web service is closed
or moved.
• e.g. 10.18908/lsdba.nbdc01389-000.V002
• Public repository ID (or accession number)
• Identifiers in an (established) public repository
• e.g. DRA000991 in the INSDC SRA repository
RIKEN-KI doctorial course 22
23. Open data - repositories
• Public repositories for specific data
• INSDC (International Nucleotide Sequence Database Collaboration)
• Repositories of sequence data
• NCBI (U.S.), ENA/EBI (Europe), DDBJ (Japan)
• Exchanging data or metadata among the repositories
• Sequence data used in any articles must be deposited to the
INSDC repositories
RIKEN-KI doctorial course 23
https://www.ddbj.nig.ac.jp/insdc.html
24. Open data - repositories
• Repositories for general data
RIKEN-KI doctorial course 24
figshare (http://figshare.com/) Dryad (http://datadryad.org/)
zenodo (http://zenodo.org/) Mendeley (https://data.mendeley.com/)
25. Open data
• How your data should be produced to be useful for other
researchers? (some tips)
• Format data files in the standard way
• e.g. Sequence data: FASTQ format
• Design and prepare adequate metadata with the controlled
vocabularies including ontologies
• Assign proper identifiers to any entities in the dataset and proper
names to any files
• e.g. dataset ID, sample ID, assay ID
• e.g. rules of file names
• Keep records relationships among any IDs and data file
• Build “data management plan (DMP)” before data collection, if
possible
RIKEN-KI doctorial course 25
26. Open data
• Data Management Plan (DMP)
• NSF guidance for biological sciences (https://www.nsf.gov/bio/biodmp.jsp)
• Types of data, physical samples or collections, software, curriculum
materials, and other materials to be produced
• The standards and formats of data and metadata
• Roles and responsibilities with respect to the management of the data
• Dissemination methods that will be used to make the data and metadata
available to others
• Policies for data sharing, public access and re-use, including re-distribution
by others and the production of derivatives. Where appropriate, include
provisions for protection of privacy, confidentiality, security, intellectual
property rights and other rights.
• Plans for archiving data, samples, and other research products. Consider
which data (or research products) will be deposited for long-term access
and where.
RIKEN-KI doctorial course 26
28. Open data
• Exaple: FANTOM5 – metadata (samples in the SDRF format)
RIKEN-KI doctorial course 28
Source Name Charateristics [ff_ontology] Charateristics [description] Characteristics [catalog_id]Characteristics [Category]Chracteristics [Species] Characteristics [Se
10000-101A1 FF:10000-101A1 Clontech Human Universal Reference Total RNA, pool1 9052727A tissues Human (Homo sapiens) mixed
10002-101A5 FF:10002-101A5 SABiosciences XpressRef Human Universal Total RNA, pool1B208251 tissues Human (Homo sapiens) mixed
10007-101B4 FF:10007-101B4 Universal RNA - Human Normal Tissues Biochain, pool1 B208251 tissues Human (Homo sapiens) mixed
10010-101C1 FF:10010-101C1 adipose tissue, adult, pool1 0910061 -1tissues Human (Homo sapiens) mixed
10011-101C2 FF:10011-101C2 bladder, adult, pool1 0910061 -2tissues Human (Homo sapiens) mixed
10012-101C3 FF:10012-101C3 brain, adult, pool1 0910061 -3tissues Human (Homo sapiens) mixed
10013-101C4 FF:10013-101C4 cervix, adult, pool1 0910061 -4tissues Human (Homo sapiens) female
10014-101C5 FF:10014-101C5 colon, adult, pool1 0910061 -5tissues Human (Homo sapiens) mixed
10015-101C6 FF:10015-101C6 esophagus, adult, pool1 0910061 -6tissues Human (Homo sapiens) mixed
10016-101C7 FF:10016-101C7 heart, adult, pool1 0910061 -7tissues Human (Homo sapiens) mixed
10017-101C8 FF:10017-101C8 kidney, adult, pool1 0910061 -8tissues Human (Homo sapiens) female
10018-101C9 FF:10018-101C9 liver, adult, pool1 0910061 -9tissues Human (Homo sapiens) mixed
10019-101D1 FF:10019-101D1 lung, adult, pool1 0910061 -10tissues Human (Homo sapiens) mixed
10020-101D2 FF:10020-101D2 ovary, adult, pool1 0910061 -11tissues Human (Homo sapiens) female
10021-101D3 FF:10021-101D3 placenta, adult, pool1 0910061 -12tissues Human (Homo sapiens) female
10022-101D4 FF:10022-101D4 prostate, adult, pool1 0910061 -13tissues Human (Homo sapiens) male
10023-101D5 FF:10023-101D5 skeletal muscle, adult, pool1 0910061 -14tissues Human (Homo sapiens) mixed
10024-101D6 FF:10024-101D6 small intestine, adult, pool1 0910061 -15tissues Human (Homo sapiens) mixed
10025-101D7 FF:10025-101D7 spleen, adult, pool1 0910061 -16tissues Human (Homo sapiens) male
10026-101D8 FF:10026-101D8 testis, adult, pool1 0910061 -17tissues Human (Homo sapiens) male
10027-101D9 FF:10027-101D9 thymus, adult, pool1 0910061 -18tissues Human (Homo sapiens) male
10028-101E1 FF:10028-101E1 thyroid, adult, pool1 0910061 -19tissues Human (Homo sapiens) mixed
10029-101E2 FF:10029-101E2 trachea, adult, pool1 0910061 -20tissues Human (Homo sapiens) mixed
10030-101E3 FF:10030-101E3 retina, adult, pool1 9100123A tissues Human (Homo sapiens) mixed
29. Open data
• Data journals
• Recently, several journals focusing on data have been launched.
• The journals publish “data descriptors” about data with its detailed
explanation (methods, validation, usage, etc.).
• Examples of data journals:
• Scientific Data (https://www.nature.com/sdata/)
• F1000Research (https://f1000research.com/)
• GigaScience (https://academic.oup.com/gigascience/)
• Data in brief (https://www.journals.elsevier.com/data-in-brief/)
• Format (e.g. Scientific Data)
• Background & Summary
• Methods
• Data Records
• Technical Validation
• Usage Notes
• Machine-readable metadata (ISA-Tab)
RIKEN-KI doctorial course 29
31. Open source / methodology
• Open methodology: enabling anyone to know and access to
all research processes (methods, protocols,
computations, ...)
• by documentation
• by sharing notebooks or records
• by opening software codes (as an open source software)
• Why open methodology is required?
• For reproducibility and replicability of studies
• For reusability of methods to other researches
• For education
RIKEN-KI doctorial course 31
32. Open source / methodology
• “Reproducibility crisis”
• In 2012, Amgen researchers made headlines when they declared
that they had been unable to reproduce the findings in 47 of 53
'landmark' cancer papers (Nature news, 2016,
doi:10.1038/nature.2016.19269)
• “More than 70% of researchers have tried and failed to reproduce
another scientist's experiments, and more than half have failed to
reproduce their own experiments” (Nature news, 2015,
doi:10.1038/533452a)
• and many reports to indicate “low reproducibility of published
articles”
RIKEN-KI doctorial course 32
33. Open source / methodology
• What you can do (for bioinformatics analysis)?
• Keep records what you have performed for analysis
• Jupyter Notebook, R Markdown
• Open your source codes if you developed your own software
• Github
• Provide the same computational environment that others can do
the same analysis that you did
• Vitrual machine, container
RIKEN-KI doctorial course 33
34. Open source / methodology - records
• Jupyter Notebook (https://jupyter.org/)
• a web application that you can write your codes with its outputs
including texts, values, and graphs.
• Jupyter Notebook supports over 40 programming languages
• Python, R, Julia, ruby, perl, octave, matlab, and so on.
RIKEN-KI doctorial course 34
35. Open source / methodology - records
RIKEN-KI doctorial course 35
https://hub.mybinder.org/user/binder-examples-r-dqbolkur/
notebooks/index.ipynb
a chunk of R code
output by the R code
An example of a notebook in R
36. Open source / methodology - records
• R Markdown
• a format to write a document with R codes and outputs (based on
Markdown)
• R Markdown text can be formatted to HTML, PDF, Word, and so on.
RIKEN-KI doctorial course 36
---
title: "R test“
author: "Takeya Kasukawa“
date: "2019/3/13“
---
# Example
This is an example of R Markdown.
```{r}
quantile(rnorm(10000,0,1))
```
formatting
37. Open source / methodology - sources
• Github (http://www.github.org/)
• You can develop and host your source codes.
RIKEN-KI doctorial course 37
Make a new repository
for your source code
The site for your repository.
You can put source codes and
write your pages.
39. Open source / methodology - environment
• Container
• make an image (container) of an software with its depending
libraries and programs.
RIKEN-KI doctorial course 39
https://www.docker.com/resources/what-container
40. Open source / methodology
• More tips for your data analysis
• Write scripts rather than typing commands
• You cannot remember what you did, forever
• Manage the versions or modified dates of all your scripts and
programs
• Simple idea: add date to your script file name
• Sophisticated idea: using version control system (git,
subversion, ...)
• Keep the dates or versions of all scripts/programs used in each
analysis
• This is necessary to setup the same environment for
reproducing your analysis
• This information is also necessary to write your paper
RIKEN-KI doctorial course 40
42. Open access
• Closed access articles
• (Gold) open access articles
RIKEN-KI doctorial course 42
publisherauthor reader
publisherauthor reader
subscription fee
APC: article
processing charge free access
some fees
may be required
43. Open access
• (Green) open access articles
RIKEN-KI doctorial course 43
publisherauthor reader
self-archiving
(e.g. institutional repository) reader
free access
usually after
several months
subscription fee
44. Open access
• Open access journals
• Journals only for open access articles
• ~13,000 journals (according to https://www.doaj.org/)
• PLoS journals
• BioMed Central journals
• Nature Communications
• Science Advances and so on.
• Open access options in regular journals
• In some journals, author can choose either “closed access” or “open
access” option with the APC payment (hybrid journal)
• Some journals make articles to open access and/or allow to post
the published peer-reviewed to an open repository after an
embargo period (e.g. 6 months) (delayed journal)
RIKEN-KI doctorial course 44
45. Open access
• NIH Public Access Policy
• The peer-reviewed article funded by NIH are required to be made
publicly available in “PubMed Central” no later than 12 months
after publication.
• “PubMed Central® (PMC) is a free full-text archive of
biomedical and life sciences journal literature at the U.S.
National Institutes of Health's National Library of Medicine
(NIH/NLM).” (in the PubMed Central web site)
RIKEN-KI doctorial course 45
https://publicaccess.nih.gov/
46. Open access
• Welcome trust, UK
• “require electronic copies of any research papers that have been
accepted for publication in a peer-reviewed journal, and are
supported in whole or in part by Wellcome Trust funding, to be
made available through PubMed Central (PMC) and Europe PMC as
soon as possible and in any event within six months of the journal
publisher's official date of final publication”
• https://wellcome.ac.uk/funding/guidance/open-access-policy
• cOAlition S, EU
• “requires that, from 2020, scientific publications that result from
research funded by public grants must be published in compliant
Open Access journals or platforms.”
• https://www.coalition-s.org/
RIKEN-KI doctorial course 46
49. Open peer-review
• Standard reviewing process (blind review)
RIKEN-KI doctorial course 49
anonymous
reviewers
journal
editor
author
reviewing
request
review
comments
manuscript
submission
review
results
invisible from outside
50. Open peer-review
• Open pre-publication peer-review
• Open reviewers’ names, comments and responses by authors after
publication
• e.g. several BMC journals
• Open all processes during reviewing
• e.g. F1000Research
• Open post-publication peer-review
• Reviews and comments on the journal site
• e.g. PLoS One, Scientific Reports
• Reviews and comments on independent sites
• e.g. PubPeer, Publons
RIKEN-KI doctorial course 50
51. Open peer-review – open pre-publication peer-review
RIKEN-KI doctorial course 51
https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-019-1251-7
52. Open peer-review – open pre-publication peer-review
RIKEN-KI doctorial course 52
53. Open peer-review – open pre-publication peer-review
RIKEN-KI doctorial course 53
https://f1000research.com/about
54. Open peer-review – open pre-publication peer-review
RIKEN-KI doctorial course 54
https://f1000research.com/articles/7-1352/v2
55. Open peer-review – open post-publication peer-review
RIKEN-KI doctorial course 55
https://www.nature.com/articles/s41598-017-13282-7
56. Open peer-review – open post-publication peer-review
RIKEN-KI doctorial course 56
https://pubpeer.com/publications/B02C5ED24DB280ABD0FCC59B872D04#278
57. Open peer-review
• Recent topics
• Some journals review only scientific validity of manuscripts. Impact
of articles should be evaluated in communities.
• e.g. PLoS One, Scientific reports, ...
• Acknowledging reviewing activities
• Publons (http://publons.com/)
• Users can show a verified list of reviewing activities
• Reviewing activities can be used for their evaluation and
appeals
RIKEN-KI doctorial course 57
59. Open educational resources
• Open educational resource: “teaching, learning and research
materials in any medium – digital or otherwise – that reside in the
public domain or have been released under an open license that
permits no-cost access, use, adaptation and redistribution by
others with no or limited restrictions”, UNESCO
• https://en.unesco.org/themes/building-knowledge-societies/oer
• In this lecture, we take care various web resources to share and
obtain educational resources for biology and bioinformatics.
• If you want to learn more about “open educational resources”
• UNESCO web site
• https://en.unesco.org/themes/building-knowledge-societies/oer
• OER commons
• https://www.oercommons.org/
• Open Educational Consortium
• https://www.oeconsortium.org/
RIKEN-KI doctorial course 59
60. Open Educational Resources
• JoVE (http://www.jove.com/)
• Journal of Visualized Experiments)
• Peer-reviewed video articles for scientific protocols
RIKEN-KI doctorial course 60
61. Open educational resources
• Learn bioinformatics -- an online resource guide
• https://github.com/smangul1/online.bioinformatics/wiki
RIKEN-KI doctorial course 61
62. Open educational resources
• TogoTV (http://togotv.dbcls.jp/)
• Video tutorials for bioinformatics resources
• Originally in Japanese, but English videos are also available
RIKEN-KI doctorial course 62
63. Open educational resources
• SlideShare (https://www.slideshare.net/)
• You can open your presentation slides
RIKEN-KI doctorial course 63
65. Summary
• Open science – open the overall research processes
• Data production – open data
• Data processing / analysis – open source/methodology
• Reviewing – open peer review
• Article publishing – open access
• Training/education – open educational resource
• Although the open science may be sometimes hard to follow,
manners and tips used in the open science is quite useful for
your research and analysis.
RIKEN-KI doctorial course 65