6. Research Data
"Research data is defined as recorded factual
material commonly retained by and accepted in
the scientific community as necessary to
validate research findings; although the majority
of such data is created in digital format,
all research data is included irrespective of the
format in which it is created."
https://www2.le.ac.uk/services/research-data/rdm/what-is-rdm/research-data
9. Fake Data, Fake Research
http://www.bbc.com/news/science-environment-39357819
10.
11. Open Science (incl. Data) Defined
“Open Science is the practice of science in such a
way that others can collaborate and contribute,
where research data, lab notes and other
research processes are freely available, under
terms that enable reuse, redistribution and
reproduction of the research and its
underlying data and methods.” - FOSTER Project,
funded by the European Commission
13. Open Notebook Science
“A laboratory
notebook (lab
notebook/lab book) is a
primary record of
research. Researchers use
a lab notebook to
document their
hypotheses, experiments
and initial analysis or
interpretation of these
experiments.”
https://en.wikipedia.org/wiki/Lab_notebook
15. Original Research Data Lifecycle image from University of California, Santa Cruz
http://guides.library.ucsc.edu/datamanagement/
Repositories
Repositories
Tools
Plan
Research Output
16.
17. Working with Data
• Using R, Python, ggplot and more ..
• Collection e.g. Survey
• Normalisation & Cleaning e.g. OpenRefine
• Analysis
• Visualisation
• Preservation
• Mining
21. Data Mining
• Set of methods to analyse data from various
dimensions and perspectives, finding previously
unknown hidden patterns, classifying and grouping
the data and summarizing the identified
relationships
The tasks of data mining are twofold:
• Create predictive power using features to predict
unknown or future values of the same or other
feature
• Create a descriptive power, find interesting,
human-interpretable patterns that describe the
data
26. Research Methodology
“It is a science of studying how research is to be
carried out. Essentially, the procedures by
which researchers go about their work of
describing, explaining and predicting
phenomena are called research methodology.
It is also defined as the study of methods by
which knowledge is gained.”
https://arxiv.org/pdf/physics/0601009.pdf
27.
28. Benefits of Open Research Data (1)
• Predict trends, help make informed decisions, informs
policy
• Collaboration advances science, discovery
• Drives development, improves livelihoods of citizens
of countries
• Increases return on investment (funders), avoid
duplication – research is expensive
• More and more entrepreneurs are using data in
innovative ways, creating more jobs which is much
needed on our continent
29. Benefits of Open Research Data (2)
• Helps improve service delivery e.g. mobile apps,
robots, artificial intelligence (AI)
• Provides evidence for research conducted
• Data potentially has far more outcomes when open,
with a higher impact
• Use for tenure/promotion/measure contribution of
researchers (data citation)
• Open data reduces redundancy
• And more …..
30. Fears Researchers Experience
• Getting scooped
• Time & effort by researcher
• Someone else finding a path-breaking application
of the data that researcher hasn’t considered
• Fear of problems/errors in the measurement
process being exposed
• Confidentiality/privacy of respondents - ethics
clearance
• Intellectual Property Rights – signed away, little
understanding, no IP in place
31. Protecting banana farmers’
livelihoods (Uganda)
Using maps to increase access
to education (Kenya)
Monitoring child malnutrition
(Uganda)
Research Data in Support of SDGs
35. Square Kilometre Array (SKA)
• Data collection on a massive scale
• Telescope array to consist of 250,000 radio
antennas between Australia & SA
• Investment in machine learning and artificial
intelligence software tools to enable data analysis
• 400+ engineers and technicians in infrastructure,
fibre optics, data collection
• Supercomputers to process data (IBM)
• To come: super computer 3x times power of
world’s current fastest computer (Tianhe-2) to cope
with SKA data
36. Testing Albert Einstein’s general theory of relativity; imaging
neutral hydrogen—the building blocks for stars – in the distant
universe; and examining galaxies that were formed billions of
years ago.
“Construction of the SKA is due to begin in 2018 and finish
sometime in the middle of the next decade. Data acquisition
will begin in 2020, requiring a level of processing power
and data management know-how that outstretches current
capabilities.
Astronomers estimate that the project will generate 35,000-
DVDs-worth of data every second. This is equivalent to “the
whole world wide web every day,” said Fanaroff.”
37. Data Activity 2: Ornithology
• Go to https://www.movebank.org/
• Browse Tracks
• Search studies that contain data sets for:
Hooded Vulture Africa
• Open in Studies Page.
• What can you do with the study & related data?
• Download the data.
• Sort according to ground speed.
• How many were spotted in Northern Kruger?
39. Data Stakeholders
• Governments (policy)
• Institutions (policy & strategy)
• Research Offices (reporting, impact)
• Researchers (collecting data in an ethical and trusted way
so that it can be re-used)
• Research Ethics Committees (safeguard the dignity, rights,
safety, and well-being of all trial participants)
• Statisticians (processing, analysing and visualising data)
• System engineers (to maintain a network and allow for
data to be digitally transmitted)
• Librarians (managing and organizing the data, and making
sure it is digitally preserved for the unforeseeable future)
40. Why Librarians as Data Partners?
• Information standards
• Organizational skills
• Setting up file structures (organizing
information)
• Knowledge of workflows
• Knowledge of collection management
• Describing data using established metadata
schemes & controlled vocabulary
• Collection curation/preservation
41. Data Skills for Librarians (1)
• Data terminology
• Unix-style command line interface, allowing librarians to
efficiently work with directories and files, and find and manipulate
data
• Cleaning and enhancing data in OpenRefine and spreadsheets
• Git version control system and the GitHub collaboration tool
• Web scraping and extracting data from websites
• Scientific writing in useful, powerful, and open mark-up
languages such as LaTeX, XML, and Markdown
• Formulating and managing citation data, publication lists, and
bibliographies in open formats such as BiBTeX, JSON, XML and
using open source reference management tools such as JabRef
and Zotero
42. Data Skills for Librarians (2)
• Transforming metadata documenting research outputs into open plain
text formats for easy reuse in research information systems in support of
funder compliance mandates and institutional reporting
• Scholarly identity with ORCiD and managing reputation with ORCiD-
enabled scholarly sharing platforms such as ScienceOpen
• Authorship, contributorship, and copyright ownership in collaborative
research projects
• Demonstrating best practices in attribution, acknowledgement, and
citation, particularly for non-traditional research outputs (software,
datasets)
• Identifying reputable Open Access publications and Open
Institutional/Open Data repositories
• Scholarly annotation and open peer review
• Investigating and managing copyright status of a work, and evaluating
conditions for Fair Use
43. Role of Librarians
• Initiating conversation on Open Science Open Data
Policy & Strategy - implement
• Develop own data skills (data skills but also
informed on copyright, licensing, citation)
• Advocate for transparency, openness in research,
access to data & provide support
• Recommend trusted data repositories
• Manage & register trusted data repositories
• Increase visibility of research data
• Promote & support proper research data
management planning among researchers
47. http://blogs.bl.uk/files/britishlibrarydatastrategyoutline.
pdf
Example: British Library Data Strategy
High-level plan to achieve one or more
goals under conditions of uncertainty
Where are you? Where do you want to
be? And how will you get there?
Data Management Planning, Data
Curation, Data Archiving & Preservation,
Data Access, Discovery and Reuse
48. Example: UCT Research Data
Management Policy
http://www.digitalservices.lib.uct.ac.za/sites/default/files
/image_tool/images/346/TGO_Policy_Research_Data_M
anagement_2018_V6.pdf
Introduction – Purpose Statement – Definitions
– Objectives of the Policy: Benefits of Data
Availability & Reuse – Scope of the Policy –
Criteria for Selection of Research Data –
Stakeholder Roles & Responsibilities – Provision
of Research Data Management Infrastructure –
Data Management Planning – Discovery &
Reuse – Recognition & Reward for Data
Providers – Monitoring & Reporting
Requirements – Related Policies
49. Open Science Open Data Policy
http://learn-rdm.eu/wp-
content/uploads/red_LEARN_Elements_of_the_Content_of_a_RDM_Policy.pdf
50. Job Description/Work
Agreement/KPAs
“developing a flexible curriculum on data management;
meeting with researchers in individual and group settings to
consult on projects, planning, and best practices; exploring
and piloting base-line services in curation practices and
techniques; and creating documentation and guidelines
related to scholars’ emerging data management needs. Other
activities may include ongoing assessment and monitoring of
researcher needs, proactive development of knowledge and
expertise in data management issues across disciplines and
domains, and advising researchers on how to meet the data
management and open data requirements of publishers and
federal funding agencies. This individual will be central to
efforts to design appropriate data repository and storage
infrastructure for researchers across the University.”
http://www.arl.org/component/jsfsubmit/showAttachment?tmpl=raw&id=00Pd00000
0FAxNkEAL
51. Business Plan
• How will the service be aligned &
implemented?
• Describe service
• How ill it be marketed?
• Financial forecasting
• Etc.
• Pilot with champions
• Budget
53. Self- & Lifelong Learning
• Bachelor of Science in Data Science, Sol Plaatje University
(South Africa)
• Coursera Data Science
• Coursera Research Data Management and Sharing*
• Foster Open Science Courses*
• MANTRA for Researchers
• MANTRA for Librarians*
• Author Carpentry
• Data Carpentry
• Library Carpentry
• WDS Training Resources
• UCT eResearch
66. Data Repositories vs Social Media
• Social media sites/3rd party software:
• Connect researchers sharing interests
• Marketing data
• Sites belong to third parties – and data
• Repository:
• Supports export/harvesting of metadata
• Offers long-term preservation
• Non-profit – no advertisements
• Uses open standards and protocols
• Copyright
68. “At Princeton we maintain several data collections
in our DataSpace instance. With the help of our
librarians we devised a custom submission form
tailored towards collecting metadata for data sets.
In addition we have best practice
recommendations, like: add a README file, stick to
formats commonly used in your discipline. The
library developed a Research Data Management
Guide with a section on file formats and data
organization.”
76. Register & Recommend Data
Repositories
• re3data.org
https://www.re3data.org/
• Open Data Barometer
https://opendatabarometer.org/
• Global Open Data Index
https://index.okfn.org/
• African Open Science Platform
http://africanopenscience.org.za/
• Dataverse …. And more …
77. Data Activity 3: Find Data Repositories
Find data repositories in a specific discipline and
list at:
https://tinyurl.com/ycx3q2mz
81. What is a Research Data Management
Plan (DMP)?
• Document that outlines what researcher will do
with data during & after research project
• Avoid duplication of effort, plan how to collect
data, address ethical issues, preserve data as
evidence & for re-use
• Comply with funder requirements
82. Types of data - What is the source of your data? In what formats are
your data? Will your data be fixed or will it change over time? How
much data will your project produce?
Contextual details (metadata) - How will you document and describe
your data?
Storage, backup and security - How and where will you store and
secure your data?
Provisions for protection/privacy - What privacy and confidentiality
issues must you address?
Policies for re-use - How may other researchers use your data?
Access and sharing - How will you provide access to your data by
other researchers? How will others discover your data?
Archiving and providing access - What are your plans for preserving
the data and providing long-term access?
89. African Open Science Platform (AOSP)
• Platform = opportunity to engage in dialogue,
create awareness, connect all, provide continental
view
• Funded by SA Dept. of Science & Technology
through National Research Foundation
• 3 years (1 Nov. 2016 – 31 Oct. 2019)
• Managed by Academy of Science of South Africa
(ASSAf)
• Through ASSAf hosting ICSU Regional Office for Africa
(ICSU ROA)
• Direction from CODATA
http://africanopenscience.org.za/
90. Accord on Open Data in a
Big Data World
• Proposes
comprehensive set of
principles
• FAIR Principles
• Data as open possible,
as closed necessary
• Provides framework &
plan for African data
science capacity
mobilization initiative –
AOSP
Call to Endorse
93. Phase 1 Deliverables
• Frameworks & Roadmaps
• Open Science & RDM Policy
• Open Science & RDM Research & ICT Infrastructure
• Open Science & RDM Incentives
• Open Science & RDM Capacity Building
• Library Framework
94. Rationale for a Library Framework
• Research is becoming increasingly data-driven
• There is a push towards science and research data
being open and accessible, to advance science in
support of the SDGs
• Librarians increasingly play a role in managing
research output through institutional research
repositories – in a FAIR way (findable, accessible,
interoperable, re-usable)
• In addition, research data on the increase must be
managed/curated in a trusted way, and librarians
have the necessary skills to add value – also to
remain relevant
96. Conclusion
Only if research and data are open and
democratized so that all can have equal access,
it would be possible to work towards achieving
the 2030 Sustainable Development Goals
Librarians to adapt service delivery to new way
of doing research (systemic changes), providing
data related support to researchers
97. Thank you
Ina Smith
Project Manager, African Open Science Platform Project, Academy of
Science of South Africa (ASSAf)
ina@assaf.org.za
Visit http://africanopenscience.org.za