London Open Data Meetup - March 2014
The Open Knowledge Foundation
Wednesday, March 5, 2014, London, United Kingdom
http://www.eventbrite.co.uk/e/london-open-data-meetup-march-2014-tickets-10574052275
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
sciforge - Publication and Citation of Scientific Software with Persistent Identifiers
1. Publication and Citation of Scientific Software
with Persistent Identifiers
London Open Data Meetup - The Open Knowledge Foundation
Wednesday, March 5, 2014, London, United Kingdom
Martin Hammitzsch, Helmholtz Centre Potsdam, GFZ German Research Centre for Geosciences
3. Why?
Software development in general is not perceived as a scientific
achievement, similar to the situation of research data years ago.
However, the development of software accounts for an increasingly
prominent space in research, especially in natural sciences
software has become an indispensable commodity.
Software has become an integral part of science,
yet software is not properly integrated into the
scientific discourse.
4. The missing link
“Establish the missing link between papers and data publications.”
• Findings, papers, data … and software?
– Data already is professionally published, either with papers or self-contained
– Not standard practice with the related software
– Findings are not only based on raw data, they are also based on data obtained in
analyses most likely supported by software
• Software is the link between the findings presented in papers
and the data the findings are based on.
– Software used to gain findings play a crucial role in the scientific work
– However, software is rarely seen publishable in terms of scientific publications
– Researchers may not reproduce the findings without the software which is in conflict
with the principle of reproducibility in natural sciences
• The provision of software lacks solutions serving researchers’
needs.
– Software publications would fix the missing link between data and papers of findings
– Software publications would foster their interplay
5. Scientific achievement
“Make software recognized as scientific achievement.”
• Disciplinary journals require that articles discuss scientific problems.
–
–
–
Software is often seen only as a contribution to the solution of a question or problem
Software is not perceived as an independent contribution to science
Authors of software must first find a question to motivate the publication in a desired journal
• A direct release of software in kind of scientific publications is not
possible.
–
Scientific achievements of software and its contributions to sciences are poorly perceived and
hardly measurable
• The resulting gap in interdisciplinary communication regarding
scientific software might be closed by software publications.
–
–
–
It requires common understanding of how to handle scientific software with defined processes
It requires commonly accepted and adopted metrics
Thus software could be valued and assessed as a contribution to science
6. Open science
“Leverage open access and open science.”
•
Scientific software development often implies that the software and code is
not written for others to use.
–
–
–
Code is kept and maintained on own computers and servers
If the code grows or groups work together code repositories and version control systems are set up
In many cases these systems are available for internal use, usually not reachable from the outside
•
Reuse mainly happens informally or anonymously, even in sciences.
•
For cooperation and reuse of software, there is already a number of
software platforms
Scientists use existing software and code from open source software repositories
Only few contribute their code back into the repositories
–
–
–
•
–
–
SourceForge and GitHub are used already by scientists
Platforms fulfill partly scientific needs to serve software and code as part of the scientific tradition
It is unclear, if these platforms can be augmented for scientific purposes or whether special
repositories must be created
Subsequent users have to be able to run the code
–
–
It requires the provision of sufficient documentation, sample data sets, tests and comments which in
turn can be proven by adequate and qualified reviews
This assumes that scientist learn to write and release code and software as they learn to write and
publish papers
7. Best practices
“Establish standard software engineering rules, best practices and processes in science. ”
•
The treatment of source code is associated with additional work that is not
covered in the primary research task.
–
–
–
•
Adoption of software engineering rules and best practices have to be
recognized and accepted as part of the scientific performance.
–
–
–
–
•
Most scientists have little incentive to improve code
They do not publish code either with their papers or self-contained
Software engineering habits are rarely practised by faculty and research facility staff, postdocs,
doctoral and graduate students and thus undergraduate students
Software engineering skills are not passed on to followers as for paper writing skill
It is often felt that the software or code produced is not publishable.
–
•
This includes code design, version control, documentation, and testing …
To safeguard traceability and reusability this scientific work has to be planned and supported
This includes the adoption of processes following the software development life cycle
The quality of software and its source code has a decisive influence on the quality of research results
Establishing best practices from software engineering not only adopted but
also adapted to serve scientific needs is crucial for the success of software
publications
8. Where is it going?
Find and implement solutions serving researchers’ needs regarding
software used in a scientific context so that software development
can be part of the academic tradition and thus is regarded as a
scientific achievement of its authors.
Recognize, create, and act upon opportunities
for the development of concepts establishing
defined processes and a reference platform.
9. Publication and Citation of Scientific Software
with Persistent Identifiers
London Open Data Meetup - The Open Knowledge Foundation
Wednesday, March 5, 2014, London, United Kingdom
Martin Hammitzsch, Helmholtz Centre Potsdam, GFZ German Research Centre for Geosciences
10. Publication and Citation of Scientific Software
with Persistent Identifiers
London Open Data Meetup - The Open Knowledge Foundation
Wednesday, March 5, 2014, London, United Kingdom
Martin Hammitzsch, Helmholtz Centre Potsdam, GFZ German Research Centre for Geosciences