SlideShare a Scribd company logo
1 of 39
The life-sciences as a
pathfinder in data-
intensive research
practice
Dr Andrew Treloar, Director of
Technology
11 July 2014 CC-BY-SA, @atreloar 1
Structure presentation
๏‚ง Research Lifecycles
๏‚ง Functions of Scholarly Communication
๏‚ง Pointers to the future
๏‚ง Characterising the future
๏‚ง Pathfinder problems
๏‚ง Conclusions
11 July 2014 CC-BY-SA, @atreloar 2
So many lifecyclesโ€ฆ
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 3
Minimal Research Lifecycle
Think
DoShare
11 July 2014 CC-BY-SA, @atreloar 4
Sharing: Scholarly Communication
System and its Functions
๏‚ง Registration
๏‚ง Certification
๏‚ง Awareness
๏‚ง Archiving
(Rosendaal and Geurts, 1997)
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 5
System of Journals
๏‚ง Registration
๏‚ง submission of manuscript
๏‚ง Certification
๏‚ง peer-review (pre-publication)
๏‚ง commentary (post-publication)
๏‚ง Awareness
๏‚ง discovery services
๏‚ง Archiving
๏‚ง libraries (print)
๏‚ง publishers (electronic)
๏‚ง special purpose organisations (e.g. Portico)
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 6
Pointers to the future
โ€œthe future is already here โ€“ itโ€™s
just not very evenly distributedโ€
William Gibson, NPR interview
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 7
Registration: BioRxiv
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 8
Registration: Github
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 9
Registration: WikiPathways
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 10
Registration: NeuroLex
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 11
Registration: Nanopublications
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 12
Registration: some observations
๏‚ง Decoupling registration from certification
๏‚ง Timestamping, versioning
๏‚ง Registration of various types of objects
๏‚ง Machines as creators and contributors
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 13
Certification: PubMed Commons
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 14
Certification: PubPeer
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 15
Certification: Publons
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 16
Certification: some observations
๏‚ง Peer-review decoupled from publication process
๏‚ง Certification of various types of objects
๏‚ง Machines validating form
๏‚ง Social endorsement
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 17
Awareness: myExperiment
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 18
Awareness: eLabNotebook RSS
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 19
Awareness: Twitter
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 20
Awareness: some observations
๏‚ง Awareness for various types of objects
๏‚ง Real time awareness
๏‚ง Awareness support targeted at machines
๏‚ง Awareness through social media
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 21
Archiving: PDB
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 22
Archiving: GenBank
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 23
Characterising the future
Fixed Varying
Discrete Continuous
Hidden VisibleResearch Process
Nature of object
Process of making public
Speed of communicationDelayed Instant
Atomic CompoundAtomicity of object
Communicated object
Publication
+data proxies
Publication +
linked data +
linked models
Formal InformalNature of process11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 24
Fundamental changes
๏‚ง The research process (objects, social
dimension) is becoming more exposed
๏‚ง Articles, books are no longer the only
relevant objects for research
communication
๏‚ง Objects are no longer static
๏‚ง Machines are joining humans as (co-
)creators and consumers of research
objects
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 25
Pathfinder problems
๏‚ง Integrity of the scholarly record
๏‚ง The three obsolescences
๏‚ง hardware
๏‚ง file format
๏‚ง software
11 July 2014 CC-BY-SA, @atreloar 26
System of Journals: Archiving
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 27
Web of Objects: Archiving?
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 28
Not just citation relationships
11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 29
The problem of obsolescence
๏‚ง Lifescience research environment can be viewed
as undergoing a process of accelerated evolution
๏‚ง Other disciplines will hit these problems in time
11 July 2014 CC-BY-SA, @atreloar 30
Cambrian explosion
11 July 2014 31
Hardware obsolescence: Roche 454
11 July 2014 CC-BY-SA, @atreloar 32
Software obsolescence: too much choice, not
enough support
11 July 2014 CC-BY-SA, @atreloar 33
Abandonware
๏‚ง โ€œLast summer, a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna
with an intriguing project. The ANREP program, which
annotates structural motifs in gene or protein sequences,
was out of date having been written more than a decade
ago. Although still used by molecular biologists, its slow
computing ability meant a straightforward multiple search
could take all night on a desktop PC. The Udine biologist
wanted Vitacolonna, a postdoctoral fellow in
computational biology, to write a program that could do
the job more quickly.โ€
๏‚ง Sam Jaffe, Scientists Abandon their Software, The Scientist, Feb 16, 2004
11 July 2014 CC-BY-SA, @atreloar 34
File format obsolescence: Illumina
๏‚ง Probability of error in basecalling encoded using ascii
code to reduce file size
๏‚ง Meaning of the ascii code changed along the life cycle
and for data generated at different time points the
quality might be encoded differently
๏‚ง โ€œIf you get an error like "Invalid quality score value",
your fastq file probably has Sanger (offset 33) instead
of Illumina (ASCII offset 64) quality scores. You'll need
to add the option "-Q33" to your FASTX Toolkit
argumentsโ€. Obviouslyโ€ฆ
11 July 2014 CC-BY-SA, @atreloar 35
Everett Rogers, Diffusion of Innovation, 1962
11 July 2014 CC-BY-SA, @atreloar 36
Conclusions
๏‚ง Need to move to a smaller number of standard file
formats
๏‚ง Need to move to a more sustainable model of
software development and maintenance
๏‚ง Need to encourage platform manufacturers to
innovate around the hardware, not the software
๏‚ง NOTE: other disciplines are looking to lifesciences
to work out how to solve some of these problems
11 July 2014 CC-BY-SA, @atreloar 37
On best practices in the development of
bioinformatics software, Front. Genet., 02 Jul 14
๏‚ง Source code available to reviewers
๏‚ง Software indexed, citable, available
๏‚ง Source code documented
๏‚ง Source code managed
๏‚ง Test libraries, sample data and dataset repositories
available
11 July 2014 CC-BY-SA, @atreloar 38
Questions?
๏‚ง andrew.treloar@ands.org.au
๏‚ง @atreloar
๏‚ง https://www.slideshare.net/atreloar/the-
lifesciences-as-a-pathfinder-in-dataintensive-
research-practice
11 July 2014 CC-BY-SA, @atreloar 39

More Related Content

Similar to The life-sciences as a pathfinder in data-intensive research practice

Streamlining deposit an ojs to repository plugin
Streamlining deposit an ojs to repository pluginStreamlining deposit an ojs to repository plugin
Streamlining deposit an ojs to repository plugin
Jisc
ย 
Moving from an IR to a CRIS, the why & how
Moving from an IR to a CRIS, the why & howMoving from an IR to a CRIS, the why & how
Moving from an IR to a CRIS, the why & how
David T Palmer
ย 

Similar to The life-sciences as a pathfinder in data-intensive research practice (20)

Infraestrutura para a Ciรชncia Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciรชncia Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciรชncia Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciรชncia Aberta na Europa - OpenAIRE: O poder dos reposi...
ย 
Bits of Research
Bits of ResearchBits of Research
Bits of Research
ย 
Streamlining deposit an ojs to repository plugin
Streamlining deposit an ojs to repository pluginStreamlining deposit an ojs to repository plugin
Streamlining deposit an ojs to repository plugin
ย 
A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...
ย 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
ย 
Publishing in Open Access Journals โ€“ How DOAJ can help to avoid questionable ...
Publishing in Open Access Journals โ€“ How DOAJ can help to avoid questionable ...Publishing in Open Access Journals โ€“ How DOAJ can help to avoid questionable ...
Publishing in Open Access Journals โ€“ How DOAJ can help to avoid questionable ...
ย 
Panel members v2_datajournals_repositories_repofringe3aug2015
Panel members v2_datajournals_repositories_repofringe3aug2015Panel members v2_datajournals_repositories_repofringe3aug2015
Panel members v2_datajournals_repositories_repofringe3aug2015
ย 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community Call
ย 
SciVerse @ TJU
SciVerse @ TJUSciVerse @ TJU
SciVerse @ TJU
ย 
Facilitate Research Communities Adoption of Open Science Publishing Principle...
Facilitate Research Communities Adoption of Open Science Publishing Principle...Facilitate Research Communities Adoption of Open Science Publishing Principle...
Facilitate Research Communities Adoption of Open Science Publishing Principle...
ย 
Tracking research and research systems
Tracking research and research systemsTracking research and research systems
Tracking research and research systems
ย 
Moving from an IR to a CRIS, the why & how
Moving from an IR to a CRIS, the why & howMoving from an IR to a CRIS, the why & how
Moving from an IR to a CRIS, the why & how
ย 
UAEM EU Conference
UAEM EU Conference UAEM EU Conference
UAEM EU Conference
ย 
Reshaping the world of scholarly communication by Dr. Usha Munshi
Reshaping the world of scholarly communication by Dr. Usha MunshiReshaping the world of scholarly communication by Dr. Usha Munshi
Reshaping the world of scholarly communication by Dr. Usha Munshi
ย 
The State of Open Access in USA | Ensuring Quality
The State of Open Access in USA | Ensuring QualityThe State of Open Access in USA | Ensuring Quality
The State of Open Access in USA | Ensuring Quality
ย 
Open Science : Democratizing Access to Science
Open Science : Democratizing Access to ScienceOpen Science : Democratizing Access to Science
Open Science : Democratizing Access to Science
ย 
Csora, "2Collab, The Research Collaboration Tool"
Csora, "2Collab, The Research Collaboration Tool"Csora, "2Collab, The Research Collaboration Tool"
Csora, "2Collab, The Research Collaboration Tool"
ย 
Open Access Publishing
Open Access PublishingOpen Access Publishing
Open Access Publishing
ย 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
ย 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
ย 

More from Andrew Treloar

More from Andrew Treloar (20)

Building a National Research Data Commons โ€“ Transforming Scholarship Through ...
Building a National Research Data Commons โ€“ Transforming Scholarship Through ...Building a National Research Data Commons โ€“ Transforming Scholarship Through ...
Building a National Research Data Commons โ€“ Transforming Scholarship Through ...
ย 
Provenance in Support of the ANDS Four Transformations
Provenance in Support of the ANDS Four TransformationsProvenance in Support of the ANDS Four Transformations
Provenance in Support of the ANDS Four Transformations
ย 
ANDS Applications Program: Building Tools to Facilitate Data Reuse
ANDS Applications Program: Building Tools to Facilitate Data ReuseANDS Applications Program: Building Tools to Facilitate Data Reuse
ANDS Applications Program: Building Tools to Facilitate Data Reuse
ย 
Instutional repositories and data
Instutional repositories and dataInstutional repositories and data
Instutional repositories and data
ย 
Closing comments at #iPres 2014 conference
Closing comments at #iPres 2014 conferenceClosing comments at #iPres 2014 conference
Closing comments at #iPres 2014 conference
ย 
The universe of identifiers and how ANDS is using them
The universe of identifiers and how ANDS is using themThe universe of identifiers and how ANDS is using them
The universe of identifiers and how ANDS is using them
ย 
Adding value to researchers' data
Adding value to researchers' dataAdding value to researchers' data
Adding value to researchers' data
ย 
Scholarly archive-of-the-future
Scholarly archive-of-the-futureScholarly archive-of-the-future
Scholarly archive-of-the-future
ย 
Data Infrastructure and the Scholarly Ecosystem of the Future
Data Infrastructure and the Scholarly Ecosystem of the FutureData Infrastructure and the Scholarly Ecosystem of the Future
Data Infrastructure and the Scholarly Ecosystem of the Future
ย 
Research data and the ANDS agenda in Australia
Research data and the ANDS agenda in AustraliaResearch data and the ANDS agenda in Australia
Research data and the ANDS agenda in Australia
ย 
Data drives decisions
Data drives decisionsData drives decisions
Data drives decisions
ย 
Building on the Atlas (of Living Australia)
Building on the Atlas (of Living Australia)Building on the Atlas (of Living Australia)
Building on the Atlas (of Living Australia)
ย 
Journal literature size in the context of the LHC data
Journal literature size in the context of the LHC dataJournal literature size in the context of the LHC data
Journal literature size in the context of the LHC data
ย 
Seeking serendipity
Seeking serendipitySeeking serendipity
Seeking serendipity
ย 
Research data ecology
Research data ecologyResearch data ecology
Research data ecology
ย 
From Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly CommunicationFrom Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly Communication
ย 
Data management: international challenges, national infrastructure, and insti...
Data management: international challenges, national infrastructure, and insti...Data management: international challenges, national infrastructure, and insti...
Data management: international challenges, national infrastructure, and insti...
ย 
The Past, Present and Future of data
The Past, Present and Future of dataThe Past, Present and Future of data
The Past, Present and Future of data
ย 
Data, librarians, and services
Data, librarians, and servicesData, librarians, and services
Data, librarians, and services
ย 
Ands National Identifier Solution
Ands National Identifier SolutionAnds National Identifier Solution
Ands National Identifier Solution
ย 

Recently uploaded

CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service ๐Ÿชก
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service  ๐ŸชกCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service  ๐Ÿชก
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service ๐Ÿชก
anilsa9823
ย 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
ย 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
ย 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
LeenakshiTyagi
ย 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sรฉrgio Sacani
ย 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sรฉrgio Sacani
ย 

Recently uploaded (20)

CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service ๐Ÿชก
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service  ๐ŸชกCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service  ๐Ÿชก
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service ๐Ÿชก
ย 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
ย 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
ย 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
ย 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
ย 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
ย 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
ย 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
ย 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
ย 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
ย 
Stunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCR
Stunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCRStunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCR
Stunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCR
ย 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
ย 
Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls AgencyHire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
ย 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
ย 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
ย 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
ย 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
ย 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
ย 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
ย 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
ย 

The life-sciences as a pathfinder in data-intensive research practice

  • 1. The life-sciences as a pathfinder in data- intensive research practice Dr Andrew Treloar, Director of Technology 11 July 2014 CC-BY-SA, @atreloar 1
  • 2. Structure presentation ๏‚ง Research Lifecycles ๏‚ง Functions of Scholarly Communication ๏‚ง Pointers to the future ๏‚ง Characterising the future ๏‚ง Pathfinder problems ๏‚ง Conclusions 11 July 2014 CC-BY-SA, @atreloar 2
  • 3. So many lifecyclesโ€ฆ 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 3
  • 4. Minimal Research Lifecycle Think DoShare 11 July 2014 CC-BY-SA, @atreloar 4
  • 5. Sharing: Scholarly Communication System and its Functions ๏‚ง Registration ๏‚ง Certification ๏‚ง Awareness ๏‚ง Archiving (Rosendaal and Geurts, 1997) 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 5
  • 6. System of Journals ๏‚ง Registration ๏‚ง submission of manuscript ๏‚ง Certification ๏‚ง peer-review (pre-publication) ๏‚ง commentary (post-publication) ๏‚ง Awareness ๏‚ง discovery services ๏‚ง Archiving ๏‚ง libraries (print) ๏‚ง publishers (electronic) ๏‚ง special purpose organisations (e.g. Portico) 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 6
  • 7. Pointers to the future โ€œthe future is already here โ€“ itโ€™s just not very evenly distributedโ€ William Gibson, NPR interview 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 7
  • 8. Registration: BioRxiv 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 8
  • 9. Registration: Github 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 9
  • 10. Registration: WikiPathways 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 10
  • 11. Registration: NeuroLex 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 11
  • 12. Registration: Nanopublications 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 12
  • 13. Registration: some observations ๏‚ง Decoupling registration from certification ๏‚ง Timestamping, versioning ๏‚ง Registration of various types of objects ๏‚ง Machines as creators and contributors 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 13
  • 14. Certification: PubMed Commons 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 14
  • 15. Certification: PubPeer 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 15
  • 16. Certification: Publons 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 16
  • 17. Certification: some observations ๏‚ง Peer-review decoupled from publication process ๏‚ง Certification of various types of objects ๏‚ง Machines validating form ๏‚ง Social endorsement 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 17
  • 18. Awareness: myExperiment 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 18
  • 19. Awareness: eLabNotebook RSS 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 19
  • 20. Awareness: Twitter 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 20
  • 21. Awareness: some observations ๏‚ง Awareness for various types of objects ๏‚ง Real time awareness ๏‚ง Awareness support targeted at machines ๏‚ง Awareness through social media 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 21
  • 22. Archiving: PDB 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 22
  • 23. Archiving: GenBank 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 23
  • 24. Characterising the future Fixed Varying Discrete Continuous Hidden VisibleResearch Process Nature of object Process of making public Speed of communicationDelayed Instant Atomic CompoundAtomicity of object Communicated object Publication +data proxies Publication + linked data + linked models Formal InformalNature of process11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 24
  • 25. Fundamental changes ๏‚ง The research process (objects, social dimension) is becoming more exposed ๏‚ง Articles, books are no longer the only relevant objects for research communication ๏‚ง Objects are no longer static ๏‚ง Machines are joining humans as (co- )creators and consumers of research objects 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 25
  • 26. Pathfinder problems ๏‚ง Integrity of the scholarly record ๏‚ง The three obsolescences ๏‚ง hardware ๏‚ง file format ๏‚ง software 11 July 2014 CC-BY-SA, @atreloar 26
  • 27. System of Journals: Archiving 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 27
  • 28. Web of Objects: Archiving? 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 28
  • 29. Not just citation relationships 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 29
  • 30. The problem of obsolescence ๏‚ง Lifescience research environment can be viewed as undergoing a process of accelerated evolution ๏‚ง Other disciplines will hit these problems in time 11 July 2014 CC-BY-SA, @atreloar 30
  • 32. Hardware obsolescence: Roche 454 11 July 2014 CC-BY-SA, @atreloar 32
  • 33. Software obsolescence: too much choice, not enough support 11 July 2014 CC-BY-SA, @atreloar 33
  • 34. Abandonware ๏‚ง โ€œLast summer, a member of the biology department of the University of Udine in Italy approached Nicola Vitacolonna with an intriguing project. The ANREP program, which annotates structural motifs in gene or protein sequences, was out of date having been written more than a decade ago. Although still used by molecular biologists, its slow computing ability meant a straightforward multiple search could take all night on a desktop PC. The Udine biologist wanted Vitacolonna, a postdoctoral fellow in computational biology, to write a program that could do the job more quickly.โ€ ๏‚ง Sam Jaffe, Scientists Abandon their Software, The Scientist, Feb 16, 2004 11 July 2014 CC-BY-SA, @atreloar 34
  • 35. File format obsolescence: Illumina ๏‚ง Probability of error in basecalling encoded using ascii code to reduce file size ๏‚ง Meaning of the ascii code changed along the life cycle and for data generated at different time points the quality might be encoded differently ๏‚ง โ€œIf you get an error like "Invalid quality score value", your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores. You'll need to add the option "-Q33" to your FASTX Toolkit argumentsโ€. Obviouslyโ€ฆ 11 July 2014 CC-BY-SA, @atreloar 35
  • 36. Everett Rogers, Diffusion of Innovation, 1962 11 July 2014 CC-BY-SA, @atreloar 36
  • 37. Conclusions ๏‚ง Need to move to a smaller number of standard file formats ๏‚ง Need to move to a more sustainable model of software development and maintenance ๏‚ง Need to encourage platform manufacturers to innovate around the hardware, not the software ๏‚ง NOTE: other disciplines are looking to lifesciences to work out how to solve some of these problems 11 July 2014 CC-BY-SA, @atreloar 37
  • 38. On best practices in the development of bioinformatics software, Front. Genet., 02 Jul 14 ๏‚ง Source code available to reviewers ๏‚ง Software indexed, citable, available ๏‚ง Source code documented ๏‚ง Source code managed ๏‚ง Test libraries, sample data and dataset repositories available 11 July 2014 CC-BY-SA, @atreloar 38
  • 39. Questions? ๏‚ง andrew.treloar@ands.org.au ๏‚ง @atreloar ๏‚ง https://www.slideshare.net/atreloar/the- lifesciences-as-a-pathfinder-in-dataintensive- research-practice 11 July 2014 CC-BY-SA, @atreloar 39

Editor's Notes

  1. Story that is being told here โ€“ might seem initially in pieces, but there is a common thread. Point of first section is broad context for two case studies
  2. Increasingly, Share is bleeding into Do, so letโ€™s zoom in on this
  3. Want to provide a series of snapshots of the future drawn from lifesciences
  4. Sourceforge is another example
  5. DNA variant of NG_000007.3 (hemoglobin) Sardinian population Provenance: authors of the article from which the nanopub was mined
  6. Content: Post-publication peer review of pubs
  7. Content: Post-publication peer review of pubs
  8. Publons aims to change all that. Members of the site can import papers, rate them, and discuss them. In ongoing discussions, members can endorse reviews. When the endorsements reach a certain threshold, the review gains a digital object identifier (DOI), turning it into an object that can be cited in more traditional academic literature.
  9. Content: Multiple sources checking the validity/classification of data
  10. Content: Multiple sources checking the validity/classification of data
  11. Content: Multiple sources checking the validity/classification of data
  12. Could also have had this for Registration, of course
  13. Content: Multiple sources checking the validity/classification of data
  14. Problem of reproducibility is just part of the problem
  15. Integrity used to be based on reliable archives
  16. Accelerated evolution (again, like Cambrian explosion)
  17. Not supported after 2016
  18. Omictools, Seqanswers I am reminded a bit of the early days of computing and the proliferation of word processors
  19. One way to think about this problem is in terms of diffusion of innovation
  20. So no pressure thenโ€ฆ