Z Score,T Score, Percential Rank and Box Plot Graph
Second Open Economics Workshop - Thoughts from the Biosciences
1. Thoughts from the
Biomedical Sciences
Philip E. Bourne
UCSD
pbourne@ucsd.edu
Second Open Economics Workshop 1June 11, 2013
2. My Perspective is Drawn from Being:
A data producer and a data user*
An overseer of data curation efforts
A database provider (PDB & IEDB)
Suspicious of workshop reports, data
standards bodies …
A supporter of data publication
An open access journal founder
Opinionated
Second Open Economics Workshop 2June 11, 2013
3. The Big Picture
The Good News:
– NLM – Entrez - A Great
Job
– Open
data/software/papers
have spawned science
and jobs
– Success stories: Encode,
PDB
– D2K?
The Bad News:
– We have resources but
now they are perceived
as silos
– Lack of reproducibility
revealed
– Sustainability is unsolved
– Failures: CaBIG,
DataNet
– D2K?June 11, 2013 Second Open Economics Workshop
4. The Big Picture – What is the Way
Forward?
Driven by scientific outcomes – not build it
and they will come
Community, community, - which means:
– A simple vision that many stakeholders can buy into
– Transparency
– Shared ownership
– A code of conduct
– A reward system for individuals and teams
– Strategic policies eg open access, data sharing plans
– Use resources as drivers – funding bodies, societies,
institutions have a role here
– Building trust through quality data/software
June 11, 2013 Second Open Economics Workshop 4
5. Worldwide Protein Data Bank
www.wwpdb.org
Personal Experiences to Support My
Big Picture View
June 11, 2013 Second Open Economics Workshop 5
6. Its All About Trust
6
Second Open Economics Workshop
PDB
Trust in the data
is perhaps our
biggest achievement
7. Its All About Trust
Trust is like compound interest
Comes from listening
Comes from engaging the community in
every aspect of the process
Comes from data consistency and level of
annotation
Comes from responsiveness
Comes from the quality of the delivery service
7Second Open Economics WorkshopJune 11, 2013
8. Data Quality Begats Trust
About 25% of our budget has been spent on data
remediation
Support for versioning hence the copy of record
Our ontology/data model has been a critical
component of our workflow and data accuracy
Until recently the same data model was too complex
to facilitate wide adoption by others that use our data
Second Open Economics Workshop 8June 11, 2013
10. Its All About People
The Users
Constantly striving to have the user distinguish raw
from derived data
All data are not created equal but the user thinks so
Second Open Economics Workshop 10
June 11, 2013
11. Its All About People
The Global Personalities
11 Second Open Economics Workshop
12. Its NOT All About Institutions
As far as I am aware no data standards body
has directly influenced anything we have
done in 15 years of running the PDB
The structural biology community created a
very successful data sharing plan long before
funding bodies did
12Second Open Economics WorkshopJune 11, 2013
13. It is About Openness
There are no restrictions on the usage of the
data beyond attribution
The PDB runs exclusively on open source
software
We maintain and contribute to the Biojava
repository
We need to be transparent about data usage
Second Open Economics Workshop 13June 11, 2013
14. Worldwide Protein Data Bank
www.wwpdb.org
So What Needs to Change re Data?
Second Open Economics Workshop 14June 11, 2013
15. That All Data Are Created Equal
Must End
We need to understand
how data are used
Sustainability is not
more money from the
funding agencies its
about business models
Reductionism is not a
dirty word – Reference
Data!
We need to do more
with the long tail
Second Open Economics Workshop
On the Future of Genomic Data
Science 11 February 2011:
vol. 331 no. 6018 728-729
June 11, 2013
16. Institutions That Generate Data Must
Play a Greater Role
We need institutional data sharing plans
We need data scientists to be better
recognized by institutions – its not all about
papers – this implies new metrics
Second Open Economics Workshop 16June 11, 2013
17. www.force11.org
– Tim Clark
– Ivan Herman
– Paul Groth
– Ed Hovy
– Maryann Martone
– Cameron Neylon
– David Shotton
– Anita de Waard
www.plos.org
Beyond the PDF
Many others
Second Open Economics Workshop
Funding Agencies:
NSF, NIGMS, DOE, NLM, NCI,
NCRR, NIBIB, NINDS, NIDDK
17
Acknowledgements
June 11, 2013
18. The {Lack of} Distinction Between Data
and Knowledge Needs to be Better
Appreciated
• The PDB paper has been cited 14,000 times
• No one has ever read it
• Some PDB datasets have 1,000’s of downloads
• These data are not associated with publications 18