3. http://www.flickr.com/photos/jsmjr/62443357/
most of us would agree that science progresses by standing on shoulders of those who came before. Or by kneeling on their backs. Or clambering up their work any
other way we can.
11. This is a great story, right? And why where are all here.
But it is also a great metaphor for the problem
12. http://www.flickr.com/photos/davemurr/4592014327/
What exactly do broad shoulders get the individual researcher?
Pain!
Because a few citations, as much as we'd like to think otherwise, aren't enough to offset the hard work and Fear Uncertainty and Doubt that accompanies the costs of uploading
a dataset in the current culture.
19. http://www.flickr.com/photos/commissariat/4829261601/
in/faves-30112411@N02/
somebody else gets to be top tog. And I think a lot of researchers actually believe that by
making their shoulders broader they enable others to become top tog at their expense.
21. Gleditsch et al. 2003. Posting Your Data: Will You Be
Scooped or Will You Be Famous?, International Studies
Perspectives 4(1): 89–97.
Piwowar et al. 2007. Sharing Detailed research data is
associated with increased citation Rate. PLoS ONE.
Ioannidis et al. Repeatability of published microarray gene
expression analyses. Nature Genetics 41, 149 - 155
Pienta et al. 2010. NSR Social Science Secondary Use.
Michigan IR.
Henneken et al. 2011. Linking to Data – Effect on Citation
Rates in Astronomy. ESO.
Sears 2011. Data Sharing Effect on Article Citation rate in
Paleoceanography. AGU.
Don't get me wrong, I'm a fan of studies that show a citation benefit for sharing data :) . But it won't be enough.
28. http://www.flickr.com/photos/davemurr/4592014327/
What exactly do broad shoulders get the individual researcher?
Pain!
Because a few citations, as much as we'd like to think otherwise, aren't enough to offset the hard work and Fear Uncertainty and Doubt that accompanies the costs of uploading
a dataset in the current culture.
30. We need to facilitate
deep recognition of the
labour of dataset creation.
We need to facilitate deep recognition of the labour of dataset creation. hat top John Wilbanks.
Ok let me say that again because it is so important
We need to facilitate deep recognition of the labour of dataset creation.
38. http://total-impact.org
A CV is sort of bland, don't you think? It has no context of use.
We can see one version of a more useful future comes from a tool called total-Impact. Continuing a project that started as a hackathon at the Open Society Foundation
workshop Beyond Impact organized by Cameron Neylon here in the UK last spring, Jason Priem, me, and a few other people have been working on a tool called total-impact.
http://total-impact.org
40. http://total-impact.org
can drill in
The metrics are citations, but also altmetrics. PLoS has done some of the ground breaking work in this space with article-level citations, but a lot of other metrics are available
also...various indications that others have found your research worth bookmarking, or blogging, or referencing on Wikipedia.
41. http://total-impact.org
Also non-traditional research products like datasets.
It doesn't currently look for dataset identifiers in public R packages, but it could, for example, as indication of use.
This makes a “live CV” if you will, giving post-publication context to research output.
48. I'll splash by a few graphs of preliminary research findings.... come find me or my blog if you want more info.
Using manual annotation we are starting to be able to estimate third party reuse. In terms of raw numbers, with extrapolations
49. Teasing out use by the original authors from use by 3rd parties who probably only got access to the data because of the repository. Tools that support data citation will help
this.
50. We have observed reuse of at 35%
of GEO datasets submitted in 2005.
And distribution of the data use across all of the datasets in the repository. Is it 1% of the datasets that
drive all the use? Nope, it looks like often use is distributed across a broad population of datasets.
51. Piwowar, Vision, Whitlock (2011)
Data archiving is a good investment.
Nature letter to the editor: 473, p285.
http://researchremix.wordpress.com/2011/05/19/nature-letter/
This sort of information is very valuable for repositories when they want to make their case.
As I said, right now we can get some of this information through a lot of painful manual searching
across the internet. Data citations will help reduce some of this burden.
52. Indispensible
What repositories really want, though, though -- correct me if I’m wrong -- is to show that they are indispensable. That they generate new, profound science not otherwise
possible. That they are a great financial investment in scientific progress. This requires knowing more than just a citation count, it requires knowing the context of reuse. This
means we need access to the full text of the paper that cites the data.
54. http://www.flickr.com/photos/n2artscapes/3527520456/
They want to know the impact the data had on society. Did it facilitate innovation, reduce discrimination, create jobs, save the rainforest, increase our GDP.
That kind of tracking is beyond what any of us know how to do yet :)
We're going to need digital tracking technology that as far as I know isn't available yet but I'm sure people are working on. Google analytics meets digital RF-ID tags.... I
dunno... but I do know we need it. Furthermore, we need these digital tracking mechanisms to be affordable and open, to facilitate mashups.
55. Ok, so with that sort of future vision for tracking, what do we need as a scholarly ecosystem need to power this future world?
56. innovation and
experimentation
We need innovation and experimentation.
58. open access to citation data
We can't just rely on Scopus, Thomson, and Google Scholar.
Those are only three players, They good at what they do and have been invaluable, but they can't possibly be as nimble as a whole bunch of startups.
It is taking them a long time to come out with a data tracking tool. Why? Probably because they have an ambitious vision and need time to fit it into their other product
offerings. That isn’t a bad thing... but at the same time, Some of the rest of us would be happy with iterating on a quick and dirty solution.
We need more competition in this space. The barrier to entry is extrodinarily high because of course reference lists are almost all behind copyright and paywalls.... but open
access publications gives us a toehold.
59. open access to full text
Open access to full text.
Open access also gives us a toehold into citation context information.
A citation to a dataset tells us that the dataset played some role in that new research paper. What role? Was it used to validate a new method? Detect errors? Was it combined
with other datasets to solve a problem that was otherwise intractable? The answers to these questions are fundamental to what funders and others need to know about impact.
It won't be easy to derive them from the text of the paper, but I strongly believe it is possible.
60. open access to other metrics
Open access to other use.
We need broad-based metrics... not just citations, but blog posts about data, slides that include R and STATA tutorials about data, bookmarks to data on bookmarking sites.
altmetrics. If you run a data repository, make your download stats publicly available. We frankly don't know what all of this info means yet, but we didn't know what citations
to papers meant 50 years ago either. We'll all figure it out, the more data the better.
74. thank you
Todd Vision,
Jonathan Carlson, Estephanie Sta Maria,
Jason Priem, total-Impact and Beyond Impact
Dryad and DataONE teams
The open science online community and those who
release their articles, datasets and photos openly
blog: ResearchRemix.wordpress.com
@researchremix
thank you
75. 1. raise our expectations
2. raise our voices
3. get excited and make things