Presented at Strategic Conversations at Harvard Library, 9 June 2016
Details are here: http://library.harvard.edu/hlsc
In this talk, Ixchel Faniel from OCLC discussed data reuse practices within academic communities as a means to inform data curation. Knowledge of data reuse and curation processes can shape the activities and services of researchers, librarians, and other information professionals in order to enhance data reuse and accelerate research discoveries.
Ixchel M. Faniel is a Research Scientist at OCLC Research.
Improving Support for Researchers: How Data Reuse Can Inform Data Curation
1. Strategic Conversations at Harvard Library, Boston MA, June 9, 2016
Improving Support for Researchers: How Data
Reuse Can Inform Data Curation
Ixchel M. Faniel, Ph.D.
Research Scientist, OCLC
fanieli@oclc.org
@imfaniel
2. Data reuse lets researchers do
cool things.
“In 2005, a team of marine biologists…used
inflation-adjusted pricing data from the New
York Public Library’s (NYPL) collection of
45,000 restaurant menus, among other
sources, to confirm the commercial
overharvesting of abalone stocks along the
California coast beginning in the 1920s…”.
(Enis, 2015)
3. “[It] is a lot harder than a lot of people think
because it’s not just about getting the data and
getting some kind of file that tells you what it
is, you really have to understand all the detail
of an actual experiment that took place in order
to make proper use of it usually. And so it’s
usually pretty involved…”.
- Earthquake Engineering Researcher, 10
But data reuse is hard.
(Faniel & Jacobsen, 2010)
4. Dissemination Information Packages
for Information Reuse (DIPIR)
The DIPIR Project was made possible by a National Leadership Grant from the Institute of Museum and Library Services, LG-06-10-0140-10,
“Dissemination Information Packages for Information Reuse” and support from OCLC Online Computer Library Center, Inc., and the University of
Michigan.
5. 1. What are the significant
properties of social science,
archaeological, and zoological
data that facilitate reuse?
2. Can data reuse and curation
practices be generalized
across disciplines?
Our
Interest
Research Questions
(Faniel & Yakel, 2011)
6. ICSPR Open Context UMMZ
Phase 1: Project Start up
Interview Staff 10 4 10
Phase 2: Collecting and analyzing user data
Interview Reusers 43 22 27
Survey Reusers 1480
Web analytics server logs
Observe Reusers 13
Phase 3: Mapping data’s context to reusers’ needs
DIPIR Methodology
7. Today’s Focus: The Data Reuser
“I’m sort of transitioning from…hunting and herding
…to look at how animals are incorporated into
increasingly complex societies…so the role they
play in the emergence of wealth and elites,
particularly domestic animals, commodity
production and the use of wool as a major
foundation for urban economies in the Bronze
Age…”.
- Archaeologist 13
8. Today’s Focus: The Data Reuser
• Context information needed
– Its direct vs. indirect relationship with the data
• Reasons context information needed
– Role of data quality in the data reuse process
• Sources of context information
– Tracking, linking, and curating varied sources
• Implications for academic libraries
– Shaping data curation activities and services
10. Interviews and Observations
Data Collection
• 92 interviews
• 13 researchers
observed at the
University of Michigan
Museum of Zoology
Data Analysis
• 1st cycle coding
– based on interview
protocol
– more codes added as
necessary
• 2nd cycle coding for
context
– Detailed context needed
– Place get context
– Reason need context
14. “Sometimes they'll simply declare we were only interested
in broad-based information. We were only collecting broad-
based artifacts...So, they're walking huge tracts of land, but
they're only hitting big things…I've heard of things like
shoulder surveys, where they literally walk side by side and
pick those little things, but then, again, you've only, you're
doing a very narrow tract. So there are procedures”.
- Archaeologist 01
Data collection information
15. “People have looked at
morphometrics of lizards
before, but usually they're
not at the skeletal level...
And then…Nobody
measures teeth…I'm
very interested in teeth
and dentition”.
- Zoologist 30
Specimen Information
16. “So I was contacting him for other specific
information. Where was this found, what period did
it date to, and what artifacts were found with it
because that's often not cataloged along [with] this
primary zooarchaeological data nor do you have
access to field notes or anything like that.”.
- Archaeologist 02
Artifact Information
17. “But even within the codebook they may tell you
how it's coded. I had this a lot of times and I still
don't know…they just say if this variable belongs to
this category, we coded it as six. But it really doesn't
tell, didn't tell me how they coded it…I wanted to
make my own judgment whether this variable is
exactly what I want. But that won't give me that
indication”.
- Social Scientist 16
Data Analysis Information
18. “At least, if I know other people are
using it in criminology and publishing
in it, and it seems to be a pretty reliable
data source and something that's pretty
useful for criminology, then I know,
‘Okay, let's see if it's got the
information that we specifically want
for our project’”.
- Social Scientist 33
Prior Reuse Information
19. “If it is a tissue sample that's associated
with a voucher specimen, in other words,
it's a tissue sample that was taken from an
animal that wound up in a museum, I would
like to know that. I think that there should
be a field for that, or at least I should be
able to extract that data easily enough, so
that I know whether I can confirm the
taxonomic status of the fish from whence
this tissue came”.
- Zoologist 02
Digitization/Curation Information
20. “There was a relationship
already between the museum
and the university. And having
to be related to a famous
museum that has a reputation, it
does make the source more
reliable…”.
- Archaeologist 04
Repository Information
21. Establishing Trust in Repositories
Role of repository functions and classic trust factors
24. Improving Support for Researchers
• Evaluate data deposit requirements against reusers
needs
• Shape reusers’ perceptions when and where possible
• Capture and share context information generated
beyond data producers
27. “And part of it is not even about trust. It's about
how much that dataset fits your concept of what
your theory is, it's your operationalization”.
– Social Scientist 03
Data Quality - Relevance
28. “If some fish is identified as X, and it's from Y, and
X doesn't occur in Y, then I would say, ‘Okay, well
that's wrong’. So he's got that... He or she's got that
wrong”.
- Zoologist 08
Data Quality - Credibility
29. “But understanding them in context and sort of
defining them. I realized that there was a lot of
potential for the data and for the site itself. And
that's in large part, thanks go to the diligence of
the original excavators, because without the
accompanying documentation…”.
- Archaeologist 20
Data Quality - Interpretability
30. Data Quality – Ease of operation
“What I really noticed was that almost every
survey asked the questions slightly
differently…some of the studies that were done in
America actually copied the exact wording and
the scaling of a particular question…some
countries decided to do it just a little bit
differently…so it was difficult to compile and
harmonize the data”.
- Social Scientist 30
31. “Tails are kind of tough to study…The last several
tail vertebrae are very, very small, might get left
behind when the specimen's prepared…So
actually a lot of the skeletons in here would not
work for us. And that's something you really don't
know until you get the specimen”.
- Zoologist 36
Data Quality – Completeness
32. “…that [aggregator repository] targets so many
different collections that once you have access
you know pretty much…You can identify very
quickly what you need”.
- Zoologist 13
Data Quality – Accessibility
33. “…that Germans in Munich tradition is one
of the respected traditions for
zooarchaeology in the Old World. So, those
senior scholars and then their students are
the ones that you trust”.
– Archaeologist 13
Data quality – Data producer rep
34.
35. • Used ICPSR’s bibliography
of data related literature
• Surveyed 1,480 data reusers
• First authors on journal
articles published 2009-2012
• 16.8% response rate
Survey of Social Science Data Reusers
(Faniel, Kriesberg, & Yakel 2016)
36. B
Constant -.030
Data relevancy .066
Data completeness .245***
Data accessibility .320***
Data ease of operation .134*
Data credibility .148*
Documentation quality .204**
Data producer reputation .008
Journal rank .030
Model Statistics
N 237
R2 55.5%
Adjusted R2 54.0%
Model F 35.59***
What data quality
attributes influence
data reusers’
satisfaction after
controlling for journal
rank?
*p < .05, **p < .01, ***p , .001
(Faniel, Kriesberg, & Yakel 2016)
37. Improving Support for Researchers
• Meeting changing needs throughout the data reuse
process
• Evaluate repository success in different ways
• Shape documentation quality to meet reusers’
expectations
38. Sources of context information
Tracking, linking, and curating varied sources
39. Seven Key Sources of Contextual Information
(Faniel & Yakel, forthcoming)
40. Improving Support for Researchers
• Shape documentation during data creation
• Recognize people as an important source of context
information
• Use reuse to inform potential reusers (e.g. reuse
metrics, DOIs, data citations, bibliographies of data
reuse)
41. Related Work
• DIPIR
– http://www.oclc.org/research/themes/user-studies/dipir.html
• E-research and Data: Opportunities for Library
Engagement
– http://www.oclc.org/research/themes/user-studies/e-research.html
• Beyond Management: Data Curation as Scholarship in
Archaeology Project Description
– http://alexandriaarchive.org/projects/bridging-creation-and-reuse/
42. Acknowledgements
• Institute of Museum and Library Services
• Co-PI: Elizabeth Yakel (University of Michigan)
• Partners: Nancy McGovern, Ph.D. (MIT), Eric Kansa, Ph.D. (Open Context),
William Fink, Ph.D. (University of Michigan Museum of Zoology)
• OCLC Fellow: Julianna Barrera-Gomez
• Doctoral Students: Rebecca Frank, Adam Kriesberg, Morgan Daniels, Ayoung
Yoon
• Master’s Students: Alexa Hagen, Jessica Schaengold, Gavin Strassel,
Michele DeLia, Kathleen Fear, Mallory Hood, Annelise Doll, Monique Lowe
• Undergraduates: Molly Haig
43. References
Enis, Matt. 2015. “Wisdom of the Crowd | Digital Collections.” Library Journal, July 13.
http://lj.libraryjournal.com/2015/07/technology/wisdom-of-the-crowd-digital-collections/#_.
Faniel, Ixchel M., and T.E. Jacobsen. 2010. “Reusing Scientific Data: How Earthquake Engineering Researchers Assess the
Reusability of Colleagues’ Data.” Computer Supported Cooperative Work 19 (3–4): 355–75.
doi:10.1007/s10606-010-9117-8.
Faniel, Ixchel M., and Elizabeth Yakel. 2011. “Significant Properties as Contextual Metadata.” Journal of Library Metadata 11 (3–4):
155–65.
Faniel, Ixchel M., Adam Kriesberg, and Elizabeth Yakel. “Data Reuse and Sensemaking among Novice Social Scientists.”
Proceedings of the American Society for Information Science and Technology 49, no. 1 (2012): 1–10.
doi:10.1002/meet.14504901068.
Faniel, Ixchel M., Eric Kansa, Sarah Whitcher Kansa, Julianna Barrera-Gomez, and Elizabeth Yakel. “The Challenges of Digging
Data: A Study of Context in Archaeological Data Reuse.” In Proceedings of the 13th ACM/IEEE-CS Joint Conference on
Digital Libraries, 295–304. JCDL ’13. New York, NY, USA: ACM, 2013. doi:10.1145/2467696.2467712.
Yakel, Elizabeth, Ixchel Faniel, Adam Kriesberg, and Ayoung Yoon. 2013. “Trust in Digital Repositories.” International Journal of
Digital Curation 8 (1): 143–56. doi:10.2218/ijdc.v8i1.251.
Faniel, Ixchel M., Adam Kriesberg, and Elizabeth Yakel. 2016. “Social Scientists’ Satisfaction with Data Reuse.” Journal of the
Association for Information Science and Technology 67 (6): 1404–16. doi:10.1002/asi.23480.
Faniel, Ixchel M., and Elizabeth Yakel. forthcoming. “Practices Do Not Make Perfect: Disciplinary Data Sharing and Reuse
Practices and Their Implications for Repository Data Curation.” In Curating Research Data Volume 1: Practical Strategies
for Your Digital Repository. Chicago, IL: Association of College and Research Libraries Press.
Additional references for the DIPIR project: http://www.oclc.org/research/themes/user-studies/dipir/publications.html