Presentation given at the International Data Curation Conference (#IDCC!6) in Amsterdam, at the "A Context-driven Approach to Data Curation for Reuse" workshop (organized by Ixchel Faniel and Elizabeth Yakel) on Monday, February 22, 2015
2. Data often discussed
using language of
compliance
(Taylorist perspectives)
Data often discussed
using language of
compliance
(Taylorist perspectives)
3. ●
Linked: Links with other systems & data (tDAR, ORCID, etc)
●
Open: Code, data (mainly CC-By) on GitHub, machine-readable formats, APIs
●
Long-term: NSF, NEH data management. California Digital Library archiving
●
Global: Mirroring, collaboration with the German Archaeological Institute (DAI)
●
Linked: Links with other systems & data (tDAR, ORCID, etc)
●
Open: Code, data (mainly CC-By) on GitHub, machine-readable formats, APIs
●
Long-term: NSF, NEH data management. California Digital Library archiving
●
Global: Mirroring, collaboration with the German Archaeological Institute (DAI)
4.
Role: Publication (editorial & peer-review) and exhibition (like an online museum)
Promote Data Reuse: Attempt to document context, annotate data to common
vocabularies. Increasing emphasis on intervening earlier in research data “life-
cycle”.
Role: Publication (editorial & peer-review) and exhibition (like an online museum)
Promote Data Reuse: Attempt to document context, annotate data to common
vocabularies. Increasing emphasis on intervening earlier in research data “life-
cycle”.
5. ?
Spectrum of Less and More Structure
1. More structured: classification, quantification
2. Less structured: images, field-notes
3. Structured and less structured information need to
cross-reference (URIs useful), all provide context
Spectrum of Less and More Structure
1. More structured: classification, quantification
2. Less structured: images, field-notes
3. Structured and less structured information need to
cross-reference (URIs useful), all provide context
6.
Open Context ≠ A conventional digital repository
Open Context ≠ A conventional digital repository
7.
8.
9. Information Stable URI
300m wall circumference (estimated based on
geomagnetic sounding, approximate)
http://arcserver.usc.edu/reports/reports/TAA_
2000_to_2007.pdf
Wall foundation about 1.8m thick http://opencontext.org/media/BF565965-
98A8-4E84-2318-AFFA983277E1
Brick dimensions: 34 x 31 x 9 cm http://opencontext.org/subjects/975143F2-
B80E-436B-B078-1D67FD848352
Surviving wall height: 1.2 meters http://opencontext.org/subjects/02B9D6E6-
D6AD-4138-7FCC-3EF6F8BD5722
Specific Citation Promotes Reproducibility
1. Look at lots of pictures, read field notes.
2. URIs facilitate reproducibility, link assertions with
specific information sources
Specific Citation Promotes Reproducibility
1. Look at lots of pictures, read field notes.
2. URIs facilitate reproducibility, link assertions with
specific information sources
URIs & Unstructured Data
10. APIs (Machine-Readable
Data) make it easier to re-
use, analyze, visualize, +
interpret less structured
data.
APIs (Machine-Readable
Data) make it easier to re-
use, analyze, visualize, +
interpret less structured
data.
11.
Open Context ≠ A conventional digital repository
Open Context ≠ A conventional digital repository
12. Image Credit: Mark Skipper via Flickr (CC-BY)
https://www.flickr.com/photos/bitterjug/7670055210
Challenge of ComplexityChallenge of Complexity
13. Entity Relation Diagram:
Anglo-Saxon Graves and Grave Goods of
the 6th and 7th Centuries AD: A
Chronological Framework
John Hines (2013)
http://dx.doi.org/10.5284/1018290
Entity Relation Diagram:
Anglo-Saxon Graves and Grave Goods of
the 6th and 7th Centuries AD: A
Chronological Framework
John Hines (2013)
http://dx.doi.org/10.5284/1018290
14. Digital
Repository
Citation Cite Archaeological
Entities (sites, coins,
bones, etc)
Cite Digital Files (can
contain thousands of
items)
Granularity High (“1 URI per
potsherd”)
Low (Information
aggregated in big files)
Discovery,
Querying
Common schema,
common index for
content, not just
metadata
Index metadata only,
content is more opaque
Cost Expensive “Boutique
Publishing”
Cheaper, easier to scale.
Self-service models.
15. Managing Complexity:
Data about this coin came
from several different files
(relational data bases,
spreadsheets)
Some archaeological
projects can have dozens of
different spreadsheets +
databases!
Managing Complexity:
Data about this coin came
from several different files
(relational data bases,
spreadsheets)
Some archaeological
projects can have dozens of
different spreadsheets +
databases!
18. Large scale data sharing &
integration for exploring the
origins of farming.
Funded by EOL / NEH
Large scale data sharing &
integration for exploring the
origins of farming.
Funded by EOL / NEH
22. LimitationsLimitations
• Diverse recovery, sampling,Diverse recovery, sampling,
identification methods…identification methods…
• Data modeling problems inData modeling problems in
sources (esp. teeth)sources (esp. teeth)
• Researchers need toResearchers need to
understand how to make dataunderstand how to make data
better suited for reusebetter suited for reuse
LimitationsLimitations
• Diverse recovery, sampling,Diverse recovery, sampling,
identification methods…identification methods…
• Data modeling problems inData modeling problems in
sources (esp. teeth)sources (esp. teeth)
• Researchers need toResearchers need to
understand how to make dataunderstand how to make data
better suited for reusebetter suited for reuse
23. Bootstrapping ProblemBootstrapping Problem
• (Linked) Data can feel like(Linked) Data can feel like
having a telephone withhaving a telephone with
nobody to callnobody to call
• Links with other data can helpLinks with other data can help
buid context. But relevancebuid context. But relevance
can have a very narrow scopecan have a very narrow scope
Bootstrapping ProblemBootstrapping Problem
• (Linked) Data can feel like(Linked) Data can feel like
having a telephone withhaving a telephone with
nobody to callnobody to call
• Links with other data can helpLinks with other data can help
buid context. But relevancebuid context. But relevance
can have a very narrow scopecan have a very narrow scope
24. Pelagios:
Geographic context emerging as
key way to aggregate multiple
datasets
(Pis: Leif Isaksen, Elton Barker)
Pelagios:
Geographic context emerging as
key way to aggregate multiple
datasets
(Pis: Leif Isaksen, Elton Barker)
25. ●
Digital Index of North American Archaeology (DINAA): David G.
Anderson, Joshua Wells (PIs) NSF-funded.
●
Publishes a gazetteer of archaeological “site” records (from state agencies).
gazetteer of “sites”. (A site is a key concept in archaeology)
●
Digital Index of North American Archaeology (DINAA): David G.
Anderson, Joshua Wells (PIs) NSF-funded.
●
Publishes a gazetteer of archaeological “site” records (from state agencies).
gazetteer of “sites”. (A site is a key concept in archaeology)
26. ●
Cross referenced site URIs with relevant records in tDAR and other public
databases
●
Cross referenced site URIs with relevant records in tDAR and other public
databases
27. PeriodO (http://perio.do)
•
Led by Adam Rabinowitz, Ryan
Shaw, Eric Kansa (NEH funding)
•
Sometimes little consensus in
context (time periods)
PeriodO (http://perio.do)
•
Led by Adam Rabinowitz, Ryan
Shaw, Eric Kansa (NEH funding)
•
Sometimes little consensus in
context (time periods)
28. PeriodO Gazetteer of Periods,
modeling:
(1) Temporal scope
(2) Geographic coverage
(3) Scholarly authority [because
disagreements about High,
Middle, and Low Chronologies]
PeriodO Gazetteer of Periods,
modeling:
(1) Temporal scope
(2) Geographic coverage
(3) Scholarly authority [because
disagreements about High,
Middle, and Low Chronologies]
29. New Publishing Services
1. Open Context will publish
citable, formally modeled
(SKOS) controlled vocabularies
2. Context-informed
reconciliation services to help
researchers / curators link
data
3. Offer a recommendation
service for relevant
vocabularies for researchers
(especially seeking DMP help)
New Publishing Services
1. Open Context will publish
citable, formally modeled
(SKOS) controlled vocabularies
2. Context-informed
reconciliation services to help
researchers / curators link
data
3. Offer a recommendation
service for relevant
vocabularies for researchers
(especially seeking DMP help)
30. Final Thoughts
(Finally) some examples of data
reuse and integration (in
archaeology).
In many cases, reuse is still
aspirational. Need long time
scales to develop context.
“Context” is a hard research
problem (including theoretical);
requires better practice at each
stage of the data life-cycle.
(Finally) some examples of data
reuse and integration (in
archaeology).
In many cases, reuse is still
aspirational. Need long time
scales to develop context.
“Context” is a hard research
problem (including theoretical);
requires better practice at each
stage of the data life-cycle.