SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Rescue of Long-Tail Data ‹
from the Ocean Bottom to the Moon!
!

Leslie Hsu, Kerstin Lehnert, Suzanne Carbotte, Vicki Ferrini,!
1
2
3!
! John Delano , James B. Gill , Maurice Tivey
!
Lamont-Doherty Earth Observatory, Columbia University,!
! 1University of Albany, 2University of California, Santa Cruz, 3Woods Hole Oceanographic Institution!
!

!

IN12A. Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-source Science!
Fall AGU 2013!

IEDA

iedadata.org
Data at Risk!
Â€ï‚€â€Ż "Data at Risk" is scientiïŹc data that are !
Â€ï‚€â€Ż not in formats that permit full electronic access to the information they
contain. !

Â€ï‚€â€Ż Data at Risk may be !
Â€ï‚€â€Ż non-digital (e.g., handwritten or photographic), !
Â€ï‚€â€Ż on near-obsolete digital media (such as ïŹ‚oppy disks), !
Â€ï‚€â€Ż or insufïŹciently described (lacking metadata). !

Â€ï‚€â€Ż Some born-digital data are considered "at risk" if they cannot be
ingested into managed databases because they lack adequate
formatting or metadata.!
!
DeïŹnition from the ICSU CODATA Data at Risk Task Group (DARTG)!

IEDA

iedadata.org
Data Rescue!
Â€ï‚€â€Ż A “Data Rescue Mission” is any effort to preserve data at risk. Rescue
missions can come in the form of digitization, format migration, treating
damaged materials (e.g., water or mold), adding metadata or any action
taken to make data accessible in the long term.!

M. Tivey
Definition from ICSU CODATA Data at Risk Task Group (DARTG)

IEDA

iedadata.org
Long Tail Data are often Data at Risk!
The Head:

Long Tail Characteristics!

Astronomy,
Climate,
High Energy
Physics,
Genomics

qï±â€Ż
qï±â€Ż
qï±â€Ż
qï±â€Ż
qï±â€Ż
qï±â€Ż

Long Tail:

Environmental and
Earth sciences

http://juliegood.wordpress.com/tag/long-tail/

L. Wyborn

More specialised!
Low volume!
On C drives!
Hard to ïŹnd!
Heterogeneous!
Collected by many
people!
qï±â€Ż Citizen science!
qï±â€Ż Etc!
qï±â€Ż Etc!

IEDA

iedadata.org
IEDA Data Rescue Mini-Awards!
Â€ï‚€â€ŻEstablished to preserve valuable legacy data sets that
are in danger by impending retirement or degradation!
Â€ï‚€â€Ż Evaluated by highest impact on future research by quality, size,
rarity, unique location or data type!
Â€ï‚€â€Ż Made accessible to the community for re-use by inclusion in the IEDA
data collections (EarthChem, MGDS, SESAR)!
Â€ï‚€â€Ż $7000 award to support proper compilation, documentation, transfer!

Â€ï‚€â€Ż 3 awardees chosen from 11 entries over a wide range of geochemical
and geophysical data!
!

IEDA

iedadata.org
1: Geologic samples and geochemistry!
Â€ï‚€â€Ż WHAT: Compilation of sample
metadata and geochemical
analyses from three areas – Fiji,
Izu Arc, and Endeavour segment.
(James B. Gill)!

Maps made with GeoMapApp

Â€ï‚€â€Ż WHY: study of intra-ocean arcs
and spreading centers!
Â€ï‚€â€Ż HOW: Check and add incomplete
data, digitize data, add persistent
identiïŹers. Link between related
resources!
Â€ï‚€â€Ż Major challenge: Physical sample
management!

IEDA

iedadata.org
The importance of Sample identiïŹcation!
Â€ï‚€â€Ż Individual samples can play a large role in scientiïŹc conclusions, so
accurate documentation of sample metadata is critical.!
Â€ï‚€â€Ż The key measurement was the one backarc basalt called "PPTUW”...
Subsequent efforts to conïŹrm the observation ran into problems. The
apparently-same sample was variously called PPTU, PPTUW/5,
PPTUW-1, and TVZ19 in four other papers. None of those papers gave
its latitude and longitude
 (J. Gill and E. Todd)!

IEDA

iedadata.org
2: Near-bottom magnetics!
Â€ï‚€â€Ż WHAT: Compilation of near-bottom
magnetometer data, including raw,
merged, processed, and navigation
metadata (Maurice Tivey)!
Â€ï‚€â€Ż WHY: study of magnetic reversals,
effect of tectonics on magnetic ïŹeld!
Â€ï‚€â€Ż HOW: gather data from different
formats, add complete metadata
and workïŹ‚ow!
Â€ï‚€â€Ż Challenge: over three decades of
technology and ïŹle formats!

IEDA

iedadata.org
Evolution of equipment: 1985, 1992, 2004, 2011 !

IEDA

iedadata.org
Evolution of storage media!

M. Tivey

IEDA

iedadata.org
Addition of “sufïŹcient” metadata!

IEDA

iedadata.org
3: Lunar sample geochemistry!
Â€ï‚€â€Ż WHAT: Compilation of lunar
sample geochemistry (John W.
Delano et al.)!
Â€ï‚€â€Ż WHY: composition of the Moon!
Â€ï‚€â€Ż HOW: Digitize photos, label
speciïŹc grains, compile
geochemistry in data templates!
Â€ï‚€â€Ż Challenge: nothing was digital!

!

LPI

IEDA

iedadata.org
Use of IEDA EarthChem templates!

IEDA

iedadata.org
Common needs addressed!
Â€ï‚€â€ŻAccessibility – web access, links between systems!
Â€ï‚€â€ŻDocumentation – README ïŹles, additional descriptions!
Â€ï‚€â€ŻStandardization – IEDA EarthChem geochemical templates !
Â€ï‚€â€ŻPersistent links – DOIs and IGSNs!
Â€ï‚€â€ŻCitability – DOIs, example citations!
Â€ï‚€â€ŻGuidance/Training – calls and emails with disciplinary repository
staff!

IEDA

iedadata.org
IEDA

iedadata.org
Lessons learned: investigator!
Â€ï‚€â€ŻTake ownership of your own legacy!
Â€ï‚€â€Ż Data curation by others may not be complete or correct!

Â€ï‚€â€ŻData rescue of an entire career does not need to be
overwhelming !
Â€ï‚€â€Ż Start with small steps!
Â€ï‚€â€Ż Disciplinary repositories will help and guide you to what is needed!

Â€ï‚€â€ŻDespite the time investment, data rescue is worth it!
Â€ï‚€â€Ż Others will now be able to re-use the data!
Â€ï‚€â€Ż Notes taken years ago actually explain anomalies!
!

IEDA

iedadata.org
Lessons learned: repository!
Â€ï‚€â€ŻFor Long Tail Data, every project is different !
Â€ï‚€â€Ż There is not an established workïŹ‚ow – just past experience!
Â€ï‚€â€Ż Time commitment from staff is nontrivial!

Â€ï‚€â€ŻDisciplinary training helps a great deal!
Â€ï‚€â€Ż Investigators need help determining the best products!

Â€ï‚€â€ŻA small incentive will motivate investigators!
Â€ï‚€â€ŻData Rescue missions help the repository determine
next steps for development of tools and services!

IEDA

iedadata.org
Summary of Long-tail Data Rescue!
Â€ï‚€â€ŻThree Data Rescue efforts this past year by IEDA have
made data that were at risk!
Â€ï‚€â€Ż digitized from analog data and near-obsolete media!
Â€ï‚€â€Ż sufïŹciently described for reuse!
Â€ï‚€â€Ż in formats that permit full electronic access!
Â€ï‚€â€Ż Citable, with persistent identiïŹers, and ready for reuse!

Â€ï‚€â€ŻThe projects also helped IEDA identify improvements in
data rescue workïŹ‚ow, and future tools and services!

IEDA

iedadata.org
More Data Rescue Activities!

Â€ï‚€â€ŻElsevier-IEDA Data Rescue Process Study!
Â€ï‚€â€Ż A data entry tool for lunar geochemistry: MoonDB!

Â€ï‚€â€ŻElsevier-IEDA International Data Rescue Award!
Â€ï‚€â€Ż Winner announced at reception tonight, Monday Dec 9th, 2013!
Â€ï‚€â€Ż Intercontinental Hotel, Twin Peaks Room, 7:00-8:30pm!

IEDA

iedadata.org

Weitere Àhnliche Inhalte

Andere mochten auch

Memoire desiu médecine subaquatique et hyperbare dr soualhi .dr naamani.dr ba...
Memoire desiu médecine subaquatique et hyperbare dr soualhi .dr naamani.dr ba...Memoire desiu médecine subaquatique et hyperbare dr soualhi .dr naamani.dr ba...
Memoire desiu médecine subaquatique et hyperbare dr soualhi .dr naamani.dr ba...Islem Soualhi
 
DiƟ ti̇carette kullanilan tesli̇m Ɵeki̇lleri̇
DiƟ ti̇carette kullanilan tesli̇m Ɵeki̇lleri̇DiƟ ti̇carette kullanilan tesli̇m Ɵeki̇lleri̇
DiƟ ti̇carette kullanilan tesli̇m Ɵeki̇lleri̇Burcu BuRcu
 
What do you_know_about_the_usa
What do you_know_about_the_usaWhat do you_know_about_the_usa
What do you_know_about_the_usaAntshil
 
Clusters - Quayside Clothing Case Study
Clusters - Quayside Clothing Case StudyClusters - Quayside Clothing Case Study
Clusters - Quayside Clothing Case StudyClusters Ltd
 
ĐœĐŸĐč Đșласс
ĐœĐŸĐč Đșласс ĐœĐŸĐč Đșласс
ĐœĐŸĐč Đșласс Antshil
 
Facebook education, Facebook Marketing
Facebook education, Facebook MarketingFacebook education, Facebook Marketing
Facebook education, Facebook Marketingmehergaje
 
A profile of a famous person
A profile of a famous personA profile of a famous person
A profile of a famous personEgiptodiaz12
 
ĐșĐŸĐ»Đ”ĐŽĐ°
ĐșĐŸĐ»Đ”ĐŽĐ°ĐșĐŸĐ»Đ”ĐŽĐ°
ĐșĐŸĐ»Đ”ĐŽĐ°Sinapova
 
Laporan teknologi pupukdan pemupukan
Laporan teknologi pupukdan pemupukanLaporan teknologi pupukdan pemupukan
Laporan teknologi pupukdan pemupukanfahmiganteng
 
portfolio web design
portfolio web designportfolio web design
portfolio web designMarkus Kasemaa
 
Laporan praktikum fistanklorofil
Laporan praktikum fistanklorofilLaporan praktikum fistanklorofil
Laporan praktikum fistanklorofilfahmiganteng
 
I-Lappy- the Future Laptop
I-Lappy- the Future LaptopI-Lappy- the Future Laptop
I-Lappy- the Future LaptopSukumar Perneti
 
Ignatius termes i condicions
Ignatius termes i condicionsIgnatius termes i condicions
Ignatius termes i condicionsespailiterari
 
American Golf Courses Of Note
American Golf Courses Of NoteAmerican Golf Courses Of Note
American Golf Courses Of Notegroc1350
 

Andere mochten auch (17)

Memoire desiu médecine subaquatique et hyperbare dr soualhi .dr naamani.dr ba...
Memoire desiu médecine subaquatique et hyperbare dr soualhi .dr naamani.dr ba...Memoire desiu médecine subaquatique et hyperbare dr soualhi .dr naamani.dr ba...
Memoire desiu médecine subaquatique et hyperbare dr soualhi .dr naamani.dr ba...
 
DiƟ ti̇carette kullanilan tesli̇m Ɵeki̇lleri̇
DiƟ ti̇carette kullanilan tesli̇m Ɵeki̇lleri̇DiƟ ti̇carette kullanilan tesli̇m Ɵeki̇lleri̇
DiƟ ti̇carette kullanilan tesli̇m Ɵeki̇lleri̇
 
What do you_know_about_the_usa
What do you_know_about_the_usaWhat do you_know_about_the_usa
What do you_know_about_the_usa
 
INFLOW-2014-NVM-Compression
INFLOW-2014-NVM-CompressionINFLOW-2014-NVM-Compression
INFLOW-2014-NVM-Compression
 
Clusters - Quayside Clothing Case Study
Clusters - Quayside Clothing Case StudyClusters - Quayside Clothing Case Study
Clusters - Quayside Clothing Case Study
 
ĐœĐŸĐč Đșласс
ĐœĐŸĐč Đșласс ĐœĐŸĐč Đșласс
ĐœĐŸĐč Đșласс
 
Facebook education, Facebook Marketing
Facebook education, Facebook MarketingFacebook education, Facebook Marketing
Facebook education, Facebook Marketing
 
A profile of a famous person
A profile of a famous personA profile of a famous person
A profile of a famous person
 
ĐșĐŸĐ»Đ”ĐŽĐ°
ĐșĐŸĐ»Đ”ĐŽĐ°ĐșĐŸĐ»Đ”ĐŽĐ°
ĐșĐŸĐ»Đ”ĐŽĐ°
 
Laporan teknologi pupukdan pemupukan
Laporan teknologi pupukdan pemupukanLaporan teknologi pupukdan pemupukan
Laporan teknologi pupukdan pemupukan
 
portfolio web design
portfolio web designportfolio web design
portfolio web design
 
Laporan praktikum fistanklorofil
Laporan praktikum fistanklorofilLaporan praktikum fistanklorofil
Laporan praktikum fistanklorofil
 
I-Lappy- the Future Laptop
I-Lappy- the Future LaptopI-Lappy- the Future Laptop
I-Lappy- the Future Laptop
 
Presentation bus - international business
Presentation bus - international businessPresentation bus - international business
Presentation bus - international business
 
Ignatius termes i condicions
Ignatius termes i condicionsIgnatius termes i condicions
Ignatius termes i condicions
 
American Golf Courses Of Note
American Golf Courses Of NoteAmerican Golf Courses Of Note
American Golf Courses Of Note
 
Schedule
ScheduleSchedule
Schedule
 

Ähnlich wie Rescue of Long-Tail Data from the Ocean Bottom to the Moon

Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014iedadata
 
Goldschmidt2019 Samples Workshop
Goldschmidt2019 Samples WorkshopGoldschmidt2019 Samples Workshop
Goldschmidt2019 Samples WorkshopKerstin Lehnert
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016Dag Endresen
 
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...nabo_ghea
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
 
GBIF and reuse of research data, Bergen (2016-12-14)
GBIF and reuse of research data, Bergen (2016-12-14)GBIF and reuse of research data, Bergen (2016-12-14)
GBIF and reuse of research data, Bergen (2016-12-14)Dag Endresen
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation HeidornBryan Heidorn
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationMichael Day
 
Use of persistent identifiers to link heterogeneous data systems in the Integ...
Use of persistent identifiers to link heterogeneous data systems in the Integ...Use of persistent identifiers to link heterogeneous data systems in the Integ...
Use of persistent identifiers to link heterogeneous data systems in the Integ...hsuleslie
 
Sarah Jones RDM from a disciplinary perspective
Sarah Jones RDM from a disciplinary perspectiveSarah Jones RDM from a disciplinary perspective
Sarah Jones RDM from a disciplinary perspectiveJisc
 
Module 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxModule 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxesta2310819
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfssuserff37aa
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...hsuleslie
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypseENUG
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New ScienceAnita de Waard
 

Ähnlich wie Rescue of Long-Tail Data from the Ocean Bottom to the Moon (20)

Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
METRO RDM Webinar
METRO RDM WebinarMETRO RDM Webinar
METRO RDM Webinar
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014
 
Goldschmidt2019 Samples Workshop
Goldschmidt2019 Samples WorkshopGoldschmidt2019 Samples Workshop
Goldschmidt2019 Samples Workshop
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016
 
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521
 
GBIF and reuse of research data, Bergen (2016-12-14)
GBIF and reuse of research data, Bergen (2016-12-14)GBIF and reuse of research data, Bergen (2016-12-14)
GBIF and reuse of research data, Bergen (2016-12-14)
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation Heidorn
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curation
 
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data EquivalenceNISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
 
Use of persistent identifiers to link heterogeneous data systems in the Integ...
Use of persistent identifiers to link heterogeneous data systems in the Integ...Use of persistent identifiers to link heterogeneous data systems in the Integ...
Use of persistent identifiers to link heterogeneous data systems in the Integ...
 
Sarah Jones RDM from a disciplinary perspective
Sarah Jones RDM from a disciplinary perspectiveSarah Jones RDM from a disciplinary perspective
Sarah Jones RDM from a disciplinary perspective
 
Module 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxModule 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptx
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdf
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New Science
 
E research overview gahegan bioinformatics workshop 2010
E research overview gahegan bioinformatics workshop 2010E research overview gahegan bioinformatics workshop 2010
E research overview gahegan bioinformatics workshop 2010
 

KĂŒrzlich hochgeladen

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Christopher Logan Kennedy
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 

KĂŒrzlich hochgeladen (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Rescue of Long-Tail Data from the Ocean Bottom to the Moon

  • 1. Rescue of Long-Tail Data ‹ from the Ocean Bottom to the Moon! ! Leslie Hsu, Kerstin Lehnert, Suzanne Carbotte, Vicki Ferrini,! 1 2 3! ! John Delano , James B. Gill , Maurice Tivey ! Lamont-Doherty Earth Observatory, Columbia University,! ! 1University of Albany, 2University of California, Santa Cruz, 3Woods Hole Oceanographic Institution! ! ! IN12A. Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-source Science! Fall AGU 2013! IEDA iedadata.org
  • 2. Data at Risk! Â€ï‚€â€Ż "Data at Risk" is scientiïŹc data that are ! Â€ï‚€â€Ż not in formats that permit full electronic access to the information they contain. ! Â€ï‚€â€Ż Data at Risk may be ! Â€ï‚€â€Ż non-digital (e.g., handwritten or photographic), ! Â€ï‚€â€Ż on near-obsolete digital media (such as ïŹ‚oppy disks), ! Â€ï‚€â€Ż or insufïŹciently described (lacking metadata). ! Â€ï‚€â€Ż Some born-digital data are considered "at risk" if they cannot be ingested into managed databases because they lack adequate formatting or metadata.! ! DeïŹnition from the ICSU CODATA Data at Risk Task Group (DARTG)! IEDA iedadata.org
  • 3. Data Rescue! Â€ï‚€â€Ż A “Data Rescue Mission” is any effort to preserve data at risk. Rescue missions can come in the form of digitization, format migration, treating damaged materials (e.g., water or mold), adding metadata or any action taken to make data accessible in the long term.! M. Tivey Definition from ICSU CODATA Data at Risk Task Group (DARTG) IEDA iedadata.org
  • 4. Long Tail Data are often Data at Risk! The Head: Long Tail Characteristics! Astronomy, Climate, High Energy Physics, Genomics qï±â€Ż qï±â€Ż qï±â€Ż qï±â€Ż qï±â€Ż qï±â€Ż Long Tail: Environmental and Earth sciences http://juliegood.wordpress.com/tag/long-tail/ L. Wyborn More specialised! Low volume! On C drives! Hard to ïŹnd! Heterogeneous! Collected by many people! qï±â€Ż Citizen science! qï±â€Ż Etc! qï±â€Ż Etc! IEDA iedadata.org
  • 5. IEDA Data Rescue Mini-Awards! Â€ï‚€â€ŻEstablished to preserve valuable legacy data sets that are in danger by impending retirement or degradation! Â€ï‚€â€Ż Evaluated by highest impact on future research by quality, size, rarity, unique location or data type! Â€ï‚€â€Ż Made accessible to the community for re-use by inclusion in the IEDA data collections (EarthChem, MGDS, SESAR)! Â€ï‚€â€Ż $7000 award to support proper compilation, documentation, transfer! Â€ï‚€â€Ż 3 awardees chosen from 11 entries over a wide range of geochemical and geophysical data! ! IEDA iedadata.org
  • 6. 1: Geologic samples and geochemistry! Â€ï‚€â€Ż WHAT: Compilation of sample metadata and geochemical analyses from three areas – Fiji, Izu Arc, and Endeavour segment. (James B. Gill)! Maps made with GeoMapApp Â€ï‚€â€Ż WHY: study of intra-ocean arcs and spreading centers! Â€ï‚€â€Ż HOW: Check and add incomplete data, digitize data, add persistent identiïŹers. Link between related resources! Â€ï‚€â€Ż Major challenge: Physical sample management! IEDA iedadata.org
  • 7. The importance of Sample identiïŹcation! Â€ï‚€â€Ż Individual samples can play a large role in scientiïŹc conclusions, so accurate documentation of sample metadata is critical.! Â€ï‚€â€Ż The key measurement was the one backarc basalt called "PPTUW”... Subsequent efforts to conïŹrm the observation ran into problems. The apparently-same sample was variously called PPTU, PPTUW/5, PPTUW-1, and TVZ19 in four other papers. None of those papers gave its latitude and longitude
 (J. Gill and E. Todd)! IEDA iedadata.org
  • 8. 2: Near-bottom magnetics! Â€ï‚€â€Ż WHAT: Compilation of near-bottom magnetometer data, including raw, merged, processed, and navigation metadata (Maurice Tivey)! Â€ï‚€â€Ż WHY: study of magnetic reversals, effect of tectonics on magnetic ïŹeld! Â€ï‚€â€Ż HOW: gather data from different formats, add complete metadata and workïŹ‚ow! Â€ï‚€â€Ż Challenge: over three decades of technology and ïŹle formats! IEDA iedadata.org
  • 9. Evolution of equipment: 1985, 1992, 2004, 2011 ! IEDA iedadata.org
  • 10. Evolution of storage media! M. Tivey IEDA iedadata.org
  • 11. Addition of “sufïŹcient” metadata! IEDA iedadata.org
  • 12. 3: Lunar sample geochemistry! Â€ï‚€â€Ż WHAT: Compilation of lunar sample geochemistry (John W. Delano et al.)! Â€ï‚€â€Ż WHY: composition of the Moon! Â€ï‚€â€Ż HOW: Digitize photos, label speciïŹc grains, compile geochemistry in data templates! Â€ï‚€â€Ż Challenge: nothing was digital! ! LPI IEDA iedadata.org
  • 13. Use of IEDA EarthChem templates! IEDA iedadata.org
  • 14. Common needs addressed! Â€ï‚€â€ŻAccessibility – web access, links between systems! Â€ï‚€â€ŻDocumentation – README ïŹles, additional descriptions! Â€ï‚€â€ŻStandardization – IEDA EarthChem geochemical templates ! Â€ï‚€â€ŻPersistent links – DOIs and IGSNs! Â€ï‚€â€ŻCitability – DOIs, example citations! Â€ï‚€â€ŻGuidance/Training – calls and emails with disciplinary repository staff! IEDA iedadata.org
  • 16. Lessons learned: investigator! Â€ï‚€â€ŻTake ownership of your own legacy! Â€ï‚€â€Ż Data curation by others may not be complete or correct! Â€ï‚€â€ŻData rescue of an entire career does not need to be overwhelming ! Â€ï‚€â€Ż Start with small steps! Â€ï‚€â€Ż Disciplinary repositories will help and guide you to what is needed! Â€ï‚€â€ŻDespite the time investment, data rescue is worth it! Â€ï‚€â€Ż Others will now be able to re-use the data! Â€ï‚€â€Ż Notes taken years ago actually explain anomalies! ! IEDA iedadata.org
  • 17. Lessons learned: repository! Â€ï‚€â€ŻFor Long Tail Data, every project is different ! Â€ï‚€â€Ż There is not an established workïŹ‚ow – just past experience! Â€ï‚€â€Ż Time commitment from staff is nontrivial! Â€ï‚€â€ŻDisciplinary training helps a great deal! Â€ï‚€â€Ż Investigators need help determining the best products! Â€ï‚€â€ŻA small incentive will motivate investigators! Â€ï‚€â€ŻData Rescue missions help the repository determine next steps for development of tools and services! IEDA iedadata.org
  • 18. Summary of Long-tail Data Rescue! Â€ï‚€â€ŻThree Data Rescue efforts this past year by IEDA have made data that were at risk! Â€ï‚€â€Ż digitized from analog data and near-obsolete media! Â€ï‚€â€Ż sufïŹciently described for reuse! Â€ï‚€â€Ż in formats that permit full electronic access! Â€ï‚€â€Ż Citable, with persistent identiïŹers, and ready for reuse! Â€ï‚€â€ŻThe projects also helped IEDA identify improvements in data rescue workïŹ‚ow, and future tools and services! IEDA iedadata.org
  • 19. More Data Rescue Activities! Â€ï‚€â€ŻElsevier-IEDA Data Rescue Process Study! Â€ï‚€â€Ż A data entry tool for lunar geochemistry: MoonDB! Â€ï‚€â€ŻElsevier-IEDA International Data Rescue Award! Â€ï‚€â€Ż Winner announced at reception tonight, Monday Dec 9th, 2013! Â€ï‚€â€Ż Intercontinental Hotel, Twin Peaks Room, 7:00-8:30pm! IEDA iedadata.org