Connecting European Archaeology datasets: prospects and challenges

Connecting European
archaeology datasets:
prospects and challenges
Kate Fernie, 2Culture Associates
Big Data in Archaeology: Practicalities and Possibilities
27-28 March 2019

• CARARE
• A brief history
• Datasets and their diversity
• Metadata and schemas
• Challenges
• Possibilities
Introduction

CARARE
Connecting Archaeology and Architecture in Europe
• Began as an EU-funded best practice network in 2010
• Established as a membership association in 2016
• Objective: Advancing professional practice and fostering appreciation
of the digital archaeological and architectural heritage
• Areas:
• Good practices, advice and guidance
• Services to enable data sharing
• CARARE metadata schema
• Promoting re-use
http://www.carare.eu/

Steps on the way to CARARE
• A shared vision
• International collaborations on
heritage data (CIDOC, Arena,
Acquarelle, DARIAH, INSPIRE,
Europeana, etc.)
• Digitisation and use of digital
technologies
• GIS
• Technical infrastructures
A brief history

Who is collecting archaeological and architectural heritage data?
• State agencies
• inventories of protected sites, monuments and buildings
• conservation records, field investigations, surveys
• Museums – finds and excavation archives
• Research Institutions & researchers
• Libraries
Datasets
Image: Swedish National Heritage Board

CARARE and related projects have aggregated over 6 million digital
objects from 20+ countries for Europeana.eu
Many different types of object
• Inventory records, reports, photographs, drawings, books, videos, objects,
aerial photos, GIS datasets, 3D datasets, models, reconstructions, and more
Many different ways of recording objects
• Heritage agencies, museums, archives, libraries, researchers all have
different ways of describing objects
Many different languages, vocabularies, time periods and map systems
Rather diverse

Tournoi royal de motos à Londres changement
d'une roue de side-car en marche, 1932
Agence de presse Mondial Photo-Presse.
We work with
the metadata
that’s provided

CARARE defined a metadata model for metadata aggregation
• Standards based: CIDOC core standards, MIDAS Heritage, LIDO and EDM
• Distinguishes between “heritage assets” (monument, building, painting, book,
image, film, 3D) and digital representations found online
• Allows for events (field activities, lab work) and collections
• Supports objects that are composed of other objects (complexes and
hierarchies)
• Is rich where the domain calls for it (e.g. time, space, monument character)
The schema meets a need to mediate between native data (exports) and enable
their transformation into a common format
Combining datasets

Let’s see an example
MINT
• Metadata mapping (from
native to target schema)
• Preview
• Statistics
• Transformation (to target
schema(s))
Rijksdienst voor het Cultureel Erfgoed:
Rijsmonmumenten

Making connections
Heritage asset
Has
representation
Images: Instituto Universitario de Investigación en Arqueología Ibérica
“Hornos de Peal, Jaén”
Has
representation
is related
Relationships between the main CARARE classes:
• Heritage asset, digital resources and events
Has Met

Enriching metadata during mapping
Heritage asset
Images: Instituto Universitario de Investigación en Arqueología Ibérica
“Hornos de Peal, Jaén”
<car:heritageAssetType>http://vocab.getty.edu/aat/300054328</car:heritageAssetType>
Adding constants: LOD
AAT concepts
<car:heritageAssetType lang="es">Necrópolis</car:heritageAssetType>
Languages identification
Mapping the metadata gives an opportunity to
make some simple enrichments, by adding:
• Language of the metadata
• Name of the provider
• Country of provider

There’s a difference between doing a schema mapping and a mapping to
transform real data.
Data issues can include:
• Data that doesn’t conform entirely to the scope of an element
• Multiple values within a single element (separators)
• Data inserted in mandatory elements (n/a)
• Lack of unique values
A good mapping can address some of these issues, e.g. by splitting
multiple subject concepts into separate elements.
(issues can be fixed at source, but this can be time consuming with datasets that
include hundreds of thousands of records).
Quality issues

Transformation: some semantic gains
Through transformation to a
common schema, we achieve
interoperability between
disparate datasets
 Enabling cross searches
(what, when, where, who)
 Open licencing of the
metadata and APIs enables
reuse in various applications
http://eculturemap.eculturelab.eu/eCulture14m/Map.html?

• Metadata mapping is rarely easy
• Metadata models are complex with subtle difference in world view
• Statistical metrics can show that recording practices diverge and other
quality issues
• Native metadata is designed to serve specific purposes
• Local context, audiences and questions
• Merging metadata from various organisations in different
countries/languages poses special challenges
Some challenges

Aggregators like CARARE enable transformation of metadata into a
common model and have some services to enable further work
• Language labelling
• Adding Linked Open Data
• Automatic enrichment
• Crowdsourcing
Aggregating and enriching
MORe

One of the big challenges in searching across datasets in Europe is
dealing with data in different languages
Linguistic resources and translation tools are increasingly available, but to
work they need first to identify which language is involved
 Language labels are often missing
 Language identification and labelling microservices
Interfaces, displays and search services can adapt to users’ preferred
language and in this way return results which are relevant but which have
been catalogued in unfamiliar languages.
Why add language information to data?

CARARE microservices include:
• Natural language processing techniques to enable subject concepts
and names to be extracted from text
• Geocoding services to add coordinates for named places
• Vocabulary matching services
• Geo conversion, inversion and normalization services
Automated enrichment

Location case study
• Location is important for archaeology but place information is often
missing, especially for content from library, archive and museum
collections
• Automated extraction techniques can identify place names in data, but
place names are not unique
• The process requires quality control
• Crowd sourcing is one way of harnessing the knowledge of individuals
to check the results of automated enrichment and place objects
correctly on the map
• One such service was developed by the LoCloud project
Crowd sourcing

The content aggregated by CARARE is in Europeana
Take a look: www.europeana.eu

Is it big data?
• Volume – 2-4 million assets aggregated by CARARE
• Includes the national heritage inventories for several
countries, which are individually quite large datasets
• Europeana includes another 1 million+ assets relevant for
archaeology aggregated by other projects
• Includes museum and library collections, film archives,
newspaper reports
• Quite big?
• New research would be great!

kfernie27@gmail.com
Any questions?
www.carare.eu

Connecting European Archaeology datasets: prospects and challenges

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Connecting European Archaeology datasets: prospects and challenges

Ähnlich wie Connecting European Archaeology datasets: prospects and challenges (20)

Mehr von CARARE

Mehr von CARARE (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Connecting European Archaeology datasets: prospects and challenges