1. @DM2Europeana
Introduction to DM2E (Digitised
Manuscripts to Europeana)
Sam Leon
(Open Knowledge Foundation)
Eva Minerva 2012
2. Lower the barriers for humanities
scholars to re-use digital heritage
in their teaching and research
Eva Minerva 2012
3. Division of labour
1. Provide data and digital content to
Europeana with a focus on digitised
manuscripts
2. Build tools for the aggregation of this data
and content so that it can be re-used and
connected
3. Develop tools for the re-use of Europeana
Linked Data in humanities research contexts
Eva Minerva 2012
4. Content providers
• European Association for Jewish Culture (Judaica)
• Max-Planck-Institut für Wissenschaftsgeschichte (ECHO)
• Österreichische Nationalbibliothek (Google)
• Staatsbibliothek zu Berlin (Kalliope)
• University of Bergen (Wittgenstein)
• CNRS ITEM (Nietzsche)
• National Library of Israel (Judaica)
• Berlin Brandenburgische Akademie (German Text Archive)
• Humboldt-Universität zu Berlin (Polytechnisches Journal)
Eva Minerva 2012
7. Linked Data & RDF
• Resource Description Framework (RDF)
– Conceptualizes information as web-resources identified by a URI
– Models information as statements about resources in the form of
triples (subject – predicate – object)
– RDF makes meaningful connections between resources very easy
Eva Minerva 2012
8. Linked Open Data
• Allow teachers, students and researchers to
explore the connections others make
between works, authors and collections
• Lower the barrier to entry for others to build
the tools that will define 21st century
scholarship
Eva Minerva 2012
10. Beyond infrastructure
“Research infrastructure is not research just as roads are not
economic activity. We tend to forget when confronted by large
infrastructure projects that they are not an end in themselves.
[...] Infrastructure projects can become ends in themselves by
developing into an industry that promotes continued
investment. To sustain infrastructure there develops a class of
people whose jobs are tied to infrastructure investment.”
Rockwell, 2010
Eva Minerva 2012
11. also wrote
authored by
refers to held by
a theory of language
Eva Minerva 2012
12. Scholarly primitives
• "Scholarly Primitives: what methods do humanities researchers
have in common, and how might our tools reflect this?” – John
Unsworth, 2000
• Unsworth’s scholarly primitives:
– discovering
– annotating
– comparing
– referring
– sampling
– illustrating
– representing
• Professor Stefan Gradmann co-authoring a paper with the DM2E
Digital Humanities Advisory Board developing this set of primitives
to further model the scholarly domain
Eva Minerva 2012
15. Pundit Demo
Hands-on Pundit demo 15:45-17:00 in the
Auditorium with Christian Morbidonni (Net7)
Eva Minerva 2012
16. Wittgenstein Case Study
10 scholars from the University
of Bergen using DM2E tools to
annotate and curate
Wittgenstein’s Brown Book
Feedback in to further
development of the
platform
Eva Minerva 2012
17. Useful Links
• http://dm2e.eu
• http://thepund.it
• Follow @DM2Europeana
• Get in touch email sam.leon@okfn.org
Eva Minerva 2012
Hinweis der Redaktion
Introduction Project Manager at the Open Knowledge FoundationTwitter @DM2EuropeanaTwitter @OKFNAn NGO founded in 2004 to promote information sharing especially between government’s and their citizens but also working with GLAMs to promote accessAlongside my colleagues here from JudaiciaEuropeana, the National Library of Israel, JudaicaEuropeana and the Staatsbibliotek in Berlin we are part of a three-year EC project called DM2E* So DM2E - like many technology projects, it's an acronym - unpacked = Digitised Manuscripts to Europeana - a fitting project for the annual conference on digitisation* But it is important to recognise the DM2E is not a digitisation project per se, while it has a multitude of content partners providing digitised manuscripts the focus of DM2E is the provision of tools to enable re-use of this material in scholarly environments* As was very forcefully remarked during yesterday’s Museum Track the true advantage of the digital is that it promotes access, a sentiment to which I wholly concur, not only as a representative of the DM2E project but also as representative of the Open Knowledge Foundation an organisation founded to promote digital access be it to government data or cultural resources.* But DM2E is about taking the digital access movement beyond just making things digital and making things available online - DM2E aims at making digital heritage available in a form and in tandem the tools that will actually enable re-use of that digital content
In a nut-shell
In what follows I will look at the three main areas of work that currently going on in DM2ERoughly speaking these break up as follows
First to content provision. We are proud to have such a distinguished set of CH institutions providing openly licensed data and content all of which will eventually be integrated in to Europeana.Among them are institutions present here the National Library of Israel, Thorsten from the Staatsbibliotek in Berlin
For those who don’t know Europeana is the closest thing we have to a European digital libraryThere are over 20 million records of objects thereAround 2000 data and content providersVast dataset – metadata more reliable and richer than other existing aggregation portals Internet ArchiveIt has the potential to make our European cultural heritage more discoverable and better connected
* Digital, online is not enough – we need to address the technical challenge – we need the data to be available in the right form so that it can be re-used but we need the tools that are capable of re-using it.Tim Berners Lee founder of the world wide web recognised this – it is not just about making data available online, but making it availible online in a form that maximises the potential for re-useWithout going into too much detail this involves two primary things – idnentifying things in your dataset with URIs and linking them to other collections – the best way to achieve this is through storing your data as Linked Open Data
* The fantastic news is that the smart people at Europeana have acknowledged this and are currently ptomotyping a Linked Data version of their portalDM2E is part of the next generation of Europeana projects that utilize this exciting technologyMore specifically they use a serialisation of Linked Data called RDF that conceptualises information as web resources indetified by a URI and stores information as statements called triplesTalk through example of Mona LisaWhy is RDF important, why is it important for DM2E which aims to lower the barriers to the re-use of cultural heritage material in the humanities?It is important for two main reasons:It facilitates the linking that Tim Berners Lee characterised as the 5-star, the top-grade, of Linked Open Data as it wereSecondly, RDF is very expressive, it approximates to a human language, and it allows you to capture the thought processes of those interacting with digital heritageScholarly methods and interactions with texts and images can be captured in RDFA scholars annotation of a given piece of text can be captured in RDF linking that very section of text, to a concept and perhaps another web resourceBefore I go into that in more detail, I would like to point out one further thing about the work being done as part of DM2E
It is open all metadata submitted as part of DM2E conforms to the Data Exchange AgreementWhy is this important?2 reasons… read slideThink about the release of open data by the London Mayor – hundreds of navigation appsWe want the energy that goes into developing the tools for interacting with music and film into apps like Spotfy and Netlfix to go into the development of tools for use within research – opening data up is one way to do this
Having spoken now about the content provision to DM2E and the need to aggregate this as Linked Open Data I want to look at what I identified as the third and final part of the work being undertaken as part of DM2E – the building of tools for schoalrly communities for the re-use of Linked Open DataThe primary goal being to enableresearchers to do what they’ve done for milleniaand do things they’venever done before
Fundamental to this is the idea that we’re not as part of DM2E just trying to build infrastructure, we’re actually trying to build something that will directly lead to an improvement to research and is not just technology as an end in itselfWe want to understand the existing methods of scholars and support and enrich them in close consultation with themOn the screen is a quote often referred to by Professor Stefan Gradmann the Project Coordinator of DM2E and goes to the heart of what we are doing…
In the spirit of this, let me take an example close to my heartAs an MA student I was an avid reader and admirer of the work of Ludwig WittgensteinWhat are the kinds of things I might have said in my thesis or a paper I was preparing for publication?In the course of my work on this given section of text I’ve done a number of things I’ve compared, I’ve referred to work and authors, and the likelihood is that I’ve also annotated
Capturing these different kinds of activity is something that researchers within the Digital Humaniteis have been working on for some timeIn many ways this work begins with John Unsowrth’s seminal paper called schoalrlyprimtivesThere he listed the key primitives that most scholarly methods have in common in an attempt to improve tools that were being made for scholarsVery much in this line Professor Stefan Gradmann and the DM2E Advisory Board are preparing a paper for publication in 2013 that will elaborate on Unsworth’s work for to enumerate these activities and further model the scholarly domain is the first step to going beyond infrastructure or technology for the sake of technology as these primtives will be be the basisBut you’ll be pleased to hearthat the DM2E team has not simply modeled the scholarly domain it has made the ifrrst steps to building the tools that respond to these primitives utilising the expressivity of RDF triples to capture scholarly work
The tool is Pundit developed by the brilliant team at Net7 led by my colleague Christian MorbidonniIn a nut-shell, Pundit is an open-source semantic annotation tool that can be used in your browserMore importantly Pundit is the tool through which I could capture the thought processes I outlined in my Wittgenstein illustration in the form of RDF annotationsMore than this Pundit is the tool that would enable me to share the annotations that I created on a given workYou might think that this was not so exceptional in my case?But what if, these tools were adopted by the wider scholarly community think about the richness of the links that would be created between web resourcesThis tapestry of annotations and web resources could become the lifeblood of scholarly debate, allowing the student or researcher to survey the connections made between all previous works, with easy access to all the resources required at little to no cost allowing them in turn to build on this knowledge.A process that has of course been going on for many centuries in the humanities, but one that can be enriched tremedously and made easier by Linked Open Data tools like Pundit.
But what of the things scholars haven’t been doing for years, surely we want to innovate and find the technologies that will take us beyond the methodologies of yesterdayWhat is exciting is that we already see the Pundit team making steps in this directionEdgeMaps is a visualisation engine and when combined with Linked Data annotations made in Pundit you can generate some startling graphsPerhaps you want to visualise assertions of inluence made by various authors – with a wide enough group, you could generate interesting and unexpected results perhaps being able to identify clusters or connections between authors that would have been impossible or at least very difficult to discover using conventional reading methologies alone in a human life time
I will leave the disucssion of Pundit there…
As a way of wrapping up I’ll hint at one of the most exciting next steps in the DM2E projectReport at Digital Humanities 2013