Keynote at Online Information 2009, delivered on 3rd December. I discuss hype and reality and focus on linked data as the dominant design for publishing data on the web. This
Exploring the Future Potential of AI-Enabled Smartphone Processors
The Reality of Linked Data
1. The Reality of Linked Data Ian Davis, CTO, Talis Online Information 2009
2. “ A significant change in the computer field in the last five to eight years has been made in the way we treat and handle data. In the early days of our field, data was intimately tied to the application programs that used it. Now we see that we want to break that tie. We want data that is independent of the application programs that use it – that is, data that is organized and structured to serve many applications and many users. What we seek is the...”
5. “ Copernicus completely reoriented our view of astronomical phenomena when he suggested that the earth revolves around the sun. There is a growing feeling that data processing people would benefit if they were to accept a radically new point of view, one that would liberate the application programmer's thinking from the centralism of core storage and allow him the freedom to act as a navigator within a database.”
6. “ Both software and the hardware needed remain immature, that little experience so far existed in its use and that the generalized features offered by the DBMS brought a hefty performance penalty”
28. Find out more http://www.talis.com/platform http://blogs.talis.com/nodalities [email_address]
Hinweis der Redaktion
The title of my talk today is the reality of linked data and I want to show you what is possible today with linked data, who else is using it and how you can get started. But first, I'd like to read this quote that I came across recently
A data base. Two words:data base. This isn't a software system this is a base of data.
Those words are from Richard G. Canning in his introduction to the 1973 Turing Award. 1973! The sentiment is very familiar today nearly four decades later. The recipient of that year's Turing Award was Charles W. Bachman a pioneer in the field of databases. In his acceptance lecture Bachman compared the change in thinking needed for information systems to that of Copernicus
Bachman was speaking against a background of a decade of hype for database management systems. The technology was seen as a means of enabling everyone in an organisation to have access to information “at their fingertips”. Even senior managers would be using the technological marvel of the database. This myth was brought down to earth in the mid seventies with a series of damning reports.
One stated: In addition no survey of the early 1970's were able to find any firms where the database was used directly by managers or even by analysts. By 1981 the market leading datanase system TOTAL had only 4000 installations while IBM's IMS was in second place with around 1500. But in the same year, in the midst of a severe recession, RSI renamed itself Oracle, Sequoia Capital provided growth investment and the rest is history. Today even our managers can access the data they need.
This process, this technology adoption process, is well understood these dats and is best illustrated by this famous diagram. There is this crucial period as the technology starts up the slop of elightenment. That's the point that Oracle got started. After the hype had died away and people started taking a serious look at the reality of the technology. I think this is where we are with the Semantic Web today.
This is also about the time that the industry converges on dominant designs. This is an accepted pattern for a technology, like the pedals in a car. Dominant designs don't stifle innovation but they drive adoption. Massively.
Linked Data is a dominant design for the Semantic Web. it lays down a standard pattern for publishing data so it can be found and reused.
One of the things that Linked Data teaches us is that your website is your API. What does that mean? It means that with a little extra effort to publish data as well as your normal HTML you can enable people to use your site to build other services and applications. Making your site into an API is simple
The most important thing you can do also happens to be the simplest. Look at your data and think about what it is about – the places, people and things. Then give each of those things an identifier, a URI, just like you do with your web pages. By assigning URIs to things you enable other people to talk about them. You enable people to link to them.
The next most important thing you can do is to describe those things using RDF. Your descriptions don't have to be sophisticated. Do as much or as little work as you can afford. The better the descriptions are though, the more useful they will be for other people. Including links to other things gives your description context.
Finally you should respond to requests on your identifiers by sending your description of that thing. You can just serve the plain old RDF, or to be more helpful you can provide HTML versions of the descriptions too. If you use RDFa then you can do both in a single document.
With these three steps you have turned your website into an API. In fact its the best kind of API because its users don't need any special software to use it. Also they don't need to learn a new API for every site they want to use. This talk is about the reality of linked data, not the hype. So which real companies and organisations are doing this today?
The BBC for one. They are publishing their programme catalogue as linked data. And they don't compromise on style or usability.
The data for all these BBC programmes is right there behind the page. Every programme has an identifier, a URI. Every segment of a programme, every brand, every person. In fact all important things in the BBC data has a URI.
When you turn your website into an API using linked data you find that people start building new things that reuse your data in new and interesting ways. This is fanhu.bz a prototype service that uses linked data from the BBC programmes pages and remixes it with Twitter to build a social space for fans of BBC programmes.
The BBC also expose linked data for their music site. Interestingly this site reuses linked data from two other sources: dbpedia and musicbrainz
This is LIBRIS the Swedish Union Catalogue publishing linked data in exactly the same way
Here is the UK government doing exactly the same, this time with education data.
The Library of Congress Subject Headings
The New York Times name subject headings. Incidently the New York Times have wonderful metaphor to describe their linked data: they call it their treasure map.
All the sites I have shown so far have been read-only. But you can use linked data for fully interactive web apps too. This is Talis Aspire, one of our products, used by the University of Plymouth. This is a reading list for a module in a mathematics course. All of this is, of course, available as linked data. Because it is also an API the university can reuse this data in lots of different contexts with very little effort.
But this is a powerfil interactive application with full editing capabilities. Talis Aspire allows teaching staff to build reading lists using a simple bookmarklet that the detects the page being viewed and saves it to a reading list.
Today, to obtain the metadata for that journal, we have to screen scrape the page to look for text that looks like a DOI (if we are lucky). That is then looked up in a separate repository. Just think how much simpler and less error prone it would be if the publishers website were its API. It could be if they just published linked data.
So what I have shown you is the reality of linked data. Forget the hype and don't be disillusioned. You can be productive today and turn your website into your API.
Remember to identify the important things with URIs, describe them using RDF and respond with those descriptions when people request your identifiers.