At the 2009 Seminar on Innovative Approaches to Turn Statistics into Knowledge (http://www.oecd.org/progress/ict/statknowledge), jointly organized by the OECD, US Census Bureau and World Bank, we proposed and demo'd a proof of concept on data sharing between international organizations. We demonstrated how open source tools could sit on top of existing infrastructure and reused visualization tools to show how data could be pulled and combined from the various organizations on the fly.
11. History
⢠WHO: Openhealth prototype - global
disease incidence reporting platform
⢠OECD: QWIDS - Query Wizards for
International Development Statistics
⢠Bill & Melinda Gates Foundation
⢠International Aid Transparency Initiative
69. Take Homes
⢠build shared software got Intâl Orgs
⢠sits on existing infrastructure
⢠end users can answer the harder questions
⢠Query and combine across organizations
⢠accessibility, usability
70. Where are we going?
⢠Funding from foundations
⢠Expressions of Interest
⢠WHO, OECD, IMF, WB, UNESCO,
UNCTAD, FAO
71. Come talk to us!
www.2paths.com/conf/tsik2009
aaron@2paths.com
michal@2paths.com
Hinweis der Redaktion
my name is Aaron Gladders, that’s me on the bottom left
my colleague, Michal Urbanski and his recent addition
we work @ 2Paths
we’re from canada yo
specifically Vancouver, where known for our rain
but it’s also a beautiful city
http://www.flickr.com/photos/tommyauphoto/2579810410/sizes/o/
Our Focus: making it easier for people (and machines!) to find data
At 2Paths we’re not about world domination. We want to feed our kids....
and ideally make the world a better place
So what is the problem
There are lots of UI’s out there, some great ones that we’ve seen here and we’ve also heard from some great story tellers. But they need access to the data in an accessible way to tell their stories. As we heard yesterday, much of the north american focus is on just getting access to the data.
So a little history. This began for us with the WHO Openhealth prototype. It was a global disease incidence reporting platform, meant to assemble data from all over the world on the diseases occurring there. Ultimately this was to show up on portals, with the associated grids, graphs and maps. Later we worked with making it easier for people to dig for the data they wanted, with the OECD Query Wizard for Int’l Development Statistics. This was built on top of .STAT, which you say yesterday in Trevor Fletcher’s gangster flick. In the last while we’ve been helping the BMGF to classify their information and ultimately to share it. And our most recent addition has been towards IATI, helping to define which tech should be used for donor governments to share their aid activities, to improve aid transparency and effectiveness.
Really though, we’re trying to answer questions. Such as...
Where are flu pandemics erupting. Generally you would goto the WHO for this.
How much is being spent on HIV/AIDS by Japan. Here you would goto the OECD
What if we want to get a little more complicated - You’d have to goto the WHO and the OECD
And even more complicated, when data comes from places all over the world like with the Millenium Development Goals
So where is this data?
You’d go to NSO’s
Or the websites of Int’l Orgs. Note some of these orgs are providing data feeds - OECD (not advertised, but they are there, we used them and we even built an nice RESTful one. It’s very exciting that the World Bank has a public API
More recently you can go to data.un.ORG
And now with the United States, data.gov
Really though, you go to as many sources as you can find (or even just one)
download it
combine it, chart it
and maybe map it (and with tools like Google Fusion, it’s a snap)
Our goal was to make it easy to answer these questions. That required cross org data mashups. But we wanted to leverage the existing tools out there, and more importantly, keep it simple.
We quickly zero’d in on semantic web tech - flexibility for each org define their information as they needed but allow for mappings between, in a queryable way.
I’m here to present a quick primer on Linked Data for the uninitiated. We’re going to quickly go over what it is, and why we should be paying attention to it.
The main technology behind linked data is one we should already be familiar with: the semantic web. It is, essentially, an extension of our current web. It grafts some new standards onto existing ones, in order to give meaning to content.
The technology driving the semantic web has been around for a while. While it may have started out as a highly academic exercise, it has evolved into a very compelling platform for the sharing of data. Semantic technologies have also benefitted from a lot of different areas, from better XML support in languages to the emergence of a new class of semantic-specific vendors. It’s a technology that has “escaped the lab”, so to speak, and is being used to solve actual problems, today.
If you had to summarize semantic technology in one sentence, it would be “Anyone can say anything, about anything”. However, what’s novel about it is that the way you say things is standardized, because ...
... the “meaning” of statement isn’t intended only for us, it’s also for machines, for tools and agents who can act on _behalf_ of people. Here we see a statement, “Japan is a DAC country”, represented in an example form that a machine would understand.
If we represent all our information in these semantic formats, we can leverage a significant number of tools that already understand them. We still have to do some work, just like we used to rolling out own XML formats for moving data around, except now we’re more likely to use other people’s vocabularies or produce vocabularies that others can use. These vocabularies are the “link” of linked data.
Because the semantic web has been in development for some time, there are already a number of existing vocabularies you can use to describe your data. Using them is the normal and natural way of participating in the world of linked data. In the case where you need to define your vocabulary because a suitable one doesn’t already exist, you will want others to use it as well, making the entire semantic ecosystem naturally extensible. Once you release and describe your data, others can easily say things about it, or use your vocabulary to describe *their* data, or link their data to yours.
A large amount of linked data is of limited utility if we have no way to find what we’re looking for. Relatively recently, the semantic web gained a nice, shiny query language called SPARQL. It became an official W3C recommendation at the beginning of 2008, so it’s pretty new, but it’s already become quite a popular tool for making complex queries into distributed stores of linked data.
There’s another aspect of the semantic web that we should find very interesting as we examine the world of linked data. The idea of “provenance”, which lets you trace where and from who a piece of data comes from, will be highly useful to organizations which are concerned about data accuracy.
Suppose you are looking at a chart of data you’ve found online. With a proper provenance system in place, you would be able to tell where that chart’s data came from, and more importantly, you could determine whether or not you can ...
... trust it. By enumerating your trusted data sources, a system can automatically determine whether the data you’re looking at is trustworthy by examining its provenance.
Suppose, that chart had used data from Wikipedia in addition to officially published figures from the OECD and the WHO. In this case, you might want to bit more careful using the data from that chart.
Of course, if Wikipedia also has a provenance system in place, ideally your software will follow that chain and perhaps you can trust that data after all.
The last thing that I’d like to touch on with respect to semantic technology is the idea of inference. This basically means that a semantic system is able to derive “new knowledge” based on things it already knows.
A basic example would be, given a system that knows “Japan is a DAC country” and “DAC countries are donors”, it would be able to infer that Japan is a donor country. This is one of the keys for semantic technology... it’s arguably what’s driving widespread semantic adoption. Once the data and metadata are there, this is the tool that will really drive innovation.
And that’s it. Hopefully that gives you a reasonable picture as to what linked data and the semantic web are, and why we ought to be interested in them.
And now we’re going to take a walkthrough a little prototype we’ve done up, in order to demonstrate a working system that can make use of distributed, linked data. I’m presenting screenshots because we’re paranoid about demo curses, but we do have a running system available here today, which we’re willing to show you under less stressful circumstances.