Kasabi, an online data market based on linked data principles, offers data publishers an easy way to publish, link and monetise data, while giving developers of data-centric applications access to this data in different formats and through a number of different interfaces.
11. What’s so special?
• Kasabi is based on linked data principles
- data in graph structure (RDF)
12. What’s so special?
• Kasabi is based on linked data principles
- data in graph structure (RDF)
- URIs identify data items
13. What’s so special?
• Kasabi is based on linked data principles
- data in graph structure (RDF)
- URIs identify data items
- data links to other datasets (context)
14. What’s so special?
• Kasabi is based on linked data principles
- data in graph structure (RDF)
- URIs identify data items
- data links to other datasets (context)
- linked data views
42. Summary
• Kasabi is a platform to publish, link, find
and consume data
• based on linked data principles
• Linked Data as a Service
• APIs over your data
• data in different flavours (turtle, json, rdf/
xml)
43. Keep in touch!
• http://kasabi.com
• http://blog.kasabi.com/
• Twitter: @kasabi
• IRC: #kasabi (freenode.net)
• this presentation:
http://www.slideshare.net/dunken69/the-kasabi-information-marketplace
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
44. Under the Hood
• Cohodo
• (used to be Talis Platform)
• distributed data platform
• load balancing, data replication, etc.
Hinweis der Redaktion
- Kasabi is an information marketplace\n- or, to use a different buzz word that you might have come across, a “data market”\n
- so, what does that mean, “information market place”?\n- well, it means different things depending on your perspective\n- if you are a data provider, that is, an organisation, or a company, or an individual that has data and wants to make it available, then Kasabi is a place for you to publish your data\n- a place to integrate your data - link it, put it into context with other data\n- also, it can be a place for you to monetise your data, if that’s what you want\n- on the other hand, if you’re a data consumer, for example a developer, a journalist or an analyst, then Kasabi is a place for you to find data and use it\n
- so, what does that mean, “information market place”?\n- well, it means different things depending on your perspective\n- if you are a data provider, that is, an organisation, or a company, or an individual that has data and wants to make it available, then Kasabi is a place for you to publish your data\n- a place to integrate your data - link it, put it into context with other data\n- also, it can be a place for you to monetise your data, if that’s what you want\n- on the other hand, if you’re a data consumer, for example a developer, a journalist or an analyst, then Kasabi is a place for you to find data and use it\n
- so, what does that mean, “information market place”?\n- well, it means different things depending on your perspective\n- if you are a data provider, that is, an organisation, or a company, or an individual that has data and wants to make it available, then Kasabi is a place for you to publish your data\n- a place to integrate your data - link it, put it into context with other data\n- also, it can be a place for you to monetise your data, if that’s what you want\n- on the other hand, if you’re a data consumer, for example a developer, a journalist or an analyst, then Kasabi is a place for you to find data and use it\n
- so, what does that mean, “information market place”?\n- well, it means different things depending on your perspective\n- if you are a data provider, that is, an organisation, or a company, or an individual that has data and wants to make it available, then Kasabi is a place for you to publish your data\n- a place to integrate your data - link it, put it into context with other data\n- also, it can be a place for you to monetise your data, if that’s what you want\n- on the other hand, if you’re a data consumer, for example a developer, a journalist or an analyst, then Kasabi is a place for you to find data and use it\n
- so, what does that mean, “information market place”?\n- well, it means different things depending on your perspective\n- if you are a data provider, that is, an organisation, or a company, or an individual that has data and wants to make it available, then Kasabi is a place for you to publish your data\n- a place to integrate your data - link it, put it into context with other data\n- also, it can be a place for you to monetise your data, if that’s what you want\n- on the other hand, if you’re a data consumer, for example a developer, a journalist or an analyst, then Kasabi is a place for you to find data and use it\n
- so, what does that mean, “information market place”?\n- well, it means different things depending on your perspective\n- if you are a data provider, that is, an organisation, or a company, or an individual that has data and wants to make it available, then Kasabi is a place for you to publish your data\n- a place to integrate your data - link it, put it into context with other data\n- also, it can be a place for you to monetise your data, if that’s what you want\n- on the other hand, if you’re a data consumer, for example a developer, a journalist or an analyst, then Kasabi is a place for you to find data and use it\n
Ok, just to give a couple of basic facts to set the stage:\n- Kasabi is web-based platform. In other words, you store and consume data “in the cloud”, there is no need to install any software locally\n- Kasabi is a horizontal market place, meaning that it is not specialised for any particular domain\n- instead, you will find data from any domain in there\n- the was you interact with Kasabi as a developer is through a set of APIs (I’ll come to that)\n- or you can use a number of language bindings\n- there is also a Python-based command line tool called “pytassium”\n\n
- there are a number of data markets out there, each with their own distinguishing features\n- Infochimps, Factual, MS Azure, ...\n- one of Kasabi’s main differentiators is that it’s built on linked data principles\n- in other words, your data in Kasabi will has a graph structure, rather than a table structure\n- each individual data item is identified by a URI\n- this enables you to link your data not just within your own dataset, but also to any other dataset\n- so, you can integrate your data and put it into context\n\n
- there are a number of data markets out there, each with their own distinguishing features\n- Infochimps, Factual, MS Azure, ...\n- one of Kasabi’s main differentiators is that it’s built on linked data principles\n- in other words, your data in Kasabi will has a graph structure, rather than a table structure\n- each individual data item is identified by a URI\n- this enables you to link your data not just within your own dataset, but also to any other dataset\n- so, you can integrate your data and put it into context\n\n
- there are a number of data markets out there, each with their own distinguishing features\n- Infochimps, Factual, MS Azure, ...\n- one of Kasabi’s main differentiators is that it’s built on linked data principles\n- in other words, your data in Kasabi will has a graph structure, rather than a table structure\n- each individual data item is identified by a URI\n- this enables you to link your data not just within your own dataset, but also to any other dataset\n- so, you can integrate your data and put it into context\n\n
- there are a number of data markets out there, each with their own distinguishing features\n- Infochimps, Factual, MS Azure, ...\n- one of Kasabi’s main differentiators is that it’s built on linked data principles\n- in other words, your data in Kasabi will has a graph structure, rather than a table structure\n- each individual data item is identified by a URI\n- this enables you to link your data not just within your own dataset, but also to any other dataset\n- so, you can integrate your data and put it into context\n\n
- once you publish your data, it doesn’t just sit there, waiting to be downloaded as a CSV file, or even an RDF file\n- instead, what will happen is that you get a number of basic APIs that provide various ways of accessing your data\n- the most powerful of these is the SPARQL API, in other words, you get a SPARQL endpoint out of the box to allow rich, structured queries over your data\n- you also get keyword search over your data, a lookup API to get descriptions for individual data items, a reconciliation API and an attribution API\n- probably most interesting: you can define your own custom APIs to provide specialised access to your data\n\n
- once you publish your data, it doesn’t just sit there, waiting to be downloaded as a CSV file, or even an RDF file\n- instead, what will happen is that you get a number of basic APIs that provide various ways of accessing your data\n- the most powerful of these is the SPARQL API, in other words, you get a SPARQL endpoint out of the box to allow rich, structured queries over your data\n- you also get keyword search over your data, a lookup API to get descriptions for individual data items, a reconciliation API and an attribution API\n- probably most interesting: you can define your own custom APIs to provide specialised access to your data\n\n
- once you publish your data, it doesn’t just sit there, waiting to be downloaded as a CSV file, or even an RDF file\n- instead, what will happen is that you get a number of basic APIs that provide various ways of accessing your data\n- the most powerful of these is the SPARQL API, in other words, you get a SPARQL endpoint out of the box to allow rich, structured queries over your data\n- you also get keyword search over your data, a lookup API to get descriptions for individual data items, a reconciliation API and an attribution API\n- probably most interesting: you can define your own custom APIs to provide specialised access to your data\n\n
- once you publish your data, it doesn’t just sit there, waiting to be downloaded as a CSV file, or even an RDF file\n- instead, what will happen is that you get a number of basic APIs that provide various ways of accessing your data\n- the most powerful of these is the SPARQL API, in other words, you get a SPARQL endpoint out of the box to allow rich, structured queries over your data\n- you also get keyword search over your data, a lookup API to get descriptions for individual data items, a reconciliation API and an attribution API\n- probably most interesting: you can define your own custom APIs to provide specialised access to your data\n\n
- once you publish your data, it doesn’t just sit there, waiting to be downloaded as a CSV file, or even an RDF file\n- instead, what will happen is that you get a number of basic APIs that provide various ways of accessing your data\n- the most powerful of these is the SPARQL API, in other words, you get a SPARQL endpoint out of the box to allow rich, structured queries over your data\n- you also get keyword search over your data, a lookup API to get descriptions for individual data items, a reconciliation API and an attribution API\n- probably most interesting: you can define your own custom APIs to provide specialised access to your data\n\n
OK, I’m just going to give you a little tour of the Kasabi web app.\nYour starting point as a registered user is always the dashboard, which provides you with some usage statistics, your own datasets, the datasets you have subscribed to, lets you change your profile, etc.\nYou also find out about your API key here, which you need to access any other datasets APIs.\nNow, if you’re a data publisher, this is also the place where you create new datasets.\n
- creating a new dataset is a relatively simple process:\n- you put in some basic metadata, maybe a logo and a short description\n- you could assign some categories, specify a license for your data, give typical example resources to illustrate what kind of data people can find in your dataset\n- for uploading your data, the requirement is currently that you have your data available in graph form (i.e., RDF)\n\n
- obviously the RDF requirement is a bottle neck\n- internally we are using a tool called “Vertere” to convert tabular data into RDF\n- snippet of MS AdventureWorks dataset\n- relatively simple approach\n- declarative conversion\n\n
- in a nutshell, you define mapping rules for each column\n- can be quite simple (#category rule) or more complex (#weight) rule\n- similar to XLWrap\n
- here is the output data ready to upload to Kasabi\n
- right, once your data is in Kasabi, you get a dataset page which gives consumers an overview of what it has to offer\n- this starts with very high-level overviews (types of things)\n
- ... continues with detailed structured metadata about the dataset\n- another feature of Kasabi is that any kind of data the platform provides is usually available in different flavours\n- so, here we are looking at an HTML page showing the dataset metadata\n
... this is the same data as a Turtle (RDF) file\n- for those of you who know a little about linked data best practices - this is a VoiD dataset description\n
... you can also get the same as JSON\n- these different flavours are all available under their own URI\n- however, HTTP content negotiation is also supported\n- also via URL parameters\n- this way of getting data in different flavours works across the board, for each data item in a dataset\n- by the way - yes, we distinguish information and non-information resources...\n
- I told you about APIs\n- so, I’m just going to give you a quick glimpse at each of them\n- the query API basically is a SPARQL endpoint that you get out of the box when you publish a dataset\n- connected to that is the possibility of adding example queries to your dataset page, which can act as additional documentation for users\n- actually, anyone who has subscribed to your dataset can add queries here, so in this way the dataset page acts a little like a community hub around your data\n
- obviously the SPARQL API follows the regular SPARQL HTTP protocol, but like all APIs, it also has a Web interface where you can just try it out in your browser\n
- a basic keyword search API\n- you can also use SOLR syntax by appending /solr to the path\n
- lookup API useful for publishing existing RDF data in Kasabi\n- when URIs in dataset are not in the namespace of the dataset itself\n- otherwise, Kasabi will just serve these descriptions directly at the URI of the resource, following linked data principles\n
- the reconciliation API can be used to find the identifier of some data item in your dataset\n- e.g., I know there is an author called “Tom Heath” in the WWW dataset, but I don’t know his URI\n- this can be very useful for entity resolution purposes and linking between datasets\n- if you are familiar with Google Refine, then you can use the API within Google Refine to align data\n
- a very useful feature is to ability to create custom APIs\n- in particular a stored SPARQL procedure\n- you could say that SPARQL itself is for power users, but not all users of your dataset are likely wanting to learn SPARQL\n- so what you can do is pre-conceive typical queries that users of your data might want to run, and wrap them in a simple API call \n
- what you would do is define a SPARQL query like this, to get papers by subject\n- and then bind one or more of the variables in the query to an API parameter\n- so finally, to get all papers for the subject “online communities”, you would have a simple API call like this, rather than requiring users to write the query\n
- what you would do is define a SPARQL query like this, to get papers by subject\n- and then bind one or more of the variables in the query to an API parameter\n- so finally, to get all papers for the subject “online communities”, you would have a simple API call like this, rather than requiring users to write the query\n
- there are more APIs for you - you can find a complete list here\n- for example, there are various data management APIs, which will be use by the data publisher themselves\n
- does it cost anything?\n- at the moment, we are still in beta, so everything is free\n- there will be different pricing plans once we get out of beta (no date set yet)\n