2. Contents
• Introduction
• Data Marketplaces
– Factual, InfoChimps, Azure DataMarket, Freebase, Socrata,
Kasabi
– Data Market, Timetric, xIgnite
• Data Marketplaces for Linked Data
(Linked) Data Marketplaces Jan 2011 #2
3. INTRODUCTION
(Linked) Data Marketplaces Jan 2011 #3
4. Definitions
• Data-as-a-Service (DaaS)
– “Like all members of the "as a Service" (XaaS) family, DaaS is based on
the concept that the product, data in this case, can be provided on
demand to the user regardless of geographic or organizational
separation of provider and consumer. Additionally, the emergence of
service-oriented architecture (SOA) has rendered the actual platform
on which the data resides also irrelevant” (Wikipedia)
• Data Marketplaces
– “Services that make it easy to find data from a range of secondary
data sources, then consume the data in a usable and unified format.
Several of these services are trying to create marketplaces for data,
envisioning that data providers can offer their data sets for sale to
data seekers” (DataMarket.com)
(Linked) Data Marketplaces Jan 2011 #4
5. Data Marketplaces properties
• Proposed classification by Bauereiss & Fensel
1. Data domain
2. Population of content
3. Community management
4. Operating party
5. Pricing models
6. Data exchange
• Some additional differentiating characteristics
– Data model, Data size, Data export
– Branded marketplaces, SLA
– Query languages, Data tools
(Linked) Data Marketplaces Jan 2011 #5
8. Factual (2)
• Data domain
– Travel, finance, sports, autos, movies, music, TV, books,
health, food, politics, education, science, arts, …
– High quality local data
• USA, Germany, France, Italy, UK, Japan, Switzerland, Australia, …
• Used by Facebook Places
• Data population
– Crawling the web
– Public data sources
– Community contributions
• Upload XLS/ODS, CSV
(Linked) Data Marketplaces Jan 2011 #8
9. Factual (3)
• Data model
– tabular
– Taxonomy of 400 categories
• 13 Level 1 categories: Arts, Automotive, Business, Government, …
• Data size – 500,000 datasets
• Company info
– Factual Inc. (USA)
– $27M VC funding so far
(Linked) Data Marketplaces Jan 2011 #9
10. Factual (4)
• Monetization model
– Pricing model not finalised yet (currently free)
– Pay-per-use pricing (per API call) with subscriptions
• Companies that contribute data will have a fee reduction
• Data access options
– REST API
• Read from table, Add/Write to table, Get schema info
– Web applications
• Read/write raw data from a web page (JavaScript)
• Web widgets for visualising, filtering and sorting data
(Linked) Data Marketplaces Jan 2011 #10
11. Factual (5)
• Data tools
– AutoClipper – find tables on the web
– PageClipper – extract tabular data from a web page
– FactClipper – find individual facts (query templates)
(Linked) Data Marketplaces Jan 2011 #11
13. InfoChimps (2)
• Data domain
– All purpose
• Including data from Freebase, Wikipedia infoboxes, CKAN, Twitter,
Data.gov, Data.gov.uk, GeoNames, …
• Data population
– Public datasets
– User submitted datasets
• Data model is dataset specific
• 10,000+ datasets organised in 13 collections
(Linked) Data Marketplaces Jan 2011 #13
14. InfoChimps (3)
• Company info
– InfoChimps (USA)
– $1.6M VC funding so far
– Acquired DataMarketplace in 12/2010
• Monetization model
– Charge data sellers
• Data sellers choose the price & licensing of their data
• Charge for data storage
• 30% commission for InfoChimps on each sale
(Linked) Data Marketplaces Jan 2011 #14
15. InfoChimps (4)
• Monetization model (2)
– Charge data buyers
• Baboon – free, 100K API calls / mo
• Brass Monkey – $20/mo, 500K API calls / mo
• Silverback – $250/mo, 2M API calls / mo
• Golden Ape – $4,000/mo, 15M API calls / mo
• Data access options
– REST API
• api.infochimps.com/DATASET/METHOD.json?PARAM=VALUE
– YQL tables
(Linked) Data Marketplaces Jan 2011 #15
17. Azure DataMarket (2)
• Data domain
– All purpose, incl. Data.gov, UN data, Wolfram|Alpha, ESRI
• Data population
– Data publishers (need prior approval)
• Data can be stored on SQL Azure, Azure Storage or 3rd party clouds
(via Data Access Layers)
• Data model
– Depends on the dataset and the storage, but always
presented as OData to consumers
• Data size – 90 datasets
(Linked) Data Marketplaces Jan 2011 #17
19. Azure DataMarket (4)
• Company info
– Microsoft
• Monetization model
– Subscription for data buyers (limited/unlimited API calls)
• Access options
– OData (feeds, queries, updates)
• Data tools
– Service Explorer
– Excel add-in (find, purchase, consume data)
– Integration with SQL Server Reporting Services /
Integration Services
(Linked) Data Marketplaces Jan 2011 #19
21. DataMarket (2)
• Data domain
– Statistical data from 2,000 providers, incl. UN, Eurostat,
World Bank, US agencies, BP, FIFA, …
• Data population
– Data aggregation (2,000 data providers)
• Data size
– 13K datasets, 100M time series, 600M facts
• Company info
– DataMarket (Iceland)
(Linked) Data Marketplaces Jan 2011 #21
22. DataMarket (3)
• Monetization model
– Charge data sellers
• Free datasets – $249/mo; Paid datasets – 25% commission;
Branded datasets – $699/mo + commission
– Charge data buyers
• Free – 50 API calls/mo; $99 – 500 API calls/mo; $299 – 10K API
calls/mo; $799 – 100K API calls/mo
• Data access
– REST API
(Linked) Data Marketplaces Jan 2011 #22
24. Socrata (2)
• Data domain
– Business, education, government data
• Data population
– Uploads from data publishers
• Data size
– 13K datasets
• Data model
– tabular
(Linked) Data Marketplaces Jan 2011 #24
25. Socrata (3)
• Company info
– Socrata (USA)
• Monetization model
– Charge data buyers (“Plans starting at $499 per month”)
• Basic – 100K API calls/mo + 50GB traffic; Plus – 250K API calls/mo
+ 250GB traffic; Premium – 1M API calls/mo + 1.2TB traffic;
Ultimate – 10M API calls/mo + 5TB traffic
• Data access
– REST API (Socrata Open Data API)
– Data export (XLS, CSV, RDF, XML)
– RSS updates
(Linked) Data Marketplaces Jan 2011 #25
27. Kasabi (2)
• Data domain
– All purpose, incl. DBpedia, GeoNames, BBC Linked Data, …
• Data population
– Public datasets
– User submitted datasets
• Data size
– 55 datasets
• Data model
– RDF
(Linked) Data Marketplaces Jan 2011 #27
28. Kasabi (3)
• Company info
– Talis (UK)
• Monetization model
– Charge data consumers
– Data hosting is free
• Data access
– SPARQL / Linked Data endpoint
– REST API
– Additional APIs
– PHP & Ruby client libraries
(Linked) Data Marketplaces Jan 2011 #28
30. Freebase (2)
• Data domain
– General purpose
• Data model
– Graph (RDF dumps available)
• Data population
– Community curated data (licensed as CC-BY)
– Import of public data sources (Wikipedia, MusicBrainz,
WordNet, LoC, …)
• Data size
– 20M entities
(Linked) Data Marketplaces Jan 2011 #30
31. Freebase (3)
• Company info
– Metaweb (USA), now Google
• Monetization model
– Free for 100K read API calls per day (10K write)
– Paid for higher volumes
• Data access
– REST API
– Linked Data endpoint (http://rdf.freebase.com)
– Triple uploader / RDF dumps
– Acre (application hosting platform)
(Linked) Data Marketplaces Jan 2011 #31
32. Freebase (4)
• Data tools
– Web based – schema editor, review queue, viewers, …
– GridWorks (Google Refine)
• Exploring, data cleaning, transformation of tabular data
• Map data to Freebase schema & RDF export (3rd party extension)
– Acre
• Application hosting platform
– User contributed JavaScript code (converted to Java with Rhino)
• Access & store data directly into Freebase
(Linked) Data Marketplaces Jan 2011 #32
34. timetric (2)
• Data domain
– Economic data
• Data population
– aggregate data from the world's leading sources of
economic data (World Bank, Eurostat, …)
– User uploaded data
• Data size
– 2.5M public statistics
(Linked) Data Marketplaces Jan 2011 #34
35. timetric (3)
• Company info
– Timetric Ltd. (UK)
• Monetization model
– Free public datasets
– Paid exclusive datasets
• Data access
– REST API
(Linked) Data Marketplaces Jan 2011 #35
37. xIgnite (2)
• Data domain
– Financial data
• Data population
– aggregate data from leading sources (Dow Jones, Thomson
Reuters, stock exchanges, …)
– Public datasets (national banks, SEC, Federal Reserve, …)
– User uploaded data
• Company info
– Xignite (USA)
(Linked) Data Marketplaces Jan 2011 #37
38. xIgnite (3)
• Monetization model
– Paid subscriptions
• Data access
– Web services (REST/SOAP)
(Linked) Data Marketplaces Jan 2011 #38
39. Coming soon…
• BuzzData
– www.buzzdata.com / @buzzdata
– Company: BuzzData
(Linked) Data Marketplaces Jan 2011 #39
40. Data marketplaces – features summary
• Data
– Data model, domain, export options
• Monetization
– Charge buyers/ sellers
– free API calls
– branded marketplaces & Service Level Agreement
• For developers
– REST API; query language
– Tools for data management / integration
– Application hosting
(Linked) Data Marketplaces Jan 2011 #40
42. LINKED DATA + MARKETPLACES
(Linked) Data Marketplaces Jan 2011 #42
43. Linked Data cloud (Sep 2010)
(c) R. Cyganiak and A. Jentzsch
(Linked) Data Marketplaces Jan 2011 #43
44. Benefits of Linked Data for Data Marketplaces
• Unified data representation model (RDF)
– Easy consumption of the data
• Global identifiers for all objects (URI)
– Makes incremental data integration & federation easier
• Interlinked datasets
– New data added to the marketplace can be integrated
with existing data
– Network effects
• Data marketplace interoperability
– Data from different marketplaces can be easily integrated
(Linked) Data Marketplaces Jan 2011 #44
45. Benefits of Linked Data for Data Marketplaces (2)
• Derived knowledge / facts
– RDF inference of additional implicit facts
– (see FactForge and LinkedLifeData)
• Rich queries
– SPARQL offers unmatched query expressivity
• Easy import of existing LOD datasets
– Linked Open Data cloud already includes 200+ datasets
with 20+ billion RDF triples
(Linked) Data Marketplaces Jan 2011 #45
46. Linked Data for marketplaces – challenges
• Quality of data
– Different (public) datasets may come with inconsistent or
controversial data
– Quality more important than quantity
• Large scale data integration
– Ontology (schema) mapping of different datasets &
vocabularies
• Licensing
– Some datasets come with “CC-BY-NC” or unclear licensing
• Billing
– API calls / SPARQL queries with varying computational
cost (Linked) Data Marketplaces Jan 2011 #46
47. Linked Data for marketplaces – challenges (2)
• Billing
– API calls / SPARQL queries with varying computational
cost
• Operations
– Service Level guarantees
– Availability & scalability challenges
• Most Linked Data endpoints at present are neither scalable, nor
available
(Linked) Data Marketplaces Jan 2011 #47
49. LinkedLifeData & FactForge
• FactForge
– Integrates some of the most central LOD datasets
– General-purpose information (not specific to a domain)
– 1.2 billion explicit and 1 billion inferred statements
– The largest upper-level knowledge base
– http://www.FactForge.net
• Linked Life Data
– 25 of the most popular life-science datasets
– 2.7 billion explicit and 1.4 billion inferred statements
– http://www.LinkedLifeData.com
(Linked) Data Marketplaces Jan 2011 #49
50. Strategic questions
• Monetization strategy
– which (linked) datasets can be monetized
– Charge buyers / charge sellers / free quota
– Branded marketplaces
• Community building
– Crowdsource the data curation to the community
– How to provide incentives to data curators?
(Linked) Data Marketplaces Jan 2011 #50
51. Strategic questions (2)
• Operations
– How to ensure Service Level guarantees?
– How to deal with licensing issues?
– Account management, metering, billing
• Platform
– RDF database – data volume, query volume
– ETL tools
– Curation tools
– Data export & consumption
(Linked) Data Marketplaces Jan 2011 #51
52. Data monetization with WebServius
(c) WebServius
• Benefits
– user management, quotas & restrictions
– Metering, pricing, billing
– Security, scalability, SLAs
(Linked) Data Marketplaces Jan 2011 #52
53. Q&A
Questions?
@ontotext
(Linked) Data Marketplaces Jan 2011 #53