SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Data Mining with Background Knowledge 
from the Web 
Introducing the RapidMiner 
Linked Open Data Extension 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 1 
Heiko Paulheim, Petar Ristoski, Evgeny Mitichkin, Christian Bizer
Motivation: An Example Data Mining Task 
• Analyzing book sales 
ISBN City Sold 
3-2347-3427-1 Darmstadt 124 
3-43784-324-2 Mannheim 493 
3-145-34587-0 Roßdorf 14 
ISBN City Population ... Genre Publisher ... Sold 
3-2347-3427-1 Darm-stadt 
144402 ... Crime Bloody 
3-43784-324-2 Mann-heim 
291458 … Crime Guns Ltd. … 493 
... 
Books 
... 124 
3-145-34587-0 Roß-dorf 
12019 ... Travel Up&Away ... 14 
... 
→ Crime novels sell better in larger cities 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 2
Motivation 
• Many data mining problems are solved better 
– when you have more background knowledge 
(leaving scalability aside) 
• Problems: 
– Tedious work 
– Selection bias: what to include? 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 3
Linked Open Data in a Nutshell 
• Started in 2007 
• A collection of ~1,000 open datasets 
– from various domains, e.g., general knowledge, government data, … 
– using semantic web standards (HTTP, RDF, SPARQL,…) 
• Machine processable 
• Free of charge 
• Sophisticated tool stacks 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 4
Linked Open Data in a Nutshell 
http://lod-cloud.net/ 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 5
Example: DBpedia 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 6
The RapidMiner LOD Extension 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 7
The RapidMiner LOD Extension 
• Automatic discovery of links to Linked Open Data 
– for local data objects 
– e.g., the database entry Boston is linked to 
http://dbpedia.org/resource/Boston 
• Automatic generation of attributes 
– e.g., add all numeric values found for Boston (and other cities) 
• Plus 
– Feature selection algorithms optimized for LOD 
– Automatic following of links to other datasets 
– Schema matching (coming soon) 
• No need to know Semantic Web technologies! 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 8
Example: the Auto MPG Dataset 
• A well-known UCI dataset 
– Goal: predict fuel consumption of cars 
• Hypothesis: background knowledge → more accurate predictions 
• Used background knowledge: 
– Entity types and categories from DBpedia (=Wikipedia) 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 9
Example: the Auto MPG Dataset 
• A well-known UCI dataset 
– Goal: predict fuel consumption of cars 
• Hypothesis: background knowledge → more accurate predictions 
• Used background knowledge: 
– Entity types and categories from DBpedia (=Wikipedia) 
• Result: M5Rules down to almost half the prediction error 
– i.e., on average, we are wrong by 1.6 instead of 2.9 MPG 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 10
Example: the Auto MPG Dataset 
• The original attributes are 
– cylinders, displacement, horsepower, weight, acceleration, model, origin 
– plus name (unique string) and mpg (target) 
• Models built are, e.g., 
– high horsepower/weight → high consumption 
• Additional attributes lead to further insights, e.g. 
– front-wheel drives have a lower consumption than rear-wheel drives 
– hatchbacks have a lower consumption than station wagons 
– rally cars generally have a low consumption 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 11
Example: Analyzing Statistics 
• As shown, e.g., at ESWC 2012, SemStats 2013 
• Statistics found on the web often 
contain only few attributes 
– extreme case: only entity + target 
• Examples: 
– Quality of living in cities (right) 
– Corruption by country 
– Fertility rate by country 
– Suicide rate by country 
– Box office revenue of films 
– ... 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 12
Example: Analyzing Statistics 
• Process in RapidMiner: 
– load statistic 
– link entities (cities, countries, etc.) to LOD cloud 
– collect additional attributes 
– analyze for correlations with target attribute of statistic 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 13
Example: Analyzing Statistics 
• Quality of living in cities worldwide: indicators for low quality 
– too hot (highest temperature in June exceeds 27°C) 
– too cold (highest temperature in January below 16°C) 
– too big (total area exceeds 334km²) 
– poor cultural live (no music recordings made in this city) 
– or simply: wrong place on the map (latitude<24, longitude<47) 
all those attributes 
come from LOD! 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 14
Example: Analyzing Statistics 
• Corruption Perception Index (CPI) by Transparency International 
• Indicators for low corruption: 
– high HDI (human development index) 
– large number of companies 
– large number of NGOs 
– small number of cargo airlines?! 
• Burnout rates in German DAX companies 
– Positive correlation between turnover and burnout rates 
– Car manufacturers are less prone to burnout 
– Local companies are less prone to burnout than international ones 
• Exception: Frankfurt 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 15
Example: Analyzing Statistics 
• Sexual activity (based on Durex survey 2005-2009) 
– Higher in French speaking than in English speaking countries 
– High GDP per capita → low activity 
– High unemployment rate → high activity 
– High number of ISPs → low activity 
http://xkcd.com/552/ 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 16
Further Usage Examples 
• Classification of Twitter messages (SMILE, 2013) 
– given a target, e.g., messages related to car traffic 
– annotate message, extract abstract features for concepts 
– e.g. “I-90” → highway 
• Prediction of user location for Twitter (ICWSM, 2013) 
– useful, e.g., for market research 
– combination with sentiment analysis: public opinion maps 
• Identifying disputed topics in the news (LD4KD, 2014) 
– on a corpus of different online newspapers 
– identified, e.g., concurrent opinions on drug legislation and gay marriage 
• Debugging Linked Open Data as such 
– e.g., identifying wrong links and axioms 
– combination with outlier detection 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 17
Conclusions 
• Many data mining tasks are better solved 
with more background knowledge 
– better predictive models 
– more insights from additional attributes 
• A lot of such knowledge exists as Linked Open Data 
• The Linked Open Data extension grants easy access to that data 
– from within RapidMiner 
– without the need to know anything about RDF, SPARQL, etc. 
• Try it out! 
– find “Linked Open Data” on the marketplace 
– Google Group: https://groups.google.com/forum/#!forum/rmlod 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 18
Data Mining with Background Knowledge 
from the Web 
Introducing the RapidMiner 
Linked Open Data Extension 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 19 
Heiko Paulheim, Petar Ristoski, Evgeny Mitichkin, Christian Bizer

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (16)

Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash Course
 
Immigration Reporting: Data, Tools and Sources
Immigration Reporting: Data, Tools and SourcesImmigration Reporting: Data, Tools and Sources
Immigration Reporting: Data, Tools and Sources
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
 
WCIT2010
WCIT2010WCIT2010
WCIT2010
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)
 
New information for new journalists pt2: data
New information for new journalists pt2: dataNew information for new journalists pt2: data
New information for new journalists pt2: data
 
Datasets slidesrachel kotarski
Datasets slidesrachel kotarskiDatasets slidesrachel kotarski
Datasets slidesrachel kotarski
 
Nathan Rosen SLA presentation Case law statutes and regulations for the non l...
Nathan Rosen SLA presentation Case law statutes and regulations for the non l...Nathan Rosen SLA presentation Case law statutes and regulations for the non l...
Nathan Rosen SLA presentation Case law statutes and regulations for the non l...
 
NTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data ScienceNTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data Science
 
Martin Kaltenböck - OGD Linked Open Government Data
Martin Kaltenböck - OGD Linked Open Government DataMartin Kaltenböck - OGD Linked Open Government Data
Martin Kaltenböck - OGD Linked Open Government Data
 
Engr185 fall 2011
Engr185 fall 2011Engr185 fall 2011
Engr185 fall 2011
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...
 
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfSparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
 
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesGuest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
 

Andere mochten auch

Andere mochten auch (20)

NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
DBpedia InsideOut
DBpedia InsideOutDBpedia InsideOut
DBpedia InsideOut
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
 
Linked Data Fragments
Linked Data FragmentsLinked Data Fragments
Linked Data Fragments
 
NLP todo
NLP todoNLP todo
NLP todo
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
 
Applying Linked Open Data to Public Procurement
Applying Linked Open Data to Public ProcurementApplying Linked Open Data to Public Procurement
Applying Linked Open Data to Public Procurement
 
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
 
Exploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queriesExploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queries
 
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data CloudA Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
 
Unsupervised Extraction of Attributes and Their Values from Product Description
Unsupervised Extraction of Attributes and Their Values from Product DescriptionUnsupervised Extraction of Attributes and Their Values from Product Description
Unsupervised Extraction of Attributes and Their Values from Product Description
 
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionFedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 
RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031
 

Ähnlich wie Data Mining with Background Knowledge from the Web - Introducing the RapidMiner Linked Open Data Extension

PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016
PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016
PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016
Anastasia Zagorodnaya (Amosova)
 
Improving decisionmaking with GIS by Bjorgulf Torjussen
Improving decisionmaking with GIS by Bjorgulf TorjussenImproving decisionmaking with GIS by Bjorgulf Torjussen
Improving decisionmaking with GIS by Bjorgulf Torjussen
Opening-up.eu
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
DennicaRivera
 

Ähnlich wie Data Mining with Background Knowledge from the Web - Introducing the RapidMiner Linked Open Data Extension (20)

Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
Intro to Big Data in Urban GIS Research
Intro to Big Data in Urban GIS ResearchIntro to Big Data in Urban GIS Research
Intro to Big Data in Urban GIS Research
 
Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics Canada
 
From E-Government to Open Government
From E-Government to Open GovernmentFrom E-Government to Open Government
From E-Government to Open Government
 
Data & Society Taxi Privacy Talk
Data & Society Taxi Privacy TalkData & Society Taxi Privacy Talk
Data & Society Taxi Privacy Talk
 
CKX: Wellbeing Toronto - More Than Just a Map
CKX: Wellbeing Toronto - More Than Just a MapCKX: Wellbeing Toronto - More Than Just a Map
CKX: Wellbeing Toronto - More Than Just a Map
 
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
 
Sneak preview: Local News Engine (Will Perrin)
Sneak preview: Local News Engine (Will Perrin)Sneak preview: Local News Engine (Will Perrin)
Sneak preview: Local News Engine (Will Perrin)
 
Approach to Open Data in Vienna
Approach to Open Data in ViennaApproach to Open Data in Vienna
Approach to Open Data in Vienna
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City Applications
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
 
#Cedem2017 Smart Cities of Self-Determined Data Subjects
#Cedem2017  Smart Cities of Self-Determined Data Subjects  #Cedem2017  Smart Cities of Self-Determined Data Subjects
#Cedem2017 Smart Cities of Self-Determined Data Subjects
 
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
#CeDEM2017 Smart Cities of Self-Determined Data Subjects#CeDEM2017 Smart Cities of Self-Determined Data Subjects
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
 
Large-scale data analytics for smart cities
Large-scale data analytics for smart citiesLarge-scale data analytics for smart cities
Large-scale data analytics for smart cities
 
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
 
PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016
PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016
PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016
 
Improving decisionmaking with GIS by Bjorgulf Torjussen
Improving decisionmaking with GIS by Bjorgulf TorjussenImproving decisionmaking with GIS by Bjorgulf Torjussen
Improving decisionmaking with GIS by Bjorgulf Torjussen
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
 
SoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningSoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social Mining
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 

Mehr von Heiko Paulheim

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Heiko Paulheim
 

Mehr von Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly Detection
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge Discovery
 
Detecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpediaDetecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpedia
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
 
Type Inference on Noisy RDF Data
Type Inference on Noisy RDF DataType Inference on Noisy RDF Data
Type Inference on Noisy RDF Data
 

Kürzlich hochgeladen

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Kürzlich hochgeladen (20)

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

Data Mining with Background Knowledge from the Web - Introducing the RapidMiner Linked Open Data Extension

  • 1. Data Mining with Background Knowledge from the Web Introducing the RapidMiner Linked Open Data Extension 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 1 Heiko Paulheim, Petar Ristoski, Evgeny Mitichkin, Christian Bizer
  • 2. Motivation: An Example Data Mining Task • Analyzing book sales ISBN City Sold 3-2347-3427-1 Darmstadt 124 3-43784-324-2 Mannheim 493 3-145-34587-0 Roßdorf 14 ISBN City Population ... Genre Publisher ... Sold 3-2347-3427-1 Darm-stadt 144402 ... Crime Bloody 3-43784-324-2 Mann-heim 291458 … Crime Guns Ltd. … 493 ... Books ... 124 3-145-34587-0 Roß-dorf 12019 ... Travel Up&Away ... 14 ... → Crime novels sell better in larger cities 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 2
  • 3. Motivation • Many data mining problems are solved better – when you have more background knowledge (leaving scalability aside) • Problems: – Tedious work – Selection bias: what to include? 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 3
  • 4. Linked Open Data in a Nutshell • Started in 2007 • A collection of ~1,000 open datasets – from various domains, e.g., general knowledge, government data, … – using semantic web standards (HTTP, RDF, SPARQL,…) • Machine processable • Free of charge • Sophisticated tool stacks 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 4
  • 5. Linked Open Data in a Nutshell http://lod-cloud.net/ 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 5
  • 6. Example: DBpedia 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 6
  • 7. The RapidMiner LOD Extension 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 7
  • 8. The RapidMiner LOD Extension • Automatic discovery of links to Linked Open Data – for local data objects – e.g., the database entry Boston is linked to http://dbpedia.org/resource/Boston • Automatic generation of attributes – e.g., add all numeric values found for Boston (and other cities) • Plus – Feature selection algorithms optimized for LOD – Automatic following of links to other datasets – Schema matching (coming soon) • No need to know Semantic Web technologies! 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 8
  • 9. Example: the Auto MPG Dataset • A well-known UCI dataset – Goal: predict fuel consumption of cars • Hypothesis: background knowledge → more accurate predictions • Used background knowledge: – Entity types and categories from DBpedia (=Wikipedia) 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 9
  • 10. Example: the Auto MPG Dataset • A well-known UCI dataset – Goal: predict fuel consumption of cars • Hypothesis: background knowledge → more accurate predictions • Used background knowledge: – Entity types and categories from DBpedia (=Wikipedia) • Result: M5Rules down to almost half the prediction error – i.e., on average, we are wrong by 1.6 instead of 2.9 MPG 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 10
  • 11. Example: the Auto MPG Dataset • The original attributes are – cylinders, displacement, horsepower, weight, acceleration, model, origin – plus name (unique string) and mpg (target) • Models built are, e.g., – high horsepower/weight → high consumption • Additional attributes lead to further insights, e.g. – front-wheel drives have a lower consumption than rear-wheel drives – hatchbacks have a lower consumption than station wagons – rally cars generally have a low consumption 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 11
  • 12. Example: Analyzing Statistics • As shown, e.g., at ESWC 2012, SemStats 2013 • Statistics found on the web often contain only few attributes – extreme case: only entity + target • Examples: – Quality of living in cities (right) – Corruption by country – Fertility rate by country – Suicide rate by country – Box office revenue of films – ... 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 12
  • 13. Example: Analyzing Statistics • Process in RapidMiner: – load statistic – link entities (cities, countries, etc.) to LOD cloud – collect additional attributes – analyze for correlations with target attribute of statistic 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 13
  • 14. Example: Analyzing Statistics • Quality of living in cities worldwide: indicators for low quality – too hot (highest temperature in June exceeds 27°C) – too cold (highest temperature in January below 16°C) – too big (total area exceeds 334km²) – poor cultural live (no music recordings made in this city) – or simply: wrong place on the map (latitude<24, longitude<47) all those attributes come from LOD! 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 14
  • 15. Example: Analyzing Statistics • Corruption Perception Index (CPI) by Transparency International • Indicators for low corruption: – high HDI (human development index) – large number of companies – large number of NGOs – small number of cargo airlines?! • Burnout rates in German DAX companies – Positive correlation between turnover and burnout rates – Car manufacturers are less prone to burnout – Local companies are less prone to burnout than international ones • Exception: Frankfurt 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 15
  • 16. Example: Analyzing Statistics • Sexual activity (based on Durex survey 2005-2009) – Higher in French speaking than in English speaking countries – High GDP per capita → low activity – High unemployment rate → high activity – High number of ISPs → low activity http://xkcd.com/552/ 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 16
  • 17. Further Usage Examples • Classification of Twitter messages (SMILE, 2013) – given a target, e.g., messages related to car traffic – annotate message, extract abstract features for concepts – e.g. “I-90” → highway • Prediction of user location for Twitter (ICWSM, 2013) – useful, e.g., for market research – combination with sentiment analysis: public opinion maps • Identifying disputed topics in the news (LD4KD, 2014) – on a corpus of different online newspapers – identified, e.g., concurrent opinions on drug legislation and gay marriage • Debugging Linked Open Data as such – e.g., identifying wrong links and axioms – combination with outlier detection 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 17
  • 18. Conclusions • Many data mining tasks are better solved with more background knowledge – better predictive models – more insights from additional attributes • A lot of such knowledge exists as Linked Open Data • The Linked Open Data extension grants easy access to that data – from within RapidMiner – without the need to know anything about RDF, SPARQL, etc. • Try it out! – find “Linked Open Data” on the marketplace – Google Group: https://groups.google.com/forum/#!forum/rmlod 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 18
  • 19. Data Mining with Background Knowledge from the Web Introducing the RapidMiner Linked Open Data Extension 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 19 Heiko Paulheim, Petar Ristoski, Evgeny Mitichkin, Christian Bizer