SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Data Mining with Background Knowledge 
from the Web 
Introducing the RapidMiner 
Linked Open Data Extension 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 1 
Heiko Paulheim, Petar Ristoski, Evgeny Mitichkin, Christian Bizer
Motivation: An Example Data Mining Task 
• Analyzing book sales 
ISBN City Sold 
3-2347-3427-1 Darmstadt 124 
3-43784-324-2 Mannheim 493 
3-145-34587-0 Roßdorf 14 
ISBN City Population ... Genre Publisher ... Sold 
3-2347-3427-1 Darm-stadt 
144402 ... Crime Bloody 
3-43784-324-2 Mann-heim 
291458 … Crime Guns Ltd. … 493 
... 
Books 
... 124 
3-145-34587-0 Roß-dorf 
12019 ... Travel Up&Away ... 14 
... 
→ Crime novels sell better in larger cities 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 2
Motivation 
• Many data mining problems are solved better 
– when you have more background knowledge 
(leaving scalability aside) 
• Problems: 
– Tedious work 
– Selection bias: what to include? 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 3
Linked Open Data in a Nutshell 
• Started in 2007 
• A collection of ~1,000 open datasets 
– from various domains, e.g., general knowledge, government data, … 
– using semantic web standards (HTTP, RDF, SPARQL,…) 
• Machine processable 
• Free of charge 
• Sophisticated tool stacks 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 4
Linked Open Data in a Nutshell 
http://lod-cloud.net/ 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 5
Example: DBpedia 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 6
The RapidMiner LOD Extension 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 7
The RapidMiner LOD Extension 
• Automatic discovery of links to Linked Open Data 
– for local data objects 
– e.g., the database entry Boston is linked to 
http://dbpedia.org/resource/Boston 
• Automatic generation of attributes 
– e.g., add all numeric values found for Boston (and other cities) 
• Plus 
– Feature selection algorithms optimized for LOD 
– Automatic following of links to other datasets 
– Schema matching (coming soon) 
• No need to know Semantic Web technologies! 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 8
Example: the Auto MPG Dataset 
• A well-known UCI dataset 
– Goal: predict fuel consumption of cars 
• Hypothesis: background knowledge → more accurate predictions 
• Used background knowledge: 
– Entity types and categories from DBpedia (=Wikipedia) 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 9
Example: the Auto MPG Dataset 
• A well-known UCI dataset 
– Goal: predict fuel consumption of cars 
• Hypothesis: background knowledge → more accurate predictions 
• Used background knowledge: 
– Entity types and categories from DBpedia (=Wikipedia) 
• Result: M5Rules down to almost half the prediction error 
– i.e., on average, we are wrong by 1.6 instead of 2.9 MPG 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 10
Example: the Auto MPG Dataset 
• The original attributes are 
– cylinders, displacement, horsepower, weight, acceleration, model, origin 
– plus name (unique string) and mpg (target) 
• Models built are, e.g., 
– high horsepower/weight → high consumption 
• Additional attributes lead to further insights, e.g. 
– front-wheel drives have a lower consumption than rear-wheel drives 
– hatchbacks have a lower consumption than station wagons 
– rally cars generally have a low consumption 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 11
Example: Analyzing Statistics 
• As shown, e.g., at ESWC 2012, SemStats 2013 
• Statistics found on the web often 
contain only few attributes 
– extreme case: only entity + target 
• Examples: 
– Quality of living in cities (right) 
– Corruption by country 
– Fertility rate by country 
– Suicide rate by country 
– Box office revenue of films 
– ... 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 12
Example: Analyzing Statistics 
• Process in RapidMiner: 
– load statistic 
– link entities (cities, countries, etc.) to LOD cloud 
– collect additional attributes 
– analyze for correlations with target attribute of statistic 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 13
Example: Analyzing Statistics 
• Quality of living in cities worldwide: indicators for low quality 
– too hot (highest temperature in June exceeds 27°C) 
– too cold (highest temperature in January below 16°C) 
– too big (total area exceeds 334km²) 
– poor cultural live (no music recordings made in this city) 
– or simply: wrong place on the map (latitude<24, longitude<47) 
all those attributes 
come from LOD! 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 14
Example: Analyzing Statistics 
• Corruption Perception Index (CPI) by Transparency International 
• Indicators for low corruption: 
– high HDI (human development index) 
– large number of companies 
– large number of NGOs 
– small number of cargo airlines?! 
• Burnout rates in German DAX companies 
– Positive correlation between turnover and burnout rates 
– Car manufacturers are less prone to burnout 
– Local companies are less prone to burnout than international ones 
• Exception: Frankfurt 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 15
Example: Analyzing Statistics 
• Sexual activity (based on Durex survey 2005-2009) 
– Higher in French speaking than in English speaking countries 
– High GDP per capita → low activity 
– High unemployment rate → high activity 
– High number of ISPs → low activity 
http://xkcd.com/552/ 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 16
Further Usage Examples 
• Classification of Twitter messages (SMILE, 2013) 
– given a target, e.g., messages related to car traffic 
– annotate message, extract abstract features for concepts 
– e.g. “I-90” → highway 
• Prediction of user location for Twitter (ICWSM, 2013) 
– useful, e.g., for market research 
– combination with sentiment analysis: public opinion maps 
• Identifying disputed topics in the news (LD4KD, 2014) 
– on a corpus of different online newspapers 
– identified, e.g., concurrent opinions on drug legislation and gay marriage 
• Debugging Linked Open Data as such 
– e.g., identifying wrong links and axioms 
– combination with outlier detection 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 17
Conclusions 
• Many data mining tasks are better solved 
with more background knowledge 
– better predictive models 
– more insights from additional attributes 
• A lot of such knowledge exists as Linked Open Data 
• The Linked Open Data extension grants easy access to that data 
– from within RapidMiner 
– without the need to know anything about RDF, SPARQL, etc. 
• Try it out! 
– find “Linked Open Data” on the marketplace 
– Google Group: https://groups.google.com/forum/#!forum/rmlod 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 18
Data Mining with Background Knowledge 
from the Web 
Introducing the RapidMiner 
Linked Open Data Extension 
08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 19 
Heiko Paulheim, Petar Ristoski, Evgeny Mitichkin, Christian Bizer

Weitere ähnliche Inhalte

Was ist angesagt?

Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseGiorgio Orsi
 
Immigration Reporting: Data, Tools and Sources
Immigration Reporting: Data, Tools and SourcesImmigration Reporting: Data, Tools and Sources
Immigration Reporting: Data, Tools and Sourcesborderzine
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)Sammy Fung
 
New information for new journalists pt2: data
New information for new journalists pt2: dataNew information for new journalists pt2: data
New information for new journalists pt2: dataPaul Bradshaw
 
Datasets slidesrachel kotarski
Datasets slidesrachel kotarskiDatasets slidesrachel kotarski
Datasets slidesrachel kotarskiRobin Saklatvala
 
Nathan Rosen SLA presentation Case law statutes and regulations for the non l...
Nathan Rosen SLA presentation Case law statutes and regulations for the non l...Nathan Rosen SLA presentation Case law statutes and regulations for the non l...
Nathan Rosen SLA presentation Case law statutes and regulations for the non l...Nathan Rosen
 
NTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data ScienceNTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data ScienceSteph Nagoski
 
Engr185 fall 2011
Engr185 fall 2011Engr185 fall 2011
Engr185 fall 2011echeneyl
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...LIBER Europe
 
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfSparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfHarsh Thakkar
 
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesGuest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesLaura Hollink
 

Was ist angesagt? (16)

Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash Course
 
Immigration Reporting: Data, Tools and Sources
Immigration Reporting: Data, Tools and SourcesImmigration Reporting: Data, Tools and Sources
Immigration Reporting: Data, Tools and Sources
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
 
WCIT2010
WCIT2010WCIT2010
WCIT2010
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)
 
New information for new journalists pt2: data
New information for new journalists pt2: dataNew information for new journalists pt2: data
New information for new journalists pt2: data
 
Datasets slidesrachel kotarski
Datasets slidesrachel kotarskiDatasets slidesrachel kotarski
Datasets slidesrachel kotarski
 
Nathan Rosen SLA presentation Case law statutes and regulations for the non l...
Nathan Rosen SLA presentation Case law statutes and regulations for the non l...Nathan Rosen SLA presentation Case law statutes and regulations for the non l...
Nathan Rosen SLA presentation Case law statutes and regulations for the non l...
 
NTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data ScienceNTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data Science
 
Martin Kaltenböck - OGD Linked Open Government Data
Martin Kaltenböck - OGD Linked Open Government DataMartin Kaltenböck - OGD Linked Open Government Data
Martin Kaltenböck - OGD Linked Open Government Data
 
Engr185 fall 2011
Engr185 fall 2011Engr185 fall 2011
Engr185 fall 2011
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...
 
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfSparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
 
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesGuest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
 

Andere mochten auch

Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesHeiko Paulheim
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsMarieke van Erp
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Stefan Dietze
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataMuhammad Saleem
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine LearningHeiko Paulheim
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataOlaf Hartig
 
Applying Linked Open Data to Public Procurement
Applying Linked Open Data to Public ProcurementApplying Linked Open Data to Public Procurement
Applying Linked Open Data to Public ProcurementJindřich Mynarz
 
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...vtunotesbysree
 
Exploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queriesExploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queriesLuiz Henrique Zambom Santana
 
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data CloudA Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data CloudSyed Muhammad Ali Hasnain
 
Unsupervised Extraction of Attributes and Their Values from Product Description
Unsupervised Extraction of Attributes and Their Values from Product DescriptionUnsupervised Extraction of Attributes and Their Values from Product Description
Unsupervised Extraction of Attributes and Their Values from Product DescriptionRakuten Group, Inc.
 
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionFedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionSyed Muhammad Ali Hasnain
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Olaf Hartig
 
RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031kwangsub kim
 

Andere mochten auch (20)

NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
DBpedia InsideOut
DBpedia InsideOutDBpedia InsideOut
DBpedia InsideOut
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
 
Linked Data Fragments
Linked Data FragmentsLinked Data Fragments
Linked Data Fragments
 
NLP todo
NLP todoNLP todo
NLP todo
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
 
Applying Linked Open Data to Public Procurement
Applying Linked Open Data to Public ProcurementApplying Linked Open Data to Public Procurement
Applying Linked Open Data to Public Procurement
 
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
 
Exploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queriesExploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queries
 
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data CloudA Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
 
Unsupervised Extraction of Attributes and Their Values from Product Description
Unsupervised Extraction of Attributes and Their Values from Product DescriptionUnsupervised Extraction of Attributes and Their Values from Product Description
Unsupervised Extraction of Attributes and Their Values from Product Description
 
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionFedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 
RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031
 

Ähnlich wie Data Mining with Background Knowledge from the Web - Introducing the RapidMiner Linked Open Data Extension

Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge GraphsHeiko Paulheim
 
Intro to Big Data in Urban GIS Research
Intro to Big Data in Urban GIS ResearchIntro to Big Data in Urban GIS Research
Intro to Big Data in Urban GIS ResearchRobert Goodspeed
 
Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaPiet J.H. Daas
 
From E-Government to Open Government
From E-Government to Open GovernmentFrom E-Government to Open Government
From E-Government to Open GovernmentJohann Höchtl
 
Data & Society Taxi Privacy Talk
Data & Society Taxi Privacy TalkData & Society Taxi Privacy Talk
Data & Society Taxi Privacy Talkcwhong
 
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...Suparna De
 
Sneak preview: Local News Engine (Will Perrin)
Sneak preview: Local News Engine (Will Perrin)Sneak preview: Local News Engine (Will Perrin)
Sneak preview: Local News Engine (Will Perrin)DataJournalismUK
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City ApplicationsAmit Sheth
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSHarsh Thakkar
 
#Cedem2017 Smart Cities of Self-Determined Data Subjects
#Cedem2017  Smart Cities of Self-Determined Data Subjects  #Cedem2017  Smart Cities of Self-Determined Data Subjects
#Cedem2017 Smart Cities of Self-Determined Data Subjects Malgorzata Zofia Goraczek
 
Large-scale data analytics for smart cities
Large-scale data analytics for smart citiesLarge-scale data analytics for smart cities
Large-scale data analytics for smart citiesPayamBarnaghi
 
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...Mindtrek
 
PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016
PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016
PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016Anastasia Zagorodnaya (Amosova)
 
Improving decisionmaking with GIS by Bjorgulf Torjussen
Improving decisionmaking with GIS by Bjorgulf TorjussenImproving decisionmaking with GIS by Bjorgulf Torjussen
Improving decisionmaking with GIS by Bjorgulf TorjussenOpening-up.eu
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptxDennicaRivera
 
SoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningSoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningResearch Data Alliance
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas
 

Ähnlich wie Data Mining with Background Knowledge from the Web - Introducing the RapidMiner Linked Open Data Extension (20)

Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
Intro to Big Data in Urban GIS Research
Intro to Big Data in Urban GIS ResearchIntro to Big Data in Urban GIS Research
Intro to Big Data in Urban GIS Research
 
Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics Canada
 
From E-Government to Open Government
From E-Government to Open GovernmentFrom E-Government to Open Government
From E-Government to Open Government
 
Data & Society Taxi Privacy Talk
Data & Society Taxi Privacy TalkData & Society Taxi Privacy Talk
Data & Society Taxi Privacy Talk
 
CKX: Wellbeing Toronto - More Than Just a Map
CKX: Wellbeing Toronto - More Than Just a MapCKX: Wellbeing Toronto - More Than Just a Map
CKX: Wellbeing Toronto - More Than Just a Map
 
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
 
Sneak preview: Local News Engine (Will Perrin)
Sneak preview: Local News Engine (Will Perrin)Sneak preview: Local News Engine (Will Perrin)
Sneak preview: Local News Engine (Will Perrin)
 
Approach to Open Data in Vienna
Approach to Open Data in ViennaApproach to Open Data in Vienna
Approach to Open Data in Vienna
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City Applications
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
 
#Cedem2017 Smart Cities of Self-Determined Data Subjects
#Cedem2017  Smart Cities of Self-Determined Data Subjects  #Cedem2017  Smart Cities of Self-Determined Data Subjects
#Cedem2017 Smart Cities of Self-Determined Data Subjects
 
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
#CeDEM2017 Smart Cities of Self-Determined Data Subjects#CeDEM2017 Smart Cities of Self-Determined Data Subjects
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
 
Large-scale data analytics for smart cities
Large-scale data analytics for smart citiesLarge-scale data analytics for smart cities
Large-scale data analytics for smart cities
 
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
 
PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016
PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016
PLB Conference_Doing Business in Russia_Privacy Law Risk Update_July 5 2016
 
Improving decisionmaking with GIS by Bjorgulf Torjussen
Improving decisionmaking with GIS by Bjorgulf TorjussenImproving decisionmaking with GIS by Bjorgulf Torjussen
Improving decisionmaking with GIS by Bjorgulf Torjussen
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
 
SoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningSoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social Mining
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 

Mehr von Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsHeiko Paulheim
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph BlockHeiko Paulheim
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Heiko Paulheim
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterHeiko Paulheim
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingHeiko Paulheim
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the WebHeiko Paulheim
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyHeiko Paulheim
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopHeiko Paulheim
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionHeiko Paulheim
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge DiscoveryHeiko Paulheim
 
Detecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpediaDetecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpediaHeiko Paulheim
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionHeiko Paulheim
 
Type Inference on Noisy RDF Data
Type Inference on Noisy RDF DataType Inference on Noisy RDF Data
Type Inference on Noisy RDF DataHeiko Paulheim
 

Mehr von Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly Detection
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge Discovery
 
Detecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpediaDetecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpedia
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
 
Type Inference on Noisy RDF Data
Type Inference on Noisy RDF DataType Inference on Noisy RDF Data
Type Inference on Noisy RDF Data
 

Kürzlich hochgeladen

INTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
INTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYINTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
INTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYFreelance
 
Film cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrkFilm cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrk494f574xmv
 
prediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachprediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachAdekunleJoseph4
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Unlocking Anticipatory Text Generation- A Constrained Approach for Large Lan...
Unlocking Anticipatory Text Generation-  A Constrained Approach for Large Lan...Unlocking Anticipatory Text Generation-  A Constrained Approach for Large Lan...
Unlocking Anticipatory Text Generation- A Constrained Approach for Large Lan...Ingeol Baek
 
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbas73678sri
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Introductio to Data Science and types of data
Introductio to Data Science and types of dataIntroductio to Data Science and types of data
Introductio to Data Science and types of dataManishaPatil932723
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvws73678sri
 
concept of soil quality & soil health.pptx
concept of soil quality & soil health.pptxconcept of soil quality & soil health.pptx
concept of soil quality & soil health.pptxpranavmishrafzd
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Data Discovery With Power Query in excel
Data Discovery With Power Query in excelData Discovery With Power Query in excel
Data Discovery With Power Query in excelKapilSidhpuria3
 
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYMANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYFreelance
 
Báo cáo Connected Consumer Quý 4 năm 2023
Báo cáo Connected Consumer Quý 4 năm 2023Báo cáo Connected Consumer Quý 4 năm 2023
Báo cáo Connected Consumer Quý 4 năm 2023MarketingTrips
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdfDSP Mutual Fund
 

Kürzlich hochgeladen (20)

INTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
INTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYINTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
INTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
 
Film cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrkFilm cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrk
 
prediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachprediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approach
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Unlocking Anticipatory Text Generation- A Constrained Approach for Large Lan...
Unlocking Anticipatory Text Generation-  A Constrained Approach for Large Lan...Unlocking Anticipatory Text Generation-  A Constrained Approach for Large Lan...
Unlocking Anticipatory Text Generation- A Constrained Approach for Large Lan...
 
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Introductio to Data Science and types of data
Introductio to Data Science and types of dataIntroductio to Data Science and types of data
Introductio to Data Science and types of data
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
 
concept of soil quality & soil health.pptx
concept of soil quality & soil health.pptxconcept of soil quality & soil health.pptx
concept of soil quality & soil health.pptx
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Data Discovery With Power Query in excel
Data Discovery With Power Query in excelData Discovery With Power Query in excel
Data Discovery With Power Query in excel
 
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYMANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
 
Báo cáo Connected Consumer Quý 4 năm 2023
Báo cáo Connected Consumer Quý 4 năm 2023Báo cáo Connected Consumer Quý 4 năm 2023
Báo cáo Connected Consumer Quý 4 năm 2023
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdf
 

Data Mining with Background Knowledge from the Web - Introducing the RapidMiner Linked Open Data Extension

  • 1. Data Mining with Background Knowledge from the Web Introducing the RapidMiner Linked Open Data Extension 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 1 Heiko Paulheim, Petar Ristoski, Evgeny Mitichkin, Christian Bizer
  • 2. Motivation: An Example Data Mining Task • Analyzing book sales ISBN City Sold 3-2347-3427-1 Darmstadt 124 3-43784-324-2 Mannheim 493 3-145-34587-0 Roßdorf 14 ISBN City Population ... Genre Publisher ... Sold 3-2347-3427-1 Darm-stadt 144402 ... Crime Bloody 3-43784-324-2 Mann-heim 291458 … Crime Guns Ltd. … 493 ... Books ... 124 3-145-34587-0 Roß-dorf 12019 ... Travel Up&Away ... 14 ... → Crime novels sell better in larger cities 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 2
  • 3. Motivation • Many data mining problems are solved better – when you have more background knowledge (leaving scalability aside) • Problems: – Tedious work – Selection bias: what to include? 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 3
  • 4. Linked Open Data in a Nutshell • Started in 2007 • A collection of ~1,000 open datasets – from various domains, e.g., general knowledge, government data, … – using semantic web standards (HTTP, RDF, SPARQL,…) • Machine processable • Free of charge • Sophisticated tool stacks 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 4
  • 5. Linked Open Data in a Nutshell http://lod-cloud.net/ 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 5
  • 6. Example: DBpedia 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 6
  • 7. The RapidMiner LOD Extension 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 7
  • 8. The RapidMiner LOD Extension • Automatic discovery of links to Linked Open Data – for local data objects – e.g., the database entry Boston is linked to http://dbpedia.org/resource/Boston • Automatic generation of attributes – e.g., add all numeric values found for Boston (and other cities) • Plus – Feature selection algorithms optimized for LOD – Automatic following of links to other datasets – Schema matching (coming soon) • No need to know Semantic Web technologies! 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 8
  • 9. Example: the Auto MPG Dataset • A well-known UCI dataset – Goal: predict fuel consumption of cars • Hypothesis: background knowledge → more accurate predictions • Used background knowledge: – Entity types and categories from DBpedia (=Wikipedia) 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 9
  • 10. Example: the Auto MPG Dataset • A well-known UCI dataset – Goal: predict fuel consumption of cars • Hypothesis: background knowledge → more accurate predictions • Used background knowledge: – Entity types and categories from DBpedia (=Wikipedia) • Result: M5Rules down to almost half the prediction error – i.e., on average, we are wrong by 1.6 instead of 2.9 MPG 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 10
  • 11. Example: the Auto MPG Dataset • The original attributes are – cylinders, displacement, horsepower, weight, acceleration, model, origin – plus name (unique string) and mpg (target) • Models built are, e.g., – high horsepower/weight → high consumption • Additional attributes lead to further insights, e.g. – front-wheel drives have a lower consumption than rear-wheel drives – hatchbacks have a lower consumption than station wagons – rally cars generally have a low consumption 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 11
  • 12. Example: Analyzing Statistics • As shown, e.g., at ESWC 2012, SemStats 2013 • Statistics found on the web often contain only few attributes – extreme case: only entity + target • Examples: – Quality of living in cities (right) – Corruption by country – Fertility rate by country – Suicide rate by country – Box office revenue of films – ... 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 12
  • 13. Example: Analyzing Statistics • Process in RapidMiner: – load statistic – link entities (cities, countries, etc.) to LOD cloud – collect additional attributes – analyze for correlations with target attribute of statistic 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 13
  • 14. Example: Analyzing Statistics • Quality of living in cities worldwide: indicators for low quality – too hot (highest temperature in June exceeds 27°C) – too cold (highest temperature in January below 16°C) – too big (total area exceeds 334km²) – poor cultural live (no music recordings made in this city) – or simply: wrong place on the map (latitude<24, longitude<47) all those attributes come from LOD! 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 14
  • 15. Example: Analyzing Statistics • Corruption Perception Index (CPI) by Transparency International • Indicators for low corruption: – high HDI (human development index) – large number of companies – large number of NGOs – small number of cargo airlines?! • Burnout rates in German DAX companies – Positive correlation between turnover and burnout rates – Car manufacturers are less prone to burnout – Local companies are less prone to burnout than international ones • Exception: Frankfurt 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 15
  • 16. Example: Analyzing Statistics • Sexual activity (based on Durex survey 2005-2009) – Higher in French speaking than in English speaking countries – High GDP per capita → low activity – High unemployment rate → high activity – High number of ISPs → low activity http://xkcd.com/552/ 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 16
  • 17. Further Usage Examples • Classification of Twitter messages (SMILE, 2013) – given a target, e.g., messages related to car traffic – annotate message, extract abstract features for concepts – e.g. “I-90” → highway • Prediction of user location for Twitter (ICWSM, 2013) – useful, e.g., for market research – combination with sentiment analysis: public opinion maps • Identifying disputed topics in the news (LD4KD, 2014) – on a corpus of different online newspapers – identified, e.g., concurrent opinions on drug legislation and gay marriage • Debugging Linked Open Data as such – e.g., identifying wrong links and axioms – combination with outlier detection 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 17
  • 18. Conclusions • Many data mining tasks are better solved with more background knowledge – better predictive models – more insights from additional attributes • A lot of such knowledge exists as Linked Open Data • The Linked Open Data extension grants easy access to that data – from within RapidMiner – without the need to know anything about RDF, SPARQL, etc. • Try it out! – find “Linked Open Data” on the marketplace – Google Group: https://groups.google.com/forum/#!forum/rmlod 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 18
  • 19. Data Mining with Background Knowledge from the Web Introducing the RapidMiner Linked Open Data Extension 08/20/14 Paulheim, Ristoski, Mitichkin, Bizer 19 Heiko Paulheim, Petar Ristoski, Evgeny Mitichkin, Christian Bizer