SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
An Ambitious Wikidata Tutorial
Emw
WikiConference USA
Washington, D.C.
2015-10-10 (Updated 2015-10-13)
Wikidata is a free knowledge base that can be
read and edited by
humans and machines.
Wikidata's goals
● Centralize interwiki links
● Centralize infoboxes
● Provide an interface for rich queries
● Structure the sum of all human knowledge
What you'll learn from this talk
● How to edit Wikidata
● Projects built with Wikidata
● Wikidata API programming
● Ontologies
● Wikidata vocabulary
● Where to find things
● Querying with SPARQL, etc.
● RDF and OWL exports
Elements of a Wikidata statement
Example: Washington, D.C. (Q61)
Items and properties
●
Each item and property has its own page
● Items
– Represent subjects: Barbara McClintock, Challenger disaster
– Have identifiers like Q199654, Q921090
– 14,875,838 items as of 2015-10-05
● Properties
– Represent attribute names: occupation, cause of
– Have identifiers like P106, P828
– 1,805 properties as of 2015-10-05
Statements and claims
● Claims
– Claims are “triplets”
● Formally: subject, predicate, object
● In Wikidata: item, property, value
● Example: Barbara McClintock, occupation, scientist
● Statements
– A claim is only part of a statement
– Statements also include:
● References
● Ranks
Qualifiers, ranks, references
● Qualifiers
– Qualifiers are properties used on claims rather than items
– “Bethesda population 56,527 point in time (P585) 1960”
● Ranks
– Preferred, normal, deprecated
– Useful to mark outdated claims
● References
– Source of claim; provenance
– “... stated in (P248) 1960 United States Census”
More on Wikidata vocabulary
https://www.wikidata.org/wiki/Wikidata:Glossary
Wikipedia articles have a Wikidata item link in the left
navigation panel.
Wikidata link on Wikipedia
Getting to Wikidata from Wikipedia
Instant search suggests items that have labels or
aliases matching your keyword.
Wikidata search
Search by label
Search by alias: “flu” -> influenza
Finding properties
● Is there a property for “number of windows”?
● What was the ID of that property, again?
● Search
– In main site search box, prefix search term with “P:”
– “P:number of”, “P:occupation”
– Instant search doesn't work for properties, only items
● Browse
– https://www.wikidata.org/wiki/Wikidata:List_of_properties
^ bookmark this!
Let's edit Wikidata.
Barbara McClintock
https://www.wikidata.org/wiki/Q199654
●
employer (P108):
Cold Spring Harbor Laboratory
start time (P580): December 1941
● member of (P463):
National Academy of Sciences
start time (P580): 1944
American Academy of Arts and Sciences
start time: 1959
Royal Society
start time: 1989
● award received (P166):
Nobel Prize in Physiology or Medicine
for work (P1686): mobile genetic elements
National Medal of Science
point in time (P585): 1971
● birth name (P1477): Eleanor McClintock
Area? Height? GDP per capita?
● Quantities with units recently made possible!
● area (P2046)
● height (P2048)
● mass (P2067)
● cost (P2130)
● GDP per capita (P2132)
● total debt (P2133)
Quantities: Lots of low-hanging fruit
● Not yet on Wikidata:
– Area of Washington, D.C.
– Height of Abraham Lincoln
– Height of United States Capitol
– Length of Mississippi River
– GDP per capita of the United States of America
^ Add these!
Built on Wikidata
● Histropedia
300,000 timelines and 1.5 million events
http://histropedia.com/timeline/1fr22b0p8s/Empires
● Reasonator
Wikidata knowledge tailored for readers
J.S. Bach: http://tools.wmflabs.org/reasonator/?q=Q1339
● Gene Wiki
Every human gene is now on Wikidata
http://blog.wikimedia.de/2014/10/22/establishing-wikidata-as-the-central-hub-for-linked-open-life-
science-data/
(More info today in Open Biomedical Knowledge session at 2:15 PM)
Histropedia
http://histropedia.com/timeline/1fr22b0p8s/Empires
Tools
– Wikidata API
● https://www.wikidata.org/w/api.php
– Querying:
● Autolist:
http://tools.wmflabs.org/autolist/autolist1.html
● Wikidata Query Service (new!):
https://query.wikidata.org
– Software framework: Wikidata Toolkit
● https://www.mediawiki.org/wiki/Wikidata_Toolkit
● https://github.com/Wikidata/Wikidata-Toolkit
Wikidata API
Quick Python demo
Querying in Wikidata
List of politicians who died of cancer
Pseudo-query:
occupation: politician AND cause of death: cancer
occupation: P106
politician: Q82955
cause of death: P509
cancer: Q12078
Wikidata query in Autolist:
claim[106:82955] AND claim[509:12078]
http://tools.wmflabs.org/autolist/autolist1.html?q=claim[106:82955]%20AND%20claim[509:12078]
Only 56 politicians have died of cancer?
Seems rather low...
Classification on Wikidata
● Taxonomy of knowledge
● Enables powerful inference, novel applications
● Interesting philosophical, design, and engineering issues
Tree of Porphyry
User:VoiceOfTheCommons, CC-BY-SA 3.0
Classes and instances
● Plato is a human is a animal
● Plato instance of human subclass of animal
● Instance: concrete object, individual
● Class: abstract object
Classification on Wikidata
● instance of (P31)
– rdf:type in RDF and OWL
– Most popular Wikidata property
● subclass of (P279)
– “all instances of A are also instances of B”
– rdfs:subClassOf in RDF and OWL
Examples
● USS Nimitz instance of Nimitz-class aircraft carrier
Nimitz-class aircraft carrier subclass of aircraft carrier
● 2012 Cannes Film Festival instance of Cannes Film Festival
Cannes Film Festival subclass of film festival
● an individual charm quark instance of charm quark
charm quark subclass of quark
^ Many “leaf nodes” in Wikidata's taxonomic hierarchy are not instances.
(There are no items about individual quarks on Wikidata!)
https://www.wikidata.org/wiki/Help:Basic_membership_properties
Bad smells
Item has many instance of or subclass of claims
Items typically satisfy a huge number of instance of claims:
● Fido instance of dog
● Fido instance of English Pointer
● Fido instance of faithful animal
● …
Solution: use one class for instance of, put other class knowledge into
normal properties
● Fido instance of dog
● Fido breed: English Pointer
● Fido known for: faithfulness
● ...
Bad smells
subclass of claim that is nonsensical when interpreted as “All instances of A
are also instances of B”
Example:
dog subclass of pet
But not all dogs are pets!
feral dog subclass of dog true
feral dog subclass of pet false
:. dog subclass of pet false
Solution: put “pet” knowledge about dogs into claim that does not apply to all
instances of dog. E.g. “dog has role pet”. (Has role would not be transitive.)
Classification on Wikidata
● Last but not least: part of (P361)
– Third basic membership property
– Top-level “part-whole” relation
● subclass of and part of are both transitive; instance of is not transitive
● Transitive relation:
A subclass of B
B subclass of C
:. A subclass of C
https://www.wikidata.org/wiki/Help:Basic_membership_properties
subclass of (P279)
enables machines to infer
conceptual hierarchy
Recall
Query reports that only 56 politicians have died of cancer
Problem
Only matches the precise claim “cause of death: cancer”
Omits results that have:
– cause of death: lung cancer
– cause of death: lymphoma
– cause of death: leukemia
Solution
Include causes of death that are a subclass of cancer
– In Autolist: http://tinyurl.com/ovgjqd8
– Also possible in SPARQL in new Wikidata Query Service
SPARQL
● SPARQL: semantic query language for databases
● Wikidata recently added official support
– Query UI: https://query.wikidata.org
– Examples:
https://www.mediawiki.org/wiki/Wikibase/Indexing
/SPARQL_Query_Examples
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?politician ?cause ?politician_label ?cause_of_death_label WHERE {
?politician wdt:P106 wd:Q82955 . # find items that have "occupation (P106): politician (Q82955)"
?politician wdt:P509 ?cause . # with a P509 (cause of death) claim
?cause wdt:P279* wd:Q12078 . # ... where the cause is a subclass of (P279*) cancer (Q12078)
# ?politician wdt:P39 wd:Q11696 . # Uncomment this line to include only U.S. Presidents
OPTIONAL {?politician rdfs:label ?politician_label filter (lang(?politician_label) = "en") .}
OPTIONAL {?cause rdfs:label ?cause_of_death_label filter (lang(?cause_of_death_label) = "en").}
}
ORDER BY ASC (?politician)
Example SPARQL query
List of politicians who died of cancer:
Live demo: http://tinyurl.com/nh7jc2p
http://tinyurl.com/nh7jc2p
?cause wdt:P279* wd:Q12078 .
Be sure to add an asterisk (*) to get the subclass tree!
Autolist vs. Wikidata Query Service
Autolist Wikidata Query Service
URL https://tools.wmflabs.org/autolist/ https://query.wikidata.org
Syntax WDQ (custom, but succinct) SPARQL (W3C standard)
Support status Unofficial Official, beta
Release date 2013-09-17 2015-09-07
Developer Magnus Manske Stas Malyshev
Source code https://bitbucket.org/magnusmanske
/wikidataquery
https://github.com/wikimedia/wikidata-
query-rdf/
License GPL 2+ Apache 2.0
Technology C++ Java, Blazegraph
How to:
Explore RDF/OWL dumps locally
● Get the most recent dumps:
http://tools.wmflabs.org/wikidata-exports/rdf/
● Small, interesting: wikidata-taxonomy.nt.gz
● Download and install Protege:
http://protege.stanford.edu/
Protege
● http://protege.stanford.edu/
● The browser of the Semantic Web
● Good for small- to medium-sized ontologies
Querying cancer types in wikidata-taxonomy.nt.gz
Open questions: Modeling causes
What caused or causes:
– The Space Shuttle Challenger explosion?
– The dinosaurs to die?
– Malaria? Cancer?
– The American Civil War?
Causation on Wikidata
● https://www.wikidata.org/wiki/Help:Modeling_causes
● has cause (P828) (alias underlying cause): thing that ultimately resulted in
the effect
● has immediate cause (P1478): thing that proximately resulted in the effect
● has contributing factor (P1479): thing that significantly influenced the
effect, but did not directly cause it
American Civil War (Q8676)
has cause:
slavery in the United States (Q118382) (preferred rank)
states' rights (Q48527) (deprecated rank)
...
has immediate cause:
Battle of Fort Sumter (Q543165)
United States presidential election, 1860 (Q698842)
...
has contributing factor:
caning of Charles Sumner (Q5032419)
Dred Scott v. Sandford (Q690462)
Bleeding Kansas (Q331377)
Uncle Tom's Cabin (Q2222)
...
What caused the
American Civil War?
Thank you!
https://www.wikidata.org/wiki/User:Emw

Weitere ähnliche Inhalte

Was ist angesagt?

Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Databricks
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsPeter Haase
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AINeo4j
 
Building and using ontologies
Building and using ontologies Building and using ontologies
Building and using ontologies Elena Simperl
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Enterprise Knowledge
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for ElasticsearchFlorian Hopf
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Mayank Shrivastava
 
Semantic Web for Cultural Heritage valorisation
Semantic Web for Cultural Heritage valorisationSemantic Web for Cultural Heritage valorisation
Semantic Web for Cultural Heritage valorisationValentina Carriero
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityJoshua Shinavier
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with WeaviateNETWAYS
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerDatabricks
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 

Was ist angesagt? (20)

Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
Building and using ontologies
Building and using ontologies Building and using ontologies
Building and using ontologies
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for Elasticsearch
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
 
Semantic Web for Cultural Heritage valorisation
Semantic Web for Cultural Heritage valorisationSemantic Web for Cultural Heritage valorisation
Semantic Web for Cultural Heritage valorisation
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviate
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 

Andere mochten auch

Wikidata for libraries and archives
Wikidata for libraries and archivesWikidata for libraries and archives
Wikidata for libraries and archives_Emw
 
Up and running with Wikidata
Up and running with WikidataUp and running with Wikidata
Up and running with Wikidata_Emw
 
Wikidata presentation at SemTechBiz Berlin 2012
Wikidata presentation at SemTechBiz Berlin 2012Wikidata presentation at SemTechBiz Berlin 2012
Wikidata presentation at SemTechBiz Berlin 2012Denny Vrandecic
 
2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk CambridgeMagnus Manske
 
Linked Open Data Publications through Wikidata & Persistent Identification...
Linked Open Data  Publications through  Wikidata &  Persistent Identification...Linked Open Data  Publications through  Wikidata &  Persistent Identification...
Linked Open Data Publications through Wikidata & Persistent Identification...PACKED vzw
 
위키데이터 개론
위키데이터 개론위키데이터 개론
위키데이터 개론Jeongmin KIM
 
The Knowledge Discovery Quest
The Knowledge Discovery Quest The Knowledge Discovery Quest
The Knowledge Discovery Quest Ontotext
 
Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...
Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...
Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...Simon Knight
 
Wikipedia
WikipediaWikipedia
Wikipediasimondw
 
Open Monuments Project at GLAM Wiki Conference 2015
Open Monuments Project at GLAM Wiki Conference 2015Open Monuments Project at GLAM Wiki Conference 2015
Open Monuments Project at GLAM Wiki Conference 2015Aleksandra Janus
 
A Primer on Wikipedia Editathons (MCN 2014)
A Primer on Wikipedia Editathons (MCN 2014)A Primer on Wikipedia Editathons (MCN 2014)
A Primer on Wikipedia Editathons (MCN 2014)Dominic McDevitt-Parks
 
wikipedia event_HerstoryEditathon 3-2016_v1.3
wikipedia event_HerstoryEditathon 3-2016_v1.3wikipedia event_HerstoryEditathon 3-2016_v1.3
wikipedia event_HerstoryEditathon 3-2016_v1.3raypun101
 
Wikimedia for civil servants
Wikimedia for civil servantsWikimedia for civil servants
Wikimedia for civil servantsSara Thomas
 
Philippe Beaudette - How To Motivate Legions of Volunteers
Philippe Beaudette - How To Motivate Legions of VolunteersPhilippe Beaudette - How To Motivate Legions of Volunteers
Philippe Beaudette - How To Motivate Legions of VolunteersFeverBee Limited
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Duncan Hull
 
Quick Start Guide to Editing Wikipedia - #WikiHumanOrigins Presentation
Quick Start Guide to Editing Wikipedia - #WikiHumanOrigins PresentationQuick Start Guide to Editing Wikipedia - #WikiHumanOrigins Presentation
Quick Start Guide to Editing Wikipedia - #WikiHumanOrigins PresentationSara Snyder
 
Présentation des projets Wikimedia
Présentation des projets WikimediaPrésentation des projets Wikimedia
Présentation des projets WikimediaWikimédia France
 
20160601 Wiki Loves Earth 2016 in Bulgaria
20160601 Wiki Loves Earth 2016 in Bulgaria20160601 Wiki Loves Earth 2016 in Bulgaria
20160601 Wiki Loves Earth 2016 in BulgariaVassia Atanassova
 

Andere mochten auch (20)

Wikidata for libraries and archives
Wikidata for libraries and archivesWikidata for libraries and archives
Wikidata for libraries and archives
 
Up and running with Wikidata
Up and running with WikidataUp and running with Wikidata
Up and running with Wikidata
 
Wikidata presentation at SemTechBiz Berlin 2012
Wikidata presentation at SemTechBiz Berlin 2012Wikidata presentation at SemTechBiz Berlin 2012
Wikidata presentation at SemTechBiz Berlin 2012
 
2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge
 
Linked Open Data Publications through Wikidata & Persistent Identification...
Linked Open Data  Publications through  Wikidata &  Persistent Identification...Linked Open Data  Publications through  Wikidata &  Persistent Identification...
Linked Open Data Publications through Wikidata & Persistent Identification...
 
위키데이터 개론
위키데이터 개론위키데이터 개론
위키데이터 개론
 
The Knowledge Discovery Quest
The Knowledge Discovery Quest The Knowledge Discovery Quest
The Knowledge Discovery Quest
 
Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...
Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...
Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...
 
Wikipedia
WikipediaWikipedia
Wikipedia
 
Music scoring on Wikipedia
Music scoring on WikipediaMusic scoring on Wikipedia
Music scoring on Wikipedia
 
Wikidata
WikidataWikidata
Wikidata
 
Open Monuments Project at GLAM Wiki Conference 2015
Open Monuments Project at GLAM Wiki Conference 2015Open Monuments Project at GLAM Wiki Conference 2015
Open Monuments Project at GLAM Wiki Conference 2015
 
A Primer on Wikipedia Editathons (MCN 2014)
A Primer on Wikipedia Editathons (MCN 2014)A Primer on Wikipedia Editathons (MCN 2014)
A Primer on Wikipedia Editathons (MCN 2014)
 
wikipedia event_HerstoryEditathon 3-2016_v1.3
wikipedia event_HerstoryEditathon 3-2016_v1.3wikipedia event_HerstoryEditathon 3-2016_v1.3
wikipedia event_HerstoryEditathon 3-2016_v1.3
 
Wikimedia for civil servants
Wikimedia for civil servantsWikimedia for civil servants
Wikimedia for civil servants
 
Philippe Beaudette - How To Motivate Legions of Volunteers
Philippe Beaudette - How To Motivate Legions of VolunteersPhilippe Beaudette - How To Motivate Legions of Volunteers
Philippe Beaudette - How To Motivate Legions of Volunteers
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia
 
Quick Start Guide to Editing Wikipedia - #WikiHumanOrigins Presentation
Quick Start Guide to Editing Wikipedia - #WikiHumanOrigins PresentationQuick Start Guide to Editing Wikipedia - #WikiHumanOrigins Presentation
Quick Start Guide to Editing Wikipedia - #WikiHumanOrigins Presentation
 
Présentation des projets Wikimedia
Présentation des projets WikimediaPrésentation des projets Wikimedia
Présentation des projets Wikimedia
 
20160601 Wiki Loves Earth 2016 in Bulgaria
20160601 Wiki Loves Earth 2016 in Bulgaria20160601 Wiki Loves Earth 2016 in Bulgaria
20160601 Wiki Loves Earth 2016 in Bulgaria
 

Ähnlich wie An Ambitious Wikidata Tutorial

A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsDr. Neil Brittliff
 
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...Onando Mulang'
 
Wikipedia and Civic Engagement
Wikipedia and Civic EngagementWikipedia and Civic Engagement
Wikipedia and Civic EngagementAndrew Lih
 
Entity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionEntity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionJennifer D'Souza
 
Mon domingue key_introduction to semantic
Mon domingue key_introduction to semanticMon domingue key_introduction to semantic
Mon domingue key_introduction to semanticeswcsummerschool
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Searchkrisztianbalog
 
Repo fringe - Wikidata
Repo fringe  - WikidataRepo fringe  - Wikidata
Repo fringe - WikidataEwan McAndrew
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsKrishna Sankar
 
Web Driven Revolution For Library Data
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library DataRichard Wallis
 
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLVALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLJane Frazier
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligencekrisztianbalog
 

Ähnlich wie An Ambitious Wikidata Tutorial (11)

A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your Analytics
 
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
 
Wikipedia and Civic Engagement
Wikipedia and Civic EngagementWikipedia and Civic Engagement
Wikipedia and Civic Engagement
 
Entity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionEntity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph Completion
 
Mon domingue key_introduction to semantic
Mon domingue key_introduction to semanticMon domingue key_introduction to semantic
Mon domingue key_introduction to semantic
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
Repo fringe - Wikidata
Repo fringe  - WikidataRepo fringe  - Wikidata
Repo fringe - Wikidata
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science Competitions
 
Web Driven Revolution For Library Data
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library Data
 
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLVALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 

Kürzlich hochgeladen

Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Introductio to Data Science and types of data
Introductio to Data Science and types of dataIntroductio to Data Science and types of data
Introductio to Data Science and types of dataManishaPatil932723
 
Báo cáo Connected Consumer Quý 4 năm 2023
Báo cáo Connected Consumer Quý 4 năm 2023Báo cáo Connected Consumer Quý 4 năm 2023
Báo cáo Connected Consumer Quý 4 năm 2023MarketingTrips
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdfDSP Mutual Fund
 
Unlocking Anticipatory Text Generation- A Constrained Approach for Large Lan...
Unlocking Anticipatory Text Generation-  A Constrained Approach for Large Lan...Unlocking Anticipatory Text Generation-  A Constrained Approach for Large Lan...
Unlocking Anticipatory Text Generation- A Constrained Approach for Large Lan...Ingeol Baek
 
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYMANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYFreelance
 
Film cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrkFilm cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrk494f574xmv
 
prediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachprediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachAdekunleJoseph4
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
concept of soil quality & soil health.pptx
concept of soil quality & soil health.pptxconcept of soil quality & soil health.pptx
concept of soil quality & soil health.pptxpranavmishrafzd
 
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbas73678sri
 
INTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
INTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYINTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
INTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYFreelance
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvws73678sri
 
Data Discovery With Power Query in excel
Data Discovery With Power Query in excelData Discovery With Power Query in excel
Data Discovery With Power Query in excelKapilSidhpuria3
 

Kürzlich hochgeladen (20)

Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Introductio to Data Science and types of data
Introductio to Data Science and types of dataIntroductio to Data Science and types of data
Introductio to Data Science and types of data
 
Báo cáo Connected Consumer Quý 4 năm 2023
Báo cáo Connected Consumer Quý 4 năm 2023Báo cáo Connected Consumer Quý 4 năm 2023
Báo cáo Connected Consumer Quý 4 năm 2023
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdf
 
Unlocking Anticipatory Text Generation- A Constrained Approach for Large Lan...
Unlocking Anticipatory Text Generation-  A Constrained Approach for Large Lan...Unlocking Anticipatory Text Generation-  A Constrained Approach for Large Lan...
Unlocking Anticipatory Text Generation- A Constrained Approach for Large Lan...
 
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYMANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
 
Film cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrkFilm cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrk
 
prediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachprediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approach
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
concept of soil quality & soil health.pptx
concept of soil quality & soil health.pptxconcept of soil quality & soil health.pptx
concept of soil quality & soil health.pptx
 
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
 
INTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
INTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYINTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
INTRODUCTION TO BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
 
Data Discovery With Power Query in excel
Data Discovery With Power Query in excelData Discovery With Power Query in excel
Data Discovery With Power Query in excel
 

An Ambitious Wikidata Tutorial

  • 1. An Ambitious Wikidata Tutorial Emw WikiConference USA Washington, D.C. 2015-10-10 (Updated 2015-10-13)
  • 2. Wikidata is a free knowledge base that can be read and edited by humans and machines.
  • 3. Wikidata's goals ● Centralize interwiki links ● Centralize infoboxes ● Provide an interface for rich queries ● Structure the sum of all human knowledge
  • 4. What you'll learn from this talk ● How to edit Wikidata ● Projects built with Wikidata ● Wikidata API programming ● Ontologies ● Wikidata vocabulary ● Where to find things ● Querying with SPARQL, etc. ● RDF and OWL exports
  • 5. Elements of a Wikidata statement
  • 7. Items and properties ● Each item and property has its own page ● Items – Represent subjects: Barbara McClintock, Challenger disaster – Have identifiers like Q199654, Q921090 – 14,875,838 items as of 2015-10-05 ● Properties – Represent attribute names: occupation, cause of – Have identifiers like P106, P828 – 1,805 properties as of 2015-10-05
  • 8. Statements and claims ● Claims – Claims are “triplets” ● Formally: subject, predicate, object ● In Wikidata: item, property, value ● Example: Barbara McClintock, occupation, scientist ● Statements – A claim is only part of a statement – Statements also include: ● References ● Ranks
  • 9. Qualifiers, ranks, references ● Qualifiers – Qualifiers are properties used on claims rather than items – “Bethesda population 56,527 point in time (P585) 1960” ● Ranks – Preferred, normal, deprecated – Useful to mark outdated claims ● References – Source of claim; provenance – “... stated in (P248) 1960 United States Census”
  • 10. More on Wikidata vocabulary https://www.wikidata.org/wiki/Wikidata:Glossary
  • 11. Wikipedia articles have a Wikidata item link in the left navigation panel. Wikidata link on Wikipedia
  • 12. Getting to Wikidata from Wikipedia
  • 13. Instant search suggests items that have labels or aliases matching your keyword. Wikidata search
  • 15. Search by alias: “flu” -> influenza
  • 16. Finding properties ● Is there a property for “number of windows”? ● What was the ID of that property, again? ● Search – In main site search box, prefix search term with “P:” – “P:number of”, “P:occupation” – Instant search doesn't work for properties, only items ● Browse – https://www.wikidata.org/wiki/Wikidata:List_of_properties ^ bookmark this!
  • 18. Barbara McClintock https://www.wikidata.org/wiki/Q199654 ● employer (P108): Cold Spring Harbor Laboratory start time (P580): December 1941 ● member of (P463): National Academy of Sciences start time (P580): 1944 American Academy of Arts and Sciences start time: 1959 Royal Society start time: 1989 ● award received (P166): Nobel Prize in Physiology or Medicine for work (P1686): mobile genetic elements National Medal of Science point in time (P585): 1971 ● birth name (P1477): Eleanor McClintock
  • 19. Area? Height? GDP per capita? ● Quantities with units recently made possible! ● area (P2046) ● height (P2048) ● mass (P2067) ● cost (P2130) ● GDP per capita (P2132) ● total debt (P2133)
  • 20. Quantities: Lots of low-hanging fruit ● Not yet on Wikidata: – Area of Washington, D.C. – Height of Abraham Lincoln – Height of United States Capitol – Length of Mississippi River – GDP per capita of the United States of America ^ Add these!
  • 21. Built on Wikidata ● Histropedia 300,000 timelines and 1.5 million events http://histropedia.com/timeline/1fr22b0p8s/Empires ● Reasonator Wikidata knowledge tailored for readers J.S. Bach: http://tools.wmflabs.org/reasonator/?q=Q1339 ● Gene Wiki Every human gene is now on Wikidata http://blog.wikimedia.de/2014/10/22/establishing-wikidata-as-the-central-hub-for-linked-open-life- science-data/ (More info today in Open Biomedical Knowledge session at 2:15 PM)
  • 23. Tools – Wikidata API ● https://www.wikidata.org/w/api.php – Querying: ● Autolist: http://tools.wmflabs.org/autolist/autolist1.html ● Wikidata Query Service (new!): https://query.wikidata.org – Software framework: Wikidata Toolkit ● https://www.mediawiki.org/wiki/Wikidata_Toolkit ● https://github.com/Wikidata/Wikidata-Toolkit
  • 25. Querying in Wikidata List of politicians who died of cancer Pseudo-query: occupation: politician AND cause of death: cancer occupation: P106 politician: Q82955 cause of death: P509 cancer: Q12078 Wikidata query in Autolist: claim[106:82955] AND claim[509:12078]
  • 27. Only 56 politicians have died of cancer? Seems rather low...
  • 28. Classification on Wikidata ● Taxonomy of knowledge ● Enables powerful inference, novel applications ● Interesting philosophical, design, and engineering issues
  • 29.
  • 31. Classes and instances ● Plato is a human is a animal ● Plato instance of human subclass of animal ● Instance: concrete object, individual ● Class: abstract object
  • 32. Classification on Wikidata ● instance of (P31) – rdf:type in RDF and OWL – Most popular Wikidata property ● subclass of (P279) – “all instances of A are also instances of B” – rdfs:subClassOf in RDF and OWL
  • 33. Examples ● USS Nimitz instance of Nimitz-class aircraft carrier Nimitz-class aircraft carrier subclass of aircraft carrier ● 2012 Cannes Film Festival instance of Cannes Film Festival Cannes Film Festival subclass of film festival ● an individual charm quark instance of charm quark charm quark subclass of quark ^ Many “leaf nodes” in Wikidata's taxonomic hierarchy are not instances. (There are no items about individual quarks on Wikidata!) https://www.wikidata.org/wiki/Help:Basic_membership_properties
  • 34. Bad smells Item has many instance of or subclass of claims Items typically satisfy a huge number of instance of claims: ● Fido instance of dog ● Fido instance of English Pointer ● Fido instance of faithful animal ● … Solution: use one class for instance of, put other class knowledge into normal properties ● Fido instance of dog ● Fido breed: English Pointer ● Fido known for: faithfulness ● ...
  • 35. Bad smells subclass of claim that is nonsensical when interpreted as “All instances of A are also instances of B” Example: dog subclass of pet But not all dogs are pets! feral dog subclass of dog true feral dog subclass of pet false :. dog subclass of pet false Solution: put “pet” knowledge about dogs into claim that does not apply to all instances of dog. E.g. “dog has role pet”. (Has role would not be transitive.)
  • 36. Classification on Wikidata ● Last but not least: part of (P361) – Third basic membership property – Top-level “part-whole” relation ● subclass of and part of are both transitive; instance of is not transitive ● Transitive relation: A subclass of B B subclass of C :. A subclass of C https://www.wikidata.org/wiki/Help:Basic_membership_properties
  • 37. subclass of (P279) enables machines to infer conceptual hierarchy
  • 38. Recall Query reports that only 56 politicians have died of cancer Problem Only matches the precise claim “cause of death: cancer” Omits results that have: – cause of death: lung cancer – cause of death: lymphoma – cause of death: leukemia Solution Include causes of death that are a subclass of cancer – In Autolist: http://tinyurl.com/ovgjqd8 – Also possible in SPARQL in new Wikidata Query Service
  • 39. SPARQL ● SPARQL: semantic query language for databases ● Wikidata recently added official support – Query UI: https://query.wikidata.org – Examples: https://www.mediawiki.org/wiki/Wikibase/Indexing /SPARQL_Query_Examples
  • 40. PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> SELECT ?politician ?cause ?politician_label ?cause_of_death_label WHERE { ?politician wdt:P106 wd:Q82955 . # find items that have "occupation (P106): politician (Q82955)" ?politician wdt:P509 ?cause . # with a P509 (cause of death) claim ?cause wdt:P279* wd:Q12078 . # ... where the cause is a subclass of (P279*) cancer (Q12078) # ?politician wdt:P39 wd:Q11696 . # Uncomment this line to include only U.S. Presidents OPTIONAL {?politician rdfs:label ?politician_label filter (lang(?politician_label) = "en") .} OPTIONAL {?cause rdfs:label ?cause_of_death_label filter (lang(?cause_of_death_label) = "en").} } ORDER BY ASC (?politician) Example SPARQL query List of politicians who died of cancer: Live demo: http://tinyurl.com/nh7jc2p
  • 42. ?cause wdt:P279* wd:Q12078 . Be sure to add an asterisk (*) to get the subclass tree!
  • 43. Autolist vs. Wikidata Query Service Autolist Wikidata Query Service URL https://tools.wmflabs.org/autolist/ https://query.wikidata.org Syntax WDQ (custom, but succinct) SPARQL (W3C standard) Support status Unofficial Official, beta Release date 2013-09-17 2015-09-07 Developer Magnus Manske Stas Malyshev Source code https://bitbucket.org/magnusmanske /wikidataquery https://github.com/wikimedia/wikidata- query-rdf/ License GPL 2+ Apache 2.0 Technology C++ Java, Blazegraph
  • 44. How to: Explore RDF/OWL dumps locally ● Get the most recent dumps: http://tools.wmflabs.org/wikidata-exports/rdf/ ● Small, interesting: wikidata-taxonomy.nt.gz ● Download and install Protege: http://protege.stanford.edu/
  • 45.
  • 46. Protege ● http://protege.stanford.edu/ ● The browser of the Semantic Web ● Good for small- to medium-sized ontologies
  • 47.
  • 48. Querying cancer types in wikidata-taxonomy.nt.gz
  • 49. Open questions: Modeling causes What caused or causes: – The Space Shuttle Challenger explosion? – The dinosaurs to die? – Malaria? Cancer? – The American Civil War?
  • 50. Causation on Wikidata ● https://www.wikidata.org/wiki/Help:Modeling_causes ● has cause (P828) (alias underlying cause): thing that ultimately resulted in the effect ● has immediate cause (P1478): thing that proximately resulted in the effect ● has contributing factor (P1479): thing that significantly influenced the effect, but did not directly cause it
  • 51. American Civil War (Q8676) has cause: slavery in the United States (Q118382) (preferred rank) states' rights (Q48527) (deprecated rank) ... has immediate cause: Battle of Fort Sumter (Q543165) United States presidential election, 1860 (Q698842) ... has contributing factor: caning of Charles Sumner (Q5032419) Dred Scott v. Sandford (Q690462) Bleeding Kansas (Q331377) Uncle Tom's Cabin (Q2222) ... What caused the American Civil War?