Streamlining Python Development: A Guide to a Modern Project Setup
A preliminary study on Wikipedia Dbpdeia and Wikidata
1. Andrea Wei-Ching Huang
Institute of Information Science, Academia Sinica, Taipei, Taiwan
June.01 2015 @ IIS R101
1. Why & What
2. Semantic Enrichment
3. What if : Digital Archives Taiwan
A Preliminary Study on Wikipedia : DBpedia : Wikidata
7. Collaboratively-generated, semi-structured
information is made up of content which is
(a) semantified,
(b) wide-coverage,
(c) up-to-date,
(d) multilingual,
(e) free in nature.
Hovy et al., Collaboratively built semi-structured content and Artificial
Intelligence: The story so far, Artificial Intelligence (2012)
8. Source; Arnold, P., & Rahm, E. (2015). Automatic Extraction of Semantic Relations from Wikipedia.
International Journal on Artificial Intelligence Tools, 24(2), 1540010.
9. Basic Information Table
Wikipedia DBpedeia Wikidata
Website wikipedia.org dbpedia.org www.wikidata.org
Release Time January 15, 2001 23 January 2007 30 October 2012
Description "the free encyclopedia" “the Semantic Web mirror of
Wikipedia”
“Wikipedia for data”
Host Wikimedia Foundation, Inc. University of Leipzig; University of
Mannheim; OpenLink Software
Wikimedia Foundation, Inc.
Creators Jimmy Wales, Larry Sanger Wikimedia community
Data mainly from Wikipedians Wikipedia Wikipedia and sister projects
Generation method manual/community-created automatic/
semi-automatic
semi-automatic;
manual/community-created
Advantage Free text /
Easiness of access and contribution
LOD Hub,
Semantic coverage & depth
Quality (accuracy) : URI/
Provenance/Contextual representation
Operation Media Wiki Virtuoso Universal Server MediaWiki Extension: Wikibase
URI/IRI Schemes (language).wikipedia.org/wiki/Name Wikipedia-like IRIs
(language).dbpedia.org/resource/Name
Language independent number IDs
http://wikidata.org/wiki/Qxxx or Pxxx
Data structure Mostly unstructured texts;
Semi-structured: infobox, category…
RDF; Named Graphs;
DBpedia ontology (dynamic structure)
Wikibase data model ; Wikibase system
ontology; Wikidata:WikiProject Ontology
Data Access Wikipedia Data Dumps DBpedia dumps RDF dumps
Free Text Search SPARQL endpoint Wikidata Query (WDQ)
MediaWiki API DBpedia Spotlight (annotating mentions
of DBpedia resources in text)
Wikidata API
License CC Attribution / Share-Alike 3.0; text
with dual-licensed under GFDL;
media licensing varies.
GNU General Public License CC0 1.0
Language(s)
Support
288 (as May 2015) 111(extraction of Wikipedia)
119 (see DBpedia dumps)
>125 (as Mar, 2015)
27 (DBpedia Ontology)
358 (as Aug.2014)
Mutual Relation Wikipedia:Wikidata Dbpedia:Wikidata Wikidata:DBpedia
10. 1 Arts and culture
1.1 Award
1.2 Book
1.3 Comic book
1.4 Fictional character
1.5 Fictional element
1.6 Film
1.7 Game
1.8 Language
1.8.1 Styles
1.8.2 Other language
1.9 Music
1.10 Publishing
1.11 Radio
1.12 Television
1.13 Other arts and culture
2 Geography and place
2.1 Geography
2.2 Place
2.3 Buildings and structures
2.3.1 Entertainment venues and structures
2.3.2 Historic sites and structures
2.3.3 Other buildings and structures
3 Health and fitness
3.1 Medicine
3.2 Other health and fitness
4 History and events
4.1 Event
4.2 History
5 Mathematics and abstraction
6 Person
6.1 Religious person
6.2 Royalty and nobility
6.3 Sportsperson
6.3.1 American football person
6.3.2 Baseball person
6.3.3 Basketball person
6.3.4 Motorsports person
6.3.5 Other sportsperson
6.4 Other person
7 Religion and belief
7.1 Religious building
7.2 Other religion
8 Science and nature
8.1 Biology
8.1.1 Botany
8.1.2 Animal
8.1.3 Other biology
8.2 Astronomy
8.2.1 Spaceflight
8.2.2 Other astronomy
8.3 Geology
8.4 Weather
8.5 Other science and nature
9 Society and social science
9.1 Business and economics
9.2 Education
9.3 Food and drinks
9.4 Law
9.5 Military and war
9.6 Numismatics
9.7 Organization
9.8 Politics and government
9.8.1 Cabinet
9.8.2 Constituency
9.8.3 Legislature
9.8.4 Party
9.9 Other politics and government
9.10 Transport
9.10.1 Air transport
9.10.2 Automotive
9.10.3 Highway and street
9.10.4 Public transport
9.10.5 Rail transport
9.10.6 Water transport
9.10.7 Other transport
9.11 Sports
9.11.1 American football
9.11.2 Association football (soccer)
9.11.3 Athletics
9.11.4 Australian rules football
9.11.5 Canadian football
9.11.6 Badminton
9.11.7 Baseball
9.11.8 Basketball
9.11.9 Boxing
9.11.10 Cricket
9.11.11 Curling
9.11.12 Cycling
9.11.13 Field hockey
9.11.14 Figure skating
9.11.15 Floorball
9.11.16 Gaelic games
9.11.17 Golf
9.11.18 Handball
9.11.19 Horse racing
9.11.20 Ice hockey
9.11.21 Lacrosse
9.11.22 Martial arts
9.11.23 Motorsports
9.11.24 Multi-sport competition
9.11.25 Netball
9.11.26 Roller hockey
9.11.27 Rowing
9.11.28 Rugby league
9.11.29 Rugby union
9.11.30 Sailing
9.11.31 Skiing
9.11.32 Softball
9.11.33 Squash
9.11.34 Swimming
9.11.35 Tennis
9.11.36 Volleyball
9.11.37 Wrestling
9.11.38 Other sports
9.12 Other society and social sciences
10 Technology and applied science
10.1 Computing
10.1.1 Hardware
10.1.2 Software
10.1.3 Other computing
10.2 Photography
10.3 Other technology
11 Other
11.1 Shimming
11.2 Parent templates
11.3 Internal use
11.4 Not infoboxes
11.4.1 Subtemplates
11.4.2 Cleanup
11.4.3 Documentation
11.5 Pre-filled
11.6 Unsorted
12 See also
13 External links
http://en.wikipedia.org/wiki/Wikipedia:List_of_infoboxes
Wikipedia InfoBoxes
A
► Agriculture (43 C, 189 P)
► Architecture (41 C, 90 P)
▼ Arts (36 C, 72 P)
▼ Arts by culture (7 C)
► Artists by culture (11 C, 1 P)
► Celtic art (4 C, 62 P)
► Cinema by culture (16 C, 2 P)
▼ Painting by culture (12 C)
► Paintings by nationality (36 C)
► Ancient Greek pottery (7 C, 19 P)
► Bangladeshi painting (1 C, 2 P)
► Brazilian painting (1 C, 3 P)
▼ Chinese painting (10 C, 30 P)
► Art movements in Chinese painting (5 P)
► Banhua (1 P)
► Chinese ink brush (6 P)
► Ming dynasty painting (1 C, 4 P)
▼ Chinese painters (42 C, 3 P)
► Painters from Anhui (1 C, 13 P)
► Painters from Beijing (1 C, 10 P)
► Chinese landscape painters (8 C, 1 P)
► Painters from Chongqing (1 C)
► Five Dynasties and Ten Kingdoms painters (5 C)
► Painters from Fujian (1 C, 11 P)
► Painters from Gansu (1 C, 1 P)
► Painters from Guangdong (1 C, 12 P)
► Painters from Guangxi (1 C, 1 P)
► Painters from Guizhou (1 C)
► Painters from Hebei (1 C, 1 P)
► Painters from Heilongjiang (1 C)
► Painters from Henan (1 C, 14 P)
► Hong Kong painters (6 P)
► Painters from Hubei (1 C, 3 P)
► Painters from Hunan (1 C, 3 P)
► Painters from Jiangsu (1 C, 75 P)
► Painters from Jiangxi (1 C, 12 P)
► Painters from Jilin (1 C)
► Jin dynasty (1115–1234) painters (1 P)
► Jin dynasty (265–420) painters (2 P)
► Painters from Liaoning (1 C)
► Ming dynasty painters (1 C, 38 P)
► People's Republic of China painters (24 C, 6 P)
► Chinese portrait painters (14 P)
► Qing dynasty painters (1 C, 63 P)
▼ Republic of China painters (1 C, 52 P)
► Republic of China landscape painters (2 P)
► Painters from Shaanxi (1 C, 6 P)
► Painters from Shandong (1 C, 9 P)
► Painters from Shanghai (1 C, 14 P)
► Painters from Shanxi (2 P)
► Painters from Sichuan (1 C, 7 P)
► Song dynasty painters (1 C, 29 P)
► Southern and Northern Dynasties painters (6 C)
► Sui dynasty painters (2 P)
► Tang dynasty painters (1 C, 10 P)
► Three Kingdoms painters (1 C)
► Painters from Tianjin (3 P)
► Yuan dynasty painters (1 C, 23 P)
► Painters from Yunnan (1 C, 1 P)
► Painters from Zhejiang (1 C, 61 P)
► Chinese painter stubs (191 P)
► Chinese paintings (3 C, 14 P)
► Qing dynasty painting (1 P)
► Song dynasty painting (1 C, 1 P)
► Tang dynasty painting (2 C, 2 P)
► Tibetan painting (1 C, 10 P)
► Radio by culture (2 C)
► Television by culture (7 C)
► Theatre by culture (9 C)
► Arts by period (1 C, 1 P)
► Arts by place (5 C, 1 P)
► Aesthetics (19 C, 130 P)
► Artists (39 C, 67 P, 2 F)
► Audiovisual art (2 C, 1 P)
► Arts awards (13 C, 41 P)
► Art competitions (1 C, 3 P)
► Crafts (31 C, 97 P)
► Creative works (21 C, 2 P)
► Culinary arts (2 C, 19 P)
► Arts databases (1 C, 8 P)
► Disability in the arts (5 C, 19 P)
► Arts districts (2 C, 57 P)
► Arts events (9 C, 10 P)
► Funerary art (2 C, 14 P)
► Arts genres by country or nationality (20 C)
► Artistic incompetence (17 P)
► Art and culture law (2 C, 13 P)
► Arts-related lists (20 C, 42 P)
► Literature (52 C, 86 P)
► Arts occupations (9 C, 34 P)
► Arts organizations (24 C, 61 P)
► People associated with the arts (11 C, 5 P)
► Performing arts (44 C, 107 P)
► Perfumery (6 C, 44 P)
► Plastic arts (8 C, 3 P)
► The arts and politics (7 C, 7 P)
► Religion and the arts (9 C, 2 P)
▼ Topics in the arts (8 C, 2 P)
► Topics in popular culture (41 C, 96 P)
► Angels in art (2 C, 116 P)
► Animals in art (22 C, 102 P)
► Anti-fascist works (3 C, 4 P)
▼ Art by subject (22 C, 6 P)
► Statues by subject (11 C, 1 P)
▼ Paintings by subject (7 C, 2 P)
▼ Portraits by subject (6 C, 5 P)
► Portraits of monarchs (3 C, 13 P, 1 F)
► Portraits of popes (8 P, 1 F)
► Portraits of historial figures (2 P)
► Self-portraits (1 C, 54 P, 3 F)
► Portraits of William Shakespeare (15 P)
▼ Portraits of women (1 C, 2 P)
► Mona Lisa (13 P, 3 F)
► Paintings set in cabarets (4 P)
► Landscape paintings (56 P)
► Maritime paintings (1 C, 29 P)
► Paintings depicting myths (1 C, 5 P)
► Paintings of people (5 C, 19 P)
► War paintings (1 C, 68 P)
► Angels in art (2 C, 116 P)
► Animals in art (22 C, 102 P)
► Black people in art (13 P)
► Botanical art (1 C, 17 P)
► Dacia in art (1 C, 11 P)
► Death in art (4 C, 14 P)
► Depictions of kneeling (5 P)
► Environmental art (3 C, 26 P)
► Marine art (5 C, 11 P, 1 F)
► Mathematics and art (3 C, 5 P)
► Military art (7 C, 74 P, 1 F)
► Moon in art (3 P)
► Native Americans in art (19 P, 1 F)
► Depictions of people (22 C)
► Political art (10 C, 61 P)
► Religious art (7 C, 12 P)
► Science in art (1 C, 12 P)
► Sexuality in arts (4 C, 1 P)
► Slavery in art (11 P, 1 F)
► Vodou art (2 P)
► Censorship in the arts (3 C, 52 P)
► Military of the United States in art (5 C, 1 P)
► Virtual reality in fiction (6 C, 82 P)
► Arts venues (4 C, 4 P)
► Visual arts (45 C, 69 P)
► Women and the arts (17 C, 26 P)
► Works about the arts (20 C, 1 P)
► Wikipedia books on arts (7 C, 7 P)
► Art stubs (21 C, 379 P)
B
► Behavior (24 C, 50 P)
C
► Chronology (20 C, 52 P)
► Creativity (18 C, 63 P)
► Culture (46 C, 62 P)
D
► Disciplines (8 C, 1 P)
E
► Education (59 C, 197 P)
► Environment (47 C, 75 P)
G
► Geography (28 C, 79 P)
► Government (66 C, 113 P)
H
► Health (40 C, 4 P)
► History (34 C, 37 P)
► Humanities (33 C, 80 P)
► Humans (25 C, 43 P)
I
► Industry (37 C, 101 P)
► Information (25 C, 33 P)
K
► Knowledge (31 C, 96 P)
L
► Language (26 C, 69 P)
► Law (27 C, 75 P)
M
► Mathematics (19 C, 9 P)
► Medicine (23 C, 18 P)
► Mind (37 C, 18 P)
N
► Nature (23 C, 9 P)
O
► Objects (6 C, 2 P)
P
► People (13 C, 4 P)
► Politics (36 C, 50 P)
S
► Science (38 C, 32 P)
► Sports (36 C, 9 P)
► Structure (24 C, 13 P)
► Systems (7 C, 23 P)
T
► Technology (52 C, 134 P)
U
► Universe (10 C, 24 P)
W
► World (13 C, 12 P)
Wikipedia Category
http://en.wikipedia.org/wiki/Category:Main_topic_classifications
35 subcategories13 topics
Semi-structured
12. "Wiki" is a Hawaiian word meaning…
http://www.wikidata.org/wiki/Q128736
The results of Wikipedia article and
Wikidata about John Nash’s car accident
after 17 hours of related news release.
13. Wikipedia DBpedeia Wikidata
Language(s) Support 288 (as May 2015) 111(extraction of Wikipedia)
119 (see DBpedia dumps)
>125 (as Mar, 2015)
27 (DBpedia Ontology)
358 (as Aug.2014)
16. Infobox
Categories
Structured information is hidden in Article
Wikitext / templates such as: infobox and
categories.
Source: Broughton, J. (2008). Wikipedia: the missing manual. " O'Reilly Media, Inc.".
19. WikiTaxonomy is generated by
traversing the network and deciding
for each pair of categories whether
the sub-category isa a super-category. Hovy et al., Collaboratively built semi-structured content and Artificial
Intelligence: The story so far, Artificial Intelligence (2012)
20. Main Reference: Lehmann, Jens, et al. (2015) "DBpedia–A large-scale, multilingual
knowledge base extracted from Wikipedia." Semantic Web, Vol 6. No.2
21. Data Extraction and Mapping
Data Dumps
Extractors turn a specific type of wiki markup into triples.
http://dbpedia.org/resource/Academia_Sinica
http://en.wikipedia.org/wiki/Academia_Sinica
23. DBPedia Thematic Overview
Revised Source from: Valsecchi, F., Abrate, M., Bacciu, C., Tesconi, M., & Marchetti, A. DBpedia Atlas: Mapping
the Uncharted Lands of Linked Data. Linked Data on the Web (LDOW2015)
DBpedia Atlas, online at http://wafi.iit.cnr.it/lod/dbpedia/atlas.
the largest classes of the ontology: Agent, Place, Work, Species, and TimePeriod
most deepest levels of the ontology are in Place : Diocese class (has 5 super classes)
and OverseasDepartment, HistoricalDistrict, FormerMunicipality, HistoricalProvince (6 super classes)
the highest average outdegree: Soccer Manager, Jockey and Horse Trainer (bottom right)
the lowest depth/average outdegree: CareerStation, PersonFunction and TimePeriod
31. Academia Sinica (Q337266)
Statements
Academia Sinica
/m/0216tkFreebase
identifier stated in Freebase Data Dumps as
publication of 28 October 2013
Contextual information/
ternary relations/ are
represented by the
“qualifier”
32. 1.Item
1. Item identifier (number prefixed with Q)
2. Fingerprint, consisting of:
1. Multilingual label*
2. Multilingual description*
3. Multilingual aliases
3. Statements, each consisting of:
1. Claim, consisting of:
1. Property
2. Value
3. Qualifiers (additional property-
value pairs)
2. References (each consisting of one or
more property-value pairs)
3. Rank
4. Site links
2. Property
1. Property identifier (number prefixed with P)
2. Fingerprint, consisting of:
1. Multilingual label*
2. Multilingual description*
3. Multilingual aliases
3. Statements, each consisting of:
1. Claim, consisting of:
1. Property
2. Value
3. Qualifiers (additional property-value
pairs)
2. References (each consisting of one or more
property-value pairs)
3. Rank
4. Datatype
Wikibase database content can be summarized as follows:
Entity is one of the following three types of Wikibase pages, each with database content:
3. Query**
*) Unless label and/or description of an entity are not empty, within the scope of an entity type, an entity's combination
of label and description in a certain language must be unique.
**) Under development.
http://www.mediawiki.org/wiki/Wikibase/DataModel/Primer
35. dat.digitalarchives.tw can answer questions like:
Q1:銅琺瑯方瓶有哪些語意概念?
What concepts are represented in the Artifact A ?
Q2: 概念侈口(器口向 外張)描述了哪些
器物?
What artifacts have been described by the concept X ?
Q3: 器物一和器物二有哪些相似的特質?
What relations are between A and B ( or more) ?
36. 1. 25 Artifact : 374 triple
2. 6 classes (details)
3. core properties: 10/11 dat:ceramicCharacteristics ; [陶瓷性狀描述]
not been used yet.
4. Concepts: 148 dat concepts + 39 AAT
5. 24/25 Artifacts use AAT; the main properties to relate AAT are
dat:ArtifactType /[器物類型], dct:created /[創作時代] and
dct:medium
6. 181 instances (details) : 148 concepts + 25 Artifact + 8 meta (4
datasets + 3 reusing + 1 Article ) using 40 properties (details)
7. Total triples : 641
Data Profiling : 25 artifacts in dat.digitalarchives.tw
55. 1. Wikidata URI for disambiguation?
2. Enrichment by embedding Wikidata
information to our interfaces? (no
extraction & maintenance tasks)
3. Logical reasoning through Wikidata or
DBPedia or (Wikidata +DBPedia) to infer
new knowledge ?
60. The new move towards the possible partnership of
Europeana and Wikidata
http://pro.europeana.eu/files/Europeana_Professional/Europeana
_Network/europeana_wikimedia_taskforce_report_2015.pdf
64. Thank you
This document is made available under the Creative Commons Licence CC-BY-SA 4.0
Citation Information: Andrea Wei-Ching Huang (2015) A Preliminary Study on Wikipedia, DBpedia
and Wikidata. URL: http://andrea-index.blogspot.tw/2015/06/wikipedia-dbpedia-wikidata.html