AI You Can Trust - Ensuring Success with Data Integrity Webinar
Metadata in Wikipedia
1. Metadata in Wikipedia
Daniel Kinzler
Wikipedia
Traditional Metadata
Metadata in Wikipedia Document and Revision
Media Metadata
Accessing Metadata
data in, data out Link Structure
Hyperlinks
Categories
Inter-Language Links
WikiWord
Daniel Kinzler Structured Data
Records
Infoboxes
Wikimedia Deutschland e.V. DBPedia
Semantic MediaWiki
WikiData
September 26. 2008 Conclusion
We Have
We Need
Thank You
2. Metadata in Wikipedia
Wikipedia
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
Categories
Wikipedia is the free encyclopedia anyone can edit Inter-Language Links
WikiWord
Founded in 2001 Structured Data
Records
Has become the standard online reference Infoboxes
DBPedia
Semantic MediaWiki
Number 8 website (Alexa), 50K requests per second WikiData
Conclusion
Exists in 250 languages, has 10 million articles We Have
We Need
Run by Wikimedia, runs on MediaWiki Thank You
Free content, free software
3. Metadata in Wikipedia
Document Metadata
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Traditional (document) metadata is available throughout Link Structure
Hyperlinks
Wikipedia Categories
Inter-Language Links
Document information WikiWord
Structured Data
Title Records
URL Infoboxes
DBPedia
Semantic MediaWiki
Revision information WikiData
Author Conclusion
We Have
Timestamp We Need
Thank You
4. Metadata in Wikipedia
Document Metadata
Daniel Kinzler
Wikipedia
Metadata for media files is maintained on-page, as Traditional Metadata
content: Document and Revision
Media Metadata
Accessing Metadata
Source, License, Contributors, . . . Link Structure
Hyperlinks
Categories
Inter-Language Links
WikiWord
Structured Data
Records
Infoboxes
DBPedia
Semantic MediaWiki
WikiData
Conclusion
We Have
We Need
Thank You
5. Metadata in Wikipedia
Images Metadata
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Metadata for image formats Media Metadata
Accessing Metadata
Resolution Link Structure
Hyperlinks
EXIF Categories
Inter-Language Links
Author, Copyright WikiWord
Structured Data
Timestamp Records
Exposure, Aperture, Infoboxes
DBPedia
Flash Semantic MediaWiki
WikiData
Camera model
Conclusion
... We Have
We Need
Metadata for audio and Thank You
video formats is not yet
supported.
6. Metadata in Wikipedia
Online Export Interface
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
MediaWiki’s page export facility provides limited Hyperlinks
Categories
metadata Inter-Language Links
WikiWord
Special:Export Structured Data
Records
Pages and revisions Infoboxes
DBPedia
Semantic MediaWiki
XML wrapper around wikitext WikiData
Conclusion
Some basic metadata We Have
We Need
Thank You
7. Metadata in Wikipedia
Online Export Interface XML
Daniel Kinzler
Wikipedia
http://en.wikipedia.org/wiki/Special: Traditional Metadata
Export/Berlin Document and Revision
Media Metadata
Accessing Metadata
<page>
<title>Berlin</title> Link Structure
Hyperlinks
<id>3354</id> Categories
<revision> Inter-Language Links
WikiWord
<id>240627831</id>
<timestamp>2008-09-24T06:44:58Z</timestamp> Structured Data
Records
<contributor>
Infoboxes
<username>Ling.Nut</username> DBPedia
<id>1929579</id> Semantic MediaWiki
WikiData
</contributor>
<minor/> Conclusion
We Have
<comment>clean up, typos fixed</comment> We Need
<text xml:space=quot;preservequot;> Thank You
{{pp-semi-protected|small=yes}}
{{otheruses1|the capital of Germany}}
{{Infobox German Bundesland
8. Metadata in Wikipedia
MediaWiki Web API
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
MediaWiki’s web API for bots/scripts Hyperlinks
Categories
api.php Inter-Language Links
WikiWord
supports complex queries Structured Data
Records
lots of properties Infoboxes
DBPedia
Semantic MediaWiki
several output formats (JSON, YAML, WDDX, . . . ) WikiData
Conclusion
but no RDF We Have
We Need
Thank You
9. Metadata in Wikipedia
MediaWiki Web API XML
Daniel Kinzler
http://en.wikipedia.org/w/api.php?action= Wikipedia
query&titles=Berlin&prop=info| Traditional Metadata
revisions&rvlimit=5&format=xml Document and Revision
Media Metadata
Accessing Metadata
<page pageid=quot;3354quot;
Link Structure
ns=quot;0quot; Hyperlinks
title=quot;Berlinquot; Categories
touched=quot;2008-09-24T06:44:58Zquot; Inter-Language Links
WikiWord
lastrevid=quot;240627831quot;
Structured Data
counter=quot;2317quot; Records
length=quot;91446quot;> Infoboxes
<revisions> DBPedia
Semantic MediaWiki
<rev revid=quot;240627831quot; WikiData
minor=quot;quot;
Conclusion
user=quot;Ling.Nutquot; We Have
timestamp=quot;2008-09-24T06:44:58Zquot; We Need
comment=quot;clean up, typos fixedquot; /> Thank You
<rev revid=quot;239984512quot;
user=quot;Lear 21quot;
timestamp=quot;2008-09-21T12:03:45Zquot;
comment=quot;/* Transportation */ refquot; />
10. Metadata in Wikipedia
MediaWiki RDF Extension
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
The RDF Extension provides access to metadata Accessing Metadata
Link Structure
Per-page RDF output Hyperlinks
Categories
Document info mainly in DC and CC vocab Inter-Language Links
WikiWord
Also links, categories, images, etc Structured Data
Records
Output in XML, Turtle or NTriples Infoboxes
DBPedia
Semantic MediaWiki
Supports custom RDF embedded in wiki pages WikiData
Conclusion
Compare http://www.communitywiki.org/en/ We Have
DublinCoreForWiki We Need
Thank You
Not on Wikipedia, used by WikiTravel
11. Metadata in Wikipedia
MediaWiki RDF Extension XML
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
http://wikitravel.org/en/Special:Rdf/Berlin Media Metadata
Accessing Metadata
<rdf:Description Link Structure
Hyperlinks
rdf:about=quot;http://wikitravel.org/en/Berlinquot;> Categories
<dc:date Inter-Language Links
rdf:datatype=quot;http://purl.org/dc/elements/1.1/W3CDTFquot;> WikiWord
2008-09-23T18:04:01Z Structured Data
Records
</dc:date>
Infoboxes
<dc:rights> DBPedia
Creative Commons Attribution-ShareAlike 1.0 Semantic MediaWiki
WikiData
</dc:rights>
<dc:title xml:lang=quot;enquot;> Conclusion
We Have
Berlin We Need
</dc:title> Thank You
12. Metadata in Wikipedia
Structural Information
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
Categories
Wiki pages contain several types of links Inter-Language Links
WikiWord
The structure of hyperlinks encodes relations Structured Data
Records
Links connect on textual and conceptual level Infoboxes
DBPedia
Semantic MediaWiki
Links maintened by users, relations are implicit WikiData
Conclusion
We Have
We Need
Thank You
13. Metadata in Wikipedia
Page Links
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Hyperlinks cross-reference pages Link Structure
Hyperlinks
Navigational, but also conceptual Categories
Inter-Language Links
WikiWord
Mutually linked pages → related concepts
Structured Data
Link label and link target → word and meaning Records
Infoboxes
DBPedia
Beware identity crisis when choosing URIs Semantic MediaWiki
WikiData
Conclusion
[[Berlin Wall|The Wall]] We Have
We Need
Thank You
14. Metadata in Wikipedia
Category Links
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Pages are assigned to one or more categories.
Link Structure
Hyperlinks
Categories form a poly-hierarchy (by convention) Categories
Inter-Language Links
Categories of pages → Subsumtion of concepts WikiWord
Structured Data
Structure often unclear or broken Records
Infoboxes
No intersection, no transitive inclusion DBPedia
Semantic MediaWiki
WikiData
[[Category:Capitals in Europe]] Conclusion
We Have
[[Category:States of Germany]] We Need
Thank You
15. Metadata in Wikipedia
Inter-Language Links
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Inter-language links refer to the same page in a different Accessing Metadata
language (on another wiki) Link Structure
Hyperlinks
Granularity and coverage differ greatly Categories
Inter-Language Links
WikiWord
Mutually linked pages probably describe the same
Structured Data
concept Records
Infoboxes
Maintained manually, and per bot DBPedia
Semantic MediaWiki
WikiData
Would a centralized system be better?
Conclusion
We Have
[[de:Berliner Mauer]] We Need
Thank You
[[fr:Mur de Berlin]]
16. Metadata in Wikipedia
WikiWord
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
WikiWord builds a thesaurus by mining the link structure Accessing Metadata
Link Structure
Every page describes a concept Hyperlinks
Categories
Inter-Language Links
Link labels are terms refering to those concepts WikiWord
Structured Data
Links and categories define relations Records
Infoboxes
Multilingual thesaurus by merging languages DBPedia
Semantic MediaWiki
Export to SKOS WikiData
Conclusion
No web interface yet We Have
We Need
Thank You
http://brightbyte.de/page/WikiWord
17. Metadata in Wikipedia
Data Records
Daniel Kinzler
Wikipedia
Wikipedia uses templates to present structured data Traditional Metadata
Document and Revision
records Media Metadata
Accessing Metadata
Maintained directly by users Link Structure
Hyperlinks
Template parameters can be extracted Categories
Inter-Language Links
WikiWord
MediaWiki stores them as plain text
Structured Data
Records
External mining tools needed Infoboxes
DBPedia
Semantic MediaWiki
{{Infobox German Bundesland WikiData
|Name = Berlin
Conclusion
|image_photo = Cityscapeberlin2006.JPG We Have
|area = 891.82 We Need
Thank You
|population = 3416300
|elevation = 34 - 115
|GDP = 81.7
...
18. Metadata in Wikipedia
Infoboxes
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Infoboxes present a terse overview of Hyperlinks
Categories
properties Inter-Language Links
WikiWord
Used for Cities, animals, bands, Structured Data
Records
books, chemicals, . . . Infoboxes
DBPedia
Semantic MediaWiki
Qualifiers are problematic: WikiData
date of measurement, error Conclusion
We Have
margin, unit, source, etc We Need
Thank You
19. Metadata in Wikipedia
Personendaten
Daniel Kinzler
Wikipedia
“Personendaten” are biographic records on the German Traditional Metadata
Document and Revision
Wikipedia Media Metadata
Accessing Metadata
Works like a hidden infobox Link Structure
Hyperlinks
Contains date/place of birth/death, aliases, etc. Categories
Inter-Language Links
Maintained by a WikiProject WikiWord
Structured Data
Automated extraction (every now and then) Records
Infoboxes
DBPedia
{{Personendaten Semantic MediaWiki
WikiData
|NAME=Einstein, Albert
|ALTERNATIVNAMEN= Conclusion
We Have
|KURZBESCHREIBUNG=Physiker
We Need
|GEBURTSDATUM=14. M¨rz 1879
a Thank You
|GEBURTSORT=[[Ulm]]
|STERBEDATUM=18. April 1955
|STERBEORT=[[Princeton (New Jersey)|Princeton]], [[USA]]
}}
20. Metadata in Wikipedia
DBPedia
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
Categories
DBPedia is a project that mines RDF triples from Inter-Language Links
WikiWord
Infoboxes Structured Data
Records
Allows SPARQL queries Infoboxes
DBPedia
Semantic MediaWiki
Multiple languages WikiData
100 million triples Conclusion
We Have
We Need
Web interface Thank You
http://dbpedia.org
21. Metadata in Wikipedia
DBPedia XML
Daniel Kinzler
Wikipedia
Traditional Metadata
http://dbpedia.org/data/Berlin Document and Revision
Media Metadata
Accessing Metadata
<rdf:Description
Link Structure
rdf:about=quot;http://dbpedia.org/resource/Lothar_Bolzquot;>
Hyperlinks
<n0pred:deathPlace xmlns:n0pred=quot;http://dbpedia.org/property/quot; Categories
rdf:resource=quot;http://dbpedia.org/resource/Berlinquot;/> Inter-Language Links
WikiWord
</rdf:Description>
<rdf:Description Structured Data
Records
rdf:about=quot;http://dbpedia.org/resource/Alfred_Wegenerquot;> Infoboxes
<n0pred:birthPlace xmlns:n0pred=quot;http://dbpedia.org/property/quot; DBPedia
rdf:resource=quot;http://dbpedia.org/resource/Berlinquot;/> Semantic MediaWiki
WikiData
</rdf:Description>
<rdf:Description Conclusion
We Have
rdf:about=quot;http://dbpedia.org/resource/Untotenquot;> We Need
<n0pred:origin xmlns:n0pred=quot;http://dbpedia.org/property/quot; Thank You
rdf:resource=quot;http://dbpedia.org/resource/Berlinquot;/>
</rdf:Description>
22. Metadata in Wikipedia
Semantic MediaWiki
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
Categories
Semantic MediaWiki is a MediaWiki extension: Inter-Language Links
WikiWord
Builds an RDF structure Structured Data
Records
Allows SPARQL queries Infoboxes
DBPedia
Semantic MediaWiki
Users enter semantic relations in wiki syntax WikiData
Conclusion
More complex syntax We Have
We Need
semantic-mediawiki.org Thank You
Not supported by Wikipedia
23. Metadata in Wikipedia
Semantic MediaWiki XML
Daniel Kinzler
Wikipedia
http://semantic-mediawiki.org/wiki/Special: Traditional Metadata
ExportRDF/Berlin Document and Revision
Media Metadata
Accessing Metadata
<swivt:Subject rdf:about=quot;&wiki;Berlinquot;> Link Structure
<rdfs:label>Berlin</rdfs:label> Hyperlinks
<swivt:page rdf:resource=quot;&wikiurl;Berlinquot;/> Categories
Inter-Language Links
<rdfs:isDefinedBy rdf:resource=quot;&wikiurl;Special:ExportRDF/Berlinquot;/>
WikiWord
<rdf:type rdf:resource=quot;&wiki;Category-3ACityquot;/>
Structured Data
<property:Capital_of rdf:resource=quot;&wiki;Germanyquot;/> Records
<property:Coordinates Infoboxes
rdf:datatype=quot;http://www.w3.org/2001/XMLSchema#stringquot;> DBPedia
Semantic MediaWiki
52◦ 31 0 N, 13◦ 24 0 E WikiData
</property:Coordinates> Conclusion
<property:Located_in rdf:resource=quot;&wiki;Germanyquot;/> We Have
<property:Population We Need
Thank You
rdf:datatype=quot;http://www.w3.org/2001/XMLSchema#doublequot;>
3391407
</property:Population>
</swivt:Subject>
24. Metadata in Wikipedia
WikiData
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
Categories
WikiData is a MediaWiki extension: Inter-Language Links
WikiWord
Stores structured data separate from wikitext Structured Data
Records
Reusable across wikis Infoboxes
DBPedia
Semantic MediaWiki
Form-based structured data entry WikiData
Conclusion
No export interface We Have
We Need
omegawiki.org Thank You
Not used by Wikipedia, active on OmegaWiki
25. Metadata in Wikipedia
We Have
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
We have. . . Categories
Inter-Language Links
Document Metadata WikiWord
Structured Data
Structural Data Records
Infoboxes
Structured data records DBPedia
Semantic MediaWiki
WikiData
Lots of people maintaining this
Conclusion
We Have
We Need
Thank You
26. Metadata in Wikipedia
We Need
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
We need ways to. . . Hyperlinks
Categories
maintain the data easily. Inter-Language Links
WikiWord
store structured data sensibly. Structured Data
Records
query the data efficiently. Infoboxes
DBPedia
Semantic MediaWiki
access the data conveniently. WikiData
Conclusion
We need people to make it happen. We Have
We Need
Thank You
27. Metadata in Wikipedia
Thank You
Daniel Kinzler
Wikipedia
Traditional Metadata
Document and Revision
Media Metadata
Accessing Metadata
Link Structure
Hyperlinks
The End Categories
Inter-Language Links
WikiWord
Structured Data
Records
Infoboxes
DBPedia
Semantic MediaWiki
WikiData
http://brightbyte.de/repos/papers/2008/ Conclusion
We Have
We Need
Thank You