SlideShare a Scribd company logo
1 of 26
Download to read offline
Data-Hacking with
Wikimedia Projects:
Learn by Example, Including
Wikipedia, Wikidata, and beyond!
@notconfusing (Max Klein)
@wrought (Matt Senate)
Who is in the room?
●
Data hackers?
●
Programmers?
●
Artists/designers?
●
Open Access folks?
●
Academics?
●
Wikipedians?
●
Wikimedians?
●
Wiki-people?
Wikimedia Movement
●
wikipedia.org
●
commons.wikimedia.org
●
wikisource.org
●
wikidata.org
Wiki Context
●
Wikipedia: far-largest in
size and user base.
●
Projects often
organized by language.
●
Each language-project
has an independent
user community.
Wiki Context
●
See Wikimedia
projects as a form of:
“curated database”
●
Web's Least Common
Denominator for data.
●
Wiki Paradox:
– Low-barrier-to-entry
– High-barrier-to-entry
Buneman, et al. Curated Databases.
https://peerlibrary.org/p/rxQ6WBd89XviMF4Tk
How do Wikimedia projects
work?
●
Community
●
Opt-in
●
Reputation
●
Cultural Protocol
●
Bureaucracy
●
Adhocracy
●
Coordination
– WikiProjects
History of Wiki Data-
Hacking
●
Rambot
– First “bot”
– 2002
●
US Census Data
●
Created 200,000
articles
●
More than doubled
Wikipedia's size at
the time
●
No permissions
“Ignore All Rules” -->
Calcification
●
To get things done:
– Need to issue a
“Request for
Comment” or RFC
●
Everybody,
regardless of
expertise has
trouble with this.
– Sometimes, not
everyone is acting in
good faith, but try to
assume it, it will help.
Häskell und Grepl
●
Hänsel und Gretel ●
Wikidata
https://commons.wikimedia.org/wiki/File:Hansel_and_Gretel.jpg
●
Rural Hunger
Problems
●
The Wikimedia
datascape having
cross-language data
sharing problems.
●
Häskell and Grepl
are sent to the
Forest alone.
●
Data is sent to live in
Templates, living
alone.
●
Häskell and Grepl
first invent a
successful
breadcrumb system
●
Wikimedia commons
allows images to be
shared across Wikis.
●
Lucky if their
breadcrumbs are not
eaten / or whatever
put.
●
Lucky if we knew on
which pages the
Data was stored.
http://commons.wikimedia.org/wiki/File:Flickr_-_Per_Ola_Wiberg_~_mostly_away_-_-_
%22No_bread,_just_a_camera...huh,_quack_%22.jpg
●
A magical
gingerbread house
●
A magical data store
– Called “Wikidata”
– Interwiki data
sharing
– Plus with extra
sweeties
http://commons.wikimedia.org/wiki/Gingerbread_house#mediaviewer/File:Pe
pparkakshus.JPG
●
And the house
includes many
different sweeties.
●
Semantic Triples
●
Qualifiers
●
Ranks
http://commons.wikimedia.org/wiki/Category:Liquorice_candy#mediaviewer/File:Flickr_-
_cyclonebill_-_Slik_%281%29.jpg
●
Häskell and Grepl
eat the roof
hungrily.
●
In this story, the
User's start adding
to Wikidata.
– Importing Wikipedia
– Foreign Database
– Manual adds.
●
The evil witch trap ●
The evil data
witches, normally
keep the
information as a silo.
●
Grepl's cunning
defeat of the witch
●
Identifers
– Think of this as a foreign
database key (Brian
Jacobs)
– Max imported 400,000
biographical identifers.
– This started an Identiifer
craze on Wikidata
●
Tennis Player
●
Swiss Parliament
●
Danish Companies
●
Grepl's cunning
defeat of the witch
●
All Wikis can
transclude arbitrary
data (eventually).
●
And Citations can be
represented as
semantic properties.
See how sources are handled with “FRBR” format:
http://www.wikidata.org/wiki/Help:Sources#Scientific.2C_newspaper_or_magazine_article
Signalling Open Access
●
WikiProject Open
Access
– On English Wikipedia
●
(Data) Problem:
Signalling “Open
Access” is hard!
●
Solution: Use clear
signals directly to
the relevant data.
– Copyright license
– Source content
– Metadata
What?
How?
●
Text WikiSource→
●
Media Commons→
●
Metadata Wikidata→
●
Signals Wikipedia→
– Including license!
●
Public domain, CC0, CC-
BY, CC-BY-SA, etc
●
RFC RecitationBot→
●
RFC RecitationBot→
●
RFC RecitationBot→
●
RFC RecitationBot→
In the Wikimedia Universe
●
Where does
“Signalling Open
Access” fit in the
Wikimedia narrative?
●
(Data) Problem:
Managing citations &
references is hard!
●
Possible Solutions:
– Templates (Many)
– Categories (Many)
– Namespace (FR)
– WikiScholar (Dead)
– VisualEditor (Zotero?)
“Signalling OA”
Opportunities
●
Take pass at citation
management
– Quality & experience
●
Integrate metadata
with Wikidata
– Where it belongs
– Sensitivity and rigor
●
Forge deep
knowledge resources
on Wikipedia
– Snapshots of sources,
deeper linking
●
Automate, but use
human judgment
– Save time and energy,
improve accuracy.
Paths for Data Hackers
●
Reputation
– Create a user account
– Contribute in good faith
to wikimedia projects
●
Cultural protocol
– Identify the scope,
concerns, and nature of
a given project
– Learn to navigate
●
History and Context
– Form a narrative for
your hack based on
past endeavors
– Seek consent and
build consensus
●
Community
– Reach out, on IRC
and mailing lists!
Of course we're Open Source
github.com/wpoa/OA-signalling
github.com/wpoa/recitation-bot
Thanks!
@notconfusing (Max)
@wrought (Matt)

More Related Content

What's hot

DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016Sebastian Hellmann
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
 
Open data easy, explicit and fast
Open data easy, explicit and fastOpen data easy, explicit and fast
Open data easy, explicit and fastMetaSolutions AB
 
Aileen O'Carroll - DRI Training UCC: Introduction to Metadata
Aileen O'Carroll - DRI Training UCC: Introduction to MetadataAileen O'Carroll - DRI Training UCC: Introduction to Metadata
Aileen O'Carroll - DRI Training UCC: Introduction to Metadatadri_ireland
 
Wikidata at Wikipeda Day 15 (2016) NYC
Wikidata at Wikipeda Day 15 (2016) NYCWikidata at Wikipeda Day 15 (2016) NYC
Wikidata at Wikipeda Day 15 (2016) NYCLucie-Aimée Kaffee
 
Video game controlled vocabulary in wikidata
Video game controlled vocabulary in wikidataVideo game controlled vocabulary in wikidata
Video game controlled vocabulary in wikidatapeterchanws
 
Digital Narratives for Transylvania DH
Digital Narratives for Transylvania DHDigital Narratives for Transylvania DH
Digital Narratives for Transylvania DHShawn Day
 
Contropedia: Critical learning through Wikipedia's edit history
Contropedia: Critical learning through Wikipedia's edit historyContropedia: Critical learning through Wikipedia's edit history
Contropedia: Critical learning through Wikipedia's edit historyDavid Laniado
 
Digital game preservation conference 12 25-2018
Digital game preservation conference   12 25-2018Digital game preservation conference   12 25-2018
Digital game preservation conference 12 25-2018peterchanws
 
Kathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XMLKathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XMLdri_ireland
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublinm_ackermann
 
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013Frauke Ziedorn
 
An Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataAn Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataDLFCLIR
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Fabrizio Orlandi
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsdgarijo
 
Contract Cheating in Canada (for University of Calgary) - 17 October 2018
Contract Cheating in Canada (for University of Calgary) - 17 October 2018Contract Cheating in Canada (for University of Calgary) - 17 October 2018
Contract Cheating in Canada (for University of Calgary) - 17 October 2018Thomas Lancaster
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 

What's hot (20)

DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Open data easy, explicit and fast
Open data easy, explicit and fastOpen data easy, explicit and fast
Open data easy, explicit and fast
 
Aileen O'Carroll - DRI Training UCC: Introduction to Metadata
Aileen O'Carroll - DRI Training UCC: Introduction to MetadataAileen O'Carroll - DRI Training UCC: Introduction to Metadata
Aileen O'Carroll - DRI Training UCC: Introduction to Metadata
 
DBpedia InsideOut
DBpedia InsideOutDBpedia InsideOut
DBpedia InsideOut
 
20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture
 
Wikidata at Wikipeda Day 15 (2016) NYC
Wikidata at Wikipeda Day 15 (2016) NYCWikidata at Wikipeda Day 15 (2016) NYC
Wikidata at Wikipeda Day 15 (2016) NYC
 
Video game controlled vocabulary in wikidata
Video game controlled vocabulary in wikidataVideo game controlled vocabulary in wikidata
Video game controlled vocabulary in wikidata
 
Digital Narratives for Transylvania DH
Digital Narratives for Transylvania DHDigital Narratives for Transylvania DH
Digital Narratives for Transylvania DH
 
Contropedia: Critical learning through Wikipedia's edit history
Contropedia: Critical learning through Wikipedia's edit historyContropedia: Critical learning through Wikipedia's edit history
Contropedia: Critical learning through Wikipedia's edit history
 
Digital game preservation conference 12 25-2018
Digital game preservation conference   12 25-2018Digital game preservation conference   12 25-2018
Digital game preservation conference 12 25-2018
 
Kathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XMLKathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XML
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
 
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
 
An Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataAn Introduction to Linked Data and Microdata
An Introduction to Linked Data and Microdata
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
Contract Cheating in Canada (for University of Calgary) - 17 October 2018
Contract Cheating in Canada (for University of Calgary) - 17 October 2018Contract Cheating in Canada (for University of Calgary) - 17 October 2018
Contract Cheating in Canada (for University of Calgary) - 17 October 2018
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 

Viewers also liked

2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects
2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects
2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projectsAndy Mabbett
 
2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett
2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett
2015-04-03 WikiArabia GLAM-Wiki presentation by Andy MabbettAndy Mabbett
 
The Near Future of CSS
The Near Future of CSSThe Near Future of CSS
The Near Future of CSSRachel Andrew
 
Classroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and AdolescentsClassroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and AdolescentsShelly Sanchez Terrell
 
The Presentation Come-Back Kid
The Presentation Come-Back KidThe Presentation Come-Back Kid
The Presentation Come-Back KidEthos3
 

Viewers also liked (6)

GLAMHerbert
GLAMHerbertGLAMHerbert
GLAMHerbert
 
2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects
2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects
2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects
 
2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett
2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett
2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett
 
The Near Future of CSS
The Near Future of CSSThe Near Future of CSS
The Near Future of CSS
 
Classroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and AdolescentsClassroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and Adolescents
 
The Presentation Come-Back Kid
The Presentation Come-Back KidThe Presentation Come-Back Kid
The Presentation Come-Back Kid
 

Similar to Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access Signalling Project

Bot programming in Wikimedia Commons with Pywikibot
Bot programming in Wikimedia Commons with PywikibotBot programming in Wikimedia Commons with Pywikibot
Bot programming in Wikimedia Commons with PywikibotMiguel-Angel Monjas
 
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사Chris
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Olivier Grisel
 
Exploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLExploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLShalin Hai-Jew
 
N2Y3: Mashups Require Commons
N2Y3: Mashups Require CommonsN2Y3: Mashups Require Commons
N2Y3: Mashups Require CommonsMike Linksvayer
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoopcneudecker
 
Wmf wikimedia conference japan feb 3 en pdf
Wmf wikimedia conference japan feb 3 en pdfWmf wikimedia conference japan feb 3 en pdf
Wmf wikimedia conference japan feb 3 en pdfWikimedia Foundation
 
From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
From Hyperlinks to Semantic Web Properties using Open Knowledge ExtractionFrom Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
From Hyperlinks to Semantic Web Properties using Open Knowledge ExtractionSTLab
 
Wikidata: A New Way to Disseminate Structured Data
Wikidata: A New Way to Disseminate Structured DataWikidata: A New Way to Disseminate Structured Data
Wikidata: A New Way to Disseminate Structured DataLuca Martinelli
 
Open Culture - How Wiki loves art and data - Romaine
Open Culture - How Wiki loves art and data - RomaineOpen Culture - How Wiki loves art and data - Romaine
Open Culture - How Wiki loves art and data - RomaineOpen Knowledge Belgium
 
Open Access and Wikipedia : Taking accessible research to the global public"
Open Access and  Wikipedia : Taking accessible research to the global public"Open Access and  Wikipedia : Taking accessible research to the global public"
Open Access and Wikipedia : Taking accessible research to the global public"Nick Sheppard
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
 
Beyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free KnowledgeBeyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free KnowledgeErikMoeller
 
The Elephant in the Library
The Elephant in the LibraryThe Elephant in the Library
The Elephant in the LibraryDataWorks Summit
 
Wikidata Introductory Workshop
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory WorkshopBeat Estermann
 
Mphil Computational Biology Seminar Series Presentation (20201111)
Mphil Computational Biology Seminar Series Presentation (20201111)Mphil Computational Biology Seminar Series Presentation (20201111)
Mphil Computational Biology Seminar Series Presentation (20201111)ShweataNHegde
 

Similar to Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access Signalling Project (20)

Bot programming in Wikimedia Commons with Pywikibot
Bot programming in Wikimedia Commons with PywikibotBot programming in Wikimedia Commons with Pywikibot
Bot programming in Wikimedia Commons with Pywikibot
 
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
 
Tel Vortrag
Tel VortragTel Vortrag
Tel Vortrag
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
 
Exploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLExploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXL
 
ConfrencePres
ConfrencePresConfrencePres
ConfrencePres
 
N2Y3: Mashups Require Commons
N2Y3: Mashups Require CommonsN2Y3: Mashups Require Commons
N2Y3: Mashups Require Commons
 
Intranet 2.0: Using Wikis
Intranet 2.0: Using WikisIntranet 2.0: Using Wikis
Intranet 2.0: Using Wikis
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoop
 
Wmf wikimedia conference japan feb 3 en pdf
Wmf wikimedia conference japan feb 3 en pdfWmf wikimedia conference japan feb 3 en pdf
Wmf wikimedia conference japan feb 3 en pdf
 
From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
From Hyperlinks to Semantic Web Properties using Open Knowledge ExtractionFrom Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
 
Wikidata: A New Way to Disseminate Structured Data
Wikidata: A New Way to Disseminate Structured DataWikidata: A New Way to Disseminate Structured Data
Wikidata: A New Way to Disseminate Structured Data
 
Open Culture - How Wiki loves art and data - Romaine
Open Culture - How Wiki loves art and data - RomaineOpen Culture - How Wiki loves art and data - Romaine
Open Culture - How Wiki loves art and data - Romaine
 
Open Access and Wikipedia : Taking accessible research to the global public"
Open Access and  Wikipedia : Taking accessible research to the global public"Open Access and  Wikipedia : Taking accessible research to the global public"
Open Access and Wikipedia : Taking accessible research to the global public"
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocuration
 
Beyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free KnowledgeBeyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free Knowledge
 
The Elephant in the Library
The Elephant in the LibraryThe Elephant in the Library
The Elephant in the Library
 
Wikidata Introductory Workshop
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory Workshop
 
Mphil Computational Biology Seminar Series Presentation (20201111)
Mphil Computational Biology Seminar Series Presentation (20201111)Mphil Computational Biology Seminar Series Presentation (20201111)
Mphil Computational Biology Seminar Series Presentation (20201111)
 
Big dataorig
Big dataorigBig dataorig
Big dataorig
 

Recently uploaded

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 

Recently uploaded (20)

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 

Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access Signalling Project

  • 1. Data-Hacking with Wikimedia Projects: Learn by Example, Including Wikipedia, Wikidata, and beyond! @notconfusing (Max Klein) @wrought (Matt Senate)
  • 2. Who is in the room? ● Data hackers? ● Programmers? ● Artists/designers? ● Open Access folks? ● Academics? ● Wikipedians? ● Wikimedians? ● Wiki-people?
  • 4. Wiki Context ● Wikipedia: far-largest in size and user base. ● Projects often organized by language. ● Each language-project has an independent user community.
  • 5. Wiki Context ● See Wikimedia projects as a form of: “curated database” ● Web's Least Common Denominator for data. ● Wiki Paradox: – Low-barrier-to-entry – High-barrier-to-entry Buneman, et al. Curated Databases. https://peerlibrary.org/p/rxQ6WBd89XviMF4Tk
  • 6. How do Wikimedia projects work? ● Community ● Opt-in ● Reputation ● Cultural Protocol ● Bureaucracy ● Adhocracy ● Coordination – WikiProjects
  • 7. History of Wiki Data- Hacking ● Rambot – First “bot” – 2002 ● US Census Data ● Created 200,000 articles ● More than doubled Wikipedia's size at the time ● No permissions
  • 8. “Ignore All Rules” --> Calcification ● To get things done: – Need to issue a “Request for Comment” or RFC ● Everybody, regardless of expertise has trouble with this. – Sometimes, not everyone is acting in good faith, but try to assume it, it will help.
  • 9. Häskell und Grepl ● Hänsel und Gretel ● Wikidata https://commons.wikimedia.org/wiki/File:Hansel_and_Gretel.jpg
  • 10. ● Rural Hunger Problems ● The Wikimedia datascape having cross-language data sharing problems.
  • 11. ● Häskell and Grepl are sent to the Forest alone. ● Data is sent to live in Templates, living alone.
  • 12. ● Häskell and Grepl first invent a successful breadcrumb system ● Wikimedia commons allows images to be shared across Wikis.
  • 13. ● Lucky if their breadcrumbs are not eaten / or whatever put. ● Lucky if we knew on which pages the Data was stored. http://commons.wikimedia.org/wiki/File:Flickr_-_Per_Ola_Wiberg_~_mostly_away_-_-_ %22No_bread,_just_a_camera...huh,_quack_%22.jpg
  • 14. ● A magical gingerbread house ● A magical data store – Called “Wikidata” – Interwiki data sharing – Plus with extra sweeties http://commons.wikimedia.org/wiki/Gingerbread_house#mediaviewer/File:Pe pparkakshus.JPG
  • 15. ● And the house includes many different sweeties. ● Semantic Triples ● Qualifiers ● Ranks http://commons.wikimedia.org/wiki/Category:Liquorice_candy#mediaviewer/File:Flickr_- _cyclonebill_-_Slik_%281%29.jpg
  • 16. ● Häskell and Grepl eat the roof hungrily. ● In this story, the User's start adding to Wikidata. – Importing Wikipedia – Foreign Database – Manual adds.
  • 17. ● The evil witch trap ● The evil data witches, normally keep the information as a silo.
  • 18. ● Grepl's cunning defeat of the witch ● Identifers – Think of this as a foreign database key (Brian Jacobs) – Max imported 400,000 biographical identifers. – This started an Identiifer craze on Wikidata ● Tennis Player ● Swiss Parliament ● Danish Companies
  • 19. ● Grepl's cunning defeat of the witch ● All Wikis can transclude arbitrary data (eventually). ● And Citations can be represented as semantic properties. See how sources are handled with “FRBR” format: http://www.wikidata.org/wiki/Help:Sources#Scientific.2C_newspaper_or_magazine_article
  • 20. Signalling Open Access ● WikiProject Open Access – On English Wikipedia ● (Data) Problem: Signalling “Open Access” is hard! ● Solution: Use clear signals directly to the relevant data. – Copyright license – Source content – Metadata
  • 21. What?
  • 22. How? ● Text WikiSource→ ● Media Commons→ ● Metadata Wikidata→ ● Signals Wikipedia→ – Including license! ● Public domain, CC0, CC- BY, CC-BY-SA, etc ● RFC RecitationBot→ ● RFC RecitationBot→ ● RFC RecitationBot→ ● RFC RecitationBot→
  • 23. In the Wikimedia Universe ● Where does “Signalling Open Access” fit in the Wikimedia narrative? ● (Data) Problem: Managing citations & references is hard! ● Possible Solutions: – Templates (Many) – Categories (Many) – Namespace (FR) – WikiScholar (Dead) – VisualEditor (Zotero?)
  • 24. “Signalling OA” Opportunities ● Take pass at citation management – Quality & experience ● Integrate metadata with Wikidata – Where it belongs – Sensitivity and rigor ● Forge deep knowledge resources on Wikipedia – Snapshots of sources, deeper linking ● Automate, but use human judgment – Save time and energy, improve accuracy.
  • 25. Paths for Data Hackers ● Reputation – Create a user account – Contribute in good faith to wikimedia projects ● Cultural protocol – Identify the scope, concerns, and nature of a given project – Learn to navigate ● History and Context – Form a narrative for your hack based on past endeavors – Seek consent and build consensus ● Community – Reach out, on IRC and mailing lists!
  • 26. Of course we're Open Source github.com/wpoa/OA-signalling github.com/wpoa/recitation-bot Thanks! @notconfusing (Max) @wrought (Matt)