SlideShare ist ein Scribd-Unternehmen logo
1 von 76
DATA LIBERATION
Opening Up Data by Hook
or by Crook - Data
Scraping, Linkage and the
Value of a Good Identifier
                               Tony Hirst
                      Department of Communication
                             and Systems
                          The Open University
data NOT
information
              by Vick
[Disruptive
Innovation?]
“First” generation:
 data catalogues
Breathing life
 into data…
=importData(“CSV_URL”)
the spreadsheet becomes

A DATABASE
“Second” generation:
 data management
      systems
There’s lots more
data that’s locked
up in web pages…
Scraping…
“grabbing web content
in a machine readable
   format and then
 processing it for your
    own purposes”
Original      Extract
                          Accessible
HTML web    Information
                          web page
  page         -> data
Recreating the
database that was used
     to populate a
   (templated) page
…quick’n’dirty
Scrapers
                  SQLite
    Scraper      database




Views
   SQLitedatab
       ase
                 Scraper
Sometimes the
 data is spread
across different
     files…
Row based
aggregation
Sometimes the
 data is spread
across different
  websites…
…   Normalisation…
Data
Enrichment
Column
Additions/An
 notations
Sometimes the
  data is split
across different
     files…
Column
based merge
-> Data
cleansing
Clustering…
http://mashe.hawksey.info/2012/11/mining-and-openrefineing-jiscmail-a-look-at-oer-discuss/

/via Martin Hawksey/@mhawksey
“Finessing” a
  common
  identifer
Common identifiers
 (common KEYS) make
it MUCH easier to JOIN
   datasets by column
Book Title
-> ISBN
I am “psychemedia”
            on
Twitter, delicious, slide
  share, flickr, etc etc
Reconciliation…
Linked
Data™
So who speaks SPARQL?




     Diners - Journal Canteen
     by avlxyz
You DON’T have to….
Just think about how one piece of
 data might be related to another
   through a common means of
        addressing them…
http://ouseful.info

 @psychemedia

Weitere ähnliche Inhalte

Was ist angesagt?

Soton2013 opendata
Soton2013 opendataSoton2013 opendata
Soton2013 opendataTony Hirst
 
Promises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library CatalogsPromises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library CatalogsEmily Nimsakont
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentationekansa
 
Relevance of clasification and indexing
Relevance of clasification and indexingRelevance of clasification and indexing
Relevance of clasification and indexingVaralakshmiRSR
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamEnno Meijers
 
What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?Emily Nimsakont
 
Data(base) taxonomy
Data(base) taxonomyData(base) taxonomy
Data(base) taxonomyDejan Radic
 
LODLAM Landscape NOTES
LODLAM Landscape NOTESLODLAM Landscape NOTES
LODLAM Landscape NOTESShana McDanold
 
Lodlam.slideshare
Lodlam.slideshareLodlam.slideshare
Lodlam.slideshareHafabe
 
ECS2019 - Managing Content Types in the Modern World
ECS2019 - Managing Content Types in the Modern WorldECS2019 - Managing Content Types in the Modern World
ECS2019 - Managing Content Types in the Modern WorldMarc D Anderson
 
Towards collaboration at scale: Libraries, the social and the technical
Towards collaboration at scale:  Libraries, the social and the technicalTowards collaboration at scale:  Libraries, the social and the technical
Towards collaboration at scale: Libraries, the social and the technicallisld
 
Linked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An IntroductionLinked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An IntroductionEmily Nimsakont
 
The network reconfigures the catalog
The network reconfigures the catalogThe network reconfigures the catalog
The network reconfigures the cataloglisld
 
Practical Metadata Where Do I Start For a Digital Project
Practical Metadata Where Do I Start For a Digital ProjectPractical Metadata Where Do I Start For a Digital Project
Practical Metadata Where Do I Start For a Digital ProjectJill Strass
 

Was ist angesagt? (18)

Soton2013 opendata
Soton2013 opendataSoton2013 opendata
Soton2013 opendata
 
Promises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library CatalogsPromises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library Catalogs
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
I say NoSQL you say what
I say NoSQL you say whatI say NoSQL you say what
I say NoSQL you say what
 
Relevance of clasification and indexing
Relevance of clasification and indexingRelevance of clasification and indexing
Relevance of clasification and indexing
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?
 
Data(base) taxonomy
Data(base) taxonomyData(base) taxonomy
Data(base) taxonomy
 
Databases and types of databases
Databases and types of databasesDatabases and types of databases
Databases and types of databases
 
LODLAM Landscape NOTES
LODLAM Landscape NOTESLODLAM Landscape NOTES
LODLAM Landscape NOTES
 
Lodlam.slideshare
Lodlam.slideshareLodlam.slideshare
Lodlam.slideshare
 
LODLAM Landscape
LODLAM LandscapeLODLAM Landscape
LODLAM Landscape
 
ECS2019 - Managing Content Types in the Modern World
ECS2019 - Managing Content Types in the Modern WorldECS2019 - Managing Content Types in the Modern World
ECS2019 - Managing Content Types in the Modern World
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Towards collaboration at scale: Libraries, the social and the technical
Towards collaboration at scale:  Libraries, the social and the technicalTowards collaboration at scale:  Libraries, the social and the technical
Towards collaboration at scale: Libraries, the social and the technical
 
Linked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An IntroductionLinked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An Introduction
 
The network reconfigures the catalog
The network reconfigures the catalogThe network reconfigures the catalog
The network reconfigures the catalog
 
Practical Metadata Where Do I Start For a Digital Project
Practical Metadata Where Do I Start For a Digital ProjectPractical Metadata Where Do I Start For a Digital Project
Practical Metadata Where Do I Start For a Digital Project
 

Ähnlich wie Data Liberation - Tony Hirst

What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...
What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...
What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...Emily Nimsakont
 
What is the Semantic Web
What is the Semantic WebWhat is the Semantic Web
What is the Semantic WebJuan Sequeda
 
Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?Emily Nimsakont
 
Linked Data: so what?
Linked Data: so what?Linked Data: so what?
Linked Data: so what?MIUR
 
Linked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museumsljsmart
 
Library discovery: past, present and some futures
Library discovery: past, present and some futuresLibrary discovery: past, present and some futures
Library discovery: past, present and some futureslisld
 
Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2Jenel Farrell
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesAlessandro Adamou
 
Metadata in the age of data curation and linked data
Metadata in the age of data curation and linked dataMetadata in the age of data curation and linked data
Metadata in the age of data curation and linked dataRyan Johnson
 
Madrid Building blocks of Linked Data
Madrid Building blocks of Linked DataMadrid Building blocks of Linked Data
Madrid Building blocks of Linked DataVictor de Boer
 
LIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project postersLIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project postersPrattSILS
 
Libraries in a data-centered environment
Libraries in a data-centered environmentLibraries in a data-centered environment
Libraries in a data-centered environmentJakob .
 
What flavor of linked data is best for your collection?
What flavor of linked data is best for your collection? What flavor of linked data is best for your collection?
What flavor of linked data is best for your collection? Debra Shapiro
 
Semantic Mapping and LOD prez
Semantic Mapping and LOD prezSemantic Mapping and LOD prez
Semantic Mapping and LOD prezCarol Chiodo
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic WebJohn Breslin
 

Ähnlich wie Data Liberation - Tony Hirst (20)

What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...
What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...
What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...
 
What is the Semantic Web
What is the Semantic WebWhat is the Semantic Web
What is the Semantic Web
 
Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?
 
Linked Data: so what?
Linked Data: so what?Linked Data: so what?
Linked Data: so what?
 
Linked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museums
 
Library discovery: past, present and some futures
Library discovery: past, present and some futuresLibrary discovery: past, present and some futures
Library discovery: past, present and some futures
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
Metadata in the age of data curation and linked data
Metadata in the age of data curation and linked dataMetadata in the age of data curation and linked data
Metadata in the age of data curation and linked data
 
Madrid Building blocks of Linked Data
Madrid Building blocks of Linked DataMadrid Building blocks of Linked Data
Madrid Building blocks of Linked Data
 
Engineering a Semantic Web (Spring 2018)
Engineering a Semantic Web (Spring 2018)Engineering a Semantic Web (Spring 2018)
Engineering a Semantic Web (Spring 2018)
 
LIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project postersLIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project posters
 
Libraries in a data-centered environment
Libraries in a data-centered environmentLibraries in a data-centered environment
Libraries in a data-centered environment
 
Linked library data
Linked library dataLinked library data
Linked library data
 
What flavor of linked data is best for your collection?
What flavor of linked data is best for your collection? What flavor of linked data is best for your collection?
What flavor of linked data is best for your collection?
 
Semantic Mapping and LOD prez
Semantic Mapping and LOD prezSemantic Mapping and LOD prez
Semantic Mapping and LOD prez
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Semantic web Santhosh N Basavarajappa
Semantic web   Santhosh N BasavarajappaSemantic web   Santhosh N Basavarajappa
Semantic web Santhosh N Basavarajappa
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 

Mehr von Incisive_Events

Gaby Lutgens Edl@b experiment
Gaby Lutgens Edl@b experimentGaby Lutgens Edl@b experiment
Gaby Lutgens Edl@b experimentIncisive_Events
 
Louise Corti Data scientists
Louise Corti Data scientistsLouise Corti Data scientists
Louise Corti Data scientistsIncisive_Events
 
Richard Wallis Linked Data
Richard Wallis Linked DataRichard Wallis Linked Data
Richard Wallis Linked DataIncisive_Events
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersIncisive_Events
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data managementIncisive_Events
 
Anne Osterrieder Tools for sharing your research
Anne Osterrieder Tools for sharing your researchAnne Osterrieder Tools for sharing your research
Anne Osterrieder Tools for sharing your researchIncisive_Events
 
Mahendra Mahey British Library Labs
Mahendra Mahey British Library LabsMahendra Mahey British Library Labs
Mahendra Mahey British Library LabsIncisive_Events
 
Phil Bradley The future of Search
Phil Bradley The future of SearchPhil Bradley The future of Search
Phil Bradley The future of SearchIncisive_Events
 
Arthur Weiss Google vs other search tools
Arthur Weiss Google vs other search toolsArthur Weiss Google vs other search tools
Arthur Weiss Google vs other search toolsIncisive_Events
 
James Bennett CLA Search and Licence System
James Bennett CLA Search and Licence SystemJames Bennett CLA Search and Licence System
James Bennett CLA Search and Licence SystemIncisive_Events
 
Lucy Montgomery Open access for scholarly books
Lucy Montgomery Open access for scholarly booksLucy Montgomery Open access for scholarly books
Lucy Montgomery Open access for scholarly booksIncisive_Events
 
Max Espley Royal Society of Chemistry and Open Access
Max Espley Royal Society of Chemistry and Open AccessMax Espley Royal Society of Chemistry and Open Access
Max Espley Royal Society of Chemistry and Open AccessIncisive_Events
 
Jacob Morgan The Future of Work
Jacob Morgan The Future of WorkJacob Morgan The Future of Work
Jacob Morgan The Future of WorkIncisive_Events
 
Mark Stevenson Surviving in a fast changing world
Mark Stevenson Surviving in a fast changing worldMark Stevenson Surviving in a fast changing world
Mark Stevenson Surviving in a fast changing worldIncisive_Events
 
Alex Follett Integrating your library into wider institutional environment
Alex Follett Integrating your library into wider institutional environmentAlex Follett Integrating your library into wider institutional environment
Alex Follett Integrating your library into wider institutional environmentIncisive_Events
 
Sarah Fahy Reshaping Your Team
Sarah Fahy Reshaping Your TeamSarah Fahy Reshaping Your Team
Sarah Fahy Reshaping Your TeamIncisive_Events
 
James Andrews User Engagement
James Andrews User EngagementJames Andrews User Engagement
James Andrews User EngagementIncisive_Events
 

Mehr von Incisive_Events (20)

Gaby Lutgens Edl@b experiment
Gaby Lutgens Edl@b experimentGaby Lutgens Edl@b experiment
Gaby Lutgens Edl@b experiment
 
Hugh Davis MOOCs
Hugh Davis MOOCsHugh Davis MOOCs
Hugh Davis MOOCs
 
Louise Corti Data scientists
Louise Corti Data scientistsLouise Corti Data scientists
Louise Corti Data scientists
 
Richard Wallis Linked Data
Richard Wallis Linked DataRichard Wallis Linked Data
Richard Wallis Linked Data
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producers
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data management
 
Jan Reichelt Mendeley
Jan Reichelt MendeleyJan Reichelt Mendeley
Jan Reichelt Mendeley
 
Rachel Green Jove
Rachel Green JoveRachel Green Jove
Rachel Green Jove
 
Anne Osterrieder Tools for sharing your research
Anne Osterrieder Tools for sharing your researchAnne Osterrieder Tools for sharing your research
Anne Osterrieder Tools for sharing your research
 
Mahendra Mahey British Library Labs
Mahendra Mahey British Library LabsMahendra Mahey British Library Labs
Mahendra Mahey British Library Labs
 
Phil Bradley The future of Search
Phil Bradley The future of SearchPhil Bradley The future of Search
Phil Bradley The future of Search
 
Arthur Weiss Google vs other search tools
Arthur Weiss Google vs other search toolsArthur Weiss Google vs other search tools
Arthur Weiss Google vs other search tools
 
James Bennett CLA Search and Licence System
James Bennett CLA Search and Licence SystemJames Bennett CLA Search and Licence System
James Bennett CLA Search and Licence System
 
Lucy Montgomery Open access for scholarly books
Lucy Montgomery Open access for scholarly booksLucy Montgomery Open access for scholarly books
Lucy Montgomery Open access for scholarly books
 
Max Espley Royal Society of Chemistry and Open Access
Max Espley Royal Society of Chemistry and Open AccessMax Espley Royal Society of Chemistry and Open Access
Max Espley Royal Society of Chemistry and Open Access
 
Jacob Morgan The Future of Work
Jacob Morgan The Future of WorkJacob Morgan The Future of Work
Jacob Morgan The Future of Work
 
Mark Stevenson Surviving in a fast changing world
Mark Stevenson Surviving in a fast changing worldMark Stevenson Surviving in a fast changing world
Mark Stevenson Surviving in a fast changing world
 
Alex Follett Integrating your library into wider institutional environment
Alex Follett Integrating your library into wider institutional environmentAlex Follett Integrating your library into wider institutional environment
Alex Follett Integrating your library into wider institutional environment
 
Sarah Fahy Reshaping Your Team
Sarah Fahy Reshaping Your TeamSarah Fahy Reshaping Your Team
Sarah Fahy Reshaping Your Team
 
James Andrews User Engagement
James Andrews User EngagementJames Andrews User Engagement
James Andrews User Engagement
 

Kürzlich hochgeladen

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Data Liberation - Tony Hirst

Hinweis der Redaktion

  1. Tony HirstTwitter:@psychemediaBlog: http://blog.ouseful.infoPresentation prepared for: Online Info 12/11/2012DATA LIBERATION: OPENING UP DATA BY HOOK OR BY CROOK - DATA SCRAPING, LINKAGE AND THE VALUE OF A GOOD IDENTIFIERThe 1/9/90 rule is often used to characterise the way in which a small number of creators generate content that a larger number (but still small percentage in the greater scheme of things) comment on or amplify, whilst the majority just passively consume. In this presentation, I will explore the extent to which a similar view applies to the world of "data liberation". After reviewing the idea of data scraping, and some of the techniques surrounding it, I will describe how online tools such as Scraperwiki provide a platform for concentrating data scraping activity and expertise, as well as supporting the publication of data /as data/ in a variety of formats, in addition to 'end user' views in the form of graphical charts and interactive visualisations.One of the major motivations for data scraping is the aggregation of data from a variety of data sources into a larger, integrated whole. For example, the aggregation of research council funding data from separate research councils allows us to view a large proportion of the publicly funded research grants received by a single institution; or the collection of local council spending data across all UK councils allows us to see how councils spend money with each other across a range of transaction areas. But how do we actually create such aggregations when the data is sourced from different areas? In order to do this, we need to know when different datasets are actually talking about the same thing, which is where common identifiers come in. For it is surely the case that when we have common identifiers, we can have linkage, and as a result start to realise some of the benefits of Linked Data (as well as developing a wider appreciation of what those benefits might actually be...) (As an aside, I'll describe how we might go about deriving such identifiers when they are missing from a data set that might otherwise, or more conveniently, be expected to publish them.)Throughout the presentation, I will draw on practical examples of how aggregated "liberated" data has been used as the basis of wider interest, and even status quo disrupting, services, as well as reflecting on what other sources of data we might see the data liberators turning their attention to next...Key learning points:1 - What is "data scraping", how can I do it and is my website at risk of it?2 - Why the secret to understanding "Linked Data" is the very idea of it, not just (or not even) the technology.3 - How has data scraping been used to "open up" data in actual practice?
  2. The focus on this presentation is not the release of “information”, but the release of data in raw form so that it can be interpreted and presented in informative ways by other parties.
  3. The London Datastore is an early example of a council-centric open data website. Early signs suggest it is natural to locate data websites at addresses of the form data.COUNCILNAME.gov.uk or www.COUNCILNAME.gov.uk/data
  4. Another example that demonstrates how CSV can be used to help data flow is demonstrated by Google Spreadsheets. The =importData formula allows a user to specify a source data URL, and pull the CSV data found at that location in to the spreadsheet. Unlike Many Eyes Wikified, if the source data at the URL is updated, the updated will (eventually) be pulled into the spreadsheet automatically.
  5. One of the really good reasons for getting data into a data processing environment such as a spreadsheet is that you can start to work it. In the case of Google Spreadsheets, the spreadsheet environment can also be used as a database environment. That is, we can treat one or more data containing sheets in a spreadsheet as a database, and generate new views over the data, as well as running queries over that data.
  6. Another way of using a Google Spreadsheet as a database is via the Google Spreadsheets API. The GoogleVisualisation API (?) provides a way of passing queries written using the Google ???viz query language from an arbitrary web page or web application, and receiving the resulting data in a standard JSON based format, which also happens to play nicely with the Google Visualisation API???The Guardian Datastore explorer is a crude demonstration for 2009(??) demonstrating how data from the Guardian datastore, data that is stored across a range of Google spreadsheets, can be explored , queried and visualised via these APIs. Users can select a dataset from a drop down menu, fed from a delicious account to which various datastore spreadsheets have been bookmarked using a particular set of tags, or by pasting in the URL of an arbitrary (public) Google spreadsheet. The first row/headings of the data can then be previewed (a simple spreadsheet is assumed, in which column headings appear In the first row of the spreadsheet).
  7. A series of list boxes are then populated with the column labels and there names, and provide a certain amount of help for the creation of a query over the spreadsheet data. A range of output formats can also be selected, from simple HTML data tables, to a range of charts. URLs are also generated for HTML and CSV representations of the data returned from the query.
  8. One of the nice things about the data table widget (a standard GoogleVisualisation API component in this case, though similar examples exist for YUI, the Yahoo User Interface Libraries, or frameworks such as JQuery), is that is supports things like row sorting by column, (for free – no programming required!), allowing even further manipulation of the data, albeit at a simplistic level.(It’s probably worth pointing out here that it may be worth providing a preview of the column headings and first few rows (or a sample of random rows) of data when datasets are published, just so that users can see what sort of data is on offer without having to download the whole data set?)
  9. If you’re in the business of selling information as data, you are under threat where that information is published in an openly licensed way.
  10. Linked Data – the TM is something of a joke and refers to the particular style of publishing data according to set of principles first outlined by the inventor of the World Wide Web, Sir Tim Berners Lee – is one of the data formats that the Government’s data task force favour for the publication of data.
  11. There is a problem though – at the moment, there are barriers to entry to Linked Data world from both the query side (not many people speak SPARQL, or know how to construct a SPARQL query to an endpoint) and the results side (data is returned as RDF).
  12. So – do you speak SPARQL?