SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Data linking with kblog Phillip Lord Newcastle University
The Long Tail http://en.wikipedia.org/wiki/File:La_Palmyre_041-crop.jpg
Example Data ID_REF	VALUE 1007_s_at	2.867330709 1053_at	        10.50302152 117_at	        2.702517066 121_at	        3.052316166 1255_g_at	2.278998026 1294_at	        5.360226024 1316_at	        5.496447322 1320_at	        4.475412175 1405_i_at	2.301359647
Example Data ID_REF	VALUE 1007_s_at	2.867330709 1053_at	        10.50302152 117_at	        2.702517066 121_at	        3.052316166 1255_g_at	2.278998026 1294_at	        5.360226024 1316_at	        5.496447322 1320_at	        4.475412175 1405_i_at	2.301359647
Example Data ID_REF	VALUE 1007_s_at	2.867330709 1053_at	        10.50302152 117_at	        2.702517066 121_at	        3.052316166 1255_g_at	2.278998026 1294_at	        5.360226024 1316_at	        5.496447322 1320_at	        4.475412175 1405_i_at	2.301359647
Example Data ID_REF	VALUE 1007_s_at	2.867330709 1053_at	        10.50302152 117_at	        2.702517066 121_at	        3.052316166 1255_g_at	2.278998026 1294_at	        5.360226024 1316_at	        5.496447322 1320_at	        4.475412175 1405_i_at	2.301359647
The paper
The problem? http://en.wikipedia.org/wiki/File:Clock_in_Kings_Cross.jpg
The problem? http://en.wikipedia.org/wiki/File:Clock_in_Kings_Cross.jpg http://en.wikipedia.org/wiki/File:New_British_Coinage_2008.jpg
The problem? http://en.wikipedia.org/wiki/File:Clock_in_Kings_Cross.jpg http://en.wikipedia.org/wiki/File:New_British_Coinage_2008.jpg
Coach Building 250,000 articles per year 240 million Downloads Cost: 1.5 Billion Euro Elsevier 17 million articles  > 20 languages 365 million readers Total Cost: 10 million dollars Wikipedia http://commons.wikimedia.org/wiki/File:Hackney-coach,_about_1680.png
The process
The process
The process
The process
The process
The process
The process
Our Solution
Wordpress Has one critical feature It has an edit dialog Word Latex Open Office Asciidoc Textile Markdown By email
Features Reviewing Metadata – coins, metatags * Crawlability * Multiple authors Archiving (UKWA) Searchability
Features Bi-directional links Permalinks (purls to follow) DOIs (datacite!) Versioning Extensibility Nice maths * (and mathjax) Syntax Highlighting Bibliographic Support (with DOIs, and incompletely CiTO) * ePUB and PDF (!?) export
Data Linking Bi-directional links require support at both ends Adding this generically Adding this for specific data sets (microarray) Data linking into papers
Old technology Most of this technology pre-exists So why don’t people use it! There is a good reason... TECHNOLOGY IS BORING
Content http://ontogenesis.knowledgeblog.org Now has 15k page views (not hits!) 25 articles, multiple authors Seeking pubmed inclusion Advertising: two blog articles about ontogenesis happened with 1 day of first article.  http://taverna.knowledgeblog.org 10 articles About scientific workflows Supplement to myExperiment
Well... These stats are not going to scare either Elsevier or Wikipedia But, they are not bad either And it allows primary scientific content of many different forms We believe it can form part of the scientific landscape
Acknowledgements Phillip Lord (me!) Dan Swan Simon Cockell Robert Stevens (Manchester) Georgina Moulton (Manchester) Thanks also to JISC, David Shotton, BL, Datacite, and WordPress.

Weitere ähnliche Inhalte

Ähnlich wie Data linking with kblog

Open (linked) bibliographic data edmund chamberlain (university of cambridge)
Open (linked) bibliographic data   edmund chamberlain (university of cambridge)Open (linked) bibliographic data   edmund chamberlain (university of cambridge)
Open (linked) bibliographic data edmund chamberlain (university of cambridge)
RDTF-Discovery
 
Beyond MARC: MARC, linked data, and Bibframe
Beyond MARC: MARC, linked data, and BibframeBeyond MARC: MARC, linked data, and Bibframe
Beyond MARC: MARC, linked data, and Bibframe
Thomas Meehan
 
URI Disambiguation in the Context of Linked Data
URI Disambiguation in the Context of Linked DataURI Disambiguation in the Context of Linked Data
URI Disambiguation in the Context of Linked Data
butest
 
Alternative Search Mechanism for Web 2.0 Resources
Alternative Search Mechanism for Web 2.0 ResourcesAlternative Search Mechanism for Web 2.0 Resources
Alternative Search Mechanism for Web 2.0 Resources
uji_geotec
 
Collaborating in the Clouds
Collaborating in the CloudsCollaborating in the Clouds
Collaborating in the Clouds
Tom Ipri
 

Ähnlich wie Data linking with kblog (17)

Emerging technology trends in libraries for 2017
Emerging technology trends in libraries for 2017Emerging technology trends in libraries for 2017
Emerging technology trends in libraries for 2017
 
Working with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityWorking with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open University
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARC
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
 
Tech Trends for Libraries in 2019 and Beyond
Tech Trends for Libraries in 2019 and BeyondTech Trends for Libraries in 2019 and Beyond
Tech Trends for Libraries in 2019 and Beyond
 
Open (linked) bibliographic data
Open (linked) bibliographic dataOpen (linked) bibliographic data
Open (linked) bibliographic data
 
Open (linked) bibliographic data edmund chamberlain (university of cambridge)
Open (linked) bibliographic data   edmund chamberlain (university of cambridge)Open (linked) bibliographic data   edmund chamberlain (university of cambridge)
Open (linked) bibliographic data edmund chamberlain (university of cambridge)
 
Beyond MARC: MARC, linked data, and Bibframe
Beyond MARC: MARC, linked data, and BibframeBeyond MARC: MARC, linked data, and Bibframe
Beyond MARC: MARC, linked data, and Bibframe
 
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
 
URI Disambiguation in the Context of Linked Data
URI Disambiguation in the Context of Linked DataURI Disambiguation in the Context of Linked Data
URI Disambiguation in the Context of Linked Data
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
Introduction to CrossRef for Publishers
Introduction to CrossRef for PublishersIntroduction to CrossRef for Publishers
Introduction to CrossRef for Publishers
 
Umedia2011 - uP: A lightweight protocol for services in smart spaces
Umedia2011 -  uP: A lightweight protocol for services in smart spacesUmedia2011 -  uP: A lightweight protocol for services in smart spaces
Umedia2011 - uP: A lightweight protocol for services in smart spaces
 
basic-engineering-circuit-analysis-10th-Irwin.pdf
basic-engineering-circuit-analysis-10th-Irwin.pdfbasic-engineering-circuit-analysis-10th-Irwin.pdf
basic-engineering-circuit-analysis-10th-Irwin.pdf
 
Alternative Search Mechanism for Web 2.0 Resources
Alternative Search Mechanism for Web 2.0 ResourcesAlternative Search Mechanism for Web 2.0 Resources
Alternative Search Mechanism for Web 2.0 Resources
 
Collaborating in the Clouds
Collaborating in the CloudsCollaborating in the Clouds
Collaborating in the Clouds
 
SADI CSHALS 2013
SADI CSHALS 2013SADI CSHALS 2013
SADI CSHALS 2013
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Data linking with kblog

Hinweis der Redaktion

  1. So, today I am going to talk about data linking with knowledge blog. Normally, talks start at the beginning. I thought to buck this trend and instead...
  2. Start at the end....The long tail was mentioned yesterday. Much research data comes from individual research labsFrom individual researchers, each producing relatively small amounts of data, but collectivelyProducing a lot. So, long tail or big science?My field, bioinformatics, does both.
  3. But the data from the long tail and big science is different. While big science generally produces Sequence data, which is generally all of the same type. The long tail doesn’t. For example, We start with microarray expression data. Then we have MIAME compliant metadata, An RNA degredation plot and finally a paper, in this case a random one that I found on PLoSYesterday. Of these, we have data standards for many parts – the second part, often called “metadata” even Though it isn’t, whichusesMIAME which is one of the older information content standards in Bioinformatics. To me, all of this is data. Without the later three, the “raw data” is just junk.
  4. The paper is the richest form in terms of expressivity – is carries the most complex ideas, usesThe largest vocabulary. Also the least open to reuse, although in general it gives meaning to all the rest. And is the form of scientific data storage Which has changed the least
  5. So, what is the problem. Well first the process of publishing is very time-consuming. Secondly, it’s very expensive. And finally, it’s a process where, to misquote Douglas AdamsWhich is so amazingly primitive that we still think PDFs are a pretty neat idea. But in general, this form of data capture only happens for the most cherry picked data. The positive data, the significant data, the data where the experiment worked. What aboutThe negative data, the insignificant, what about the standard operating procedure, what about the tutorialInformation and so on. This is not a small issue – the massive publication bias in biology hampersOur understanding of the way that organisms function. In medicine, people die because not through lack of knowledge, but because we cannot collate information that exists.
  6. So, why is this the case. Well, scientific publishing is basically still at the stage of coach building.Consider these stats: the second biggest STM publisher in the world looks like this – and costs1.5 billion euros per annum. This is Elsevier. The biggest looks like this. It only costs 10 million dollars per annum. This is wikipedia.Is this comparison fair? Are the two equivalent? No, probably not, but they are not two orders Of magnitude different either.
  7. Consider for example this process from one of the major publishers that I have Published with. I wrote my article in latex. I converted it to PDF. The website converted it to anotherPDF (which I had to check). The publishers then (and this is true) converted it to a word doc. From there, they turn it into XML, which was finally converted to HTML and, yes, you guessedIt, another PDF. Now, not only is this a waste of time, but it’s inaccurate. Errors happen. And trying to get Structured or data linked publications through this process. You might as well give up.
  8. My solution.Wordpress. Actually, more importantly, commodity software. And by commodity, I mean commodity, and not research. There are some excellent tools from academia – widely used. Open Journal Systems, for example, powers6000 journals. Wordpress is behind 10% of ALL websites.
  9. Why wordpress. Well, it has an edit dialog. But it’s not very good. But you can blog from word – I don’t think that is very good either. But, it is the way that itIs, it’s what people use. So wordpress fits in with peoples workflows. It supports everything. Nothing would ever convince me to add this level of support to a tool.
  10. What other features are suitable for academic publishing. Well, here, we borrowed, stole and occasionally wrote our own. Reviewings – courtesy of EditFlow. Metadata, and crawlability features we added. Multiple authors we borrowed. These allow archiving – this comes from the UK web archive. Also searchability (google scholar)
  11. Bi Directional links. As well as permalinks, it also supports legacy identifiers in the shape of DOIs --- thanks to datacite. And it’s extensible. So I added nice look maths (scalable, thanks to mathjax), syntax highlighting. Bibliographic support Exists . We can do typed linking, with CiTO (thanks to David Shotton), although clunkily at the moment. This will beImproved – also want to add client renderable – the user should choose the citation format. And finally, epub and even PDF export.
  12. We also want to extend bi-directional linking – blogs do this out of the box, but support required at both ends.And finally we want to be able to embed the data directly into the paper.
  13. So, why are people not doing this already. I’ve now spent a fair bit of time learning PhP, javascript. And whilePoking around in the innards of wordpress I have discovered something that I now reveal to you
  14. Short articles, single author, example based articles.