Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

BL Labs Presentation at Liverpool John Moores University

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 75 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (16)

Anzeige

Ähnlich wie BL Labs Presentation at Liverpool John Moores University (20)

Weitere von labsbl (13)

Anzeige

Aktuellste (20)

BL Labs Presentation at Liverpool John Moores University

  1. 1. 1 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR British Library Labs What is British Library Labs and what have we learned over the last four years? Mahendra Mahey 1315 – 1400, 22 March 2017 Learning the Lessons of working with the British Library’s Digital Content and Data for your research (History UK with Liverpool John Moores University) https://goo.gl/Mj9DWR
  2. 2. 2 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR The British Library Inside the British Library Space for 1200 readers, around 400,000 visitors per year Uses low oxygen and robots Reading room and delivery to London Document Supply and Storage at Boston Spa Stockton-on-Tees Author right to payment each time their books are borrowed from public libraries. St Pancras, London, UK Many books are stored 4 stories below the building Legal Deposit Library – Reference only
  3. 3. 3 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Living Knowledge Vision (2015 – 2023) Custodianship Research Business Culture Learning International To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023. Document:http://goo.gl/h41wW7 Speech:https://goo.gl/Py9uHK Roly Keating (Chief Executive Officer of the British Library) To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023.
  4. 4. 4 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Collections – not just books! > 180*million items > 0.8* m serial titles > 8* m stamps > 14* m books > 3* m sound recordings > 4* m maps > 1.6* m musical scores > 0.3* m manuscripts > 60* m patents King’s Library *Estimates
  5. 5. 5 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR http://www.bl.uk/projects/british-library-labs Funded by the Andrew W. Mellon Foundation
  6. 6. 6 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR http://www.bl.uk/projects/british-library-labs Funded by the Andrew W. Mellon Foundation
  7. 7. 7 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Wider…not just Researchers Researchers https://goo.gl/WutNyi Artists http://goo.gl/nNKhQ2 Librarians Curators https://goo.gl/9NWZUW Software Developers https://goo.gl/7QQ5Tf Archivists https://goo.gl/x7b4tg Educators https://goo.gl/qh01Mi
  8. 8. 8 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Curators / Researchers Access & Reuse Group © Developers/ Technical Staff Project Board Universities & wider The World ResearchersBL Labs British Library Digital Scholarship Digital Content United Kingdom Advisory Board Digital Research Stakeholders involved in Labs
  9. 9. 9 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Digital research methods Visualisations Using Application Programming Interfaces for datasets e.g. Metadata, Images Annotation Location based searching & Geo-tagging Crowdsourcing Human Computation
  10. 10. 10 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR How are we doing this?
  11. 11. 11 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Competition Awards Projects Tell us your ideas of what to do with our digital content Show us what you have already done with our digital content in research, artistic, commercial and learning and teaching categories Talk to us about working on collaborative projects
  12. 12. 12 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Why are doing this? • Working closely with and listening to those who want use our digital collections and data for their work and helping to build services, tools and processes to support them • We can learn how we are and should be supporting them. – Is the access to digital collections we provide sufficient? – Do we have the right tools? – Do we provide the right support? – Where are the gaps between what they want and what we can give? – How do we build the bridges to overcome them? – Many more reasons…
  13. 13. 13 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR digital Data all around us! / Knowledge Quarter London 55 knowledge organisations within 1 mile radius of Kings Cross, http://www.knowledgequarter.london https://goo.gl/pGO7QY digital Data all around us!
  14. 14. 14 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR #bldigital 1-2 %* digitised * estimate Digitisation Partnerships Commercial & Other Organisations Amount increasing rapidly Bias in digitisation http://goo.gl/bR9UJL Sample Generator
  15. 15. 15 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR So little digitised…? • Common misconception that all our physical items are digitised. No! Costs time and resources! • Still a big number though! • Dialogue is either: – you are lucky and we have the digital content relevant to your research – we don’t have, exactly what your looking for but this is what we have, is there anything of interest? • We tend to attract researchers with ‘fuzzier’ research boundaries • Artists find this dialogue easier • Access easier for out of copyright content
  16. 16. 16 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR only in Reading Rooms due to © only on site due to © or ethical etc not online / available – various storage devices, personal data online and open British Library online behind paywall Challenges of Digital access at the Library
  17. 17. 17 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR The Story of the Collection! Collection Curator Who paid for the digitisation? Who did the digitisation? Technology used Born digital? Published Unpublished Where is it? Can it still be accessed? Generates income Reputational Risk Legalities Political Ego Surprises Metadata Old format not supported What media was the digitisation done from? Documentation No Metadata Messy Metadata Still there? Sometimes it’s complicated, better to know as much as possible, if you want to open it up!
  18. 18. 18 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Finding Open Digital Collections at the British Library • Curated? Learn the story behind a collection! Is there a human who knows the ‘story’ about the collection, who wants it used, are there any surprises lurking? • Where is it, is it accessible? • Licensing? Internal Access and Reuse and Licensing Group (Risk assessment group – Strategic, Commercial, Copyright, Curatorial, Technical) • Metadata available? What state is it and does it need cleaning? https://goo.gl/Qjeqo1 https://goo.gl/Kfc4qc Access & Reuse Group ©
  19. 19. 19 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Open Licensed Digital Content? 15% Openly Licensed Working through Breakdown by collection* Manuscripts 59% Books 9% Maps and Views 7% Newspapers 3% Archives and Records 3% Paintings, Prints and Drawings 2% *Based on digitisation projects Largest proportion of funding Public / Private Partnership 15% Openly Licensed 85% Available onsite
  20. 20. 20 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR READING ROOM ON SITE NOT ONLINE OPEN British Library £ Digital access at the Library Labs Residency Model
  21. 21. 21 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Playbills, Books, Newspapers (includes OCR) Digital collections and Datasets At the British Library British National Bibliography http://bnb.data.bl.uk http://sounds.bl.uk http://dml.city.ac.uk/ Music (Recordings & Sheet) & Sounds http://goo.gl/frSMJtBroadcast News (TV and Radio) http://goo.gl/cwThHw http://goo.gl/pBkisZhttp://goo.gl/E8aRyQ Usage dataImages, Manuscripts & Maps http://www.qdl.qa/ Qatar Digital Library http://idp.bl.uk/ International Dunhuang Project Maps http://www.bl.uk/maps/ Hebrew Manuscripts http://goo.gl/4sbCp9 Flickr & Wikimedia Commons https://goo.gl/LZRmaZ
  22. 22. 22 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Cultural Heritage Datasets Datasets about our collections Bibliographic datasets relating to our published and archival holdings Datasets for content mining Content suitable for use in text and data mining research Datasets for image analysis Image collections suitable for large-scale image- analysis-based research Datasets from UK Web Archive Data and API services available for accessing UK Web Archive Digital mapping Geospatial data, cartographic applications, digital aerial photography and scanned historic map materials https://data.bl.uk Discussion list: http://www.jiscmail.ac.uk/CULTURAL-HERITAGE-DATASETS
  23. 23. 23 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Taking a peek at our Open Data
  24. 24. 24 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR 002819694
  25. 25. 25 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR
  26. 26. 26 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR
  27. 27. 27 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Optically Character Recognised (OCR) generated TextScanned Page Image on Flickr Commons https://goo.gl/AC43vs
  28. 28. 28 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR OCR XML Generated by ABBY Fine Reader
  29. 29. 29 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Taking a peek at our onsite only accessible data
  30. 30. 30 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL • Need to be security cleared so that you exist as a BL entity – Hence ‘Researcher in Residence Model’ • Permission required from internal IP department and perhaps commercial company involved in the digitisation • 20 % rule in terms of re-use in research • Learning pathways so that this becomes ‘everyday’
  31. 31. 31 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 1 Results of digitisation exist on Windows file shares! Windows 7, external access possible through Citrix Server
  32. 32. 32 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL (JISC 1) 2 12 Volumes, each with terabytes of data
  33. 33. 33 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 3
  34. 34. 34 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 4
  35. 35. 35 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 5
  36. 36. 36 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 6
  37. 37. 37 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 7
  38. 38. 38 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 8
  39. 39. 39 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 9
  40. 40. 40 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 10
  41. 41. 41 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 11
  42. 42. 42 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 12
  43. 43. 43 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 13 Accessing original master image (not cropped or post processed Or Service Copy (post processed) and results of OCR available as ALTO XML
  44. 44. 44 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 14a Accessing original master image (not cropped or post processed)
  45. 45. 45 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL Accessing original master image (not cropped or post processed) 14b
  46. 46. 46 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 15a Accessing Service Copy (post processed) and results of OCR available as ALTO XML
  47. 47. 47 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL Accessing Service Copy (post processed) 15b
  48. 48. 48 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers onsite at the BL 15c Accessing OCR as ALTO XML
  49. 49. 49 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers through Gale Interface (subscription) 1
  50. 50. 50 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Accessing digitised newspapers through Gale Interface (subscription) 2
  51. 51. 51 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Virtual Infrastructure for OCR text OCR text scraped from digitised newspapers and in cloud Jupyter notebook Write code in browser Results in browser http://jupyter.org
  52. 52. 52 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR What did people actually do?
  53. 53. 53 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Machine Learning / Reading • Analogies to how humans read • Machines acquire ‘knowledge’ and use that knowledge to make sense of new situations • BL doing this on a case by case basis. • Need computational and human effort • Human input as to where to look > computational ‘lasso throwing’ > human sift • Legalities of this process being ‘ironed’ out with publishers • Not well understood area…
  54. 54. 54 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR The smell of soup! Thanks to Memo Akten (@memotv on twitter) for the inspiration! https://goo.gl/toq4Bo Nasreddin, 13th Century Turkish Sufi http://web2.uvcs.uvic.ca/elc/studyzone/330/reading/smell1.htm
  55. 55. 55 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Messy Data! Optical Character Recognition
  56. 56. 56 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Finding things in messy data Mrs Folly • Clean up some manually • Get ‘ground truth’ • Write code to find things reliably in it automatically • Try code on messy content • Tweak if necessary • Digital lasso around content • Manually sift through
  57. 57. 57 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Katrina Navickas (2015) Political Meetings Mapper http://politicalmeetingsmapper.co.uk https://goo.gl/Qq78Oa Labs Symposium 2015 https://goo.gl/BSA3be Interview 2015 The Chartist Newspaper http://goo.gl/vOLSnH Chartist Monster Meeting Chartists Re-enactment London
  58. 58. 58 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Black Abolitionist Performances & their Presence in Britain (2016) – Hannah-Rose Murray Frederick Douglass Ellen Craft Josiah Henson Ida B Wells A Performance by Joe Williams & Martelle Edinborough http://frederickdouglassinbritain.com/
  59. 59. 59 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Use of Overproof – OCR Correction Also just re-OCR?
  60. 60. 60 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR See Bob Nicholson – Looking for Jokes See Jennifer Batt – Looking for Poems
  61. 61. 61 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR What can 65,000 books tell us? Image: Artwork by Alicia Martin Just one open digital collection
  62. 62. 62 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Worked better for female faces than men’s Press http://mechanicalcurator.tumblr.com Posts image every 30 minutes http://www.flickr.com/photos/britishlibrary/ 1,020,418 images need tagging! Creative uses of images Face recognition Mechanical Curator http://goo.gl/qPPgxX Flickr Snipping out images from 65,000 Digitised Books* >600,000,000 views >15,500,000 tags https://goo.gl/FgZ4HM Work @ BL by Ben O’Steen, Labs and Digital Research Team *Matt Prior - http://goo.gl/j29Tnx Since Dec 2013
  63. 63. 63 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Opportunities – increasing traffic to Library services You can purchase a ‘High Res’ Copy View in the Library Item Viewer Download .pdf All illustrations in book Other illustrations in books Published in same year View the item in the Library Catalogue Tags auto generated User generated Tag Grouping for image
  64. 64. 64 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Tagging a million images Iterative Crowdsourcing http://goo.gl/j6fxac Cardiff University’s Lost Visions Project http://www.metadatagames.org/ Metadata Games James Heald Mario Klingemann Chico 45 Use computational methods Human Tagger Top British Library Flickr Commons Taggers http://goo.gl/8SkfM1 Machine Learning Search Engine & Google Image search
  65. 65. 65 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Special Jury’s Prize (2015) James Heald – Wikimedia and Map work https://goo.gl/WYZCB2 http://goo.gl/HNQq5e https://goo.gl/VPgffL https://commons.wikimedia.org/ https://goo.gl/djtm1b Labs Symposium (2015)Geotagging maps 54,000 Maps
  66. 66. 66 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Adam Crymble (2015) Crowdsource Arcade What if crowd sourcing looked like this? http://goo.gl/LBfJ4W http://goo.gl/OH9pOZ https://goo.gl/7z0j8p 30 mins talk Labs Symposium (2015) https://goo.gl/SSRsdd 5 min interview (2015) http://goo.gl/0APpE8 Game Jam
  67. 67. 67 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR SherlockNet: Competition Winner 2016 Karen Wang, Luda Zhao and Brian Do Using Convolutional Neural Networks to Automatically Tag and Caption the British Library Flickr Commons 1 million Image Collection Classify into one of 12 categories >20 million tags added (total now 20 million overall) >100,000 experimental captions Data available soon! bit.ly/sherlocknet Pooled surrounding Optical Character Recognised text on page from similar images Used Microsoft COCO (photographs) & British Museum Prints and Drawings collections as training sets. Tags Captions
  68. 68. 68 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Artistic / Creative Works http://goo.gl/dM8ieA Mario Klingeman (2015) http://www.crossroadsofcuriosity.com David Normal 2014 and 2015 https://www.youtube.com/watch?v=-GRgj7Q5OM0 Rob Walker 2014 http://goo.gl/bNxGZZ Kris Hoffman (2016) https://goo.gl/QilqqT Jiayi Chong 2016 Ling Low 2016 https://www.youtube.com/watch?v=bcOP1E5bRE0 https://www.facebook.com/RealmlandStory/ Paul Rand Pierce 2016 A Hat on the Ground Spells trouble Tragic Looking Women 44 Men who Look 44 (Notice the direction faces)
  69. 69. 69 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Mario Klingemann 2016 https://www.youtube.com/watch?v=xgnxnmqnR7Y Google Arts and Culture Lab – Experiments with Machine Learning https://artsexperiments.withgoogle.com/
  70. 70. 70 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Imaginary Cities – BL Labs Project 16- 17 Michael Takeo Magruder https://goo.gl/4ARwTy An artistic exploration seeking to create provocative fictional cityscapes for the Information Age from the British Library’s digital collection of historic urban maps
  71. 71. 71 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Some Lessons Learned and Challenges so far… • Everything starts from a conversation (external and internal)! • Need to have several conversations with several stakeholders and tap into their tacit knowledge that isn’t always written down (esp. internal). • It’s hard work at the beginning! • Expectations change when researchers actually see the data, systems and experience the ‘culture’ of the organisation. • We tend to work with researchers who can be ‘flexible’ with their research questions and are willing to embrace challenges. • Often misunderstandings because of jargon & different meaning of words. • Embrace dirty data, it may never be perfect!
  72. 72. 72 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Some Lessons Learned and Challenges so far…(2) • Many researchers have the domain knowledge but lack the technical skills to use Digital Research methods. Should they be teamed up with those that have problems that need solving (Computing) or get trained? • Identifying / bridging gaps for researchers to use data, help them ‘navigate’ through the Library to get the data they want (sometimes). • Huge appetite to use digital content & data (e.g. Flickr Commons stats). • Start small and simple, but think big! • Create and embrace serendipity, stimulate the imagination, work fast, give it energy. • Letting go of the emotional and psychological connection to “my” collection • If digitised collections are not used, what is the point of digitising them? • Fail faster (don’t be afraid), small experiments, reject perfectionism, good enough
  73. 73. 73 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR The Magic of Openness! • By opening collections up we are creating the possibility to have them used in ways only restricted by human imagination. • Need to work hard to tell people about our Digital Collections and Data especially if not easy to find, creating serendipity and opportunities for use! • Give plenty of examples to inspire use! • Support and celebrate the use!
  74. 74. 74 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Exercise – Explore or Imagine Our Data! • CSV of Metadata https://data.bl.uk/digbks/dig19cbooks-mdata-csv.csv • 19th Century Books - Book Metadata - 01/09/2013. https://data.bl.uk/digbks/db21.html • Digitised Books - Flickr Tag History - Dec 2013 to March 2016. TSV https://data.bl.uk/digbks/db15.html • Digitised Hebrew Manuscripts - Metadata https://data.bl.uk/hebrewmanuscripts/heb1.html • Digitised Hebrew Manuscripts: Or 2210 - Or 2364 https://data.bl.uk/hebrewmanuscripts/heb8.html • Theatrical playbills from Britain and Ireland (OCR text only) https://data.bl.uk/playbills/pb2.html • Portraits of actors, views of theatres and playbills (covering 1750 - 1821 in a single volume) https://data.bl.uk/singlesheet/por1.html • Volumes of Lysons Collectanea (Amusements), comprising broadsides, cuttings, advertisements on amusements.1660- 1840. https://data.bl.uk/singlesheet/ad1.html Work in pairs! https://data.bl.uk •Report back on Data! •Data Quality •Issues Or an idea you have thought of what to do with the data! http://labs.bl.uk/Ideas+for+Labs Smaller datasets
  75. 75. 75 @BL_Labs @BL_DigiSchol #bldigital https://goo.gl/Mj9DWR Contact us Mahendra Mahey Manager of BL Labs mahendra.mahey@bl.uk labs@bl.uk

Hinweis der Redaktion

  • 25 Seconds (68 Words)
    My name is Mahendra Mahey and I work on a project called British Library Labs. We are based at the British Library in London, in the Digital Scholarship department and we work closely with the Digital Research team there. It’s been running for three years now and is funded by the Andrew W. Mellon Foundation.
  • 140 seconds
    The British Library is the national library of the UK and one of the largest research libraries in the world . The Library moved to a new purpose built building in 1997 <click> the largest of it’s kind that was built in the UK in the 20th century. Many frequently used items are stored 5 stories below the main building at St Pancras in London and many might not know that part of the building is meant to look like a ship on a journey to discovery!<click>. <click to switch off>
    The building can sit 1,200 researchers at any one time across 5 reading rooms.
    <click>Medium and long term requested items are held at Boston Spa in Yorkshire in a low oxygen warehouse, using robot to retrieve items. In total, the library has 625 km of shelving, growing by 12 km every year.
    Whilst we acquire items through purchase or gifts, much of the collection has been built up through legal deposit. That is, by law, a copy of every UK and Ireland print publication must be given to the British Library by its publishers. Around 3 million items are added per year. In 2013, legal deposit was extended to cover non-print material which means by law we take in digitally published items as well, which means regular mass crawls of the entire UK web domain as well as ebooks, ejournals etc.
  • 85 seconds
    The picture you can see is inside the main building in London, it’s the King’s Library – King George the Third’s personal library! Sometimes known as the ‘stack’, I walk past this everyday and I sometimes forget that the collections the British Library have are truly staggering! We currently estimate them to exceed <click>150 million items, representing every age of written civilisation and every known language. Our archives now contain the earliest surviving printed book in the world, the Diamond Sutra, written in Chinese and dating from 868 AD….
    So some big numbers…
    Over …<click>14 million books
    <click>60 million patents
    <click>8 million stamps
    <click>4 million maps
    <click>3 million sound recordings
    <click>1.6 million music scores
    <click>over .3 million manuscripts
    <click>0.8 million serials titles (which are of course made up of many many volumes/editions), this is where a lot of our content is, just in case you thought the numbers didn’t add up!
  • 33 Seconds (100 Words)
    In a nutshell the project encourages researchers, artists, entrepreneurs, educators and anyone else,
    <Click>
    to ‘experiment’ with our digital collections and data. We are particularly interested in those who have questions which focus on the potential to find and create NEW things through access to the digital content. For example, being able to ask a question across thousands of digitised books or newspapers using computational techniques would not feasible using manual methods. Let’s look at a clear example.
    <Click>
  • 33 Seconds (100 Words)
    In a nutshell the project encourages researchers, artists, entrepreneurs, educators and anyone else,
    <Click>
    to ‘experiment’ with our digital collections and data. We are particularly interested in those who have questions which focus on the potential to find and create NEW things through access to the digital content. For example, being able to ask a question across thousands of digitised books or newspapers using computational techniques would not feasible using manual methods. Let’s look at a clear example.
    <Click>
  • https://goo.gl/WutNyi
    http://goo.gl/nNKhQ2
    https://goo.gl/9NWZUW
    https://goo.gl/7QQ5Tf
    https://goo.gl/x7b4tg
    https://upload.wikimedia.org/wikipedia/commons/a/a2/Interactive_whiteboard_at_CeBIT_2007.jpg
  • Get clearer annotation image and transcription (perhaps TILT)
  • 6 Seconds (20 Words)
    So <Click> ‘how’ do we try and engage those who might be interested in the BL’s digital collections and data? <Click>
  • 17 Seconds (53 Words)
    <Click>The British Library is one of the largest Library’s in the world <Click> with an estimated 180 million physical items, with only a small proportion being digitised. <Click>We estimate this is around 1-2%, but no one really knows exactly how much. However, increasingly more items are being stored as ‘born’ digital, such as the UK Web Archive<Click>
  • <click>The British Library faces many challenges of access to our Digital collections!
    <click> Sometimes digital content is only available onsite due to license restrictions,
    <click>or even only on a specific computer in a reading room! Technically there are very few reasons why digital content can’t be online
    <click> though it might be too big or hasn’t been transferred from other digital storage media.
    <click>Sometimes access is through a paywall. Finally,
    <click>some content is in the happy sunny place, online, open and freely available.
    The real reasons why there are challenges to accessing digital content are of course human. They require different approaches from the Library and may often involve an honest, open dialogue and negotiation with the publishers.
    The Labs project has tried to address this problem my creating a ‘residency model’ for researchers to work intensively with a digital collection on-site, so as to not infringe access conditions, I will say more about this later.
  • https://goo.gl/Kfc4qc
    Finding openly licensed collections is sometimes like detective work and from lessons learned Labs, uses the following 4 methods for filtering digital content:
    <click>Is the Copyright cleared for research and non commercial use?
    <click>Is it Curated (Is there someone who knows the ‘story’ about the collection?)
    <click>Is there Collection / Item Level Metadata available? And importantly what state is it in, does it need cleansing?
    <click>Finally, where is it?
    <click>These have been effective filters in doing the work of Labs in an agile way.
    <click>Labs has therefore identified several collections at the website above, some are shown in the slide:
    <click>Due to our licensing conditions, we are in the process of text mining the abstracts for a large number of journal titles in electronic form. The visualisation indicates the subject spread of our collections.
    <click>We have been harvesting the UK Web since 1993 and this is available as a resource under specific conditions for research.
    <click>We are also investigating the use of our item request data (around 17 million records) and anonymised reader data, data protection allowing.
    <click>The British National Bibliography has over 3 million catalogue records available as linked open data, licensed under CCO from the British and Irish National Library catalogues.
    More information is available on the Labs website, and we hope to one day develop data.bl.uk a place where all our open content and data lives with a unique identifier for each data set.
  • <click>The British Library faces many challenges of access to our Digital collections!
    <click> Sometimes digital content is only available onsite due to license restrictions,
    <click>or even only on a specific computer in a reading room! Technically there are very few reasons why digital content can’t be online
    <click> though it might be too big or hasn’t been transferred from other digital storage media.
    <click>Sometimes access is through a paywall. Finally,
    <click>some content is in the happy sunny place, online, open and freely available.
    The real reasons why there are challenges to accessing digital content are of course human. They require different approaches from the Library and may often involve an honest, open dialogue and negotiation with the publishers.
    The Labs project has tried to address this problem my creating a ‘residency model’ for researchers to work intensively with a digital collection on-site, so as to not infringe access conditions, I will say more about this later.
  • Have balance of Multimedia
    Broadcast news and radio, sounds asave our sounds
    Books and newspapers
    Images
    BNB
    Qatar Digital library
    Hebrew manuscripts
  • 21 Seconds (65 Words)
    Katrina Navickas was particularly interested in the <Click>Chartist Movement who were a group who were campaigning for the vote for working people. <Click>They were the biggest popular movement for democracy in 19th century British history, just as this is early picture shows a huge monster meeting at Kennington Common<Click>She wanted to use a combination of manual and computational methods to explore our Digitised Newspapers to find out when and where they met and plot them on map. <Click>and hopefully unearthing new history.
  • 970 files from a selection of 19th century newspaper titles from the BL corpus for us to correct using the overProof post-OCR correction software
    The best way to measure the improvement made by the correction process is to compare the OCR'ed text and the automatically corrected text with a perfect correction made by a human (known as the "ground truth").
    Hannah-Rose's 5 small human-corrected samples are show as green dots. These are not only smaller than the other files, but their raw error rate is much lower at 13.3%. OverProof was measured as reducing this to 5.4%, a removal of almost 60% of errors.
    The red dotted-line indicates the correction "break-even" point: the further under the line, the better the quality of the document after correction.
    In the graph below, the grey line shows distribution of files across error rates before correction and the green line after correction.
  • 75 seconds
    The work of Labs is really about a number of stories, stories about digital collections and about researchers wanting to ask fascinating research questions about them. Let’s now tell you a story about one collection and the intended and unintended consequences of working with it.
    The Library digitised 65,000 17th to 19th century books from our collections a few years ago (around 2.7 % of the physical total in that period). You can view them from our catalogue or read them on your <click>IPad via the Historical Books app developed by BiblioLabs. We also captured 22 million individual page images, along with full text scans of these images all of which contain untold quantity of useful data such as names of people, places, historical events, dates.
    So the question became then, what next? What can 65,000 books tell us?
  • Posts small illustrations taken almost at random from the digitised book corpus to a Tumblr blog.
    This experiment with undirected engagement was a by-product of work to uncover the hidden wealth of illustrations within the digitised pages.
  • 50 seconds
    Here is the anatomy of a Flickr record, importantly we have created links to many of the Library’s services <click>some of this lovely traffic is going back to the Library and hopefully generating more interest in our services, from downloading a pdf of the book to purchasing a high res scan of the image.
    <click>Tags are added from the original book record, including the approximate page number the image came from<click>users of Flickr can add their own tags, and I have mentioned they have already started doing it.
  • 18 Seconds (56 Words)
    Indexing BL the 1 million & Mapping the Maps – was led by James Heald and collaboration with others <Click>They produced an index of 1 million 'Mechanical Curator collection' images on <Click>Wikimedia Commons from a collection of largely un-described images. <Click>This gave rise to finding 50,000 maps within the collection partially through a map-tag-a-thon <Click>These are now being geo-referenced. <Click>
  • 27 Seconds (82 Words)
    Adam Crymble <Click>wanted to harness the power of playing fun games on arcade machines to help with crowdsourcing the tagging of un-described images. He particularly wanted to engage a younger audience into crowdsourcing .<Click>On the right you can see a replica 1980’s arcade machine we built and <Click>and on the bottom left some tagging games that were developed through a ‘Games Jam’ for the machine. <Click>. Let’s take a closer look at two of the games…<Click>

×