Más contenido relacionado

Similar a What is BL Labs?(20)


What is BL Labs?

  1. 1 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Funded by the Andrew W. Mellon Foundation Mahendra Mahey Experiment with our Digital Collections Mahendra Mahey Manager of BL Labs Running since March 2013 Core Team • Adam Farquhar (PI) • Mahendra Mahey • Ben O’Steen • Eleanor Cooper (0.5) What is British Library Labs? 12:50 to 13:20, Thuirsday 12th April 2018 BL Labs Roadshow 2018 University of Britsol UK.
  2. 2 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol The British Library Inside the British Library Space for 1200 readers, around 500,000 visitors per year Building 37 uses low oxygen and robots Reading room and delivery to London Many items stored at Document Supply and Storage centre 48 hours away Stockton-on-Tees Author right to payment each time their books are borrowed from public libraries. St Pancras, London, UK Many books are stored 4 stories below the building UK Legal Deposit Library – Reference only Founded in 1973 though origins stem back to British Museum Library 1753 Boston-Spa
  3. 3 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Collections – not just books! > 180*million items > 0.8* m serial titles > 8* m stamps > 14* m books > 6* m sound recordings > 4* m maps > 1.6* m musical scores > 0.3* m manuscripts > 60* m patents King’s Library *Estimates
  4. 4 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Living Knowledge Vision (2015 – 2023) Custodianship Research Business Culture Learning International To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023 (50 year anniversary). Document: Speech: Roly Keating (Chief Executive Officer of the British Library) To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023 (50 year anniversary).
  5. 5 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Wider…not just Digital Humanities Researchers Researchers Artists Librarians Curators Software Developers Archivists Educators
  6. 6 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Digital research methods Digital Scholarship Visualisations Application Programming Interfaces (APIs) for datasets e.g. Metadata, Images, etc Transcribing Annotation Location based searching & Geo-tagging Corpus analysis, Text Mining & Natural Language Processing Crowdsourcing Human Computation In 20 years time?
  7. 7 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol What about Digital? Born Digital Digitised
  8. 8 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol / Knowledge Quarter London 80 knowledge organisations (as of 14/04/18) within 1 mile radius of Kings Cross, (Headquartered at the British Library) UK Web Archive and e-legal deposit (2013) Born digital
  9. 9 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol All our physical items are digitised right?
  10. 10 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol #bldigital 1-2 %* digitised * estimate Digitisation Partnerships Commercial & Other Organisations Amount increasing rapidly Bias in digitisation Sample Generator
  11. 11 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Playbills, Books, Newspapers (includes Optical Character Recognition (OCR)) Digital collections and Datasets British National Bibliography Music (Recordings & Sheet) & Sounds Broadcast News (TV and Radio) Usage data EtHOS Web ArchiveImages, Manuscripts & Maps Qatar Digital Library International Dunhuang Project Maps Hebrew Manuscripts Flickr & Wikimedia Commons
  12. 12 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Finding Open Cultural Heritage Datasets Collection Guides (199 as of 12/04/2018) Datasets about our collections Bibliographic datasets relating to our published and archival holdings Datasets for content mining Content suitable for use in text and data mining research Datasets for image analysis Image collections suitable for large-scale image- analysis-based research Datasets from UK Web Archive Data and API services available for accessing UK Web Archive Digital mapping Geospatial data, cartographic applications, digital aerial photography and scanned historic map materials Download collections as zips, no API Each dataset has a Digital Object Identifier (DOI) can be referenced for research Not all discoverable via search engines!
  13. 13 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol How are we doing this?
  14. 14 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Competition Awards Projects Tell us your ideas of what to do with our digital content (2013-16) Show us what you have already done with our digital content in research, artistic, commercial and learning and teaching categories Talk to us about working on collaborative projects Tell us your ideas of what to do with our digital content Engagement • Roadshows • Events • Meetings • Conversations New! Digital Research Support
  15. 15 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Digital Research Support process Online Query (baseline) Response online Requires discussion Online or Face to face (intermediate) @BL_Labs Other… Explore data first >=1 project chosen & supported per month Submit Project Proposal (advanced) Open Onsite Onsite only datasets Labs website (entry) &
  16. 16 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Digital Research Support Application Process • Complete online form - • Entries reviewed and selected at the beginning of the month • Up to 5 days support provided • Technical, curatorial and legal advice • Scope, Costs, Time, Risks • Any other relevant issues?
  17. 17 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol • The Library has to go out to meet researchers, regularly and cyclically to tell them what we have and learn what they want to do • Debunk ‘myths’ about the Library • Show / tell researchers about the reality of our data • Researcher’s ideas always change once they explore the data! Lots of two-way communication! BL Labs runs annual ‘Roadshows’ around the UK
  18. 18 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Have you got X? Looking for Physical Content in the British Library
  19. 19 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Have you got X digitised / in digital form? Looking for Digitised / Digital Content in the BL
  20. 20 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol •Digitisation costs time, resources & access can depend on restrictions imposed by funders, legal, ethical, practical etc. … •Still…702 Digitisation projects (as of 12/04/2018) But not all found through search engines or even online! So little digitised…why? © £ 
  21. 21 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Openly Licensed Digital Content? 15% Openly Licensed Around 80%* available online Working through to make more open… Though some collections will always only be available onsite due to various reasons including legal, ethical etc Breakdown by collection* Manuscripts 59% Books 9% Maps and Views 7% Newspapers 3% Archives and Records 3% Paintings, Prints and Drawings 2% *Based on number of digitisation projects (702 as of 12/04/18) Largest proportion of funding Public / Private Partnership 15 %* Openly Licensed – most online 85 %* Available onsite only at the moment *Estimates
  22. 22 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol The Story of the Digital Collection… Digital Collection Curator Who paid for the digitisation? Who did the digitisation? Technology used Born digital? Published Unpublished Where is it? Can it still be accessed? Generates income Reputational risk in using? Legalities Politics when digitised Personalities involved Surprises (e.g. gaps) Descriptive information Old format not supported What media was the digitisation done from? Is there any background documentation? No Descriptive information Inconsistent descriptive information Still there? Good to know the background ‘Story’ of a Digital Collection’ if you want to use it for research and make conclusions…
  23. 23 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Open Content vs Onsite Only Access • Access easier for openly licensed content • More challenging for on-site, in-copyright, non-print legal deposit, data protected, old content media & contemporary material (post 1877) ©
  24. 24 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol How do we give access to onsite-only Digital Collections (85% of our Digital Collections)?
  25. 25 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol only in Reading Rooms due to © only on site due to © or ethical etc not online / available – various storage devices, personal data online and open British Library online behind paywall Challenges of access to Digital Collections Labs Residency Model
  26. 26 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol OPEN £ • Have to be ‘onsite’ (interpretations vary) • Need to be ‘security cleared’ ‘trusted’ for some collections – Hence ‘Researcher in Residence Model’, trialling onsite ‘Digital Research Suite’ in reading room • Further permission may be required (depending on ‘story’ of collection) • Content could be on various media formats (not always online) • 5 - 20 % re-use of material for non commercial research for some collections, depends on agreements in place • We are learning ‘pathways’ so that this becomes ‘everyday’ to provide onsite access to some digital collections in the future Accessing digital collections onsite
  27. 27 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol • Dialogue typically: – you are ‘lucky’ & we have the digital content / data relevant to your research – we don’t have exactly what your looking for, but is there anything of interest? Let’s talk… – engagement can be hard work and it’s constantly required to maintain interest in our digital collections! • We also tend to attract researchers with ‘fuzzier’ research boundaries and possibly open to more interdisciplinary / collaborative research • Artists find this dialogue easier… What engagement does the BL have with researchers wanting use our digital content?
  28. 28 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Interactions with Labs “researcher”
  29. 29 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Phase 1: Exploration Allows a researcher to: – Understand the data in open-ended fashion. – Discover potential tools to work with the data. – Gain awareness of their capabilities and limitations. – Develop a firmer research query. – Gauge the costs, risks and time needed. • Outputs of the exploration are not intended to be shareable, beyond personal experience and key features (data size, formats, tool successes, etc.).
  30. 30 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Phase 2: Query-Focussed • A firmer and more informed query by the researcher where: – Suitable datasets already lined up – There is a good idea of the initial toolset and capabilities (human and computer) required – The project output is outlined, and relevant reuse applications are begun. – Clear agreements on what happens at the end of the project – data deletion, virtual machine deletion/archiving/etc. – Project may iterate on initial ideas,depending on researcher’s cost/risk appetite Submit idea for support
  31. 31 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Phase 3: Wrap-up • Wrap-up – Work (code, notes) exported and given to researcher – All derivative data is licenced or retained based on reuse agreements (Access & Reuse board, etc.) – Provisions made for the project are wound-down, as agreed (derivative data deleted after a grace period, etc.)
  32. 32 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Why are we doing this?
  33. 33 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Why are doing this? (1) We support research it’s our job! We want to work closely with and listening to those who want use our digital collections and data for their work!
  34. 34 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol We can learn how we are and should be supporting you and this therefore shapes the problems we work on, such as: Why are doing this? (2) • Access to digital collections / data? • Advice, guidance, technical support, training • Services, Tools and Processes? • Many more reasons…
  35. 35 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Where are the gaps between what you want & what we can give? How do we build the bridges to overcome the gaps? Why are doing this? (3)
  36. 36 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol How do we help you ‘navigate’ their way through the ‘maze’ (sometimes) of the Library to what they want to do? Sometimes requires understanding the culture of the organisation Why are doing this? (4)
  37. 37 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol What did people actually do? Examples from Text and Images Over 200 examples (including sound, video) from Competition and Awards:
  38. 38 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Example Pattern of Research 1, 2, 3 1. Find / identify new things in messy stuff 2. Unlock hidden history / data 3. Celebrate new discoveries
  39. 39 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Finding / identifying invisible / well hidden things in ‘messy’ historical data Not the British Library! Example Pattern of Research 1
  40. 40 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Messiness in historical data • 'Begun in Kiryu, Japan, finished in France' • 'Bali? Java? Mexico?' • Variations on USA: – U.S. – U.S.A – U.S.A. – USA – United States of America – USA ? – United States (case) • Inconsistency in uncertainty – U.S.A. or England – U.S.A./England ? – England & U.S.A.
  41. 41 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Open Refine
  42. 42 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol • Cultural heritage records contain uncertainty and fuzziness (e.g. date ranges, multiple values, uncertain or unavailable information)—Curators and staff at institutions often have unique expertise in deciphering these anomalies-ask them! ( [1960] vs.1960 can have a big impact depending on what you’re doing) • Optical Character Recognition in particular is an imperfect art-need to consider how bad it is, how this might effect your findings, and what needs doing to mitigate it. • Keeping data clean, organised, open and described well will not only make your life easier, but enable its widespread re-use beyond and increase future impact. (Datasets you’ve created in the course of your research projects could even be used to enhance national collections!) • Decisions always need to be made while normalising information for visualisation. Documenting them is important for your research but also future re-use! • Is your aim enquiry or presentation? All of this will have an impact on the tools and data cleaning choices you make. Things to consider: Data + Tools
  43. 43 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol
  44. 44 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol #digitalhumanities dancohen/lists/digitalhumanities @ProfHacker @Dhnow @BL_DigiSchol And more links to resources here:
  45. 45 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Unearthing / unlocking hidden histories & data to stimulate new research It’s an 18th Century Poem! Example Pattern of Research 2
  46. 46 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Celebrating hidden histories / data creatively through events, art & performance Re-enacting, re-discovering history Example Pattern of Research 3
  47. 47 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Experiments with Text
  48. 48 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Finding things in ‘messy’ Optical Character Recognised (OCR) text Mrs Folly • Clean up some manually • Get human ‘ground truth’ • Write computer code (sometimes it’s machine learning) to find things reliably in it ‘automatically’ • Try code on messy content • Tweak if necessary • Digital ‘lasso’ around content • Human sift through Mrs Folly An example pattern of research
  49. 49 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Legalities of Machine Learning / Text and Data mining Legalities of Machine Learning / Text and Data mining still up for discussion…Often misunderstood Is it the same as humans reading and looking for patterns…just a bit quicker?
  50. 50 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Victorian Meme Machine (2014) Bob Nicholson Bob Nicholson interviewed on BBC Radio 4 Making History Programme: And telling jokes to the public: Bob obtained further funding from his university Looking for more collaborations Rob Walker, Victorian Mother-in-law Jokes Victorian Comedy Night, 7 Nov 2016 Learnt about access paths to digital collections
  51. 51 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Katrina Navickas (2015) Political Meetings Mapper Labs Symposium 2015 Interview 2015 The Chartist Newspaper Chartist Monster Meeting Chartists Walking Tour and Re-enactment London Learnt that domain knowledge reduces noise
  52. 52 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Black Abolitionist Performances & their Presence in Britain (2016) – Hannah-Rose Murray Frederick Douglass Ellen Craft Josiah Henson Ida B Wells A Performance by Joe Williams & Martelle Edinborough Started to implement Machine Learning Techniques
  53. 53 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Data-mining verse in 18th Century newspapers BL Labs Project 16-17, Jennifer Batt Slides courtesy Jennifer Batt Started to refine Machine Learning Techniques
  54. 54 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Psychiatrist’s Journey into 19th Century Newspapers (2016) • Dr Surendra P Singh, Consultant Psychiatrist • To identify weekly, monthly, yearly and longitudinal trends in suicide reporting in terms of gender, status, sites, locations and health in OCR text of 19th Century Newspapers • Used ‘R’ Open Source Stats Package to collect ‘Suicide’ corpus • Looking for collaborators to work on this dataset Use off-the-shelf tools and remote access pathways
  55. 55 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Virtual Infrastructure for OCR text OCR text ‘scraped’ from digitised newspapers and put in internal cloud Jupyter notebook Write python code and results in web browser Access available for researchers ‘in residence’
  56. 56 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Experiments with Images
  57. 57 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol 65,000 digitised 19th Century books Image: Artwork by Alicia Martin 2007 / 2008 Paid for by: For a full list: Subjects include: Philosophy Poetry History Literature 1789 - 1876
  58. 58 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol 30 August 2012
  59. 59 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol 002819694 Unique number
  60. 60 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol
  61. 61 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol
  62. 62 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol OCR XML Generated by ABBY Fine Reader Optical Character Recognition
  63. 63 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Images from books captured too!
  64. 64 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol We did some of our own experiments…do as we tell others! Experiment with our Digital Collections@BL_Labs
  65. 65 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Ben O’Steen of @BL_Labs after Hack Event, August 2013
  66. 66 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Ben O’Steen of @BL_Labs after Hack Event, August 2013
  67. 67 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Ben O’Steen of @BL_Labs after Hack Event, August 2013
  68. 68 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol
  69. 69 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol
  70. 70 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol
  71. 71 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol
  72. 72 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol
  73. 73 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol 1,020,418 images needed identifying!
  74. 74 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol One major problem! •We know about the books these images come from but we know nothing about the actual images! •How will we identify them? •How will we find them later? •How can we do that with 1 million images? •Try a few experiments!
  75. 75 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Running face recognition on the images Face Recognition Algorithm Trained on Photographs  Late August 2013
  76. 76 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Face Recognition Algorithms worked better for female faces than men’s
  77. 77 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol The Mechanical Curator Snipped image posted almost randomly every hour… on a Tumblr blog One of our early followers was… Ben O’Steen, 30 September 2013 Has a slight ‘mood’… once image published, tries to find 8 similar images e.g. ‘slanty’, ‘circular’ etc. & then gets ‘bored’ follow… @MechCuratorBot
  78. 78 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Flickr Commons (100 + GLAMs as of 12/04/18)
  79. 79 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol
  80. 80 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol British Library Flickr Commons Why Flickr Commons? • Free! • Each image has it’s own unique web address, easy to share • Can Tag images • Has Application Programming Interface (API) Late August 2013
  81. 81 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Worked better for female faces than men’s Press Posts image every 30 minutes 1,020,418 images need tagging! Creative uses of images Face recognition Algorithms based on photos Mechanical Curator with an algorithmic brain (Circles, Squares and Slanty etc) Wikimedia Flickr Commons Individual URL & API Snipping out images from 65,000 Digitised Books* >800,000,000* views >17,000,000* tags Work @ BL by Ben O’Steen, Labs and Digital Research Team*Matt Prior - Since Dec 2013 Tumblr *Estimates >More demand to see physical items
  82. 82 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Tagging, Tagging, Tagging…
  83. 83 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Tagging a million images Iterative Crowdsourcing Cardiff University’s Lost Visions Project Metadata Games James Heald Mario Klingemann Chico 45 Use computational methods Human Tagger Top British Library Flickr Commons Taggers 18 hard core taggers How to reward and keep motivated this ‘small group? Average for ‘crowd’ is 1 tag per person What kind of ‘task’ can this ‘crowd’ do? Mobile games for ‘Ships’, ‘Covers’ and ‘Portraits’ Interface for tagging
  84. 84 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Adam Crymble (2015) Crowdsource Arcade 30 mins talk Labs Symposium (2015) 5 min interview (2015) Game Jam Using Arcade Games to help Tag images ‘Art Treachery’ and ‘Tag Attack’
  85. 85 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Special Jury’s Prize (2015) James Heald – Wikimedia and Map work Labs Symposium (2015)Geotagging maps 50,000 Maps Found in Flickr 1 million Human & Computational Tagging & Community engagement Geo-referencing work
  86. 86 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol SherlockNet: Competition Winner 2016 Karen Wang, Luda Zhao and Brian Do Using Convolutional Neural Networks to Automatically Tag and Caption the British Library Flickr Commons 1 million Image Collection 12 categories >15.5 million tags added >100,000 captions Pooled surrounding OCR text on page from similar images Used Microsoft COCO (photographs) & British Museum Prints and Drawings collections as training sets. Tags Captions
  87. 87 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Mario Klingeman (2015) Code Artist / Curator Kris Hoffman (2016) Animation for Fashion Week 2016 Jiayi Chong 2016 - Animation tool Paul Rand Pierce 2016 Graphic Novel on Facebook Tragic Looking Women 44 Men who Look 44 (Notice the direction faces) A Hat on the Ground Spells trouble Artistic / Creative Works David Normal 2014 and 2015 Collages/Paintings & Lightboxes
  88. 88 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Imaginary Cities – BL Labs Project / Exhibition 16-18 (Michael Takeo Magruder) An artistic exploration seeking to create provocative fictional cityscapes for the Information Age from the British Library’s digital collection of historic urban maps
  89. 89 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Alanna Hilton British Fashion Colleges Council and Teatum Jones
  90. 90 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol It all starts from a conversation! • Start with a conversation, our data isn’t highly visible on search engines (yet!) & not easy to find. Need to create and embrace serendipity & opportunities for use by talking! • Need to have several conversations with several stakeholders & tap into their tacit knowledge that isn’t always written down sometimes to progress ideas. • Often misunderstandings because of jargon & different meaning of words. ? Audience research & Digital interests Digital collections we have This is where Labs works
  91. 91 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Many researchers have the domain knowledge but lack technical / digital skills to use Digital Research methods. Should they be teamed up with those that want to solve problems or get trained? (Will look at in the afternoon) Digital skills training needed for Humanities researchers…
  92. 92 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Labs mindset… 1. Start a conversation, generate positive energy, be nice, have fun and try to support ideas . 2. Start with small experiments, but think big! 3. Fail faster (don’t be afraid) and persevere. 4. Reject perfectionism! Good enough is sometimes…good enough! 5. Celebrate the uses of digital collections, tell the world!
  93. 93 @BL_Labs @BrisUniRIT @JGIBristol @Cudigitalnet @BL_DigiSchol Explore or Imagine Our Data! • CSV of Metadata • 19th Century Books - Book Metadata - 01/09/2013. • Digitised Books - Flickr Tag History - Dec 2013 to March 2016. TSV • Digitised Hebrew Manuscripts - Metadata • Digitised Hebrew Manuscripts: Or 2210 - Or 2364 • Theatrical playbills from Britain and Ireland (OCR text only) • Portraits of actors, views of theatres and playbills (covering 1750 - 1821 in a single volume) • Volumes of Lysons Collectanea (Amusements), comprising broadsides, cuttings, advertisements on amusements.1660- 1840. • Have a look at the data. • Data Quality • Issues Or an idea you have thought of what to do with the data! Smaller datasets

Hinweis der Redaktion

  1. 140 seconds The British Library is the national library of the UK and one of the largest research libraries in the world . The Library moved to a new purpose built building in 1997 <click> the largest of it’s kind that was built in the UK in the 20th century. Many frequently used items are stored 5 stories below the main building at St Pancras in London and many might not know that part of the building is meant to look like a ship on a journey to discovery!<click>. <click to switch off> The building can sit 1,200 researchers at any one time across 5 reading rooms. <click>Medium and long term requested items are held at Boston Spa in Yorkshire in a low oxygen warehouse, using robot to retrieve items. In total, the library has 625 km of shelving, growing by 12 km every year. Whilst we acquire items through purchase or gifts, much of the collection has been built up through legal deposit. That is, by law, a copy of every UK and Ireland print publication must be given to the British Library by its publishers. Around 3 million items are added per year. In 2013, legal deposit was extended to cover non-print material which means by law we take in digitally published items as well, which means regular mass crawls of the entire UK web domain as well as ebooks, ejournals etc.
  2. 85 seconds The picture you can see is inside the main building in London, it’s the King’s Library – King George the Third’s personal library! Sometimes known as the ‘stack’, I walk past this everyday and I sometimes forget that the collections the British Library have are truly staggering! We currently estimate them to exceed <click>150 million items, representing every age of written civilisation and every known language. Our archives now contain the earliest surviving printed book in the world, the Diamond Sutra, written in Chinese and dating from 868 AD…. So some big numbers… Over …<click>14 million books <click>60 million patents <click>8 million stamps <click>4 million maps <click>3 million sound recordings <click>1.6 million music scores <click>over .3 million manuscripts <click>0.8 million serials titles (which are of course made up of many many volumes/editions), this is where a lot of our content is, just in case you thought the numbers didn’t add up!
  4. Get clearer annotation image and transcription (perhaps TILT)
  5. 6 Seconds (20 Words) So <Click> ‘how’ do we try and engage those who might be interested in the BL’s digital collections and data? <Click>
  6. 17 Seconds (53 Words) <Click>The British Library is one of the largest Library’s in the world <Click> with an estimated 180 million physical items, with only a small proportion being digitised. <Click>We estimate this is around 1-2%, but no one really knows exactly how much. However, increasingly more items are being stored as ‘born’ digital, such as the UK Web Archive<Click>
  7. Have balance of Multimedia Broadcast news and radio, sounds asave our sounds Books and newspapers Images BNB Qatar Digital library Hebrew manuscripts
  8. <click>The British Library faces many challenges of access to our Digital collections! <click> Sometimes digital content is only available onsite due to license restrictions, <click>or even only on a specific computer in a reading room! Technically there are very few reasons why digital content can’t be online <click> though it might be too big or hasn’t been transferred from other digital storage media. <click>Sometimes access is through a paywall. Finally, <click>some content is in the happy sunny place, online, open and freely available. The real reasons why there are challenges to accessing digital content are of course human. They require different approaches from the Library and may often involve an honest, open dialogue and negotiation with the publishers. The Labs project has tried to address this problem my creating a ‘residency model’ for researchers to work intensively with a digital collection on-site, so as to not infringe access conditions, I will say more about this later.
  9. Examples from the Cooper Hewitt collection. I spent 3/5 of my time at the Cooper Hewitt just trying to get the data clean enough to vaguely represent the collection. The problem is that computers think U.S., U. S. , U.S.A., U. S. A. , United States, United States of America are six different places. Fields also contain things like internal notes about potential duplicates, unexpected extra information - notes on what type of location, etc. Lots of inconsistencies - uncertainty and date ranges expressed in different ways. More common GLAM issues - What year is 'early 18th century'? What do you do with '1836 (probably)'?
  10. Open Refine is an amazing tool, and I wouldn't have gotten anywhere at Cooper Hewitt without it. It will suggest ways to make the data more consistent. You can then export the data and keep working on it in other tools, or put it into Open Refine. Because Refine runs locally it can be used for sensitive data you mightn't put online. One issue is that GLAMs tend to use question marks to record uncertainty in attribution, but Refine strips out all punctuation, so you have to be careful about preserving it (if that's what you want). Takes in TSV, CSV, *SV, Excel (.xls and .xlsx), JSON, XML, RDF as XML, and Google Data documents. useful advice
  11. 21 Seconds (65 Words) Katrina Navickas was particularly interested in the <Click>Chartist Movement who were a group who were campaigning for the vote for working people. <Click>They were the biggest popular movement for democracy in 19th century British history, just as this is early picture shows a huge monster meeting at Kennington Common<Click>She wanted to use a combination of manual and computational methods to explore our Digitised Newspapers to find out when and where they met and plot them on map. <Click>and hopefully unearthing new history.
  12. Watch out the gunner and skunk as they will make an appearance again!
  13. Posts small illustrations taken almost at random from the digitised book corpus to a Tumblr blog. This experiment with undirected engagement was a by-product of work to uncover the hidden wealth of illustrations within the digitised pages.
  14. 27 Seconds (82 Words) Adam Crymble <Click>wanted to harness the power of playing fun games on arcade machines to help with crowdsourcing the tagging of un-described images. He particularly wanted to engage a younger audience into crowdsourcing .<Click>On the right you can see a replica 1980’s arcade machine we built and <Click>and on the bottom left some tagging games that were developed through a ‘Games Jam’ for the machine. <Click>. Let’s take a closer look at two of the games…<Click>
  15. 18 Seconds (56 Words) Indexing BL the 1 million & Mapping the Maps – was led by James Heald and collaboration with others <Click>They produced an index of 1 million 'Mechanical Curator collection' images on <Click>Wikimedia Commons from a collection of largely un-described images. <Click>This gave rise to finding 50,000 maps within the collection partially through a map-tag-a-thon <Click>These are now being geo-referenced. <Click>