Más contenido relacionado

Similar a BL Labs Presentation at Open Science Infrastructures for Big Cultural Data(20)


BL Labs Presentation at Open Science Infrastructures for Big Cultural Data

  1. 1@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Funded by the Andrew W. Mellon Foundation and the British Library Running since March 2013 1630 - 1715, Thursday, 13th December 2018, Fostering Excellence in Scholarship with Big CH Collections (in Humanities data and their research use session), Open Science Infrastructures for Big Cultural Data, International Advanced Masterclass, Fifth Floor Conference Room, Hotel Trimontium, Plovdiv, Bulgaria. Fostering Excellence in Scholarship with Big Cultural Heritage Collections Insights from British Library Labs Mahendra Mahey, Manager of BL Labs
  2. 2@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu The British Library Inside the British Library Space for 1200 readers, around 500,000 visitors per year Building 37 uses low oxygen and robots Reading room and delivery to London Many items stored at Document Supply and Storage centre 48 hours away Stockton-on-Tees Author right to payment each time their books are borrowed from public libraries. St Pancras, London, UK Many books are stored 4 stories below the building UK Legal Deposit Library – Reference only Founded in 1973 though origins stem back to British Museum Library 1753 Boston-Spa
  3. 3@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu For research, inspiration and enjoyment!
  4. 4@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Living Knowledge Vision (2015 – 2023) Custodianship Research Business Culture Learning International To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023 (50 year anniversary). Document: Speech: Roly Keating (Chief Executive Officer of the British Library) To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023 (50 year anniversary).
  5. 5@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Collections – not just books! > 180*million items > 0.8* m serial titles > 8* m stamps > 14* m books > 6* m sound recordings > 4* m maps > 1.6* m musical scores > 0.3* m manuscripts > 60* m patents King’s Library *Estimates
  6. 6@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Have you got X? Looking for Physical Content in the British Library
  7. 7@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu What about Digital? Born Digital Digitised
  8. 8@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu #bldigital 3 %* digitised * estimate Digital Partnerships Commercial & Other Organisations Bias in digitisation, gaps & not always representative of physical collections Sample Generator 15 %* Openly Licensed – most online 85 %* Available onsite only at the moment Digitisation / Curating Born Digital costs money, time, resources Heritage Made Digital Research led digitisation increasing rapidly Born Digital
  9. 9@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Have you got X digitised / in digital form? Looking for Digitised / Digital Content in the BL
  10. 10@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Our Audience and our collections Audience research & Digital interests Digital collections we have This is where Labs works It starts with a conversation! Only a small amount of content is digitised! Might not be the treasure expected at the end of a digital journey!
  11. 11@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu The Story of the Digital Collection… Digital Collection Curator Who paid for the digitisation? Who did the digitisation? Technology used Born digital? Published Unpublished Where is it? Access / API? Can it still be accessed? Generates income Reputational risk in using? Legalities / Ethics / Morality Politics when digitised Personalities involved Surprises (e.g. gaps) Descriptive information Old format not supported What media was the digitisation done from? Is there any background documentation? No Descriptive information Inconsistent descriptive information Still there? Good to know the background ‘story’ of a Digital Collection if you want to use it for projects …
  12. 12@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu READING ROOM ON SITE NOT ONLINE OPEN British Library £ Labs Residency Model Challenges of access to Digital Collections at the BL
  13. 13@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Library Labs – a space to experiment and innovate on-site and on-line • Expert support and advice • Essential equipment (software, hardware, storage, network) • Essential ingredients (data, text, images) • The ability to create, validate, capture, record, reproduce, archive, and share results • Community, tutorials, examples • Integrated into reference and research workflows
  14. 14@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Some of the challenges GLAM Labs addresses • Money spent on digitising / capturing digital – return on investment, how is it being used and what value and impact it is having, especially when opening collections for all. • What digital collections are there that can be used openly / onsite and how do we tell people and get them to use them? • How do we explore the ‘feel’ / ‘shape’ of collections at scale? • How do we find, explore, augment discovery in often ‘messy’ cultural heritage data without public APIs? • How do we discover, celebrate old culture & remix to create new culture?
  15. 15@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu International GLAM Labs community 43 GLAMs, from over 20 countries came! Join us: Meeting in London to share experiences 13-14 September 2018 Next meeting Copenhagen 5-6 March 2019, Book sprint after! Transforming relationships between people, technology, and digital collections
  16. 16@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Labs Survey: Growing Community
  17. 17@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Horses for Courses • Variation in – Target users – Funding sources – Security models • Surprises – Many do not facilitate access to restricted collections – Many do not provide dedicated physical space – Or simultaneous access to digital and physical
  18. 18@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu • Start a conversation, generate positive energy, encourage fun/play/experimentation and try to support as many ideas as is humanly possible, be kind, nice, want to share and genuinely want to help people! • Start with small experiments, digital re-use can be really simple, but OK to think big! • Open approaches are preferred over single solutions and keep it simple don’t overcomplicate • Policies and process for digital re-use are critical • Don’t be afraid to fail, but fail faster and have patience and persevere. • Reject perfectionism, it is the enemy of rapid progress! Good enough is sometimes…good enough! Difficult message for Libraries • Services that allow useful exploration of cultural heritage data are rare! • Exploring data is difficult to do with large datasets and often requires specific skills and capabilities that many of our users don’t have – training or collaborations? • Celebrate the uses of digital collections, tell the world! • Success is sometimes all about the right people, place & right time…so there will be lots of failures but that is OK! Early ‘BL Labs’ lessons
  19. 19@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Wider engagement…not just Digital Humanities / Scholarship Researchers Researchers Artists Librarians Curators Software Developers Archivists Educators Working and Communicating Entrepreneurs Inspirational examples Experiences Challenges Lessons Learned
  20. 20@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu • Dialogue typically: – you are ‘lucky’ & we have the digital content / data relevant to your project – we don’t have exactly what your looking for, but is there anything of interest? Let’s talk… – engagement can be hard work and it’s constantly required to maintain interest in our digital collections! • We also tend to attract projects with ‘fuzzier’ boundaries and possibly open to more interdisciplinary / collaborative research • Artists / Creatives find this dialogue easier… What engagement does the BL have with people wanting use our digital content?
  21. 21@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Labs Engagement 2013 - current • Over 100 institutions visited • Over 60,000 miles travelled around UK and World! • 100s presentations & over 100 workshops • 1500 researchers / artists / entrepreneurs / educators / public • Over 1000 expressions of interest to use collections • 150 researchers, artists, entrepreneurs & educators supported – potential case studies • 200 TB data via post • 6 TB data on • Over a billion views through Labs projects
  22. 22@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Experimentation to Integration • Growing user demand – Still start with a conversation – But use the Digital Research support application • Simplifying support processes – Still come visit us, hot desks – But soon, book a room for your work • Integrating behind-the-scenes workflows
  23. 23@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Why are doing this? (1) We support research it’s our job! We want to work closely with and listening to those who want use our digital collections and data for their work! Listen to your users!
  24. 24@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu We can learn how we are and should be supporting you and this therefore shapes the problems and projects we work on, such as: Why are doing this? (2) • Access, discovery to digital collections / data? • Advice, guidance, technical support, training • Services, Tools and Processes? • Many more reasons…
  25. 25@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Where are the gaps between what people want & what we can give? How do we build the bridges to overcome the gaps? Why are doing this? (3)
  26. 26@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu How do we help people ‘navigate’ their way through the ‘maze’ (sometimes) of the Library to what they want to do? Requires understanding the culture of the organisation Researchers often need a translator/advocate for successful projects. Learn to wear the spectacles of the organisation, read their vision/strategy documents! Why are doing this? (4)
  27. 27@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Interactions with BL Labs “researcher” wanting to work with our data
  28. 28@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Phase 1: Exploration Allows a researcher to: – Understand the data in open-ended fashion. – Discover potential tools to work with the data. – Gain awareness of their capabilities and limitations. – Develop a firmer research query. – Gauge the costs, resources, risks and time needed. •Outputs of the exploration are not intended to be shareable, beyond personal experience and key features (data size, formats, tool successes, etc.).
  29. 29@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Phase 2: Query-Focussed • A firmer and more informed query by the researcher where: – Suitable datasets already lined up – There is a good idea of the initial toolset and capabilities (human and computer) required – The project output is outlined, and relevant reuse applications are begun. – Clear agreements on what happens at the end of the project – data deletion, virtual machine deletion/archiving/etc. – Project may iterate on initial ideas,depending on researcher’s cost/risk appetite Submit idea for support
  30. 30@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Phase 3: Wrap-up • Wrap-up – Work (code, notes) exported and given to researcher – All derivative data is licenced or retained based on reuse agreements (Access & Reuse board, etc.) – Provisions made for the project are wound-down, as agreed (derivative data deleted after a grace period, etc.)
  31. 31@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Playbills, Books, Newspapers (includes Optical Character Recognition (OCR)) Digital collections and Datasets British National Bibliography Music (Recordings & Sheet) & Sounds Broadcast News (TV and Radio) Usage data EtHOS Web ArchiveImages, Manuscripts & Maps Qatar Digital Library International Dunhuang Project Maps Hebrew Manuscripts Flickr & Wikimedia Commons
  32. 32@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Finding Open Cultural Heritage Datasets Collection Guides (225 as of 05/12/2018) (each guide may have a ‘digital collections’ section) Datasets about our collections Bibliographic datasets relating to our published and archival holdings Datasets for content mining Content suitable for use in text and data mining research Datasets for image analysis Image collections suitable for large-scale image-analysis-based research Datasets from UK Web Archive Data and API services available for accessing UK Web Archive Digital mapping Geospatial data, cartographic applications, digital aerial photography and scanned historic map materials Download collections as zips Each dataset has a Digital Object Identifier (DOI) can be referenced for research
  33. 33@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu SketchFab - British Library
  35. 35@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Computational messiness in historical data • 'Begun in Kiryu, Japan, finished in France' • 'Bali? Java? Mexico?' • Variations on USA: – U.S. – U.S.A – U.S.A. – USA – United States of America – USA ? – United States (case) • Inconsistency in uncertainty – U.S.A. or England – U.S.A./England ? – England & U.S.A.
  36. 36@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Cleaning up data Open Refine offers useful advice to cleaning up data
  37. 37@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Data / Digital Curation / Data Librarian Digitisation Collecting Born Digital Data Management Data Curation Data Characterisatio n Working with students to characterise and curate our data and becoming dataset creators
  38. 38@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Many researchers have the domain knowledge but lack technical / digital skills to use Digital Research methods. Should they be teamed up with those that want to solve problems or get trained? Digital skills training needed: Humanities researchers / Librarians…
  39. 39@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Data Strategy (2017) • Data Management • Data Creation • Data Archiving and Preservation • Data Access, Discovery & Reuse
  40. 40@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu How?
  41. 41@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Digital research methods Digital Scholarship Visualisations Application Programming Interfaces (APIs) for datasets e.g. Metadata, Images, etc Annotation Location based searching & Geo-tagging Crowdsourcing Human Computation In 20 years time?
  42. 42@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Competition Awards Projects Tell us your ideas of what to do with our digital content (2013-16) Show us what you have already done with our digital content in research, artistic, commercial, learning and teaching, staff categories Talk to us about working on collaborative projects Tell us your ideas of what to do with our digital content Engagement •Roadshows •Events •Meetings •Conversations New! Digital Research SupportHow?
  43. 43@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Hard work, no magic formula!
  44. 44@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu
  45. 45@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu What did people actually do? Examples from Text and Images Over 200 examples (including sound, video)
  46. 46@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Example Pattern of Research 1, 2, 3 1. Find / identify new things in messy stuff 2. Unlock hidden history / data 3. Celebrate new discoveries
  47. 47@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Experiments with Music
  48. 48@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Experiments with Data Mining, Machine Learning and Artificial Intelligence Possible to do data mining for non commercial research on ‘in copyright music recordings’
  49. 49@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Experiments with Text
  50. 50@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Finding things in ‘messy’ Optical Character Recognised (OCR) text Mrs Folly • Clean up some manually • Get human ‘ground truth’ • Write computer code (sometimes it’s machine learning) to find things reliably in it ‘automatically’ • Try code on messy content • Tweak if necessary • Digital ‘lasso’ around content • Human sift through Mrs Folly An example pattern of research
  51. 51@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Code: Machine Learning / Reading • Labs sometimes use Machine Learning / Reading techniques • Analogies to how humans read / learn • Machines acquire ‘knowledge’ / data, use that knowledge / data to make sense / identify patterns • Labs doing this on a case by case basis so methods can vary • Need computational & human effort • Legalities of Text and Data mining being ‘ironed’ out with publishers, on-going…Often a misunderstood … • Perhaps we need a metaphor from history for people to understand… © £
  52. 52@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Smell of soup & Machine Learning Thanks to Memo Akten (@memotv on twitter) for the inspiration! Nasreddin, 13th Century Turkish Sufi
  53. 53@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Victorian Meme Machine (2014) Bob Nicholson Bob Nicholson interviewed on BBC Radio 4 Making History Programme: And telling jokes to the public: Bob obtained further funding from his university Looking for more collaborations Rob Walker, Victorian Mother-in-law Jokes Victorian Comedy Night, 7 Nov 2016 Learnt about access paths to digital collections
  54. 54@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Katrina Navickas (2015) Political Meetings Mapper Labs Symposium 2015 Interview 2015 The Chartist Newspaper Chartist Monster Meeting Chartists Walking Tour and Re-enactment London Learnt that domain knowledge reduces noise
  55. 55@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Black Abolitionist Performances & their Presence in Britain (2016) – Hannah-Rose Murray Frederick Douglass Ellen Craft Josiah Henson Ida B Wells A Performance by Joe Williams & Martelle Edinborough Started to implement Machine Learning Techniques
  56. 56@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Data-mining verse in 18th Century newspapers BL Labs Project 16-17, Jennifer Batt Slides courtesy Jennifer Batt
  57. 57@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Virtual Infrastructure for OCR text Compute next to data OCR text ‘scraped’ from digitised newspapers and put in internal cloud Jupyter notebook Write python code and results in web browser Access available for researchers ‘in residence’
  58. 58@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Experiments with Images
  59. 59@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Worked better for female faces than men’s Press Posts image every 30 minutes 1,020,418 images need tagging! Creative uses of images Face recognition Algorithms based on photos Mechanical Curator with an algorithmic brain (Circles, Squares and Slanty etc) Snipping out images from 65,000 Digitised Books* Work @ BL by Ben O’Steen, Labs & Digital Research Team*Matt Prior - Tumblr *Estimates >1000,000,000* views >17,500,000* tags Since Dec 2013 >More demand to see physical items
  60. 60@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Identifying things Tagging, Tagging, Tagging…
  61. 61@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Tagging a million images Iterative Crowdsourcing Cardiff University’s Lost Visions Project Metadata Games James Heald Mario Klingemann Chico 45 Use computational methods Human Tagger Top British Library Flickr Commons Taggers 18 hard core taggers How to reward and keep motivated this ‘small group? Average for ‘crowd’ is 1 tag per person What kind of ‘task’ can this ‘crowd’ do? Mobile games for ‘Ships’, ‘Covers’ and ‘Portraits’ Interface for tagging
  62. 62@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Adam Crymble - Crowdsource Arcade What if crowd sourcing looked like this? 30 mins talk Labs Symposium (2015) 5 min interview (2015) Game Jam Using Arcade Games to help Tag images ‘Art Treachery’ and ‘Tag Attack’
  63. 63@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu James Heald – Wikimedia and Map work Labs Symposium (2015)Geotagging maps 50,000 Maps Found in Flickr 1 million Human & Computational Tagging & Community engagement Geo-referencing work
  64. 64@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu SherlockNet: Karen Wang, Luda Zhao and Brian Do Using Convolutional Neural Networks to Automatically Tag and Caption the British Library Flickr Commons 1 million Image Collection 12 categories >15.5 million tags added >100,000 captions Pooled surrounding OCR text on page from similar images Used Microsoft COCO (photographs) & British Museum Prints and Drawings collections as training sets. Tags Captions
  65. 65@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Visibility – What happened to our Flickr images? Understanding value / impact of making the BL’s data open / in the public domain Peter Balman developed an analytics dashboard for the Library showing what is happening to our open Images Number one use was? Challenge details:
  66. 66@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu David Normal - Artist ‘It was beyond my wildest dreams’
  67. 67@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Mario Klingeman Code Artist / Curator Kris Hoffman Animation for Fashion Week Tragic Looking Women 44 Men who Look 44 (Notice the direction faces) A Hat on the Ground Spells trouble Artistic / Creative Works David Normal Collages/Paintings & Lightboxes 2018 Lumen Prize winner
  68. 68@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Hey there Young Sailor! Ling Low 2016 MS @SWEETNLOWFILMS ON INSTAGRAM @SWEETNLOWLING ON TWITTER The Impatient Sisters Play to fade!
  69. 69@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Alanna Hilton British Fashion Colleges Council and Teatum Jones
  70. 70@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Fashion Presentation @ London Fashion Week Nabil Nayal SS19: The Library Collection
  71. 71@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu mahendra.mahey@bl.ukAnother intelligence sings
  72. 72@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Imaginary Cities Exhibition 2019 (Michael Takeo Magruder) An artistic exploration seeking to create provocative fictional cityscapes for the Information Age from the British Library’s digital collection of historic urban maps Virtual Reality with Unity 3D 4 April to 14 July 2019 Together with Algorave!
  73. 73@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Labs mindset… 1. Start a conversation, generate positive energy, be nice and kind, have fun and try to support ideas. 2. Start with small experiments, but think big! 3. Fail faster (don’t be afraid) and persevere. 4. Reject perfectionism! Good enough is sometimes…good enough! 5. Celebrate the uses of digital collections, tell the world!
  74. 74@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Thank you 004407871287026
  75. 75@BL_Labs @Britishlibrary @UCL_Qatar @DARIAHeu Questions? Prompt Question I didn’t understand…. Can you tell me more about… Why did you… I am not sure about… What if… Why didn’t you… What’s the best thing about… What was the worst thing… If you could have your time again, … How did you… I am not sure I agree about… What was the biggest challenge… What was the most successful thing about… Who did…

Hinweis der Redaktion

  1. 6 seconds (20 words) BL Labs focuses on getting people to experiment with its digital collections, things that are already <CLICK> born digital<CLICK> or digitised.
  2. 12 seconds (37 words). In another way, we are trying to match our audiences research needs and digital interests <CLICK> With the digital collections we have<CLICK> It is at this intersection where Labs works best and it usually starts with a conversation.
  3. 24 seconds (72 words) Let’s look a little further at the types of interactions we have with our researchers. We have summarised these phases as ‘Exploration’ where people often ‘rethink’ their ideas of what they want to do with the data, ‘Query-Focused’ where they often have to iterate to come up with a realistic proposal of what they want to do and a ‘Wrap-up’ phase to end their project with us, if it is relevant.
  4. 26 seconds (78 words) The ‘exploration’ phase allows the researcher to understand the data in an open-ended fashion, discover potential tools to work with the data, gain awareness of their own capabilities and limitations and develop a firmer research query, gauging costs, resources, risks and the realistic time needed to complete the project.<CLICK> The outputs of this exploration are not necessarily intended to be shareable, beyond personal experience and identifying key features of their enquiry (data size, formats, tool successes, etc.).
  5. 43 seconds (129 words) The ‘query-focussed’ phase allows the researcher to develop a firmer and more informed query where: Suitable datasets are already lined up, there is a good idea of the initial toolset and capabilities required, that is human and technical requirements. The project outputs are outlined, and relevant reuse applications are begun. There are clear agreements on what happens at the end of the project – data deletion, virtual machine deletion/archiving/etc. The project may iterate on initial ideas, depending on researcher’s cost and their’s and the BL’s appetite for risk.<CLICK> This phase may typically be supported by the Library through our new Digital Research Support phase where researchers can get up to 5 days of support for them to further develop their project ideas. More about this later.
  6. 33 seconds (99 words) Finally, when working on projects it’s important that there is a wrap-up phase. Here, the Library may give back the researcher’s work (such as code and notes) through an export from BL hosted tools (especially for those that are onsite). Also, all derivative data is licenced or retained based on reuse agreements (such as our Access & Reuse board, etc.). Provisions are made for the project to be wound-down, as agreed (for example, derivative data is deleted after a grace period, or hosted by the Library if requested by the researcher and appropriate for further re-use by others).
  7. 76 seconds (228 words) So let’s have a very brief overview of our digital collections, datasets and derived data. <CLICK> We have thousands of playbills from theatres, cuttings from magazines, books and millions of newspaper pages digitised, including their Optically Character Recognised text.<CLICK> We have been using external platforms to host our digital collections because this is often a more effective way to make them more visible on the internet, such as Flickr and Wikimedia Commons. We have of course been helping develop the Qatar Digital Library, making digitised manuscripts available from the middle east to all. The International Dunhuang Project makes digitised manuscripts from China available. The Polonsky foundation is helping us make Hebrew Manuscripts accessible and we have thousands of geo-referenced historic maps as well as an online crowdsourcing geo-referencer tool.<CLICK> We are making millions of Library data available from UK and Irish National Library catalogues through our British National Bibliography service<CLICK> We can provide usage data from our readers. EtHOS holds all UK PhDs, either born digital or some digitised, and as previously mentioned the UK Web Archive.<CLICK> We have been recording English language TV news broadcasts since 2010 and archiving historic and current UK radio programmes.<CLICK> We have derived data from the Digital Music Lab project which analysed world and traditional music to look for similarities across countries, digitised sheet music and digitised environmental sounds, music and oral history.
  8. This West African carrying case is probably from Nigeria, and was made for a richly illuminated copy of the Qur’an (OR 13706A). It is made of leather, fabric and pulp board, and probably dates from the nineteenth century. In West Africa, manuscripts were not usually bound, and cases such as this one were made to protect the loose leaf text block.
  9. 40 seconds (124 words) Because of time and resources in Labs, we didn’t spend much of it cleaning the data before putting it on We embraced ‘dirty data’ somewhat. Our data therefore comes with a health warning especially those of you who would like to carry out computational research on cultural heritage data, as it tends to pretty messy for computers to make sense of. The problem is that computers think U.S., U. S. , U.S.A., U. S. A. , United States, United States of America are six different places. Fields also contain things like internal notes about potential duplicates, unexpected extra information - notes on what type of location, etc. Lots of inconsistencies - uncertainty and date ranges expressed in different ways.
  10. 35 seconds (107 words) Open Refine is an amazing tool which we have been using to clean up data. It will suggest ways to make the data more consistent for example. You can then export the data and keep working on it in with other tools, or put it into Open Refine. Because it runs locally it can be used for sensitive data you mightn't put online. One issue is that Libraries tend to use question marks to record uncertainty in attribution, but Refine strips out all punctuation, so you have to be careful about preserving things like that (if that's what you want). It also takes in various data formats.
  11. 25 seconds (77 words) We are also learning that our digital collection will need significant curation to make them more accessible and re-usuable to researchers. At the moment, a definition of a collection really comes from the efforts of digitising it as a digitisation project. This can be meaningless to a researcher. We believe a new role is emerging for researchers, perhaps libraries to develop roles which enable the ability to characterise, manage and curate data for meaningful research by scholars.
  12. 22 seconds (67 words) We are learning that only a small group of researchers that Labs is working with posses the digital skills to use digital research methods. Many lack them, including library staff<CLICK> Should they be teamed up with those that have those skills such as computer scientists or should there be a focus on training such as Software / Library and Data carpentry courses for Librarians and budding Digital Humanists.
  13. 40 seconds (121 words) Our updated data strategy sees research data as integral to our collections, research and services as text is today. The strategy is structured around 4 central themes.<CLICK> Data Management involves the creation of a data management plans and processes to meet our obligations under funding council requirements.<CLICK> Data creation of datasets derived from our collections, and supporting those who want to so the same.<CLICK> Datasets collected and created by the Library will be archived and preserved in line with its other collection policies.<CLICK> The Library ensures that there is appropriate discovery, access and reuse of the datasets it holds, as well as those available from third parties.<CLICK> A useful email address and websites are displayed should you want to make further investigations.
  14. 6 Seconds (19 Words) ‘how’ do we try and engage those who might be interested in the BL’s digital collections and data?
  15. 75 seconds (225 words) Here are the kinds digital research methods our digital scholars are using.<CLICK> For example, searching for items based on and time and location can reveal very interesting patterns, e.g. when and where works were published. Geotagging digitised objects, putting them in space can add new dimensions to the kinds of research questions we might want to ask. <CLICK> Corpus analysis of text in language and Text mining are methods which can find patterns in text through computational analysis.<CLICK> Tasks that require humans to use technology to complete a task that computers would hard fall under the area of Crowdsourcing and Human Computation<CLICK> Annotation involves augmenting an item with additional information, usually text.<CLICK> Similarly transcribing can be the conversion of speech into text through human or computing power to then be used for further analysis. <CLICK> Providing Application Programming Interfaces or APIs to data can be very powerful ways for computational access to datasets, used by software developers to build software applications for example. <CLICK> Many researchers want to see the patterns that are emerging in large amounts of data and are now using a number of very powerful tools to visualise them to see patterns. <CLICK> What is clear is that digital methods are much more that searching for an individual item in a catalogue and Libraries, publishers, service and content providers have to change to support that.
  16. 21 Seconds (65 Words) Katrina Navickas was particularly interested in the <Click>Chartist Movement who were a group who were campaigning for the vote for working people. <Click>They were the biggest popular movement for democracy in 19th century British history, just as this is early picture shows a huge monster meeting at Kennington Common<Click>She wanted to use a combination of manual and computational methods to explore our Digitised Newspapers to find out when and where they met and plot them on map. <Click>and hopefully unearthing new history.
  17. 27 Seconds (82 Words) Adam Crymble <Click>wanted to harness the power of playing fun games on arcade machines to help with crowdsourcing the tagging of un-described images. He particularly wanted to engage a younger audience into crowdsourcing .<Click>On the right you can see a replica 1980’s arcade machine we built and <Click>and on the bottom left some tagging games that were developed through a ‘Games Jam’ for the machine. <Click>. Let’s take a closer look at two of the games…<Click>
  18. 18 Seconds (56 Words) Indexing BL the 1 million & Mapping the Maps – was led by James Heald and collaboration with others <Click>They produced an index of 1 million 'Mechanical Curator collection' images on <Click>Wikimedia Commons from a collection of largely un-described images. <Click>This gave rise to finding 50,000 maps within the collection partially through a map-tag-a-thon <Click>These are now being geo-referenced. <Click>
  19. Nabil Nayal is a fashion designer, based in Leeds. His Spring-Summer 2019 fashion collection takes its inspiration from digitised Elizabethan-era manuscripts here at the British Library. He has had several fashion shows, events, and commissions - including one at the British Library (which is pictured above). Nabil is working with the British Library and Fashion Colleges Council to organise a fashion competition for final year undergraduates and Masters fashion students.
  20. 15 seconds (47 Words) Start a conversation, generate positive energy, be nice, have fun and try to support ideas.<CLICK> Start with small experiments, but think big! <CLICK> Fail faster (don’t be afraid) and persevere. <CLICK> Reject perfectionism! Good enough is sometimes…good enough! <CLICK> Celebrate the uses of digital collections, tell the world!
  21. 19 seconds (58 Words) shukraan lakum (thank you). I would like to thank the organisers of the conference for getting me here and particularly my lovely colleague Milena Dobreva for thinking about me giving the keynote, I’m here until Saturday so please feel free to speak to or email me, I am very approachable, I believe we have 5 minutes for questions.