Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Presentation to the London Psychology Group

Wird geladen in …3

Hier ansehen

1 von 31 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (12)

Ähnlich wie Presentation to the London Psychology Group (20)


Weitere von labsbl (13)

Aktuellste (20)


Presentation to the London Psychology Group

  1. 1. 1@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol http://www.bl.uk/projects/british-library-labs Funded by the Andrew W. Mellon Foundation Running since March 2013 1400 - 1430, 2 July 2018 London Psychology Librarians Group Meeting Dickins Room, Conference Centre, British Library
  2. 2. 2@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Challenges it addresses • Money spent on digitising / capturing digital – return on investment, how is it being used and what value and impact it is having, especially when opening collections for all. • What digital collections are there that can be used openly / onsite and how do we tell people? • How do we explore the feel / shape of collections at scale? • How do we find, explore, augment discovery in often ‘messy’ cultural heritage data without public APIs? • How do we discover, celebrate old culture & remix to create new culture?
  3. 3. 3@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol > 180* million items Collections – not just books! > 0.8* m serial titles > 8* m stamps > 14* m books > 6* m sound recordings > 4* m maps > 1.6* m musical scores > 0.3* m manuscripts > 60* m patents King’s Library *Estimates
  4. 4. 4@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Playbills, Books, Newspapers (includes Optical Character Recognition (OCR)) Digital collections and Datasets British National Bibliography http://bnb.data.bl.uk http://sounds.bl.uk http://dml.city.ac.uk/ Music (Recordings & Sheet) & Sounds http://goo.gl/frSMJt Broadcast News (TV and Radio) http://goo.gl/cwThHw http://goo.gl/pBkisZhttp://goo.gl/E8aRyQ Usage data EtHOS Web ArchiveImages, Manuscripts & Maps http://www.qdl.qa/ Qatar Digital Library http://idp.bl.uk/ International Dunhuang Project Maps http://www.bl.uk/maps/ Hebrew Manuscripts http://goo.gl/4sbCp9 Flickr & Wikimedia Commons https://goo.gl/LZRmaZ
  5. 5. 5@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Finding Open Cultural Heritage Datasets http://labs.bl.uk/Digital+Collections Collection Guides (203 as of 02/07/2018) https://www.bl.uk/collection-guides/ Datasets about our collections Bibliographic datasets relating to our published and archival holdings Datasets for content mining Content suitable for use in text and data mining research Datasets for image analysis Image collections suitable for large-scale image-analysis-based research Datasets from UK Web Archive Data and API services available for accessing UK Web Archive Digital mapping Geospatial data, cartographic applications, digital aerial photography and scanned historic map materials https://data.bl.uk Download collections as zips, no API Each dataset has a Digital Object Identifier (DOI) can be referenced for research Not all discoverable via search engines!
  6. 6. 6@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Digital research methods Digital Scholarship Visualisations Application Programming Interfaces (APIs) for datasets e.g. Metadata, Images, etc Transcribing Annotation Location based searching & Geo-tagging Corpus analysis, Text Mining & Natural Language Processing Crowdsourcing Human Computation XIn 20 years time?
  7. 7. 7@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Messiness in historical data • 'Begun in Kiryu, Japan, finished in France' • 'Bali? Java? Mexico?' • Variations on USA: – U.S. – U.S.A – U.S.A. – USA – United States of America – USA ? – United States (case) • Inconsistency in uncertainty – U.S.A. or England – U.S.A./England ? – England & U.S.A.
  8. 8. 8@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Open Refine http://openrefine.org/
  9. 9. 9@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Characterising / learning the shape of your data http://blogs.bl.uk/digital-scholarship/2013/09/data-exploration-through-visualisation.html
  10. 10. 10@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol http://dirtdirectory.org
  11. 11. 11@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Who do we work with? Researchers https://goo.gl/WutNyi Artists http://goo.gl/nNKhQ2 Librarians Curators https://goo.gl/9NWZUW Software Developers https://goo.gl/7QQ5Tf Archivists https://goo.gl/x7b4tg Educators https://goo.gl/qh01Mi Working and Communicating Examples Experiences Challenges Lessons Learned Entrepreneurs https://goo.gl/Fx8RG7
  12. 12. 12@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Example Pattern of Projects 1, 2, 3 1. Find / identify new things in messy stuff 2. Unlock hidden history / data 3. Celebrate new discoveries
  13. 13. 13@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol What did people actually do? Over 200 examples (including sound, video) from Competition and Awards: http://labs.bl.uk/Ideas+for+Labs http://labs.bl.uk/Other+Uses+of+Collections
  14. 14. 14@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol An example…
  15. 15. 15@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Worked better for female faces than men’s Press http://mechanicalcurator.tumblr.com Posts image every 30 minutes http://www.flickr.com/photos/britishlibrary/ 1,020,418 images need tagging! Creative uses of images Face recognition Algorithms based on photos Mechanical Curator with an algorithmic brain (Circles, Squares and Slanty etc) http://goo.gl/qPPgxX Wikimedia Flickr Commons Individual URL & API Snipping out images from 65,000 Digitised Books* >800,000,000* views >17,000,000* tags https://goo.gl/FgZ4HM Work @ BL by Ben O’Steen, Labs & Digital Research Team*Matt Prior - http://goo.gl/j29Tnx Since Dec 2013 Tumblr *Estimates >More demand to see physical items
  16. 16. 16@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Tagging a million images Iterative Crowdsourcing James Heald Mario Klingemann Chico 45 Use computational methods Human Tagger Top British Library Flickr Commons Taggers 18 hard core taggers How to reward and keep motivated this ‘small group? Average for ‘crowd – 500K’ is 1 tag per person What kind of ‘task’ can this ‘crowd’ do? 50,000 Maps found, >32,901 georeferenced http://goo.gl/0APpE8Sherlocknethttp://goo.gl/HNQq5e Crowdsource Arcade
  17. 17. 17@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol http://goo.gl/dM8ieA Mario Klingeman Code Artist / Curator http://goo.gl/bNxGZZ Kris Hoffman Animation for Fashion Week https://goo.gl/QilqqT Jiayi Chong - Animation tool https://www.facebook.com/RealmlandStory/ Paul Rand Pierce Graphic Novel on Facebook Tragic Looking Women 44 Men who Look 44 (Notice the direction faces) A Hat on the Ground Spells trouble Artistic / Creative Works https://www.youtube.com/watch?v=Q3SBxO34Zlc David Normal Collages/Paintings & Lightboxes
  18. 18. 18@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Imaginary Cities Exhibition 2019 (Michael Takeo Magruder) An artistic exploration seeking to create provocative fictional cityscapes for the Information Age from the British Library’s digital collection of historic urban maps Virtual Reality with Unity 3D
  19. 19. 19@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Experiments with Data Mining & Machine Learning Frederick Douglass Ellen Craft Josiah Henson Ida B Wells
  20. 20. 20@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Psychiatrist’s Journey into 19th Century Newspapers (2016) • Dr Surendra P Singh, Consultant Psychiatrist • To identify weekly, monthly, yearly and longitudinal trends in suicide reporting in terms of gender, status, sites, locations and health in OCR text of 19th Century Newspapers • Used ‘R’ Open Source Stats Package to collect ‘Suicide’ corpus • Looking for collaborators to work on this dataset
  21. 21. 21@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol #bldigital 3 %* digitised * estimate Digital Partnerships Commercial & Other Organisations Bias in digitisation http://goo.gl/bR9UJL Sample Generator 15 %* Openly Licensed – most online 85 %* Available onsite only at the moment Digitisation / Curating Born Digital costs money, time, resources http://www.turing.ac.uk Digital increasing rapidly Born Digital http://www.webarchive.org.uk/ukwa/
  22. 22. 22@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Have you got X? https://upload.wikimedia.org/wikipedia/commons/5/50/Real_wuerzburg.jpg Looking for Physical Content in the British Library
  23. 23. 23@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Have you got X digitised / in digital form? http://www.yorkmix.com/wp-content/uploads/2014/04/mr-simms-sweet-shoppe-york.jpg Looking for Digitised / Digital Content in the BL
  24. 24. 24@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Our Audience and Collections Audience research & Digital interests Digital collections we have This is where Labs works It starts with a conversation!
  25. 25. 25@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol https://goo.gl/qpCLlk https://goo.gl/wMTS3Z • Dialogue typically: – you are ‘lucky’ & we have the digital content / data relevant to your project – we don’t have exactly what your looking for, but is there anything of interest? Let’s talk… – engagement can be hard work and it’s constantly required to maintain interest in our digital collections! • We also tend to attract projects with ‘fuzzier’ boundaries and possibly open to more interdisciplinary / collaborative research • Artists / Creatives find this dialogue easier… What engagement does the BL have with people wanting use our digital content?
  26. 26. 26@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol The Story of the Digital Collection… Digital Collection Curator Who paid for the digitisation? Who did the digitisation? Technology used Born digital? Published Unpublished Where is it? Access / API? Can it still be accessed? Generates income Reputational risk in using? Legalities / Ethics / Morality Politics when digitised Personalities involved Surprises (e.g. gaps) Descriptive information Old format not supported What media was the digitisation done from? Is there any background documentation? No Descriptive information Inconsistent descriptive information Still there? Good to know the background ‘story’ of a Digital Collection if you want to use it for projects …
  27. 27. 27@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Competition Awards Projects Tell us your ideas of what to do with our digital content (2013-16) Show us what you have already done with our digital content in research, artistic, commercial, learning and teaching, staff categories Talk to us about working on collaborative projects Tell us your ideas of what to do with our digital content Engagement • Roadshows • Events • Meetings • Conversations New! Digital Research Support How?
  28. 28. 28@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Digital Research Support Application Process • Complete online form - https://goo.gl/Kgaq8d • Entries reviewed and selected at the beginning of the month • Up to 5 days support provided • Technical, curatorial and legal advice • Scope, Costs, Time, Risks • Any other relevant issues? • Look at Living Knowledge Vision
  29. 29. 29@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Living Knowledge Vision (2015 – 2023) Custodianship Research Business Culture Learning International To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023 (50 year anniversary). Document:http://goo.gl/h41wW7 Speech:https://goo.gl/Py9uHK Roly Keating (Chief Executive Officer of the British Library) To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023 (50 year anniversary).
  30. 30. 30@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Phases of interaction at BL Labs Submit idea for support Ideas always change Once people experience the data and culture of the organisation
  31. 31. 31@BL_Labs @britishlibrary @labs@bl.uk @bl_digischol Summary… 1. Labs tries to start a conversation, generate positive energy, encourages fun/play/experimentation and tries to support ideas. 2. Start with small experiments, use can be really simple, but OK to think big! 3. Fail faster (don’t be afraid) and persevere. 4. Reject perfectionism! Good enough is sometimes…good enough! 5. Services that allow useful exploration of cultural heritage data are rare! 6. Exploring data is difficult to do with large datasets and often requires specific skills and capabilities that many of our users don’t have – training or collaborations? 7. Celebrate the uses of digital collections, tell the world! 8. Success is sometimes about the right people, place & right time… https://goo.gl/noASfl

Hinweis der Redaktion

  • 85 seconds

    The picture you can see is inside the main building in London, it’s the King’s Library – King George the Third’s personal library! Sometimes known as the ‘stack’, I walk past this everyday and I sometimes forget that the collections the British Library have are truly staggering! We currently estimate them to exceed <click>150 million items, representing every age of written civilisation and every known language. Our archives now contain the earliest surviving printed book in the world, the Diamond Sutra, written in Chinese and dating from 868 AD….
    So some big numbers…
    Over …<click>14 million books
    <click>60 million patents
    <click>8 million stamps
    <click>4 million maps
    <click>3 million sound recordings
    <click>1.6 million music scores
    <click>over .3 million manuscripts
    <click>0.8 million serials titles (which are of course made up of many many volumes/editions), this is where a lot of our content is, just in case you thought the numbers didn’t add up!
  • 76 seconds (228 words)

    So let’s have a very brief overview of our digital collections, datasets and derived data. <CLICK>

    We have thousands of playbills from theatres, cuttings from magazines, books and millions of newspaper pages digitised, including their Optically Character Recognised text.<CLICK>

    We have been using external platforms to host our digital collections because this is often a more effective way to make them more visible on the internet, such as Flickr and Wikimedia Commons. We have of course been helping develop the Qatar Digital Library, making digitised manuscripts available from the middle east to all. The International Dunhuang Project makes digitised manuscripts from China available. The Polonsky foundation is helping us make Hebrew Manuscripts accessible and we have thousands of geo-referenced historic maps as well as an online crowdsourcing geo-referencer tool.<CLICK>

    We are making millions of Library data available from UK and Irish National Library catalogues through our British National Bibliography service<CLICK>

    We can provide usage data from our readers. EtHOS holds all UK PhDs, either born digital or some digitised, and as previously mentioned the UK Web Archive.<CLICK>

    We have been recording English language TV news broadcasts since 2010 and archiving historic and current UK radio programmes.<CLICK>

    We have derived data from the Digital Music Lab project which analysed world and traditional music to look for similarities across countries, digitised sheet music and digitised environmental sounds, music and oral history.
  • 56 seconds (169 words)

    Despite our digital collections being a small fraction of our physical holdings and over 85% only being available onsite, here are some ways you can find out about our openly licensed cultural heritage collections. <CLICK>

    First, on the Labs website we have created a guide pointing to over 100 digital collections. Then as of today, curators have created nearly 200 collections guides by subject, each one having a section on what is available digitally onsite and online if relevant.<CLICK>

    As part of the Labs project and overall data strategy for the Library we have created a data service, ‘data.bl.uk’ where users can download over 100 datasets. Importantly, it provides the ability to download entire collections instead of single items. Each collection is treated as a dataset with it’s own citeable Digital Object Identifier (D.O.I) for replicable research purposes. The site also includes derived data from experiments that have been carried out on our digital collections.<CLICK>

    Please note that not all of these datasets are discoverable on all search engines.
  • 75 seconds (225 words)

    Here are the kinds digital research methods our digital scholars are using.<CLICK>
    For example, searching for items based on and time and location can reveal very interesting patterns, e.g. when and where works were published. Geotagging digitised objects, putting them in space can add new dimensions to the kinds of research questions we might want to ask.
    Corpus analysis of text in language and Text mining are methods which can find patterns in text through computational analysis.<CLICK>
    Tasks that require humans to use technology to complete a task that computers would hard fall under the area of Crowdsourcing and Human Computation<CLICK>
    Annotation involves augmenting an item with additional information, usually text.<CLICK>
    Similarly transcribing can be the conversion of speech into text through human or computing power to then be used for further analysis.
    Providing Application Programming Interfaces or APIs to data can be very powerful ways for computational access to datasets, used by software developers to build software applications for example.
    Many researchers want to see the patterns that are emerging in large amounts of data and are now using a number of very powerful tools to visualise them to see patterns.
    What is clear is that digital methods are much more that searching for an individual item in a catalogue and Libraries, publishers, service and content providers have to change to support that.
  • Examples from the Cooper Hewitt collection. I spent 3/5 of my time at the Cooper Hewitt just trying to get the data clean enough to vaguely represent the collection. The problem is that computers think U.S., U. S. , U.S.A., U. S. A. , United States, United States of America are six different places.

    Fields also contain things like internal notes about potential duplicates, unexpected extra information - notes on what type of location, etc. Lots of inconsistencies - uncertainty and date ranges expressed in different ways.

    More common GLAM issues - What year is 'early 18th century'? What do you do with '1836 (probably)'?
  • Open Refine is an amazing tool, and I wouldn't have gotten anywhere at Cooper Hewitt without it. It will suggest ways to make the data more consistent. You can then export the data and keep working on it in other tools, or put it into Open Refine. Because Refine runs locally it can be used for sensitive data you mightn't put online.

    One issue is that GLAMs tend to use question marks to record uncertainty in attribution, but Refine strips out all punctuation, so you have to be careful about preserving it (if that's what you want).
    Takes in TSV, CSV, *SV, Excel (.xls and .xlsx), JSON, XML, RDF as XML, and Google Data documents.

    http://freeyourmetadata.org/cleanup/ useful advice
  • 39 Seconds (117 words)

    We have been learning that characterising our data is a really valuable way for researchers to begin to understand what we have. Though this is pretty resource intensive, we have carried out some simple experiments. <CLICK>

    Here, you can see that an analysis of our catalogue data reveals the use of different versions of the Dewey Decimal System across the years.<CLICK>

    Secondly, in the left column you can see what looks like random data/noise. However, when grouped, we can see the dark blue visualisation indicates there is some similarity in the data, in this case it was subtitles from digitised TV broadcasts.<CLICK>

    We know this is something we should do more of, if we had more resources.
  • 23 seconds (71 words)

    Though the project focusses on working and communicating with Digital Humanities and Digital Scholarship researchers, we have also engaged with amazing Artists, Librarians, Curators, Educators, Entrepreneurs, Archivists, Software Developers and other innovators. Hopefully, I will show you<CLICK>

    some inspirational examples of work they have done which have used our digital collections.<CLICK>

    I will also reflect on our experiences, challenges and lessons we have learned working with some amazing and pioneering people.

  • Posts small illustrations taken almost at random from the digitised book corpus to a Tumblr blog.
    This experiment with undirected engagement was a by-product of work to uncover the hidden wealth of illustrations within the digitised pages.
  • 24 seconds (72 words)

    The BL are world renowned experts in digitising materials from our physical holdings. One common misconception that many people have is that much if not all of our collections are digitised. So, the actual proportion of our collections that are digitised surprises many<CLICK>

    The figure is around 3% of our physical collections.<CLICK>

    Much of our digitisation activity happens through partnerships with commercial, philanthropic, charitable and foundation partners<CLICK>

    What is for certain, is the amount we are digitising is increasing rapidly. Our new programme called Heritage Made Digital for example prioritises those collections for digitisation where there is a clear researcher demand.<CLICK>

    One important thing we have learned is that researchers need to take heed when doing research based on our digital collections, as they are rarely complete, having gaps and not necessarily being representative of our physical collections.
  • 28 seconds (85 words)

    This what I imagine it feels like for a researcher looking for our physical collections. <CLICK>

    Everything is on an industrial scale and it can feel overwhelming. Sometimes it isn’t always straightforward to find our items, as there are many that are not on our digital library catalogue, e.g. still on card catalogues and some items are in the secret and very secure parts of the Library where you would need very special permission because the items are extremely valuable and fragile for example.
  • 36 seconds (109 words)

    Our digital offering is perhaps like this.<CLICK>

    Imagine entering a boutique sweet shop. We have some lovely things to tempt you, but it’s much smaller than the hypermarket you just visited. The shop keeper tells you there are some things behind the back door in a giant warehouse. However, you will need special access to enter that space. She also states that there are rooms in that warehouse, even she isn’t allowed to look. She isn’t even allowed to share the full list of stock because there are items on there she may never be able to be see because they were meant to be secret.
  • 12 seconds (37 words).

    In another way, we are trying to match our audiences research needs and digital interests <CLICK>

    With the digital collections we have<CLICK>

    It is at this intersection where Labs works best and it usually starts with a conversation.
  • 49 seconds (148 words)

    So what kind of conversations do we have with researchers who may want to use our digital collections and data?<CLICK>

    The dialogue typically can be: ‘Ah, you are ‘lucky’ & we have the exact digital content / data relevant to your research’, informally we call these our ‘lucky dip researchers’.<CLICK>

    Or the conversation might go like this…’Ah, we don’t exactly have what you are looking for, but here is what we do have, is there anything of interest that you like? Let’s talk…<CLICK>

    We have learned that engagement can be hard work. But it’s constantly required to maintain interest in our digital collections because they aren’t all instantly discoverable on search engines.<CLICK>

    We also tend to attract researchers with ‘fuzzier’ and ‘flexible’ research boundaries and those who are possibly open to more interdisciplinary / collaborative research.<CLICK>

    Finally, we have found that artists find this dialogue easier.
  • 41 words (125 seconds)

    Our work in Labs has taught us that it always pays for researchers to know the back ‘story’ of a digital collection especially if they want to use it for research and analysis.<CLICK>

    There are too many things to consider right now, but a few highlights are such as, ‘are there gaps in the collection?’, ‘can they still be accessed?’, but perhaps most important of all is whether the curator or a human being who knows about the collection is still around who could be asked about it. Our experience has told us that so much will probably be in their head that isn’t written down, information that could be vital, important and useful for knowing about before carrying out research or re-use.
  • 42 seconds (128 words)

    The Library focuses most of its work and collaborations through it’s 8 year Living Knowledge vision. Initiated in 2015, to coincide with the 50th anniversary of the creation of the Library, our vision is to make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023. The Library’s two core purposes are to build, curate and preserve the UK national collection of published, written and digital content and to support and stimulate research of all kinds.<CLICK>

    We also support businesses helping them to innovate and grow, engaging everyone with memorable cultural experiences, inspiring young people and learners of all ages and working with international partners around the world to advance knowledge and mutual understanding.
  • 24 seconds (72 words)

    Let’s look a little further at the types of interactions we have with our researchers. We have summarised these phases as ‘Exploration’ where people often ‘rethink’ their ideas of what they want to do with the data, ‘Query-Focused’ where they often have to iterate to come up with a realistic proposal of what they want to do and a ‘Wrap-up’ phase to end their project with us, if it is relevant.
  • 15 seconds (47 Words)

    Start a conversation, generate positive energy, be nice, have fun and try to support ideas.<CLICK>
    Start with small experiments, but think big! <CLICK>
    Fail faster (don’t be afraid) and persevere. <CLICK>
    Reject perfectionism! Good enough is sometimes…good enough! <CLICK>
    Celebrate the uses of digital collections, tell the world!