Presentation covers the aspects of Trove which make it a Government 2.0 showcase example. It is a search engine with several social engagement and crowdsourcing features.
61. [email_address] Rose The site you manage is a nightmare! It’s addictive. Keeps me awake at night. Congratulations! Mary Questions?
Editor's Notes
Thank you for inviting me to speak here today. I am going to talk about changing roles for librarians and users, drawing to a large extent on my own personal experiences, with particular reference to Trove and Australian Newspapers.
2009
Personally I do not see the current conditions threatening libraries, Google et al. I see huge opportunities for libraries in the year 2010 which should be embraced, and a chance for them to demonstrate their relevance in society, now more than ever before.
At this point since I’ve now mentioned the ‘G’ word I want to remind you all why we need libraries and why we are different to Google or Amazon or Wikipedia. It’s because we have made some promises about our content…. Long term preservation and access We have no commercial motives Universal access “ Free for all” ALWAYS AND FOREVER
And now we are in 2010. We have firmly acknowledged that in order to give users the best service, collaboration and data sharing are key. But more than this. The two direction statements the NLA is working towards are as follows: We will explore new models for creating and sharing information and for collecting materials, including supporting the creation of knowledge by our users . “ “ The changing expectations of users that they will not be passive receivers of information, but rather contributors and participants in information services.”
In the year 2010, hot off the press (from last week actually) this is what I was saying and Trove is the result so far of the Library’s strategic direction statements. I will play this short news video. (If I cannot) I am talking about the importance of making data from libraries (both digital and not) accessible in a search engine like Google (which is Trove). The library has recognised that it is no longer enough to offer only your own data, or collaborative library data, the public want the widest possible access to the widest amount of content from any cultural heritage institution or relevant organisation that is on, about or by Australians. Trove does this. It also enables the public to engage in new ways with data and add value to it, and soon they will also be able to engage with each other within the virtual community that has been created.
I am now going to talk briefly about Trove – the search engine for Australian resources. The Library had a master plan and the Australian Newspapers service was in fact a test bed for the idea to transform service delivery and our internal IT infrastructure in the future. Because Australian Newspaper worked so well the beta model of software development, the underlying IT infrastructure and the application of user engagement has been applied to all the other discovery services the library manages which are rolled into Trove.
Trove is an aggregation of 90 million items from over 1000 libraries and other organisations It’s key feature is the single search across different types of content. Trove has social and data engagement features. Two of our most heavily used services are included in Trove (AN and PA). Trove aims to help you find and get unique Australian resources, and although predominantly features lib, archive, museum and gallery data is not limited to this.
The key features of Trove are that 1. Firstly, and most importantly it is a single search. In one click you can simultaneously search across several groups of information- books, journals, magazines and articles: images: australian digitised newspapers: diaries, letters, archives: maps: music, sound, video: archived websites, about people and organisations. 2. Secondly you can browse through these groups or zones one at a time if you prefer to only seek one type of content for example newspapers. 3. Thirdly you are able to restrict your searches to – online content only, and/or content held in locations near to you. This is very useful feature for the large majority of users.
Results are unbiased – best and most relevant info possible – relevancy ranking. Similar to values of a good reference librarian (subject to initial choices made by user eg location, immediacy). Results are returned in the same zones that we saw on the home page. You can see in each zone how many results are found. Most searches retrieve vast numbers of results because of the wealth and richness of the repository that is being searched. It is likely that you will want to refine or limit your search results and you can do this by using the facets on the left hand side of the screen. The facets change depending what content you are looking at, so for example the book, journal, magazine and article zone has a facet to refine by braille book or audio book. We recognise that many people just want items that are immediately accessible ie digitised or online, as fast as possible, so the links to online content appear immediately at this stage although we haven’t yet drilled down to a detailed results screen. The check boxes to restrict the content to online or Australian are always visible so that they can be checked or unchecked at any point in the search. Here is a concrete example. Suppose a scholar is researching the life and works of Ethel Turner, the author of “Seven little Australians”. Through Trove that scholar is able to discover books by and about Ethel Turner, with information on the location of those books in Australian libraries, and with access to the full content where the work is out of copyright; articles, conference papers, theses and other research dealing with Ethel Turner, including content from university open access repositories pictures of Ethel Turner from libraries, museums and archives newspaper articles dealing with Ethel Turner, and published prior to 1955; archived web sites that refer to Ethel Turner; music, sound and video resources, including audio books and information about the ABC television series of Seven little Australians ; information about papers, letters, diaries and other records relating to Ethel Turner that are in archival collections; and biographies of Ethel Turner from sources such as the Australian Women's Register, the Dictionary of Australian Biography Online, and Wikipedia. Note that last point. Trove includes biographical data: its serves as the online interface to the data contribution program called “People Australia”. I am now going to drill down further into the results in some of the zones to show you some other features of the service, starting with the books, journals, magazines and articles zone. Let’s start with selecting the first book in the list – seven little australians by ethel turner.
We have applied FRBR -work and version structure to resources. Therefore at the top of the screen you can see the details of the work – seven little australians. Beneath this all the different editions (117 in this case) are grouped together in a box. Grouping them together like this makes it much quicker for users to find items, rather than having every single item being listed as a separate record as is usual in a library catalogue. The online versions are always listed first in this version box which helps users who want to ‘find and get’ as quickly and easily as possible. All versions are expandable and collapsable if you want to see more detail. On the right hand side are works which may be related. For books, at version level you can check the copyright status and have the citation provided in a variety of formats.
We have enabled direct linking through to bookshops that sell the item you are looking at. If no match for the item is found suggestions of bookshops that may have it are given. We have pre-populated versions with tags and reviews from Amazon and Wikipedia, and users can also add their own tags and comments to versions. Because of the difficulty we have had in correctly putting items into version/edition groups due to inconsistent data, users can help improve the display by merge or split versions or works if they notice they are grouped incorrectly. Guidance is given on this in the help.
If you click those buttons, this is what you will see. We are supplying several different citation formats.
We are now looking at the results for people.
This is a new capability the National Library has been working on for the last couple of years and it now enables you to find information about people and organisations (that is biographies). Two of the key sources incorporated into this are the Australian Womens Register and the Australian Dictionary of Biography. Each person has been given a persistent identifier in the Trove service.
We are in the process of integrating the Australian Newspapers fully into Trove and expect to be redirecting users from the blue version to the Trove version (which will have the same functionality) in June.
When viewing results in zones you have the option to expand a zone so it fills the page by using the arrow, or to minimise some or all of the other zones by using the minimise icon.
We are viewing the original diaries of Ethel Turner now which are held in the State Library of NSW. You can see that I have now minimised the other zones on the right.
We are enhancing the user profiles. In order to be able to find items in libraries near you the service needs to be able to know where you are, so you set your library preferences in your profile after registering. It is not compulsory to register to use the service, only if you want to. Your profile also keeps a history of your data enhancements such as tagging, commenting, corrections, merging and splitting.
The recent interactions users have made are also displayed on the homepage for everyone to see. You can see the number of searches in the last hour, newspaper article corrections so far today, works merged or split this week, items tagged this week, and comments this month.
This is the article view. Users can zoom in or out and choose to view the article in the context of the entire page. They can also navigate to any other page within the newspaper issue. The electronically generated text created through the OCR process is displayed on the left hand side. This is also where the users can use the 3 enhancement features. They can drag the viewing pane to see more of the or less. Users can tag the article with keywords and they can write comments and notes about the article. If users login they will be able to choose to make their tags and comments public or private. So they can share their comments with all users or they can add their own private research notes that only they can access. One feature that we believe is innovative and not available in any other online newspaper service, is the ability for the user to correct the electronically generated text. There are a number of reasons why the electronically created text is not always 100% accurate, mainly due to the quality of the original newspaper that the image was created from. Users can correct the text by clicking on the ‘Help fix this text’ button. We will now use these features on this article. The article we are looking at is the first report in an Australian paper of the sinking of the titantic.It’s in the Northern Territory Times on 19 April 1912.
I want to tag the article with ‘titantic sinking’. If a user does not login when they first enter the service then the first time they want to enhance an article they will be offered the option to login. At this point they can either login or enter the captcha to verify they are human (and not a robot – attempting to do something undesirable). Once logged in or verified with captcha a user can enter their tags.
Now I want to add a comment. Those of you who read this article may have noticed that it was reported that all passengers were safely rescued from the titanic and the weather was calm. I’ll just add a comment to say this was unfortunately not the case.
Now I have zoomed in on the image and if the OCR text was inaccurate I would edit it in the box on the left. This is what we call the power edit mode. In this article the text is actually very accurate so has either OCR’d very well, or already been corrected by someone else.
Now we can review the article with all the enhancements we have made showing on the left. Tags, comments and corrections. We can view the history of all the enhancements (both ours and other peoples history).
One of the innovative features that was in the first release was the ability for members of the public to correct or enhance the OCR text. When digitising old newspapers the process is to convert a digital image into full-text by use of Optical Character Recognition software (OCR). This works well on new clear documents but on old newspapers where the font and paper is of poor quality and microfilms may be out of focus the translation often goes into gibberish. After investigating every possible way technically of being able to improve this we came to the conclusion that the best way was by hand and human eye. We could not possibly afford to pay contractors to do this ‘re-keying’ so the lead programmer Kent Fitch suggested we open it up for the public to do. If text was made accurate the searching would be instantly improved for everyone since the search works over the OCR text.
Several people can correct the same article. All corrections are saved and viewable in the history of the article. All versions of corrections are searched for. It is the last correction that is visible in the left hand pane. Articles are corrected by many users when they are either very long, very significant, or very illegible. For example this article is in the first Australian newspaper – the Sydney Gazette and NSW advertiser of March 1803. Around 20 people have made corrections to this article. It is particularly challenging because of its use of the long f instead of an s.
This is the text correction history of this article, showing all the different users and what parts they corrected.
The results are pretty astounding both to the National Library of Australia and the world in general. So far over 9000 users have been actively correcting text each month and they have so far corrected 12 million lines of text. They have also been using the other features especially tagging to further improve the quality and depth of the article information.
But also people gave these reasons for doing so much – in some cases up to 40 hrs a week. Because after all if you don’t enjoy it you wouldn’t keep at it, so loving it and finding it interesting and fun were really important.
In response to numerous requests we instigated the ‘hall of fame’. The top 5 correctors show on the home page as well as in the hall of fame. Originally the hall of fame only showed the top 10 but users wanted to see more, so now it is anyone who has corrected more than 5000 lines per month. Users are still asking for entire league tables however so they can see where they are in the big picture. This is a motivating factor for them. During development it was suggested that we need to use gaming technologies to encourage people to correct text but this has so far not proved necessary!
For example users can comment on items as well as newspaper articles now eg images, books and archives and share valuable information, and rate items.
Tags are a means for users to group items they are interested, highlight specific items, and provide and additional way of searching.
This is the result list for the tag ‘sinking of the centaur’
So after all this activity the most common question people kept asking me was “Who are these people?” and also “Why do they do it?” Some people even suspected that the text correctors were really library staff, which is not the case. The text correctors are real, normal people. We sent some of them a survey to find answers to our questions about how long they spend correcting, why they do it, what motivates them, what would motivate them to do more or less? The responses were very interesting.
Julie the top corrector has featured in the media and become a star. She loves correcting articles about Bendigo murders.
Because the work these people are doing is so invaluable the Director General of the NLA decided to honour the top correctors in the annual NLA Australia Day Awards (which is usually for library staff). The text correctors are considered part of the NLA family for the invaluable work they are doing. It was very interesting for me to finally meet these individuals in person and be able to thank them. They had not necessarily realised what an impact they were making.
We are now quite sure that the community has the enthusiasm knowledge and time to help us.
Trove is in an early stage of development. Plans for the year ahead are centred around expanding content and developing new features. The Library is interested to hear from new potential contributors who will be able to have their metadata harvested. Important features the library is working on are providing the ability for users to communicate with each other via a forum, alerting users to new content, being able to let people re-purpose content via use of API’s. For example this may mean if you use Primo you may be able to utilise the metadata from Trove, and enhanced getting options. We don’t want users to find dead ends in Trove. I would encourage you all to use Trove for searching if you haven’t already, and then you can discover for yourself how easy it is to find a wealth of high quality Australian information. Then please pass the word on. You could use Trove in your Library and Information week promotions. Whether you are tracing your family history, researching a topic, reading for pleasure, teaching or studying Trove will help you. Trove is a free service for all Australians.
When you go away today I would like you to think about the following things in the big picture context, and how you as a librarian have a role to play in making some of these things happen. The importance of collaboration for digitisation, storage, service delivery, crowdsourcing. “Gravity” Building social engagement into our digital interactions - tools What we may want crowdsourcing help with. Why we want the help: improve quality, social engagement, add new content
Not just serving a clientele – leading a mass movement…
To summarise what I said earlier I think 2010 is a year of opportunity for libraries. We really need to be thinking about transforming from holding power and control over information to enabling freedom over its use, sharing and re-purposing. “ Freedom is actually a bigger game than power. Power is about what you can control. Freedom is about what you can unleash.” This quote really resonates with me. Roles for librarians and users are changing.
Thank you for listening to me today. I am happy to take questions.