Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
data.ac.uk briefing paper
1. What should be atdata.ac.uk ? Briefing Paper and Recommendations
2. The Primary Community Concerns Overall themes from community consultation (see quotes in notes section below this page): Opening up data is something Universities should now be demonstrating good practice. Without a collective voice for how Universities, and as importantly individuals, should be exposing their data, Data is different, in it’s various forms including spreadsheets (more than any other kind of resource) it is subject to being lost as search engines do not easily facilitate data discovery to support learning and research uses. The future of the research is becoming dependent upon interrelating more granular pieces of data. The future Web is providing us with the ability to expose data so that detailed visualisation and specific answers can be better provided for end users. Data access and minting (permanence) of URIs needs to happen across sector, institution, department and individual in order to support this interrelation of data. There is an urgency to enable a URL syntax (XXX.data.ac.uk) so that institutions can start to open up the small bits and pieces of data they have now, otherwise more data is being lost everyday.
3. So, what should be at <data.ac.uk> ? data.ac.uk continuum The List – The Register – The Safe Storage Option 1: The List – Provide a simple single web page list that points to data applicable to the Academic domain. Option 2: The Register – An multi-page registry and catalogue that can point to datasets around the web and provide a search facility and API to find data sets (but will not store data itself). Option 3: The Safe Storage – A single central data repository where people can come to store their data for the long term. NOTE: Regardless of which solution / technology is selected to be used at data.ac.uk further guidance to the community is required.
4. Option 1: The List Summary: A single web page that will list data sets applicable to Academia along with where they can be found and who owns them (community currated). The Problem it Solves: Will make it easier for librarians and developers to find data sets, which will in turn enable them to deliver data sets (via new tools and helpdesks) to researchers and learners for use specific to their subject niche. The Audience: Developers and librarians will make their data discoverable via auto lookup and librarians will help encourage end users to register their specific datasets at data.ac.uk Value to UK H/FEIs: Will help encourage a common way of exposing data across multiple institutional, departmental and individual websites so that other in the Academic sector can find and reuse the data. Risks: If not popular or en vogue could fall into disrepair and cease to be a useful resource. Could confuse end users as to the reason for having a registry that is separate from where the data is held. Examples: Barcode registry and lookup service: http://www.keyword.com/barcode_upc.htm Suggested Features: Guidance on how to make your data discoverable to the list at data.ac.uk provided, copying the Gov’t recommendations on URL sets. A crawler tool should be built in at the website to enable an auto discovery of data from data.xyzUniversity.ac.uk sites and list them along with an owner. Form-fill submission form for anyone with a “.ac.uk” email address to come along and add the dataset to the list or to suggest an amendment, correction: a “wikipedia” like interface (but only one page, not multiple web pages).
5. Option 2: The Registry Summary: An multi-page website and catalogue that can point to datasets around the web and provide a search facility and API to find data sets (but will not store data itself). The Problem it Solves: Will increase the likelihood of small and medium sized datasets to be found and used by the end user which are otherwise not discoverable via current search & discovery technologies. The Audience: Researchers, teachers, developers, librarians, administrators and other academic staff who are tasked with the collection of data from across and within the institution and its departments. Value to UK H/FEIs: Will provide a one-stop shop for acquiring metadata on a diverse rage of datasets that will be categorised and listed according to various tags. Will suggest tools and methods for working with the datasets along with examples and good practice. Risks: Same risks from Option 1 (the List), plus... Multi-website catalogue could confuse and act as a multi-click barrier which would dissuade the community from participating in the community curation of the website. Examples: CKAN http://www.ckan.net/ <- same s/w that data.gov.uk uses. Semantic Wiki: http://semantic-mediawiki.org/wiki/Semantic_MediaWiki Suggested Features: Same features as (option 1) ‘the list’, but in a multi page wiki like space. Auto discover of metadata and community curation would be the emphasis however additional support via an editorial support team Could provide a place for datasets that could not be hosted at an .ac.uk Could potential list both datasets as well as tool for working with the datasets. CKAN advice on how to use at local “data.*.ac.uk” institutions would be provided so as to provide a networked aggregated search layer.
6. Option 3: The Safe Storage Summary: A full search and retrieve repository that is able to list and hold data in a single place, this would act as both a registry and as a datastore that would assure the per The Problem it Solves: This would attempt to bring all data into a central storage facility so that it could be managed and curated by a central team and staff. The Audience: All UK Academics would be encouraged to place their data here for the long term so that it would be Value to UK H/FEIs: This would encourage long term perseverance of data regardless of institutional change. Risks: Central repositories have had limited success and are often met with criticism in the community. Time and expense required to deliver a single solution over what an individual institution can provide immediately could make this system out of data prior to being launched. Examples: National Learning Object Repository Jorum - http://www.jorum.ac.uk/ Suggested Features: Features of both option 1 (the list) and 2 (the registry), plus... Deposit features and long term citable persistence of URIs Committee for selecting and minting URI domains for various subject areas (eg. www.classicalcomposers.data.ac.uk) Database team for assuring delivery of URIs in the long term Transcoding team in place to assure delivery of multiple formats including CSV, RSS, ATOM, JSON and RDF Marketing team for engaging a wider community of contributors and volunteer curators. Business and service models as a self-sustaining self contained organisation.
7. What to choose? Option 1 Option 2 Option 3 data.ac.uk continuum The List – The Register – The Safe Storage Recommendations: The first step must be immediate advice to the local community as they are ready to publish data NOW, e.g. Use of data.*.ac.uk URL syntax must be in place tomorrow. A central hub at data.ac.uk (regardless of solution or technology) should enable the community to come together and share advice and thoughts, this should be an attitude of ‘aggregation from across the Web’ rather than ‘build it and they will come’. While community should be first and foremost at data.ac.uk, second to this should be an urgency of government legislation for having this data open. JISC should continue to pursue supporting activities that engage communities of end users around the use of their spreadsheets as they pertain to administration, teaching, learning and research (cite jiscEXPO and jiscMRD). JISC should continue to help change the preception of senior management from a resource oriented view to a parrallel data centric view of the sector.
11. It would be the institution decision during the bidding process to decide on what technology (list, registry, safe-store) would be applicable to support the community engagement. Reflect & Re-Scope Reflect & Re-Scope data.ac.uk v2 data.ac.uk v1 data.ac.uk v0
12. data.ac.uk URL syntax recommendations One thing that almost all community members agreed upon is the need for a guidance document on URL syntax recommendations, i.e. the OPSI’s guidance on URI Sets but for UK H/FEIs.
13. Suggested Timeline Sept 2010 = Closed ITT to Southampton, Manchester and Oxford to implement data.ac.uk Oct 2010 = Selection of Project Nov 2010 – April 2011 = data.ac.uk project 1 April 2011 = JISC community review of data.ac.uk project 1 May 2011 = Consideration for data.ac.uk project 2 This would be a non-committal funding cycle that would be subject to community review and approval prior to continuation.
14. Value proposition diagram for institution exposing their data. developer community building subject specific tools audience specific website available open data
Hinweis der Redaktion
Primary Audience:Pro Vice ChancellorsSecondary Audience: Institutional Managers (Libraries, IT, Deans of Research, Heads of Grant dept, etc.)What do we want to tell them?As we continue our journey into open access weface changes in the scope of what is covered by open access aspirations and also policy. Initially efforts were focused on the primary academic outputs such as research papers and publications. More recently there has been a growing demand for other aspects of our activities be they the current UK Government focus on transparency or the growing demand for the resources behind the academic outputs such as study data or experimental data. Significant change in the way research and education is being undertakenAcademic research has always depended on a thorough understanding of the current progress in the field. Historically this could be achieved by reading the research papers. Today we depend more on complex analysis of data to advance our research activities and sharing of the data as well as the results has become key....There is a parallel in the work of government where the as far back as 2005 the EU Directive on the re-use of public sector information came into force. This has most recently been translated into the work surrounding data.gov.uk and the current UK Government focus on transparency by open publishing of government data.We can learn from our own experiences in Open Access within the academic community and also from the work to establish data.gov.uk--– owning and the digital bring us in research, HEIs, workforce – change in skills required – Records Mgrs – dig preservation, archives hub / standards – archival description digitally, Librarians – repositories, LMS, digital preservation, ukoln, developers, IPR – Medical Images/ Cherri – past / Info Net/Jisc dig media – HE/NHS forum – OER; Licence to publish, IPR toolkit with SURF, scholarly comms; Researchers – DCC, Research Data, Infrastructure NG, VREs, NGS/cloud, text-mining, Access Mgt , E Uptake - learning technologist – DiVLE, X4L, Jorum – Reload, OER, Repositories, ,Digital Libraries in Classroom - students , Exerti – flash based /accessibility. PVC – policy, ERMI electronic licence / CC – via X4l, Project Mgt – repository start – up : skills to manage projects and take this forward in the HEI. Google Generation – strategic lead, info to help you serve your students properly
data.xyzOrganisation.ac.ukWhat is to stop someone creating data.soton.ac.uk, not much. We are stopped from creating music.data.ac.uk.Quotes from the community:“we’ve got bits and pieces of data to open up... let’s do what we can, as soon as possible”“government might well respond by calling up public sector organisations to open up their data. This has already started to happen via FOI financial statements...”“everybody just publish data that is in data form, like spreadsheets and via RSS/JSON feeds” Get the data out there on a URI people can easily find, like data.xyzUnivervisty.ac.uk or data.nDept.xyzUniversity.ac.uk...what extent could data.ac.uk be a HE/FE view over the wider datastore? “The line I’m suggesting above is one of convenient discovery as much as anything else, pulling (links to) all the data sets related to an institution into an area of the institution’s own website. Cf. the similar approach taken by data.gov.uk, which is to act primarily as a directory layer, as well as hosting national level datastores for particular datasets. “...there is a growing consensus that this is a necessary step that needs to be made to facilitate the growth of usable linked data in Higher Educationfacilitate the growth of usable linked data in Higher Education. Next week there will be a public briefing paper from JISC on the topic. We expect that they will announce, among other things, that the data.ac.uk domain name has been ring-fenced for to provide a data.gov.uk-esque repository for HE. “The tools to visualise this type of data have been around for a long time but as yet there has been no long term strategy for maintaining the actual data that drives these tools.” “So why do we need a centralised datastore for HE data? Why can’t we just host it locally on data.southampton.ac.uk ... our data is a perfect fit for a non politically aligned data.ac.uk implementation.” “Ideally JISC will also create and run a platform for hosting linked data — something which both provides a SPARQL triple store...there is a growing consensus that this is a necessary step that needs to be made to, but also a way to create clean cool resolvable URIs for things, and nice ways to cope with lists of things. This could be a service or just a set of tools and best practice.”
This mainly discussed the data discovery part and not the community discussion part. Need to balance.