SlideShare ist ein Scribd-Unternehmen logo
1 von 18
UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA

Tom Ensom & Veerle Van den Eynden
wwww.data-archive.ac.uk
Archived survey data presents a vast
          wealth of material with potential for
                 secondary use in GIS




UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA
UK DATA ARCHIVE

    •    Over 5,000 datasets

    •    Popular survey data series include:

          Quarterly Labour Force Survey

          British Household Panel Survey / Understanding
           Society

          British Crime Survey




UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA
We set out to explore the availability and
         usability of geo-identifiers in the UK Data
                      Archive collection


          These identifiers come in the form of
        ‘spatial units’ e.g. Ward and Constituency




UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA
• The availability of geo-referenced data is
          ever increasing


        • The usability of geo-referenced data ‘out-
          of-the-box’ is still generally poor

             Reflective of and contributing too a divide
             between:
                • GIS experts – idiosyncratic methodologies
                • Untrained with interest – steep learning
                   curve


UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA
Three key features of ‘ready-to-link’
        survey data for GIS


        1. SELECTION

         2. QUALITY

        3. METADATA
1. SELECTION

      Include geographical identifiers which:

      • Can be readily transformed

      • Are of sufficient resolution to allow for
        fine-grained analysis

      • Are appropriate to the data subject


UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA
2. QUALITY

      Include geographical identifiers which:

      • Use standard names

      • Are coded with a standard coding scheme
        e.g. ONS’ GSS Coding and Naming




UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA
3. METADATA

      Include geographical identifiers which are:

      • Time-referenced
         e.g. Government Office Region as defined in
         2001 as opposed to 1998

      • Well documented in their derivation




UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA
Those collecting data need to adjust
   their workflows to enable this



Those curating data need to adjust
  their workflows to enable this
What should data collectors be doing?

   • Considering geographic identifiers BEFORE data
     collection!


   • Considering standards
      • INSPIRE/GEMINI
      • GSS Coding and Naming


   • Documenting the provenance of geographic
     identifiers



UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA
What will we be doing at the UK Data Archive?


   • INSPIRE compliance
   (we have published a metadata mapping for DDI-INSPIRE-GEMINI)



   • Improving spatial unit definitions through
     extensive data cleansing

         Standardised
         Time referenced




UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA
What will we be doing at the UK Data Archive?


   • Improving resource discovery tools / interface

         User friendly
         Lessen time spent searching through text
         Consider semantics


   • Feeding back to data depositors

         Guidance on best practise



UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA
U·Geo Browser

   A new web tool for resource discovery

   • Revised and augmented variable metadata
   • Information clarifying the quality of the geo-identifier
   • Integrated spatial unit definitions
   • Links to boundary files


   Live beta at: geo.data-archive.ac.uk




UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA
U·Geo Browser

   • A demo tool using a simple, pragmatic approach

   • This tech will be integrated into a central Archive resource
     discovery tool, and catalogued data will be updated to
     reflect these refinements

                                  -

   • A step in the right direction but we need formal semantics
     built on persistent vocabularies

   • A drive needed to establish this




UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA
Thanks to:

        • all those at the UK Data Archive

        • to EDINA for their contributions as consultants



                           Tom Ensom
                           tensom@essex.ac.uk

                           wwww.data-archive.ac.uk
                           @UKDataArchive




UNLOCKING THE GEOSPATIAL
POTENTIAL OF SURVEY DATA

Weitere ähnliche Inhalte

Was ist angesagt?

Remote sensing of biophysical parameters: linking field, airborne and contine...
Remote sensing of biophysical parameters: linking field, airborne and contine...Remote sensing of biophysical parameters: linking field, airborne and contine...
Remote sensing of biophysical parameters: linking field, airborne and contine...
TERN Australia
 

Was ist angesagt? (20)

AusPlots field data collection with AusScribe
AusPlots field data collection with AusScribeAusPlots field data collection with AusScribe
AusPlots field data collection with AusScribe
 
How to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsHow to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collections
 
Why The Historic Environment Needs A Spatial Data Infrastructure
Why The Historic Environment Needs A Spatial Data InfrastructureWhy The Historic Environment Needs A Spatial Data Infrastructure
Why The Historic Environment Needs A Spatial Data Infrastructure
 
Drones in the Earth Sciences - Opportunities and issues
Drones in the Earth Sciences - Opportunities and issuesDrones in the Earth Sciences - Opportunities and issues
Drones in the Earth Sciences - Opportunities and issues
 
Long Term Ecological Research Network
Long Term Ecological Research NetworkLong Term Ecological Research Network
Long Term Ecological Research Network
 
Visualising Victoria’s Groundwater
Visualising Victoria’s GroundwaterVisualising Victoria’s Groundwater
Visualising Victoria’s Groundwater
 
Item 2 : Results of the Spectral Soil Data - Needs and capacities questionnaires
Item 2 : Results of the Spectral Soil Data - Needs and capacities questionnairesItem 2 : Results of the Spectral Soil Data - Needs and capacities questionnaires
Item 2 : Results of the Spectral Soil Data - Needs and capacities questionnaires
 
Item 3: GLOSOLAN concept notes
Item 3: GLOSOLAN concept notesItem 3: GLOSOLAN concept notes
Item 3: GLOSOLAN concept notes
 
Reliance project introduction
Reliance project introductionReliance project introduction
Reliance project introduction
 
Soil data needs for agronomists - Achim Dobermann, International Rice Resarch...
Soil data needs for agronomists - Achim Dobermann, International Rice Resarch...Soil data needs for agronomists - Achim Dobermann, International Rice Resarch...
Soil data needs for agronomists - Achim Dobermann, International Rice Resarch...
 
Item 9 Standard Protocol and Scheme for Measuring Soil Spectroscopy under the...
Item 9 Standard Protocol and Scheme for Measuring Soil Spectroscopy under the...Item 9 Standard Protocol and Scheme for Measuring Soil Spectroscopy under the...
Item 9 Standard Protocol and Scheme for Measuring Soil Spectroscopy under the...
 
Data specifications and information structures for soils data - Stephen Halle...
Data specifications and information structures for soils data - Stephen Halle...Data specifications and information structures for soils data - Stephen Halle...
Data specifications and information structures for soils data - Stephen Halle...
 
Status of global soil information, Adopting new technology and rebuilding ins...
Status of global soil information, Adopting new technology and rebuilding ins...Status of global soil information, Adopting new technology and rebuilding ins...
Status of global soil information, Adopting new technology and rebuilding ins...
 
CoESRA: Platform for collaborative research
CoESRA: Platform for collaborative researchCoESRA: Platform for collaborative research
CoESRA: Platform for collaborative research
 
Item 8 Guidelines for spectral measurements
Item 8 Guidelines for spectral measurementsItem 8 Guidelines for spectral measurements
Item 8 Guidelines for spectral measurements
 
Remote Sensing and GIS Techniques
Remote Sensing and GIS TechniquesRemote Sensing and GIS Techniques
Remote Sensing and GIS Techniques
 
Rationalization and harmonization of soil legacy information, Jacqueline Hann...
Rationalization and harmonization of soil legacy information, Jacqueline Hann...Rationalization and harmonization of soil legacy information, Jacqueline Hann...
Rationalization and harmonization of soil legacy information, Jacqueline Hann...
 
Drones and A.I in Earth Science
Drones and A.I in Earth ScienceDrones and A.I in Earth Science
Drones and A.I in Earth Science
 
Remote sensing of biophysical parameters: linking field, airborne and contine...
Remote sensing of biophysical parameters: linking field, airborne and contine...Remote sensing of biophysical parameters: linking field, airborne and contine...
Remote sensing of biophysical parameters: linking field, airborne and contine...
 
TGS GPS- NE Newfoundland Interpretation
TGS GPS-  NE Newfoundland InterpretationTGS GPS-  NE Newfoundland Interpretation
TGS GPS- NE Newfoundland Interpretation
 

Andere mochten auch (6)

Ordenagailuaren osagaiak
Ordenagailuaren osagaiakOrdenagailuaren osagaiak
Ordenagailuaren osagaiak
 
134 137 aurea
134 137  aurea134 137  aurea
134 137 aurea
 
TMH7 Preznentacja Agencji
TMH7 Preznentacja AgencjiTMH7 Preznentacja Agencji
TMH7 Preznentacja Agencji
 
Cv english
Cv englishCv english
Cv english
 
Duravit starck k_int
Duravit starck k_intDuravit starck k_int
Duravit starck k_int
 
Neutrophils in tb
Neutrophils in tbNeutrophils in tb
Neutrophils in tb
 

Ähnlich wie Unlocking the geospatial potential of survey data

Craig Walker & Peter Doherty_Soils-to-Satellites: National capabilities worki...
Craig Walker & Peter Doherty_Soils-to-Satellites: National capabilities worki...Craig Walker & Peter Doherty_Soils-to-Satellites: National capabilities worki...
Craig Walker & Peter Doherty_Soils-to-Satellites: National capabilities worki...
TERN Australia
 
Map-To-Mine-Data-Services
Map-To-Mine-Data-ServicesMap-To-Mine-Data-Services
Map-To-Mine-Data-Services
Dee Rodwell
 

Ähnlich wie Unlocking the geospatial potential of survey data (20)

Craig Walker & Peter Doherty_Soils-to-Satellites: National capabilities worki...
Craig Walker & Peter Doherty_Soils-to-Satellites: National capabilities worki...Craig Walker & Peter Doherty_Soils-to-Satellites: National capabilities worki...
Craig Walker & Peter Doherty_Soils-to-Satellites: National capabilities worki...
 
Big Data is today: key issues for big data - Dr Ben Evans
Big Data is today: key issues for big data - Dr Ben EvansBig Data is today: key issues for big data - Dr Ben Evans
Big Data is today: key issues for big data - Dr Ben Evans
 
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life CycleNavigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
 
Exploration in the House 2015: NSW Seamless Geology Project: Progress to date...
Exploration in the House 2015: NSW Seamless Geology Project: Progress to date...Exploration in the House 2015: NSW Seamless Geology Project: Progress to date...
Exploration in the House 2015: NSW Seamless Geology Project: Progress to date...
 
Data Facilties Workshop - Panel on Global Data Sharing Exemplars
Data Facilties Workshop - Panel on Global Data Sharing ExemplarsData Facilties Workshop - Panel on Global Data Sharing Exemplars
Data Facilties Workshop - Panel on Global Data Sharing Exemplars
 
DART AARG Presentation Siena 2009
DART AARG Presentation Siena 2009DART AARG Presentation Siena 2009
DART AARG Presentation Siena 2009
 
Exploratory Spatial Analytics (ESA)
Exploratory Spatial Analytics (ESA)Exploratory Spatial Analytics (ESA)
Exploratory Spatial Analytics (ESA)
 
Geoservices Activities at EDINA
Geoservices Activities at EDINAGeoservices Activities at EDINA
Geoservices Activities at EDINA
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data Archive
 
Next generation data delivery for NSW geoscientific data
Next generation data delivery for NSW geoscientific data Next generation data delivery for NSW geoscientific data
Next generation data delivery for NSW geoscientific data
 
Map-To-Mine-Data-Services
Map-To-Mine-Data-ServicesMap-To-Mine-Data-Services
Map-To-Mine-Data-Services
 
Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...
Geospatial Data Insfrastructures, Cybercartography and Open Data:  The Need f...Geospatial Data Insfrastructures, Cybercartography and Open Data:  The Need f...
Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...
 
Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...
Geospatial Data Insfrastructures, Cybercartography and Open Data:  The Need f...Geospatial Data Insfrastructures, Cybercartography and Open Data:  The Need f...
Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...
 
Big Data Processing in the Cloud: A Hydra/Sufia Experience
Big Data Processing in the Cloud: A Hydra/Sufia ExperienceBig Data Processing in the Cloud: A Hydra/Sufia Experience
Big Data Processing in the Cloud: A Hydra/Sufia Experience
 
Integrating Data for Archaeology
Integrating Data for ArchaeologyIntegrating Data for Archaeology
Integrating Data for Archaeology
 
Stewards
StewardsStewards
Stewards
 
GIS ANALYTICS-2011
GIS ANALYTICS-2011GIS ANALYTICS-2011
GIS ANALYTICS-2011
 
EPOS GNSS Data and Products TCS - What we do...
EPOS GNSS Data and Products TCS - What we do...EPOS GNSS Data and Products TCS - What we do...
EPOS GNSS Data and Products TCS - What we do...
 
An Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data ResourceAn Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data Resource
 
Sharing data
Sharing dataSharing data
Sharing data
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

Unlocking the geospatial potential of survey data

  • 1. UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA Tom Ensom & Veerle Van den Eynden wwww.data-archive.ac.uk
  • 2. Archived survey data presents a vast wealth of material with potential for secondary use in GIS UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA
  • 3. UK DATA ARCHIVE • Over 5,000 datasets • Popular survey data series include:  Quarterly Labour Force Survey  British Household Panel Survey / Understanding Society  British Crime Survey UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA
  • 4. We set out to explore the availability and usability of geo-identifiers in the UK Data Archive collection These identifiers come in the form of ‘spatial units’ e.g. Ward and Constituency UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA
  • 5. • The availability of geo-referenced data is ever increasing • The usability of geo-referenced data ‘out- of-the-box’ is still generally poor Reflective of and contributing too a divide between: • GIS experts – idiosyncratic methodologies • Untrained with interest – steep learning curve UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA
  • 6. Three key features of ‘ready-to-link’ survey data for GIS 1. SELECTION 2. QUALITY 3. METADATA
  • 7. 1. SELECTION Include geographical identifiers which: • Can be readily transformed • Are of sufficient resolution to allow for fine-grained analysis • Are appropriate to the data subject UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA
  • 8. 2. QUALITY Include geographical identifiers which: • Use standard names • Are coded with a standard coding scheme e.g. ONS’ GSS Coding and Naming UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA
  • 9. 3. METADATA Include geographical identifiers which are: • Time-referenced e.g. Government Office Region as defined in 2001 as opposed to 1998 • Well documented in their derivation UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA
  • 10. Those collecting data need to adjust their workflows to enable this Those curating data need to adjust their workflows to enable this
  • 11. What should data collectors be doing? • Considering geographic identifiers BEFORE data collection! • Considering standards • INSPIRE/GEMINI • GSS Coding and Naming • Documenting the provenance of geographic identifiers UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA
  • 12. What will we be doing at the UK Data Archive? • INSPIRE compliance (we have published a metadata mapping for DDI-INSPIRE-GEMINI) • Improving spatial unit definitions through extensive data cleansing  Standardised  Time referenced UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA
  • 13. What will we be doing at the UK Data Archive? • Improving resource discovery tools / interface  User friendly  Lessen time spent searching through text  Consider semantics • Feeding back to data depositors  Guidance on best practise UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA
  • 14. U·Geo Browser A new web tool for resource discovery • Revised and augmented variable metadata • Information clarifying the quality of the geo-identifier • Integrated spatial unit definitions • Links to boundary files Live beta at: geo.data-archive.ac.uk UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA
  • 15.
  • 16.
  • 17. U·Geo Browser • A demo tool using a simple, pragmatic approach • This tech will be integrated into a central Archive resource discovery tool, and catalogued data will be updated to reflect these refinements - • A step in the right direction but we need formal semantics built on persistent vocabularies • A drive needed to establish this UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA
  • 18. Thanks to: • all those at the UK Data Archive • to EDINA for their contributions as consultants Tom Ensom tensom@essex.ac.uk wwww.data-archive.ac.uk @UKDataArchive UNLOCKING THE GEOSPATIAL POTENTIAL OF SURVEY DATA

Hinweis der Redaktion

  1. Based on JISC Geospatial funded work at the UK Data Archive
  2. Archived survey data has great potential for secondary analysis in GIS, a potential which is not yet fully realised. The UK Data Archive,as distributor ofthe UKs largest collection of social survey data, is well positioned to spearhead developments in this area.
  3. We curate the largest collection of digital data in the social sciences in the UK. Over 5,000 datasets from government departments and research institutes and other organisations, all of which are made available online to UK academia.Some of you might be familiar with our datasets, some of the more well known series include the QLFS, BHPS and BCS.
  4. The UGeo project looked in depth at this survey data, much of which contains geographic variables of some kind. We wanted to assess the quality and condition of the identifiers and the metadata describing them.
  5. First part of project a systematic information gathering exercise, working through datasets one by one and pulling out the geography variables for further examination. A observation we were quickly able to make was that the availability of geo-referenced data has been steadily increasing, particularly over the last 10 years. Not only are new studies being geo-referenced, but new varieties of identifier are being added with uses for different disciplines. Lower level geographies such as postcode and grid reference have also been increasingly made available, thanks to the advent of new licensing options and secure data services.However, the actual state of the variables and their metadata is still relatively poor. For example: timestamps are often missing making appropriate linking impossible; inappropriate units are used prohibiting meaningful analysis
  6. The next stage of our project then, was to work how exactly to remedy some of these data problems so evident in our investigation. What exactly are we looking for in ready to use georeferences? We suggest a three part criteria:
  7. Selection is the choice of geographic identifier.Ideally of a sufficiently low level that they can be transformed to any other variable e.g. grid reference, postcodeAppropriate for analysis – e.g. statistics-appropriate units such as output areaShould be appropriate to the data subject e.g. researchers are likely to want parliamentary constituencies for a political survey, police force areas with the BCS
  8. How easy it is to unambigously interpret the variable and codes:Use standard names for units e.g. the term Scottish Region could refer to administrative or electoral regions – so disambiguate them in the nameUse standards such as GSS Coding and Naming scheme produced by the ONS which provides a standard set of codes for each division of many popular spatial units
  9. Ensure any spatial unit is well documented:A timestamp for each variable, for example Government Office Region as defined in 2001 as opposed to 1998Sufficient documentation of provenance. For example, if you’re including a grid reference, how was it derived? Postcode centroids?
  10. In order to meet this criteria, there are new approaches needed in many stages of the pre-analysis data lifecycle, from both those gathering the data and going on to deposit it, and from those who preserve and disseminate the data such as the UK Data Archive
  11. Briefly, what should those collecting data be doing? This has relevance to those working on research projects as well as big government surveys. - Instead of tacking on geo identifiers they should be considered prior to data collection, and asking which units and why?- Using data standards at the collection stage- Documenting how the unit been derived in precise terms
  12. What are we doing to make the lives of researchers easier? The UK Data Archive will be leading the way in new developments for archives INSPIRE is an EU standard which helps to ensure a minimum level of information about the geospatial content of a dataset.A number of survey data / spatial unit specific improvements will be required. Much data cleansing work on our catalogue will be taking place over the coming months to bring it up to scratch
  13. 3. Using the enhanced metadata, we will try and make it easier for users to find the data they need. We will be considering interface design and making the relevant documentation easier to find. All this will consider the semantics of the unit – dataset relationship4. And finally we will of course be encouraging data depositors to give us better geospatial data
  14. An immediate development of the project has been a web tool called the UGeo Browser. This is a demonstration tool, that brings the geospatial to the forefront of searchable metadata. It meets many of the requirements I have just outlined, for a subset of our survey data collection:Revised and augmented variable level metadata to ensure accuracy and completenessExtra quality information – e.g. this variable is Ward, but it’s missing value labelsClear and immediately accessible unit definitionsVerified links to boundary files, with divergence (if any) between dataset and boundary clarified
  15. Interface preview
  16. Interface preview
  17. In many ways this functions as a proof of concept for our ideas on how ‘studies’ and ‘units’ as entities should interact. The long term goal is that this tech will be integrated in the Archive’s central catalogue. Data cleansing and application development work has already begun.We’re also now considering the best way of creating formal semantics between units and studies. Perhaps a first step will be persistent identifiers for units…
  18. Thanks and contact details :)