SlideShare ist ein Scribd-Unternehmen logo
1 von 33
First Things First:
Figuring Out What to Preserve and Why
     A case study of DIY data management



          Grace Currie & Ann Jebson

                                           March 2012
Systems?

    We should take wide view of systems:
      • Processes

      • Understanding your business

      • Communicating with your business

      • Influencing workflows

      • Cultivating attitudes

02/04/12                                   2
02/04/12   3
02/04/12   4
Don’t be afraid to DIY


02/04/12                            5
02/04/12   6
/
Some Stats

    Move to digital processing 1980s

    Over 250 releases of official statistics every year

    Multiple datasets created for each release




02/04/12                                                  7
http://www.flickr.com/photos/26664862@N04/2499573972/sizes/l/
A good start

           “an enduring national resource”

“ensuring that information is maintained in an
  accessible format for possible future use”




02/04/12                                     9
A good start




02/04/12                  10
02/04/12                                       11
           http://www.flickr.com/photos/beglen/5385092551/
Develop a process

    Retention, Preservation, and Disposal statement
    for statistical data (RPDs)
•   Make a start
•   Develop a template
•   Manage the process
•   Collaborate



02/04/12                                          12
04/02/12   13
04/02/12   14
Documentation

    Publications

    Statistical metadata

    Corporate Information


02/04/12                           15
04/02/12   16
Challenges



    Great interest       Great resistance




02/04/12                                     17
04/02/12   18
Pisces.SD2




02/04/12                19
Household Labour Force Survey




02/04/12                                   20
How did we get ‘buy in’?




02/04/12                              21
04/02/12   22
04/02/12   23
04/02/12   24
04/02/12   25
04/02/12   26
What have we learned?




02/04/12                           27
Do it as you go

    Data Management needs to be part of the
    business process




      •Retrospective metadata gathering is very
      difficult
02/04/12                                          28
What makes it easier …

    Choose the right person for the job

      • Coordinating the work programme

      • Completing the RPD statement
        (providing the information)


02/04/12                                  29
Take your opportunities

    Influence and educate in Data Management

    Culture change at Statistics NZ




02/04/12                                       30
Outcomes
    Data Archive now holds valuable data

    Work underway to refine archiving process

    New corporate metadata system “Colectica”
    being rolled out to organisation


02/04/12                                        31
Questions?




02/04/12                32
Slide 5:
    http://www.flickr.com/photos/johntmeyer/6577544863

    Slide 6;
    http://www.flickr.com/photos/58597766@N05/5845710179

    Slide 8:
    http://farm8.staticflickr.com/7148/6577544863_11ef8358ef.jpg

    Slide 11:
    http://www.flickr.com/photos/beglen/5385092551

    Slide 16:
    http://www.loc.gov/rr/business/company/rankings.html

    Slide 18:
    Http://maxlblue.blogspot.co.nz/2010/11/vocab-1901-assembly-line.html

    Slide 22:
    http://www.flickr.com/photos/earthworm/2916565549/

    Slide 23:
    Http://www.flickr.com/photos/shelley_dave/6675011581/

    Slide 25:
    http://www.flickr.com/photos/epsos/5575089139/

    Slide 26:
    http://www.flickr.com/photos/sharondavis/5467939822/

02/04/12                                                                   33

Weitere ähnliche Inhalte

Ähnlich wie Grace Currie Ann Jebson First Things First

Data-Ed Online: A Practical Approach to Data Modeling
Data-Ed Online: A Practical Approach to Data ModelingData-Ed Online: A Practical Approach to Data Modeling
Data-Ed Online: A Practical Approach to Data Modeling
DATAVERSITY
 

Ähnlich wie Grace Currie Ann Jebson First Things First (20)

Chicago Data Driven Talk - January 29, 2015
Chicago Data Driven Talk - January 29, 2015Chicago Data Driven Talk - January 29, 2015
Chicago Data Driven Talk - January 29, 2015
 
Data visualization and school finance
Data visualization and school financeData visualization and school finance
Data visualization and school finance
 
DMBOK and Data Governance
DMBOK and Data GovernanceDMBOK and Data Governance
DMBOK and Data Governance
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Democratizing Big Data (Updated)
Democratizing Big Data (Updated)Democratizing Big Data (Updated)
Democratizing Big Data (Updated)
 
Getting Things Done for Technical Communicators at TCUK14
Getting Things Done for Technical Communicators at TCUK14Getting Things Done for Technical Communicators at TCUK14
Getting Things Done for Technical Communicators at TCUK14
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)
 
Why organizations implement new systems
Why organizations implement new systemsWhy organizations implement new systems
Why organizations implement new systems
 
Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...
 
Big Data - Introduction and Research Topics - for Dutch Kadaster
Big Data - Introduction and Research Topics - for Dutch KadasterBig Data - Introduction and Research Topics - for Dutch Kadaster
Big Data - Introduction and Research Topics - for Dutch Kadaster
 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
 
Necessary Prerequisites to Data Success
Necessary Prerequisites to Data SuccessNecessary Prerequisites to Data Success
Necessary Prerequisites to Data Success
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
 
Brian Kelsey, Civic Analytics, Austin, TX
Brian Kelsey, Civic Analytics, Austin, TXBrian Kelsey, Civic Analytics, Austin, TX
Brian Kelsey, Civic Analytics, Austin, TX
 
basis data 02.pptx
basis data 02.pptxbasis data 02.pptx
basis data 02.pptx
 
Data-Ed Online: A Practical Approach to Data Modeling
Data-Ed Online: A Practical Approach to Data ModelingData-Ed Online: A Practical Approach to Data Modeling
Data-Ed Online: A Practical Approach to Data Modeling
 
Sq lite module1
Sq lite module1Sq lite module1
Sq lite module1
 
[DW&U] - To-Do, Doing, Done: How to manage work
[DW&U] - To-Do, Doing, Done: How to manage work[DW&U] - To-Do, Doing, Done: How to manage work
[DW&U] - To-Do, Doing, Done: How to manage work
 
Data Preparation Fundamentals
Data Preparation FundamentalsData Preparation Fundamentals
Data Preparation Fundamentals
 
What makes it worth becoming a Data Engineer?
What makes it worth becoming a Data Engineer?What makes it worth becoming a Data Engineer?
What makes it worth becoming a Data Engineer?
 

Mehr von Future Perfect 2012

Mehr von Future Perfect 2012 (20)

Working Across Organizations white paper
Working Across Organizations white paperWorking Across Organizations white paper
Working Across Organizations white paper
 
Ensuring Data Integrity white paper
Ensuring Data Integrity white paperEnsuring Data Integrity white paper
Ensuring Data Integrity white paper
 
Bigger Hard Drive Jamie Lean
Bigger Hard Drive Jamie LeanBigger Hard Drive Jamie Lean
Bigger Hard Drive Jamie Lean
 
Steve Knight by Design
Steve Knight by DesignSteve Knight by Design
Steve Knight by Design
 
Michael Parsons Passion
Michael Parsons PassionMichael Parsons Passion
Michael Parsons Passion
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
 
Joe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage LibraryJoe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage Library
 
James Smithies Academic Earthquake Research
James Smithies Academic Earthquake ResearchJames Smithies Academic Earthquake Research
James Smithies Academic Earthquake Research
 
Shaun Hendy Innovation Ecosystem
Shaun Hendy Innovation EcosystemShaun Hendy Innovation Ecosystem
Shaun Hendy Innovation Ecosystem
 
Martin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineMartin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP Online
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data Archive
 
Parul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right CombinationParul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right Combination
 
Alison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessAlison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for Success
 
Andrew Waugh Business Systems
Andrew Waugh Business SystemsAndrew Waugh Business Systems
Andrew Waugh Business Systems
 
Gabe Nault Data Integrity
Gabe Nault Data IntegrityGabe Nault Data Integrity
Gabe Nault Data Integrity
 
Clare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in DatabasesClare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in Databases
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and Formats
 
Dave Pearson The Adventures of Digi
Dave Pearson The Adventures of DigiDave Pearson The Adventures of Digi
Dave Pearson The Adventures of Digi
 
Jay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying FormatsJay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying Formats
 
Stuart Wakefield Cloud Computing
Stuart Wakefield Cloud ComputingStuart Wakefield Cloud Computing
Stuart Wakefield Cloud Computing
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Grace Currie Ann Jebson First Things First

  • 1. First Things First: Figuring Out What to Preserve and Why A case study of DIY data management Grace Currie & Ann Jebson March 2012
  • 2. Systems? We should take wide view of systems: • Processes • Understanding your business • Communicating with your business • Influencing workflows • Cultivating attitudes 02/04/12 2
  • 5. Don’t be afraid to DIY 02/04/12 5
  • 6. 02/04/12 6 /
  • 7. Some Stats Move to digital processing 1980s Over 250 releases of official statistics every year Multiple datasets created for each release 02/04/12 7
  • 9. A good start “an enduring national resource” “ensuring that information is maintained in an accessible format for possible future use” 02/04/12 9
  • 11. 02/04/12 11 http://www.flickr.com/photos/beglen/5385092551/
  • 12. Develop a process Retention, Preservation, and Disposal statement for statistical data (RPDs) • Make a start • Develop a template • Manage the process • Collaborate 02/04/12 12
  • 13. 04/02/12 13
  • 14. 04/02/12 14
  • 15. Documentation Publications Statistical metadata Corporate Information 02/04/12 15
  • 16. 04/02/12 16
  • 17. Challenges Great interest  Great resistance 02/04/12 17
  • 18. 04/02/12 18
  • 20. Household Labour Force Survey 02/04/12 20
  • 21. How did we get ‘buy in’? 02/04/12 21
  • 22. 04/02/12 22
  • 23. 04/02/12 23
  • 24. 04/02/12 24
  • 25. 04/02/12 25
  • 26. 04/02/12 26
  • 27. What have we learned? 02/04/12 27
  • 28. Do it as you go Data Management needs to be part of the business process •Retrospective metadata gathering is very difficult 02/04/12 28
  • 29. What makes it easier … Choose the right person for the job • Coordinating the work programme • Completing the RPD statement (providing the information) 02/04/12 29
  • 30. Take your opportunities Influence and educate in Data Management Culture change at Statistics NZ 02/04/12 30
  • 31. Outcomes Data Archive now holds valuable data Work underway to refine archiving process New corporate metadata system “Colectica” being rolled out to organisation 02/04/12 31
  • 33. Slide 5: http://www.flickr.com/photos/johntmeyer/6577544863 Slide 6; http://www.flickr.com/photos/58597766@N05/5845710179 Slide 8: http://farm8.staticflickr.com/7148/6577544863_11ef8358ef.jpg Slide 11: http://www.flickr.com/photos/beglen/5385092551 Slide 16: http://www.loc.gov/rr/business/company/rankings.html Slide 18: Http://maxlblue.blogspot.co.nz/2010/11/vocab-1901-assembly-line.html Slide 22: http://www.flickr.com/photos/earthworm/2916565549/ Slide 23: Http://www.flickr.com/photos/shelley_dave/6675011581/ Slide 25: http://www.flickr.com/photos/epsos/5575089139/ Slide 26: http://www.flickr.com/photos/sharondavis/5467939822/ 02/04/12 33

Hinweis der Redaktion

  1. Good morning, My name is Grace Currie and this is my colleague Ann Jebson. We are from the Information Management team at Statistics NZ and we are here to present to you what we like to think of as “ A case study of DIY data management”.
  2. All of us are here because we are interested in how to integrate digital preservation requirements into the design of systems. When we think of system design it’s sometimes hard not to think of a magical IT system that manages digital content from time of creation. But system design is also about: Processes Understanding your business Communicating with your business Influencing workflows Cultivating the right attitude in your organisation. As New Zealand’s National Statistics Office, our core business revolves around the collection, analysis and publication of data. This data has immense ongoing value. Today Ann and I will tell you how the Information Management team at Statistics NZ approached the task of identifying thousands of datasets and their associated metadata and documentation so they could be preserved for future reuse.
  3. This is a journey that has taken us from the “fire fighting” “ambulance at the bottom of the cliff” position, which I’m sure many of you will be familiar with……
  4. …… .to a state where we can now support the data management practices of our business. For us this means being involved over all the phases of the model you see here – our Statistical business process model. This model illustrates the seven stages that most studies follow during the production of official statistics. When we say a “study” we mean an activity where data is collected, for example by a survey or a census, to produce a set of information. Studies you may know of include: Consumers Price Index (CPI), Gross Domestic Product (GDP) and the Census of Population and Dwellings.
  5. The ever increasing volume and diversity of content created in our organisations means developing innovative methods to identify what content preserve is more important than ever. The message we want to get across to you today is that although such a task can be daunting, it can be achieved with tools that we all have readily available. Don’t be afraid to DIY.
  6. A few years ago our data management situation was a bit like a teenager’s messy room. We had stores of digital information rapidly growing in multiple locations. The problem was that our statistical analysts were very competent with the confidentiality, privacy and security aspects of data management, but they weren’t so good with documenting t he basics like file names and locations what the data was used for and what metadata was associated with what data This was a bit of a problem considering that in 2005 the Statistics NZ Data Archive was established as a repository to preserve valuable data and ensure its availability to future users, both internal staff and external researchers. We had no shortage of valuable data to preserve in the Data Archive – legacy data was in abundance – but we lacked the basic information required to begin ingesting this data into our archive in large quantities.
  7. To give you an idea of the size of the problem we were tackling consider this: The move to digital processing tool place around thirty years ago Currently, there are over 250 releases of official statistics every year Multiple datasets are produced for each of these releases over the collection, processing, analysis and dissemination of the model I showed you earlier
  8. In addition to this, data is different from other digital content. Data is not as self-descriptive as other information, such as written documents or images. The numbers you see here mean nothing without context. For data, that context is provided by statistical metadata. Statistical metadata is information that helps us understand data and make information out of numbers. Statistical metadata refers to information about surveys and their publications, questions and questionnaires, variables and methodologies Therefore, to preserve our valuable data is a way that would mean it would be understandable and usable in the future we also needed to locate and consolidate a large quantity of statistical metadata for each study.
  9. So, how did we approach this situation? We got off to a good start because we had two things that were instrumental to success. Firstly: That Statistics NZ, more specifically the senior leadership, had a vision for data reuse. Our 2006 statement of intent talked about our data as “an enduring national resource” and placed importance on “ensuring that information is maintained in an accessible format for possible future use”.
  10. Secondly: Since data fell outside of the coverage of Archives NZs General Disposal Authorities, Statistics NZ and Archives together developed a specialised Appraisal Report and Disposal Schedule for Statistical Data, Documentation and Metadata . The appraisal report recommended the retention of final, definitive versions of official statistical datasets It also recommends retention of the core documentation and metadata which summarises the design, development, collection, processing, and analysis of official statistical collections and data. This gave a yard stick with which to evaluate our data.
  11. However, there was much we didn’t know and we didn’t have a process or system in place to gather this knowledge. Preservation assumes that organisations have knowledge that many do not have - namely the fundamentals: what you have, where it is, how much there is and what it looks like. We needed a process and a vehicle to help us to document our data assets. This where our DIY system comes in. I’ll now hand over to Ann who will tell you more about this.
  12. So, we developed a process (to tidy the room) We created what we call the Retention, Preservation, and Disposal statements for statistical data. Or (RPDs) These are documents in which we record what statistical information we have, and what we plan to do with it. We began in 2007 with a basic template in Lotus Notes, but moved on to an excel template when we realised that we more detailed information. The important thing for us was to make a start - and then we refined our process as we learned more about what we needed. This template allows us to standardise the collection of the information that we need. And, we were aware that someone needed to manage the process to ensure that the documented information meets consistent standards and that all studies are covered which in our case is Information Management. And, most importantly, the RPD process is a collaborative process – a team sport. Statistical business units and Information Management work together to produce and evaluate the RPD statements for managerial approval.
  13. Our process is low tech – our template is an excel workbook with separate pages where we list the different types of information that we need. We need pages for the scope of the RPD – which provides a description of what the study is at a high level, and includes information about who the main users of the study are and what it is used for. This information helps us to know how much effort to put into archiving the data and the amount of metadata we will need to archive with the study so that it can be properly understood in the future. Other pages list Datasets , and documentation about the study, and lastly a page to record when the RPD statement will be formally reviewed.
  14. The data page lists the datasets that the study produces . The most important information is the filename and location – including file path and server name - for each dataset. Also, record classes and disposal decisions for datasets are recorded here. All datasets produced are listed - those that will be archived and also those that will be destroyed when their operational purpose is complete. Also, any data that should be listed here, but cannot be found , must also be recorded, with a note saying that it cannot be found.
  15. The documentation about the study falls into these 3 types Typically, the publications page would list the Information releases, publications and articles but could also include conference papers, or important presentations relating to the study. Statistical metadata is the information that makes the numbers into data – in that it gives context and meaning to the numbers in a dataset – it is, therefore, vital to the data being useful in the future, when everyone who knows about the study has gone. The information we expect to see here includes: sampling methods, questionnaires, classifications used, and processing documents. Lastly, the Corporate Information page will list any contracts, business cases, relevant corporate policies, etc. to do with the study. Any documentation that should be listed in these pages, but cannot be found or was never produced , must also be recorded, with a note to say so – this eliminates ‘time wasting’ - looking for things that cannot be found or never existed. Once again, we want to know: What it is why it is what dataset it refers to, and where it is located.
  16. The RPD statement is reviewed on a regular basis. A formal review time is specified in the RPD. At this time a new version is created which is updated with the latest information about the study. We also meet with the data custodian for an annual informal review. This is a quick and casual meeting which is a valuable way of keeping in touch with who is responsible for the data, what is happening in the business unit, and what is being planned – a great way to gather intelligence and to have a feel for what is going on in the organisation. These reviews are crucial, they ensure that the process of documenting data and metadata is embedded in the organisation – it is not about doing it once.
  17. When we first introduced the RPD process to the organisation – the reaction was predictable. Some people thought it was a great idea and could see the benefits immediately Others were not so keen
  18. Everyone is already very busy Particularly those responsible for a number of studies, and that publish their data monthly, quarterly and annually Their focus very quickly moves from what they have just published to what is about to be published   and no one is keen to take on what they perceived to be extra work   The frequently asked question is how long will it take , or how much effort is required ? And all we could say was – it depends …. It depends on how complex the study is Some are complicated – like the Consumers Price Index, Balance of Payments, National Accounts Other are relatively straight forward It also depends on how well the data is already being managed If data management is not good, the work to located, and appraise and documentation will take a long time e.g. Quite some effort went into discovering the final dataset for the Marine Recreational Fishing Survey that was run in 1987 – it was eventually found – named <click> ‘Pisces’
  19. However, if data management practices are good, then it is just a matter of documenting the fact in the RPD statement And even better, if standard file names, and file structures are used, these will only need to be documented once The point being that - if standard conventions are followed - it will take less time than if free spirits have been allowed to name and structure files creatively.
  20. This is an example of a sensible naming convention. This is part of the data page for the Household Labour Force Survey RPD. This survey is published quarterly and produces New Zealand’s official employment and unemployment statistics. The survey has been published quarterly, for 25 years, producing more than 17 datasets each quarter, but the data page of the RPD statement is relatively simple. There are 18 lines to the data sheet and in general, they will not need to be changed. The only change will be to add an additional line if a special dataset is created for a particular purpose during a quarter.
  21. How did we get ‘buy in?’ As Grace mentioned, we have the active support of senior management - which is the first thing you need if you are going to succeed. We spent a lot of time selling the benefits of having ‘up-to-date’ RPDs  
  22. The benefit of being tidy and organised – of documenting what you have and where you put it – reducing risks. The risk of knowledge being lost – for example, when information is stored in people’s heads, -the information is not available to the rest of the organisation, - and it leaves the organisation when that person leaves Of datasets not being stored in correct locations e.g. if data is stored on personal drives, it needs to be moved to shared drives, and insufficient documentation about a process. We also provided support - mainly through personal help and encouragement, but also by publishing ‘A Guide to completing an RPD statement’, and an exemplar of a completed statement. The Appraisal guidelines and corporate policies and processes that were already established also supported decision making. Another way to get ‘buy in’ is to Appreciate the effort that is put into completing the RPD statement <click> http://www.flickr.com/photos/earthworm/2916565549/
  23. The main currency of appreciation at Statistics NZ is Chocolate But we also provide positive feedback for those that have produced quality RPDs – at the time of manager sign off and also at performance review time.
  24. Similar to appreciating the effort is appreciating the content , in our case - the data Our statistical analysts love their data and find it infinitely interesting Showing a genuine interest in understanding and valuing their data, data process, and metadata certainly helps to get the job done
  25. The disposal decisions recorded in RPDs provide an opportunity to free up space on shared drives Lack of disc space is a constant issue at Stats. At Statistics NZ data is generally disposed of in one of 2 ways – it is either preserved or destroyed. We preserve data in the Data Archive but before we agree to do this, an RPD statement must be completed and signed off by managers When data has been successfully archived, the duplicate copies can be destroyed, which frees up valuable space on drives The other option is to destroy data We are terrible hoarders at Stats NZ - RPDs have introduced the idea that data can be destroyed, and in some cases, must be destroyed. But, data may not be destroyed unless it is listed in the RPD statement and the record class assigned to it allows for destruction.
  26. Unexpected benefits have also helped with ‘buy in’ This is a case where rhetoric becomes tangible. Statistics New Zealand has a very productive office in Christchurch – our colleagues there produce and release a range of business and population statistics. At the time of the earthquake in February 2011, many of the studies that are compiled in the Christchurch office were being analysed or due to be released. The information stored in the RPD statements about filepaths and server names was used to help identify and prioritise the work to recover data from the Christchurch server back up. The process documents that were recorded also enabled analysts in our other offices to help with analysing data.
  27. So, what have we learned?
  28. Do it as you go Data Management needs to be part of the business process – we need to manage data and metadata from time the study begins, through the entire data cycle - and completing the RPD statement needs to be in the Business Unit’s work plan – as part of the survey documentation – at Stats, if it is in the workplan – it will happen ! Retrospective metadata gathering is very difficult if not impossible When a study becomes obsolete, people move on very quickly - this can make preserving the obsolete data - as meaningful data - very difficult. The metadata that makes the numbers meaningful must be documented as you go.
  29. What makes it easier …. Choosing the right person for the job The person to coordinating the programme of work needs to be an influencer – someone who can get other people to do things for them, someone who understands the data cycle of the business unit and its pressure points – knows when to push, when to leave them alone, and when to help someone who can work around ‘road blocks’ and gets things done. You also need: The right person to complete the RPD statement Someone who understands the data through all parts of the data cycle, and who knows and understands the importance of metadata and other the documentation that supports the data Completing RPDs is not a good way for a new person to learn about a study – that a recipe for frustration all round !
  30. Take your opportunities The RPD process has provided an opportunity to influence and educate best practice data management. During the process we see what actually happens – and when what actually happens is not what should happen – we have an opportunity to educate and change processes and habits to best practice, or at the very least, to alert the organisation to risks. We have experienced a change in culture regarding data management at Statistics NZ. The attitude towards the RPD process has changed. The importance of documenting information about data and metadata so that it is current and available is now embedded in process and is accepted as part of what we do.
  31. RPDs were instrumental in moving us to our current state where we can now support the data management practices of our business over all stages. The Data Archive now provides datasets for researchers to use in the Statistics NZ Data Laboratory, and for internal staff to reuse in statistical production. Like we did with the RPD process we are are still refining. There is currently work underway to refine out archiving process through automation. And last, but definitely not least, as a result of what we have learnt with the RPDs, we now have our own “ magical IT system” that will manage statistical metadata over the whole data cycle. This metadata repository is called Colectica and is about to be rolled out to multiple business units. So, our system and process has evolved into something bigger and better which we think isn’t too bad for a bit of DIY.