Grace Currie Ann Jebson First Things First

•Als PPT, PDF herunterladen•

0 gefällt mir•412 views

Future Perfect 2012

First Things First: Figuring Out What to Preserve and Why A case study of DIY data management Grace Currie Ann Jebson

Technologie Business

Systems?

We should take wide view of systems:
• Processes

• Understanding your business

• Communicating with your business

• Influencing workflows

• Cultivating attitudes

02/04/12 2

Some Stats

Move to digital processing 1980s

Over 250 releases of official statistics every year

Multiple datasets created for each release

02/04/12 7

http://www.flickr.com/photos/26664862@N04/2499573972/sizes/l/

A good start

“an enduring national resource”

“ensuring that information is maintained in an
accessible format for possible future use”

02/04/12 9

02/04/12 11
http://www.flickr.com/photos/beglen/5385092551/

Develop a process

Retention, Preservation, and Disposal statement
for statistical data (RPDs)
• Make a start
• Develop a template
• Manage the process
• Collaborate

02/04/12 12

Documentation

Publications

Statistical metadata

Corporate Information

02/04/12 15

Challenges

Great interest  Great resistance

02/04/12 17

Household Labour Force Survey

02/04/12 20

Do it as you go

Data Management needs to be part of the
business process

•Retrospective metadata gathering is very
difficult
02/04/12 28

What makes it easier …

Choose the right person for the job

• Coordinating the work programme

• Completing the RPD statement
(providing the information)

02/04/12 29

Take your opportunities

Influence and educate in Data Management

Culture change at Statistics NZ

02/04/12 30

Outcomes
Data Archive now holds valuable data

Work underway to refine archiving process

New corporate metadata system “Colectica”
being rolled out to organisation

02/04/12 31

Slide 5:
http://www.flickr.com/photos/johntmeyer/6577544863

Slide 6;
http://www.flickr.com/photos/58597766@N05/5845710179

Slide 8:
http://farm8.staticflickr.com/7148/6577544863_11ef8358ef.jpg

Slide 11:
http://www.flickr.com/photos/beglen/5385092551

Slide 16:
http://www.loc.gov/rr/business/company/rankings.html

Slide 18:
Http://maxlblue.blogspot.co.nz/2010/11/vocab-1901-assembly-line.html

Slide 22:
http://www.flickr.com/photos/earthworm/2916565549/

Slide 23:
Http://www.flickr.com/photos/shelley_dave/6675011581/

Slide 25:
http://www.flickr.com/photos/epsos/5575089139/

Slide 26:
http://www.flickr.com/photos/sharondavis/5467939822/

02/04/12 33

Weitere ähnliche Inhalte

Ähnlich wie Grace Currie Ann Jebson First Things First

Chicago Data Driven Talk - January 29, 2015

Daniel Murray

Data visualization and school finance

Lisa Marie Gonzales, Ed.D.

DMBOK and Data Governance

Peter Vennel PMP,SCEA,CBIP,CDMP

Introduction to Data Engineering

Hadi Fadlallah

Democratizing Big Data (Updated)

Jeff Kelly

Getting Things Done for Technical Communicators at TCUK14

Karen Mardahl

Agile Data Mining with Data Vault 2.0 (english)

Michael Olschimke

Why organizations implement new systems

John Cachat

In this webinar Hether Ghelf, Blackbaud Pacific’s Senior Consultant & Project Manager, discusses a best practice approach to database cleaning and continued maintenance. Cleansing your data can have an immediate impact on your business by increasing retention and response rates, decreasing the volume of mail returned from post, and ensuring mail is reaching your organisation’s constituents. View the recording here: https://www.blackbaud.com.au/notforprofit-events/webinars/past

Best practice strategies to clean up and maintain your database with Hether G...

Blackbaud Pacific

Big Data - Introduction and Research Topics - for Dutch Kadaster

Just van den Broecke

In order to find value in your organization's data assets, heroic data stewards are tasked with saving the day- every single day! These heroes adhere to a data governance framework and work to ensure that data is: captured right the first time, validated through automated means, and integrated into business processes. Whether its data profiling or in depth root cause analysis, data stewards can be counted on to ensure the organization's mission critical data is reliable. In this webinar we will approach this framework, and punctuate important facets of a data steward’s role. Learning Objectives: - Understand the business need for a data governance framework - Learn why embedded data quality principles are an important part of system/process design - Identify opportunities to help drive your organization to a data driven culture

Data-Ed Slides: Best Practices in Data Stewardship (Technical)

DATAVERSITY

Far more organizations attempt to do more with data than succeed. Understanding common prerequisites to unrestricted data practices will help you determine the extent of these challenges in your organization and increase your chances of success. Deficiencies in organizational readiness and core competence represent clearly visible problems faced by data managers, but beyond that, there are several cultural and structural barriers common to virtually all organizations that must be eliminated in order to facilitate effective management of data. This webinar will discuss these barriers — aka the “Seven Deadly Data Sins” — and in the process will also - Elaborate upon the three critical factors that lead to strategy failure - Demonstrate a two-stage Data Strategy implementation process - Explore the sources and rationales behind the “Seven Deadly Data Sins” and recommend solutions

Necessary Prerequisites to Data Success

DATAVERSITY

Self-Service data analysis holds the promise of more rapid time-to-value for both business and IT users as advanced tooling & visualization helps make sense of raw and source data sets. Does this mean that the paradigm of ‘design-then-build’ that’s typical of data modeling is no longer relevant? Or is it more relevant than ever, as more eyes on the data means more questions about core business definitions. Join Donna Burbank for this webinar to discuss the realities of where data modeling fits in this new paradigm.

Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...

DATAVERSITY

Brian Kelsey, Civic Analytics, Austin, TX

nado-web

basis data 02.pptx

MuhammadNaufalMuthah

Data-Ed Online: A Practical Approach to Data Modeling

DATAVERSITY

Sq lite module1

Highervista

[DW&U] - To-Do, Doing, Done: How to manage work

Tomasz Poszytek

Whether you call it data munging, data cleansing, or data wrangling, everyone agrees that data preparation activities account for 80% of analysts’ time, leaving only 20% for analysis. Shifting this work to more specialized talent represents a major source of data analysis productivity improvements. This program “walks” through the major preparation categories including collection, evaluation, evolution, access design, and storage requirements. Understanding each in context also provides opportunities to develop complementary Data Governance/ethics frameworks. A generalized approach is presented. Learning objectives: - Appreciate the savings that can accrue from transforming data preparation from one-off to an improvable process - Recognize what data preparation knowledge/skills your organization has and/or needs - Better know the transformations that data can survive as it is prepared to be analyzed

Data Preparation Fundamentals

DATAVERSITY

What makes it worth becoming a Data Engineer?

Hadi Fadlallah

Ähnlich wie Grace Currie Ann Jebson First Things First (20)

Chicago Data Driven Talk - January 29, 2015

Data visualization and school finance

DMBOK and Data Governance

Introduction to Data Engineering

Democratizing Big Data (Updated)

Getting Things Done for Technical Communicators at TCUK14

Agile Data Mining with Data Vault 2.0 (english)

Why organizations implement new systems

Best practice strategies to clean up and maintain your database with Hether G...

Big Data - Introduction and Research Topics - for Dutch Kadaster

Data-Ed Slides: Best Practices in Data Stewardship (Technical)

Necessary Prerequisites to Data Success

Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...

Brian Kelsey, Civic Analytics, Austin, TX

basis data 02.pptx

Data-Ed Online: A Practical Approach to Data Modeling

Sq lite module1

[DW&U] - To-Do, Doing, Done: How to manage work

Data Preparation Fundamentals

What makes it worth becoming a Data Engineer?

Mehr von Future Perfect 2012

Working Across Organizations white paper

Future Perfect 2012

Ensuring Data Integrity white paper

Future Perfect 2012

Bigger Hard Drive Jamie Lean

Future Perfect 2012

Steve Knight by Design

Future Perfect 2012

Michael Parsons Passion

Future Perfect 2012

Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...

Future Perfect 2012

Joe Coleman Biodiversity Heritage Library

Future Perfect 2012

James Smithies Academic Earthquake Research

Future Perfect 2012

Shaun Hendy Innovation Ecosystem

Future Perfect 2012

Martin Donnelly Sarah Jones DMP Online

Future Perfect 2012

Steve Mc Eachern Australian Data Archive

Future Perfect 2012

Parul Sharma Sally Vermaaten Right Combination

Future Perfect 2012

Alison Fleming Michael Upton Collaborating for Success

Future Perfect 2012

Andrew Waugh Business Systems

Future Perfect 2012

Gabe Nault Data Integrity

Future Perfect 2012

Clare Somerville Trish O’Kane Data in Databases

Future Perfect 2012

Cochrane von Suchodoletz File Creation, Rendering and Formats

Future Perfect 2012

Dave Pearson The Adventures of Digi

Future Perfect 2012

Jay Gattuso Persistently Identifying Formats

Future Perfect 2012

Stuart Wakefield Cloud Computing

Future Perfect 2012

Mehr von Future Perfect 2012 (20)

Working Across Organizations white paper

Ensuring Data Integrity white paper

Bigger Hard Drive Jamie Lean

Steve Knight by Design

Michael Parsons Passion

Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...

Joe Coleman Biodiversity Heritage Library

James Smithies Academic Earthquake Research

Shaun Hendy Innovation Ecosystem

Martin Donnelly Sarah Jones DMP Online

Steve Mc Eachern Australian Data Archive

Parul Sharma Sally Vermaaten Right Combination

Alison Fleming Michael Upton Collaborating for Success

Andrew Waugh Business Systems

Gabe Nault Data Integrity

Clare Somerville Trish O’Kane Data in Databases

Cochrane von Suchodoletz File Creation, Rendering and Formats

Dave Pearson The Adventures of Digi

Jay Gattuso Persistently Identifying Formats

Stuart Wakefield Cloud Computing

Kürzlich hochgeladen

Tracing the root cause of a performance issue requires a lot of patience, experience, and focus. It’s so hard that we sometimes attempt to guess by trying out tentative fixes, but that usually results in frustration, messy code, and a considerable waste of time and money. This talk explains how to correctly zoom in on a performance bottleneck using three levels of profiling: distributed tracing, metrics, and method profiling. After we learn to read the JVM profiler output as a flame graph, we explore a series of bottlenecks typical for backend systems, like connection/thread pool starvation, invisible aspects, blocking code, hot CPU methods, lock contention, and Virtual Thread pinning, and we learn to trace them even if they occur in library code you are not familiar with. Attend this talk and prepare for the performance issues that will eventually hit any successful system. About authorWith two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Victor Rentea

The action of the next cyber saga takes place in the mystical lands of the Asia-Pacific region, where the main characters began their digital activities in the middle of 2021 and qualitatively strengthened it in 2022. Corporate espionage, document theft, audio recordings, and data leaks from messaging platforms were all a matter of one day for Dark Pink. Their geographical focus may have started in the Asia-Pacific region, but their ambitions knew no bounds, targeting a European government ministry in a bold move to expand their portfolio. Their victim profile was as diverse as a UN meeting, targeting military organizations, government agencies, and even a religious organization. Because discrimination is not a fashionable agenda. In the world of cybercrime, they serve as a reminder that sometimes the most serious threats come in the most unassuming packages with a pink bow.

Cyberprint. Dark Pink Apt Group [EN].pdf

Overkill Security

In the thrilling conclusion to 2023, ransomware groups had a banner year, really outdoing themselves in the "make everyone's life miserable" department. LockBit 3.0 took gold in the hacking olympics, followed by the plucky upstarts Clop and ALPHV/BlackCat. Apparently, 48% of organizations were feeling left out and decided to get in on the cyber attack action. Business services won the "most likely to get digitally mugged" award, with education and retail nipping at their heels. Hackers expanded their repertoire beyond boring old encryption to the much more exciting world of extortion. The US, UK and Canada took top honors in the "countries most likely to pay up" category. Bitcoins were the currency of choice for discerning hackers, because who doesn't love untraceable money?

Ransomware_Q4_2023. The report. [EN].pdf

Overkill Security

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

AXA XL - Insurer Innovation Award Americas 2024

The Digital Insurer

Dubai, often portrayed as a shimmering oasis in the desert, faces its own set of challenges, including the occasional threat of flooding. Despite its reputation for opulence and modernity, the emirate is not immune to the forces of nature. In recent years, Dubai has experienced sporadic but significant floods, testing the resilience of its infrastructure and communities. Among the critical lifelines in this bustling metropolis is the Dubai International Airport, a bustling hub that connects the city to the world. This article explores the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Orbitshub

Dubai, known for its towering skyscrapers, luxurious lifestyle, and relentless pursuit of innovation, often finds itself in the global spotlight. However, amidst the glitz and glamour, the emirate faces its own set of challenges, including the occasional threat of flooding. In recent years, Dubai has experienced sporadic but significant floods, disrupting normalcy and posing unique challenges to its infrastructure. Among the critical nodes in this bustling metropolis is the Dubai International Airport, a vital hub connecting the world. This article delves into the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Orbitshub

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Zilliz

The microservices honeymoon is over. When starting a new project or revamping a legacy monolith, teams started looking for alternatives to microservices. The Modular Monolith, or 'Modulith', is an architecture that reaps the benefits of (vertical) functional decoupling without the high costs associated with separate deployments. This talk will delve into the advantages and challenges of this progressive architecture, beginning with exploring the concept of a 'module', its internal structure, public API, and inter-module communication patterns. Supported by spring-modulith, the talk provides practical guidance on addressing the main challenges of a Modultith Architecture: finding and guarding module boundaries, data decoupling, and integration module-testing. You should not miss this talk if you are a software architect or tech lead seeking practical, scalable solutions. About the author With two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Victor Rentea

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

💥 You’re lucky! We’ve found two different (lead) developers that are willing to share their valuable lessons learned about using UiPath Document Understanding! Based on recent implementations in appealing use cases at Partou and SPIE. Don’t expect fancy videos or slide decks, but real and practical experiences that will help you with your own implementations. 📕 Topics that will be addressed: • Training the ML-model by humans: do or don't? • Rule-based versus AI extractors • Tips for finding use cases • How to start 👨‍🏫👨‍💻 Speakers: o Dion Morskieft, RPA Product Owner @Partou o Jack Klein-Schiphorst, Automation Developer @Tacstone Technology

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

UiPathCommunity

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Passkeys: Developing APIs to enable passwordless authentication Cody Salas, Sr Developer Advocate | Solutions Architect - Yubico Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

apidays

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

Following the popularity of “Cloud Revolution: Exploring the New Wave of Serverless Spatial Data,” we’re thrilled to announce this much-anticipated encore webinar. In this sequel, we’ll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you’re building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

Exploring Multimodal Embeddings with Milvus

Zilliz

Kürzlich hochgeladen (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Cyberprint. Dark Pink Apt Group [EN].pdf

Ransomware_Q4_2023. The report. [EN].pdf

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

AXA XL - Insurer Innovation Award Americas 2024

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Artificial Intelligence Chap.5 : Uncertainty

Why Teams call analytics are critical to your entire business

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

presentation ICT roal in 21st century education

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Axa Assurance Maroc - Insurer Innovation Award 2024

Exploring Multimodal Embeddings with Milvus

Grace Currie Ann Jebson First Things First

1. First Things First: Figuring Out What to Preserve and Why A case study of DIY data management Grace Currie & Ann Jebson March 2012

2. Systems? We should take wide view of systems: • Processes • Understanding your business • Communicating with your business • Influencing workflows • Cultivating attitudes 02/04/12 2

3. 02/04/12 3

4. 02/04/12 4

5. Don’t be afraid to DIY 02/04/12 5

6. 02/04/12 6 /

7. Some Stats Move to digital processing 1980s Over 250 releases of official statistics every year Multiple datasets created for each release 02/04/12 7

8. http://www.flickr.com/photos/26664862@N04/2499573972/sizes/l/

9. A good start “an enduring national resource” “ensuring that information is maintained in an accessible format for possible future use” 02/04/12 9

10. A good start 02/04/12 10

11. 02/04/12 11 http://www.flickr.com/photos/beglen/5385092551/

12. Develop a process Retention, Preservation, and Disposal statement for statistical data (RPDs) • Make a start • Develop a template • Manage the process • Collaborate 02/04/12 12

13. 04/02/12 13

14. 04/02/12 14

15. Documentation Publications Statistical metadata Corporate Information 02/04/12 15

16. 04/02/12 16

17. Challenges Great interest  Great resistance 02/04/12 17

18. 04/02/12 18

19. Pisces.SD2 02/04/12 19

20. Household Labour Force Survey 02/04/12 20

21. How did we get ‘buy in’? 02/04/12 21

22. 04/02/12 22

23. 04/02/12 23

24. 04/02/12 24

25. 04/02/12 25

26. 04/02/12 26

27. What have we learned? 02/04/12 27

28. Do it as you go Data Management needs to be part of the business process •Retrospective metadata gathering is very difficult 02/04/12 28

29. What makes it easier … Choose the right person for the job • Coordinating the work programme • Completing the RPD statement (providing the information) 02/04/12 29

30. Take your opportunities Influence and educate in Data Management Culture change at Statistics NZ 02/04/12 30

31. Outcomes Data Archive now holds valuable data Work underway to refine archiving process New corporate metadata system “Colectica” being rolled out to organisation 02/04/12 31

32. Questions? 02/04/12 32

33. Slide 5: http://www.flickr.com/photos/johntmeyer/6577544863 Slide 6; http://www.flickr.com/photos/58597766@N05/5845710179 Slide 8: http://farm8.staticflickr.com/7148/6577544863_11ef8358ef.jpg Slide 11: http://www.flickr.com/photos/beglen/5385092551 Slide 16: http://www.loc.gov/rr/business/company/rankings.html Slide 18: Http://maxlblue.blogspot.co.nz/2010/11/vocab-1901-assembly-line.html Slide 22: http://www.flickr.com/photos/earthworm/2916565549/ Slide 23: Http://www.flickr.com/photos/shelley_dave/6675011581/ Slide 25: http://www.flickr.com/photos/epsos/5575089139/ Slide 26: http://www.flickr.com/photos/sharondavis/5467939822/ 02/04/12 33

Hinweis der Redaktion

Good morning, My name is Grace Currie and this is my colleague Ann Jebson. We are from the Information Management team at Statistics NZ and we are here to present to you what we like to think of as “ A case study of DIY data management”.
All of us are here because we are interested in how to integrate digital preservation requirements into the design of systems. When we think of system design it’s sometimes hard not to think of a magical IT system that manages digital content from time of creation. But system design is also about: Processes Understanding your business Communicating with your business Influencing workflows Cultivating the right attitude in your organisation. As New Zealand’s National Statistics Office, our core business revolves around the collection, analysis and publication of data. This data has immense ongoing value. Today Ann and I will tell you how the Information Management team at Statistics NZ approached the task of identifying thousands of datasets and their associated metadata and documentation so they could be preserved for future reuse.
This is a journey that has taken us from the “fire fighting” “ambulance at the bottom of the cliff” position, which I’m sure many of you will be familiar with……
…… .to a state where we can now support the data management practices of our business. For us this means being involved over all the phases of the model you see here – our Statistical business process model. This model illustrates the seven stages that most studies follow during the production of official statistics. When we say a “study” we mean an activity where data is collected, for example by a survey or a census, to produce a set of information. Studies you may know of include: Consumers Price Index (CPI), Gross Domestic Product (GDP) and the Census of Population and Dwellings.
The ever increasing volume and diversity of content created in our organisations means developing innovative methods to identify what content preserve is more important than ever. The message we want to get across to you today is that although such a task can be daunting, it can be achieved with tools that we all have readily available. Don’t be afraid to DIY.
A few years ago our data management situation was a bit like a teenager’s messy room. We had stores of digital information rapidly growing in multiple locations. The problem was that our statistical analysts were very competent with the confidentiality, privacy and security aspects of data management, but they weren’t so good with documenting t he basics like file names and locations what the data was used for and what metadata was associated with what data This was a bit of a problem considering that in 2005 the Statistics NZ Data Archive was established as a repository to preserve valuable data and ensure its availability to future users, both internal staff and external researchers. We had no shortage of valuable data to preserve in the Data Archive – legacy data was in abundance – but we lacked the basic information required to begin ingesting this data into our archive in large quantities.
To give you an idea of the size of the problem we were tackling consider this: The move to digital processing tool place around thirty years ago Currently, there are over 250 releases of official statistics every year Multiple datasets are produced for each of these releases over the collection, processing, analysis and dissemination of the model I showed you earlier
In addition to this, data is different from other digital content. Data is not as self-descriptive as other information, such as written documents or images. The numbers you see here mean nothing without context. For data, that context is provided by statistical metadata. Statistical metadata is information that helps us understand data and make information out of numbers. Statistical metadata refers to information about surveys and their publications, questions and questionnaires, variables and methodologies Therefore, to preserve our valuable data is a way that would mean it would be understandable and usable in the future we also needed to locate and consolidate a large quantity of statistical metadata for each study.
So, how did we approach this situation? We got off to a good start because we had two things that were instrumental to success. Firstly: That Statistics NZ, more specifically the senior leadership, had a vision for data reuse. Our 2006 statement of intent talked about our data as “an enduring national resource” and placed importance on “ensuring that information is maintained in an accessible format for possible future use”.
Secondly: Since data fell outside of the coverage of Archives NZs General Disposal Authorities, Statistics NZ and Archives together developed a specialised Appraisal Report and Disposal Schedule for Statistical Data, Documentation and Metadata . The appraisal report recommended the retention of final, definitive versions of official statistical datasets It also recommends retention of the core documentation and metadata which summarises the design, development, collection, processing, and analysis of official statistical collections and data. This gave a yard stick with which to evaluate our data.
However, there was much we didn’t know and we didn’t have a process or system in place to gather this knowledge. Preservation assumes that organisations have knowledge that many do not have - namely the fundamentals: what you have, where it is, how much there is and what it looks like. We needed a process and a vehicle to help us to document our data assets. This where our DIY system comes in. I’ll now hand over to Ann who will tell you more about this.
So, we developed a process (to tidy the room) We created what we call the Retention, Preservation, and Disposal statements for statistical data. Or (RPDs) These are documents in which we record what statistical information we have, and what we plan to do with it. We began in 2007 with a basic template in Lotus Notes, but moved on to an excel template when we realised that we more detailed information. The important thing for us was to make a start - and then we refined our process as we learned more about what we needed. This template allows us to standardise the collection of the information that we need. And, we were aware that someone needed to manage the process to ensure that the documented information meets consistent standards and that all studies are covered which in our case is Information Management. And, most importantly, the RPD process is a collaborative process – a team sport. Statistical business units and Information Management work together to produce and evaluate the RPD statements for managerial approval.
Our process is low tech – our template is an excel workbook with separate pages where we list the different types of information that we need. We need pages for the scope of the RPD – which provides a description of what the study is at a high level, and includes information about who the main users of the study are and what it is used for. This information helps us to know how much effort to put into archiving the data and the amount of metadata we will need to archive with the study so that it can be properly understood in the future. Other pages list Datasets , and documentation about the study, and lastly a page to record when the RPD statement will be formally reviewed.
The data page lists the datasets that the study produces . The most important information is the filename and location – including file path and server name - for each dataset. Also, record classes and disposal decisions for datasets are recorded here. All datasets produced are listed - those that will be archived and also those that will be destroyed when their operational purpose is complete. Also, any data that should be listed here, but cannot be found , must also be recorded, with a note saying that it cannot be found.
The documentation about the study falls into these 3 types Typically, the publications page would list the Information releases, publications and articles but could also include conference papers, or important presentations relating to the study. Statistical metadata is the information that makes the numbers into data – in that it gives context and meaning to the numbers in a dataset – it is, therefore, vital to the data being useful in the future, when everyone who knows about the study has gone. The information we expect to see here includes: sampling methods, questionnaires, classifications used, and processing documents. Lastly, the Corporate Information page will list any contracts, business cases, relevant corporate policies, etc. to do with the study. Any documentation that should be listed in these pages, but cannot be found or was never produced , must also be recorded, with a note to say so – this eliminates ‘time wasting’ - looking for things that cannot be found or never existed. Once again, we want to know: What it is why it is what dataset it refers to, and where it is located.
The RPD statement is reviewed on a regular basis. A formal review time is specified in the RPD. At this time a new version is created which is updated with the latest information about the study. We also meet with the data custodian for an annual informal review. This is a quick and casual meeting which is a valuable way of keeping in touch with who is responsible for the data, what is happening in the business unit, and what is being planned – a great way to gather intelligence and to have a feel for what is going on in the organisation. These reviews are crucial, they ensure that the process of documenting data and metadata is embedded in the organisation – it is not about doing it once.
When we first introduced the RPD process to the organisation – the reaction was predictable. Some people thought it was a great idea and could see the benefits immediately Others were not so keen
Everyone is already very busy Particularly those responsible for a number of studies, and that publish their data monthly, quarterly and annually Their focus very quickly moves from what they have just published to what is about to be published and no one is keen to take on what they perceived to be extra work The frequently asked question is how long will it take , or how much effort is required ? And all we could say was – it depends …. It depends on how complex the study is Some are complicated – like the Consumers Price Index, Balance of Payments, National Accounts Other are relatively straight forward It also depends on how well the data is already being managed If data management is not good, the work to located, and appraise and documentation will take a long time e.g. Quite some effort went into discovering the final dataset for the Marine Recreational Fishing Survey that was run in 1987 – it was eventually found – named <click> ‘Pisces’
However, if data management practices are good, then it is just a matter of documenting the fact in the RPD statement And even better, if standard file names, and file structures are used, these will only need to be documented once The point being that - if standard conventions are followed - it will take less time than if free spirits have been allowed to name and structure files creatively.
This is an example of a sensible naming convention. This is part of the data page for the Household Labour Force Survey RPD. This survey is published quarterly and produces New Zealand’s official employment and unemployment statistics. The survey has been published quarterly, for 25 years, producing more than 17 datasets each quarter, but the data page of the RPD statement is relatively simple. There are 18 lines to the data sheet and in general, they will not need to be changed. The only change will be to add an additional line if a special dataset is created for a particular purpose during a quarter.
How did we get ‘buy in?’ As Grace mentioned, we have the active support of senior management - which is the first thing you need if you are going to succeed. We spent a lot of time selling the benefits of having ‘up-to-date’ RPDs
The benefit of being tidy and organised – of documenting what you have and where you put it – reducing risks. The risk of knowledge being lost – for example, when information is stored in people’s heads, -the information is not available to the rest of the organisation, - and it leaves the organisation when that person leaves Of datasets not being stored in correct locations e.g. if data is stored on personal drives, it needs to be moved to shared drives, and insufficient documentation about a process. We also provided support - mainly through personal help and encouragement, but also by publishing ‘A Guide to completing an RPD statement’, and an exemplar of a completed statement. The Appraisal guidelines and corporate policies and processes that were already established also supported decision making. Another way to get ‘buy in’ is to Appreciate the effort that is put into completing the RPD statement <click> http://www.flickr.com/photos/earthworm/2916565549/
The main currency of appreciation at Statistics NZ is Chocolate But we also provide positive feedback for those that have produced quality RPDs – at the time of manager sign off and also at performance review time.
Similar to appreciating the effort is appreciating the content , in our case - the data Our statistical analysts love their data and find it infinitely interesting Showing a genuine interest in understanding and valuing their data, data process, and metadata certainly helps to get the job done
The disposal decisions recorded in RPDs provide an opportunity to free up space on shared drives Lack of disc space is a constant issue at Stats. At Statistics NZ data is generally disposed of in one of 2 ways – it is either preserved or destroyed. We preserve data in the Data Archive but before we agree to do this, an RPD statement must be completed and signed off by managers When data has been successfully archived, the duplicate copies can be destroyed, which frees up valuable space on drives The other option is to destroy data We are terrible hoarders at Stats NZ - RPDs have introduced the idea that data can be destroyed, and in some cases, must be destroyed. But, data may not be destroyed unless it is listed in the RPD statement and the record class assigned to it allows for destruction.
Unexpected benefits have also helped with ‘buy in’ This is a case where rhetoric becomes tangible. Statistics New Zealand has a very productive office in Christchurch – our colleagues there produce and release a range of business and population statistics. At the time of the earthquake in February 2011, many of the studies that are compiled in the Christchurch office were being analysed or due to be released. The information stored in the RPD statements about filepaths and server names was used to help identify and prioritise the work to recover data from the Christchurch server back up. The process documents that were recorded also enabled analysts in our other offices to help with analysing data.
So, what have we learned?
Do it as you go Data Management needs to be part of the business process – we need to manage data and metadata from time the study begins, through the entire data cycle - and completing the RPD statement needs to be in the Business Unit’s work plan – as part of the survey documentation – at Stats, if it is in the workplan – it will happen ! Retrospective metadata gathering is very difficult if not impossible When a study becomes obsolete, people move on very quickly - this can make preserving the obsolete data - as meaningful data - very difficult. The metadata that makes the numbers meaningful must be documented as you go.
What makes it easier …. Choosing the right person for the job The person to coordinating the programme of work needs to be an influencer – someone who can get other people to do things for them, someone who understands the data cycle of the business unit and its pressure points – knows when to push, when to leave them alone, and when to help someone who can work around ‘road blocks’ and gets things done. You also need: The right person to complete the RPD statement Someone who understands the data through all parts of the data cycle, and who knows and understands the importance of metadata and other the documentation that supports the data Completing RPDs is not a good way for a new person to learn about a study – that a recipe for frustration all round !
Take your opportunities The RPD process has provided an opportunity to influence and educate best practice data management. During the process we see what actually happens – and when what actually happens is not what should happen – we have an opportunity to educate and change processes and habits to best practice, or at the very least, to alert the organisation to risks. We have experienced a change in culture regarding data management at Statistics NZ. The attitude towards the RPD process has changed. The importance of documenting information about data and metadata so that it is current and available is now embedded in process and is accepted as part of what we do.
RPDs were instrumental in moving us to our current state where we can now support the data management practices of our business over all stages. The Data Archive now provides datasets for researchers to use in the Statistics NZ Data Laboratory, and for internal staff to reuse in statistical production. Like we did with the RPD process we are are still refining. There is currently work underway to refine out archiving process through automation. And last, but definitely not least, as a result of what we have learnt with the RPDs, we now have our own “ magical IT system” that will manage statistical metadata over the whole data cycle. This metadata repository is called Colectica and is about to be rolled out to multiple business units. So, our system and process has evolved into something bigger and better which we think isn’t too bad for a bit of DIY.

Grace Currie Ann Jebson First Things First

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Grace Currie Ann Jebson First Things First

Ähnlich wie Grace Currie Ann Jebson First Things First (20)

Mehr von Future Perfect 2012

Mehr von Future Perfect 2012 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Grace Currie Ann Jebson First Things First

Hinweis der Redaktion