Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

RWDG Slides: How to Govern Data Lakes

501 Aufrufe

Veröffentlicht am

Are you spending your summer down by the Data Lake? If so, then you want to make certain that the lake is clean and that you pick the best place to swim. The Data Lake is the new analytical paradise that many organizations are banking on to become that answer to improved insights. And you need to prevent the lake from turning swampy.

In this month’s RWDG webinar, Bob Seiner and a special guest will focus on how to govern the data in your Data Lake. Bob’s interaction with his guests is always lively, fact filled and this month they will help you to successfully swim through major barriers to provide an effective and valuable data resource.

In this webinar, Bob and his guest will discuss:
- The relationship between Data Lakes and Data Governance
- Preventing your Data Lake from becoming a Data Swamp
- Governing the Metadata associated with your Data Lake
- Leveraging governed data to provide trustworthy Analytics
- Measuring the value of a governed Data Lake

Veröffentlicht in: Daten & Analysen
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

RWDG Slides: How to Govern Data Lakes

  1. 1. 1 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner How to Govern Data Lakes with Special Guest Evan Terry Monthly Webinar Series Hosted by DATAVERSITY Robert S. Seiner – KIK Consulting / TDAN.com July 18, 2019 – 11:00 a.m. PT / 2:00 p.m. ET Real-World Data Governance
  2. 2. Unified Data Orchestration Madan Kumar | Solutions Engineer| Alluxio madan@alluxio.com
  3. 3. 4 big trends driving the need for a new architecture Separation of Compute & Storage Hybrid – Multi cloud environments Self-service data across the enterprise Rise of the object store
  4. 4. Data Ecosystem - Beta Data Ecosystem 1.0 COMPUTE STORAGE STORAGE COMPUTE
  5. 5. Data Orchestration Framework Java File API HDFS Interface S3 Interface REST APIFUSE Interface HDFS Driver Swift Driver S3 Driver NFS Driver
  6. 6. Alluxio’s Approach to Big Data Federation  Unified Access - Acts as a “virtual data lake.” Files are accessed in Alluxio’s global namespace as if they resided in a single system  Performant - Provides fast local access to important and frequently used data, without maintaining a permanent copy of all data.  Modern, flexible architecture - Promotes separation of compute from storage  Storage Cost Optimization -Transparently reads and writes data directly from the source system, and so does not need to create a permanent copy of the data
  7. 7. Data Elasticity with a unified namespace Abstract data silos & storage systems to independently scale data with compute Run Spark, Hive, Presto, ML workloads on your data located anywhere Accelerate big data workloads with transparent tiered local data Data Accessibility for popular APIs & API translation Data Locality with Intelligent Multi-tiering Key Innovations of the Data Orchestration Layer
  8. 8. Use Cases Data Orchestration Enables Hive Alluxio Run big data workloads in hybrid cloud environments On premise Same instance / container Spark Alluxio Any Cloud / Multi Cloud Same data center / region PrestoSpark Alluxio Accelerate big data frameworks on the public cloud Same instance / container Enable big data on object stores across single or multiple clouds Standalone
  9. 9. Incredible Open Source Momentum with growing community 900+ contributors & growing 3760+ Git Stars Apache 2.0 Licensed Hundreds of thousands of downloads Join the conversation on Slack alluxio.org/slack
  10. 10. 2 2 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Real-World Data Governance – Monthly Webinar Series – August 15, 2019 – Data Governance versus Information Governance – Third Thursday each Month @ 2pm EST – Register at TDAN.com, KIKconsulting.com, DATAVERSITY.net • Non-Invasive Data Governance Book – ISBN 9781935504856 / Technics Publishing / Amazon.com • Speaking @ Dataversity Events – Data Architecture Summit, Chicago – October 14-17 – Data Governance Vision, Washington, DC – December 9-12 • Non-Invasive Data Governance Online Learning Plan Non-Invasive Metadata Governance Online Learning Plan – DATAVERSITY Training Center – https://training.dataversity.net • The Data Administration Newsletter (TDAN.com) – Twice Monthly – Data Articles, Columns, Blogs and Features – Produced by DATAVERSITY – Subscribe for emails – New Non-Invasive Data Governance Framework now being published • KIK Consulting & Educational Services KIKConsulting.com Home of Non-Invasive Data Governance™ – Home of Non-Invasive Metadata Governance How to Govern Data Lakes Introduction
  11. 11. 3 3 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner Chief Analytics Officer, Velocity Mortgage Capital Evan brings over 20 years of consulting experience in IT environments, including leading software development projects, designing and implementing IT and data strategies, and working on long term, cross departmental projects in such diverse industries as automotive, retail, state government, and e-commerce payments. Evan’s areas of expertise include designing practical analytics solutions, aligning business and IT strategies, and implementing data management and governance programs. He co-authored the data modeling book Beginning Relational Data Modeling and has spoken about data and process quality and systems design. Evan has a BA in Economics from McGill University and an MBA from Columbia Business School. How to Govern Data Lakes Special Guest Evan Terry
  12. 12. 4 4 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • In this webinar, Bob and Evan will discuss: – The relationship between Data Lakes and Data Governance – Preventing your Data Lake from becoming a Data Swamp – Governing the Metadata associated with your Data Lake – Leveraging governed data to provide trustworthy Analytics – Measuring the value of a governed Data Lake How to Govern Data Lakes Abstract
  13. 13. 5 5 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • What is Data Governance? – The execution and enforcement of authority over the definition, production and usage of data and data-related assets. Robert S. Seiner – The management and organization of data. Evan Terry – The orchestration of people and process and data. – The harmonization of people and process and data. – The formalization of accountability for data. – The implementation of decision-rights for data. How to Govern Data Lakes The Relationship between Data Lakes and Data Governance
  14. 14. 6 6 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • What is a Data Lake? – A data lake is a system or repository of data stored in its natural/ raw format, usually object blobs or files. – A data lake is usually a single store of all enterprise data including raw copies of source system data and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. SAS Article, 2016 • When does a Data Lake become a Data Swamp? – A data swamp is a deteriorated and unmanaged data lake that is either inaccessible to its intended users or is providing little value. Olavsrud, Thor. CIO 2017 – When the data in the lake is ungoverned. How to Govern Data Lakes The Relationship between Data Lakes and Data Governance
  15. 15. 7 7 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • A connection between governance (how to manage and organize) and data lakes for accurate and useful data management • Catalogs are critical to help you govern data, especially in data lakes – Find things – Defining things – Curate content • Need to include policy-driven processes that classify and identify the information in the lake, why it’s in there, what it means, who owns it, and who is using it • A data lake without data governance will ultimately end up being a collection of disconnected data pools or information silos—just all in one place. How to Govern Data Lakes The Relationship between Data Lakes and Data Governance
  16. 16. 8 8 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • What can be done to prevent the swamping of your data lake? – Implement data governance for the lake. – Implement metadata management for the lake. – Implement sound principles of: • Data Definition • Data Production • Data Usage • What is the appropriate level of data governance for your data lake? How to Govern Data Lakes Preventing your Data Lake from becoming a Data Swamp
  17. 17. 9 9 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • A “data lake” becomes a data swamp without organization – No organization, no curation of content, little metadata • Data warehouse principles are relevant: – Stewardship/Curation – Design, documentation, maintenance of the lake – Metadata capture – Governance • Technique - Create zones in your data lake: – Transition data sets from “raw data” to “clean data” – Apply different curation/governance principles to each zone How to Govern Data Lakes Preventing your Data Lake from becoming a Data Swamp
  18. 18. 10 10 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Governing metadata associated with: – Data Definition – Data Production – Data Usage • (Where) Is there metadata associated with your data lake? • Who is responsible for the metadata associated with your data lake? • “The metadata will not govern itself!” How to Govern Data Lakes Governing the Metadata Associated with your Data Lake
  19. 19. 11 11 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Cataloging is key, but is tricky: – don’t under/over catalog – don't be too loose/rigid in your governance rules • “Goldilocks” mentality – everything in moderation • Tune governance to priorities and context – One person's data lake is another’s data swamp – Don't turn data lake into a data warehouse – the clearest data lake – Cannot be all things to all people – playground, incubator, or operational data store? How to Govern Data Lakes Governing the Metadata Associated with your Data Lake
  20. 20. 12 12 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Sample DG purpose statement – Use strategic data with confidence. • Make certain the water is clean or it may be unhealthy. • “Boil water alert” – Is data governance the boiling of the water? • “Freshwater” versus “Saltwater” determines species that will live in your lake. How to Govern Data Lakes Leveraging Governed Data to Provide Trustworthy Analytics
  21. 21. 13 13 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Data catalogs solve the problems of finding, interpreting and using data • Data lake is a tool and the context is key – differences in required data quality • “Trustworthy” depends on context and accuracy needs – data lakes are defined as “less” controlled and structured How to Govern Data Lakes Leveraging Governed Data to Provide Trustworthy Analytics
  22. 22. 14 14 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Provides much the same value as for a data warehouse – analytics requires: – Who owns the data and can answer questions about it – Finding the right data elements that meet your needs – Cleaning the data to an appropriate level of quality – Having the right security on the data being used – Monitoring the data for adherence to standards • Lightweight governance on adding, naming, organizing protects the shared resource from the “tragedy of the commons” How to Govern Data Lakes Leveraging Governed Data to Provide Trustworthy Analytics
  23. 23. 15 15 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Metrics are one of the 6 core components of Data Governance. Data, people, process, communications, metrics and tools. • Measuring people’s ____________ the data in the lake. – confidence in – understanding of – usage of – decisions made using – knowledge of what data resides in – … all will depend on the effective management of metadata associated with your data lake. How to Govern Data Lakes Measuring the Value of a Governed Data Lake
  24. 24. 16 16 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Considerations for providing metrics – Benchmark current status – Select metrics that mean something to someone – Select metrics associated with the data lake rather than data governance – Consider that it is not easy to measure Return on Investment on DG – Go jump in the lake! How to Govern Data Lakes Measuring the Value of a Governed Data Lake
  25. 25. 17 17 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Unlocking the value depends on the data lake being broadly usable • What is the value of R&D? What is the value of avoiding a disaster? • The context of the data lake is key – What is the purpose of the data lake? – What is the tool the data lake will help you solve? – How much value does governance (lightweight or not) provide? • Value is measured in combination with the final use – AI/Machine Learning – Agility/Time to Market – Variety of end users served/capabilities enabled How to Govern Data Lakes Measuring the Value of a Governed Data Lake
  26. 26. 18 18 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • In this webinar, Bob and Evan discussed: – The relationship between Data Lakes and Data Governance – Preventing your Data Lake from becoming a Data Swamp – Governing the Metadata associated with your Data Lake – Leveraging governed data to provide trustworthy Analytics – Measuring the value of a governed Data Lake How to Govern Data Lakes Abstract
  27. 27. 19 19 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Questions and Answers Real-World Data Governance Contact Information Join us in the Dataversity Community to continue the conversation. https://community.dataversity.net/
  28. 28. 20 20 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Robert S. Seiner KIK Consulting & Educational Services – KIKconsulting.com The Data Administration Newsletter – TDAN.com Post Office Box 112571, Upper St. Clair, Pennsylvania 15241 412.220.9643, 412.220.9644 (Fax) rseiner@kikconsulting.com rseiner@tdan.com @RSeiner @TDAN_com #RWDG Real-World Data Governance Contact Information

×