Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Data Lake: A simple introduction

3.959 Aufrufe

Veröffentlicht am

An introduction to IBM Data Lake by Mandy Chessell CBE FREng CEng FBCS, Distinguished Engineer & Master Inventor.
Learn more about IBM Data Lake: https://ibm.biz/Bdswi9

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Data Lake: A simple introduction

  1. 1. © 2016 IBM Corporation Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 IBM’s Data Lake – A Basic Definition 1st June 2016 Mandy Chessell CBE FREng CEng FBCS Distinguished Engineer, Master Inventor Analytics Group CTO Office
  2. 2. © 2016 IBM Corporation2 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 Data blues & skills issues § A disproportionate portion of the time spent in analytics project is about data preparation: acquiring/preparing/formatting/normalizing the data § In addition to raw data, augmented data/analytical assets can significantly speed up the analytics process and partially bridge the talent gap
  3. 3. © 2016 IBM Corporation3 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 A growing demand … Business Teams want • Open access to more information • More powerful analysis and visualization tools IT Teams are • Concerned about cost. • Concerned about governance and regulatory requirements.
  4. 4. © 2016 IBM Corporation4 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 Big Data Lakes or Swamps? § As we collect data • Can we preserve clarity? • Do we know what we are collecting? • Can we find the data we need? § Are we creating a data swamp? § How do we build trust in big data? • Do we know what data is being used for?
  5. 5. © 2016 IBM Corporation5 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 "The need for increased agility and accessibility for data analysis is the primary driver for data lakes," said Andrew White, vice president and distinguished analyst at Gartner. "Nevertheless, while it is certainly true that data lakes can provide value to various parts of the organization, the proposition of enterprise wide data management has yet to be realized." http://www.gartner.com/newsroom/id/2809117
  6. 6. © 2016 IBM Corporation6 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 IBM’s Data Lake – designed for data access – with safeguards IBM’s Data Lake = Efficient Management, Governance, Protection and Access. Data Lake (System of Insight) Information Management and Governance Fabric Data Lake Services Data Lake Repositories
  7. 7. © 2016 IBM Corporation7 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 Users supported by IBM’s Data Lake Data Lake (System of Insight) Information Management and Governance Fabric Data Lake Services Line of Business Teams Data Lake Operations Data Lake Repositories Enterprise IT Other Data Lakes Systems of Engagement Systems of Automation Systems of Record New Sources Analytics Teams Governance, Risk and Compliance Team Information Curator
  8. 8. © 2016 IBM Corporation8 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 The subsystems inside IBM’s Data Lake Data Lake (System of Insight) Information Management and Governance Fabric Catalogue Self- Service Access Enterprise IT Data Exchange Self-Service Access Analytics Teams Governance, Risk and Compliance Team Information Curator Line of Business Teams Data Lake Operations Enterprise IT Other Data Lakes Systems of Engagement Data Lake Repositories Systems of Automation Systems of Record New Sources Analytics Engines
  9. 9. © 2016 IBM Corporation9 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 View from the user community - fraud Conform to regulations Investigate Fraud Case Develop new fraud models Detect and prevent fraud Detect and prevent fraud Detect and prevent fraud
  10. 10. © 2016 IBM Corporation10 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 The role of the catalogue Data Stores Curation of Metadata about Stores, Models, Definitions Information Governance Catalogue Search for, locate and download data and related artifacts. Provision Sand Boxes. Add additional insight into data sources through automated analysis. Develop data management models and implementations. Data StoresData Stores Sand Box Define governance policies, rules and classifications. Monitor compliance. View lineage (business and technical) and perform impact analysis.
  11. 11. © 2016 IBM Corporation11 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 Governance ensures proper management and use of information Information Governance Compliance Policy Administration Policy Enforcement Policy Monitoring Policy Implementation Standards Protection Lifecycle Quality Information Values Quality Information Dependencies Information Requirements Information Supply Chain Integrity Information Identification Information Retention Information Usage Information Privacy Information Architecture Information Disposal Are People/Systems operating properly Is data quality sufficient for use? Is data kept for appropriate length of time? Is data properly protected from loss or inappropriate use? Are systems built to appropriate standards?
  12. 12. © 2016 IBM Corporation12 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 Data lake security § The data lake’s repositories are only accessed by authorized processes. § People access the data from the data lake through the services. • Identified through a common authentication mechanism (eg LDAP) • Data classified in the catalog • Access granted by business owners • Access controlled by data lake services • All activity monitored by probes that store log information in the audit data zone. IBM’s Data Lake = Efficient Management, Governance, Protection and Access. Data Lake Information Management and Governance Fabric Data Lake Services Data Lake Repositories
  13. 13. © 2016 IBM Corporation13 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 Data Lake (System of Insight) Information Management and Governance Fabric Catalogue Self-Service Access Enterprise IT Data Exchange Self-Service Access Analytics Teams Governance, Risk and Compliance Team Information Curator Line of Business Teams Data Lake Operations Enterprise IT Other Data Lakes Systems of Engagement Systems of Automation Systems of Record New Sources Analytics Engines IBM’s Data Lake – example deployment options InfoSphere Streams InfoSphere Information Server InfoSphere Information Server InfoSphere Information Server Cognos Watson Explorer Cloudant Pure Data / BLU InfoSphere BigInsights InfoSphere Master Data Management Watson Analytics InfoSphere Information Server, Optim and Guardium SPSS
  14. 14. © 2016 IBM Corporation14 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 IBM’s Data Lake § As organizations experiment with analytics they discover: • Creating new analytics requires access to historical data from many systems. • This data includes valuable and sensitive data that is core to the organization’s operation. • Hadoop is a flexible platform for storing many types of data but is not necessarily fast enough for the production deployment of some analytics. Data needs to be reformatted and copied onto a specialist analytics platforms such as Netezza. § A data lake provides: • Single extraction of data from operational systems and distribution to multiple analytics platforms. • Cataloguing and governance of the data in the analytics platforms • Simple interfaces for the line of business to access the information they need. IBM’s Data Lake = Efficient Management, Governance, Protection and Access. Data Lake Information Management and Governance Fabric Data Lake Services Data Lake Repositories
  15. 15. © 2016 IBM Corporation15 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 Governing and managing Big Data for Analytics and Decision Makers § An introduction to IBM’s Data Lake solution http://www.redbooks.ibm.com/redpieces/abstracts/redp5120.html ?Open
  16. 16. © 2016 IBM Corporation16 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 Designing and Operating a Data Reservoir § Description of the behaviour and processes that make up a data lake from IBM (aka data reservoir) § Blog • 5 things to know about a data reservoir https://www.ibm.com/developerwo rks/community/blogs/5things/entry /5_things_to_know_about_data_res ervoir?lang=en § Redbook • http://www.redbooks.ibm.com/Red books.nsf/RedpieceAbstracts/sg248 274.html?Open
  17. 17. © 2016 IBM Corporation17 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 Ethics for Big Data and Analytics ü Context – for what purpose was the data originally surrendered? For what purpose is the data now being used? How far removed from the original context is its new use? ü Consent & Choice – What are the choices given to an affected party? Do they know they are making a choice? Do they really understand what they are agreeing to? Do they really have an opportunity to decline? What alternatives are offered? ü Reasonable – is the depth and breadth of the data used and the relationships derived reasonable for the application it is used for? ü Substantiated – Are the sources of data used appropriate, authoritative, complete and timely for the application? ü Owned – Who owns the resulting insight? What are their responsibilities towards it in terms of its protection and the obligation to act? ü Fair – How equitable are the results of the application to all parties? Is everyone properly compensated? ü Considered – What are the consequences of the data collection and analysis? ü Access – What access to data is given to the data subject? ü Accountable – How are mistakes and unintended consequences detected and repaired? Can the interested parties check the results that affect them? http://www.ibmbigdatahub. com/whitepaper/ethics-big- data-and-analytics
  18. 18. © 2016 IBM Corporation18 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 Common Information Models for an Open, Analytical and Agile World § To drive maximum value from complex IT projects, IT professionals need a deep understanding of the information their projects will use. Too often, however, IT treats information as an afterthought: the “poor stepchild” behind applications and infrastructure. That needs to change. This book will help you change it. § Using a complete case study, the authors explain what CIMs are, how to build them, and how to maintain them. You learn how to clarify the structure, meaning, and intent of any information you may exchange, and then use your CIM to improve integration, collaboration, and agility. § In today’s mobile, cloud, and analytics environments, your information is more valuable than ever. To build systems that make the most of it, start right here.
  19. 19. © 2016 IBM Corporation19 Learn more about Data Lakes on ibm.com: https://ibm.biz/Bdswi9 Data Lake: Taming the Data Dragon (White Paper) Taming the data dragon leads to significant benefits across the enterprise, from improved productivity to increased effectiveness in sales and marketing. A data lake accepts data flows from any source and brings them into a common platform for use. Data is stored in its raw, unrefined state and located, processed, refined and extracted as required. However, governance needs to be applied to the data lake to ensure it becomes a trusted data source, rather than a formless landing area in which data is stored without consideration of its validity, value or shelf life. Download Now: https://ibm.biz/Bdswiu

×