Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Leveraging the cloud for analytics and machine learning 1.29.19

301 Aufrufe

Veröffentlicht am

Learn how organizations are deriving unique customer insights, improving product and services efficiency, and reducing business risk with a modern big data architecture powered by Cloudera on Azure. In this webinar, you see how fast and easy it is to deploy a modern data management platform—in your cloud, on your terms.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Leveraging the cloud for analytics and machine learning 1.29.19

  1. 1. © Cloudera, Inc. All rights reserved. Migrating Analytics and ML to the Cloud Sushant Rao Cloud Product Marketing @ Cloudera Ron Abellera Azure Global Black Belt @ Microsoft Azure
  2. 2. © Cloudera, Inc. All rights reserved. 2 Poll Question 1: Where are you in your journey to the Cloud? ● Just started researching options in Cloud ● Starting to test different products / services in Cloud ● Have some deployments and looking to expand in Cloud ● Critical mass in the Cloud
  3. 3. 3 © Cloudera, Inc. All rights reserved. Why Cloud? CLOUD BENEFITS CLOUD PROBLEMS • Agility ○ Speed of making changes to meet business / technical needs • Scalable & Elastic ○ Scale up and down quickly • Reliable ○ Multiple options to ensure infrastructure / services are available ○ Tenant isolation ensure different workloads don’t conflict with each other • Other ○ Pay-as-you-go charges only for consumption (but not necessarily cheaper) ○ Self-service enables users to do their work without contacting IT / Data platform team
  4. 4. 4 © Cloudera, Inc. All rights reserved. But ... CLOUD PROBLEMSCLOUD CHALLENGES • Multiple copies of data & Disjointed services ○ Different services have their own copies and may not work together • On-premises integration ○ Data gravity is on-prem, so cloud needs to complement current data platform • Cloud Lock-in ○ Open source prevented lock-in for on-prem. What about cloud? • Shadow IT ○ Individual business units may setup up their own cloud deployments, without the architecture, security, and/or governance of the on-prem deployment • Cheaper? ○ On-prem can be more than 2x cheaper than cloud
  5. 5. 5 © Cloudera, Inc. All rights reserved. Common Uses Cases for Cloud CORPORATE DIRECTIVE • C-level has decided to utilize the cloud more • Running out of data center space, looking for more agility / flexibility
  6. 6. 6 © Cloudera, Inc. All rights reserved. Common Uses Cases for Cloud CORPORATE DIRECTIVE DISASTER RECOVERY • C-level has decided to utilize the cloud more • Running out of data center space, looking for more agility / flexibility • Backup all data to the cloud, without a second “physical” location • Save time and expense of setting up a physical DR site
  7. 7. 7 © Cloudera, Inc. All rights reserved. Common Uses Cases for Cloud CORPORATE DIRECTIVE ELASTIC WORKLOADSDISASTER RECOVERY • C-level has decided to utilize the cloud more • Running out of data center space, looking for more agility / flexibility • Separate environment for new, production or for intermittent, ad-hoc workloads • Takes too long to acquire and setup on-prem infrastructure • Backup all data to the cloud, without a second “physical” location • Save time and expense of setting up a physical DR site
  8. 8. 8 © Cloudera, Inc. All rights reserved. Common Uses Cases for Cloud CORPORATE DIRECTIVE SANDBOXELASTIC WORKLOADSDISASTER RECOVERY • C-level has decided to utilize the cloud more • Running out of data center space, looking for more agility / flexibility • Environment to test queries and algorithms • Doesn’t impact production cluster as data analysts and engineers test • Separate environment for new, production or for intermittent, ad-hoc workloads • Takes too long to acquire and setup on-prem infrastructure • Backup all data to the cloud, without a second “physical” location • Save time and expense of setting up a physical DR site
  9. 9. 9 © Cloudera, Inc. All rights reserved. Cloudera’s Solution for Data Analytics / Engineering in Cloud • The modern platform for machine learning and analytics • Numerous functions for all types of jobs and queries • with multiple deployment options • On-premises, Public cloud (including multi-), and Hybrid • and one shared data experience • Framework for consistent security, governance, and metadata management across applications and deployments
  10. 10. 10 © Cloudera, Inc. All rights reserved. The Modern Platform for Machine Learning & Analytics OPERATIONAL DATABASE DATA ENGINEERING DATA WAREHOUSE DATA SCIENCE DATA PROCESSING • Cost efficient • Reliable • Scalable • Based on Spark, MapReduce, Hive & Pig • Supported by Workload Analytics FAST BI & SQL • Flexibility • Elastic scale • Go beyond SQL • Based on Impala & Hive • SQL dev enviro • Supported by Workload Analytics MACHINE LEARNING • Fast dev to production • Secure self- serve • Based on Python, R, and Spark • ML dev environment (CDSW) ONLINE & REAL-TIME • High throughput, low latency • Strongly consistent • Based on Hbase, Kudu & Spark streaming
  11. 11. 11 © Cloudera, Inc. All rights reserved. Cloudera’s Vision for AI and Machine Learning Modern Enterprise Platform, Tools, and Expert Guidance to help you Unlock Business Value with ML / AI Agile platform to build, train, and deploy scalable ML applications Enterprise data science tools to accelerate team productivity Expert guidance, services & training to fast track value & scale
  12. 12. 12 © Cloudera, Inc. All rights reserved. With Multiple Deployment Options Via Cloudera Altus (IaaS) INFRASTRUCTURE SERVICES OPERATIONAL DATABASE DATA ENGINEERING DATA WAREHOUSE DATA SCIENCE DATA ENGINEERING DATA WAREHOUSE Via Cloudera Altus Services (PaaS) Traditional Infrastructure (combined storage and compute) Cloud Infrastructure (decoupled storage and compute) Cloud Infrastructure (decoupled storage and compute)
  13. 13. © Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved. Cloudera Enterprise Data Platform Benefits for IT infra & ops • Central control and security • Focus on curating not firefighting Benefits for users • Value from single source of truth • Bring the best tools for each job WORKLOADS DATA SCIENCE DATA WAREHOUSE OPERATIONAL DATABASE DATA ENGINEERING 3RD PARTY SERVICES COMMON SERVICES SECURITY GOVERNANCE LIFECYCLE MANAGEMENT CONTROL PLANE DATA CATALOG STORAGE HDFS Public Cloud Object Storage (S3, ADLS, etc) KUDUPrivate Cloud Object Storage
  14. 14. © Cloudera, Inc. All rights reserved. 14 Journey to the Cloud from On-Prem CLOUDERA CLUSTER (PERSISTENT) COMPUTE DATA CONTEXT Data Engineering Analytics Data Science Security Metadata Governance STORAGE HDFS ON PREMISES Current State ● Multiple workloads and services run in a single cluster ● Data Context (security, metadata, governance) in single cluster Goals in Journey to the Cloud ● Get to Cloud with minimal impact and change ● Replicate security groups and permissions in the Cloud ● May require multiple stages to get there ● First step may vary depending on goals ● Need to determine how data will be replicated to the Cloud
  15. 15. © Cloudera, Inc. All rights reserved. 15 CUSTOMER CLOUD (AWS, Azure, GCP, etc) Start by Replicating Data to Public Cloud via BDR ON PREMISES STORAGE HDFS PUBLIC CLOUD HDFS CLOUDERA CLUSTER (PERSISTENT) COMPUTE DATA CONTEXT Hive Impala Spark Sentry HMS STORAGE HDFS Navigator BDR
  16. 16. © Cloudera, Inc. All rights reserved. 16 CUSTOMER CLOUD Journey to the Cloud - Step 1 CLOUDERA CLUSTER (PERSISTENT) COMPUTE DATA CONTEXT Data Engineering Analytics Data Science Security Metadata Governance STORAGE CLOUD OBJECT STORE 1- LIFT AND SHIFT HDFS
  17. 17. © Cloudera, Inc. All rights reserved. 17 CUSTOMER CLOUDCUSTOMER CLOUD Journey to the Cloud - Step 2 CLOUDERA CLUSTER (PERSISTENT) COMPUTE DATA CONTEXT Data Engineering Analytics Data Science Security Metadata Governance STORAGE CLOUD OBJECT STORE 1- LIFT AND SHIFT CLOUDERA CLUSTER (PERSISTENT) COMPUTE DATA CONTEXT Data Engineering Analytics Data Science Security Metadata Governance STORAGE CLOUD OBJECT STORE 2 - OBJECT STORAGE HDFS
  18. 18. © Cloudera, Inc. All rights reserved. 18 CUSTOMER CLOUD CLOUDERA CLUSTER (PERSISTENT) COMPUTE DATA CONTEXT Data Engineering Analytics Data Science Security Metadata Governance STORAGE CLOUD OBJECT STORE CUSTOMER CLOUD Journey to the Cloud - Step 3 CLOUDERA CLUSTER (PERSISTENT) COMPUTE DATA CONTEXT Data Engineering Analytics Data Science Security Metadata Governance STORAGE CLOUD OBJECT STORE 1- LIFT AND SHIFT 2 - OBJECT STORAGE HDFS CLOUDERA CLUSTERS (TRANSIENT– ALTUS) COMPUTE Data Engineering CUSTOMER CLOUD CLOUDERA CLOUD CLOUDERA ALTUS CONTROL PLANE STORAGE CLOUD OBJECT STORE DATA CONTEXT CLOUDERA CLUSTER (PERSISTENT–DIRECTOR) COMPUTE DATA CONTEXT CLOUDERA CLUSTERS (TRANSIENT– ALTUS) COMPUTE Analytics 3 - CLOUD NATIVE ARCHITECTURES
  19. 19. © Cloudera, Inc. All rights reserved. 19 Customer Examples Many Cloudera customers (Global 5K) used public cloud • Online retailer • Over 2,000 nodes with ~2PB of data in cloud running in an active - active configuration • Transforming data with Spark and then analyzing with Apache Hive • German chain of coffee retailers and cafés • 30+ nodes with 50TB of data in cloud • Modern Cloudera platform with an Impala data warehouse • Global information company • 70+ nodes in cloud across Microsoft Azure and AWS • Replaced Netezza with Hadoop and leveraging both Impala and Spark for analytics
  20. 20. © Cloudera, Inc. All rights reserved. 20 Cloudera is using cloud as well Security Use Case Altus based solution saved more than 50% cost compared to initial implementation
  21. 21. © Cloudera, Inc. All rights reserved. 21 Cloudera Altus Key Differentiators • Multi-function: Unified platform for data engineering, data warehouse, and data science • Multi-cloud: Option for on-premises, Public cloud (including multi-), and Hybrid • SDX: Integrated shared data experience across multi-function clusters
  22. 22. © Cloudera, Inc. All rights reserved.22 © Cloudera, Inc. All rights reserved. Pick the Right Altus Component for Your Needs Depending on workload and service level • Service offering for batch oriented Data Engineering jobs on data in object stores (ADLS, others) • Usage based pricing • Runs Apache Spark, Apache Hive and MapReduce jobs • Provides Workload Analytics to troubleshoot and optimize job performance • Service offering for cloud native data warehouse use cases • Usage based pricing • Runs Apache Impala on data stored in object stores (ADLS, others) • Exposes endpoint to connect BI Tools for visualization • Offers built-in SQL Editor for ad-hoc data exploration • EDH for public cloud which gives customers full cluster control • Self-managed cloud infrastructure • Usage or node based pricing • Full breadth of CDH services available (Apache Kafka, Apache Spark Streaming, CDSW, etc) • Supports deployments on 5 public cloud platforms Altus Data Engineering (PaaS) Altus Data Warehouse (PaaS) Altus (IaaS)
  23. 23. 23 © Cloudera, Inc. All rights reserved. Azure Update
  24. 24. Cosmos Microsoft’s internal data lake • A data lake for all teams @Microsoft • Tools approachable by any developer • Batch, Interactive, Streaming, ML • Used across Office, Xbox, Azure, Windows, Ads, Bing, Skype, … By the numbers • Exabytes of data • 100Ks of Physical Servers • Millions of Interactive Queries • Huge Streaming Pipelines • 100Ks of Batch Jobs • 10K+ Developers Microsoft’s Big Data Service Azure Data Lake A data lake for everyone • The next version of Cosmos • Fully aligned with Hadoop ecosystem and standards, with full support for Hadoop tools and engines as well as unique Microsoft capabilities • Migration from Cosmos to ADL is already underway • External customers on the same service as internal customers
  25. 25. Ingest all data regardless of requirements Store all data in native format without schema definition Do analysis Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics Devices
  26. 26. Azure Data Lake Overview Windows Azure Blob Storage Spark Map- Reduce Impala Cloudera Azure Key Vault Azure Active Dir Azure Data Lake Store – in-cluster services U-SQL ADL Analytics … Ingestion Service ADLS Gateway Service Cosmos API HDFS++ API HDFS++ API Scope YARN ADLS Micro Services ADL local tier Azure VMs Azure remote storage tier
  27. 27. ADLS Gen 2 • Preview announced June 2018 • Allows all storage regions to have HDFS API • Soon available for Cloudera implementations
  28. 28. Azure Data Lake Storage Gen2 Key Features
  29. 29. © Cloudera, Inc. All rights reserved. Demo
  30. 30. © Cloudera, Inc. All rights reserved.30 © Cloudera, Inc. All rights reserved. Poll Question 2: How do you want to use the Cloud? • Migrating existing workloads from your on-prem cluster to Azure • Deploying new data analytics / engineering jobs in Cloud (PaaS / SaaS) • Interested in both of the above • Not sure
  31. 31. © Cloudera, Inc. All rights reserved. 31© Cloudera, Inc. All rights reserved. Cloud Data Analytics / Engineering with Cloudera $ • Lower risk of data breach • Analysts more productive on jobs • Self-service (no shadow IT) and more productive • IT more strategic, less admin time • Deployment choices and no lock-in • Same solution as on-premises and multi- cloud • Eliminate data copies • Single security framework with universally shared metadata • Easy to track data lineage • Unified services + ADVANTAGES BUSINESS VALUE • Lower risk of data breach • Analysts more productive on jobs • Self-service (no shadow IT) and more productive • IT more strategic, less admin time • Deployment choices and no lock-in
  32. 32. © Cloudera, Inc. All rights reserved. 32© Cloudera, Inc. All rights reserved. Ready to try the Cloud? $10K of free Azure credits! • Cloudera and Microsoft will offer $10,000 in FREE Azure for qualifying opportunities • To be applied to Azure subscription • Must be consumed in 60 days • Must be a Cloudera product running on Microsoft Azure • Must be tied to a single customer entity for PoC or pilot deployment • Limited time offer • Contact azureoffer@cloudera.com
  33. 33. THANK YOU
  34. 34. © Cloudera, Inc. All rights reserved. Appendix
  35. 35. 35 © Cloudera, Inc. All rights reserved. Cloudera Pricing / Acquisition Acquisition Options ● Pay-as-you-go usage-based pricing ● Node-based license subscription ● Free 30-day trial ● Pre-pay of cloud credits ● Free version that can be deployed in the cloud Pricing - https://www.cloudera.com/products/pricing.html

×