SlideShare ist ein Scribd-Unternehmen logo
1 von 24
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rohit Pujari, Solutions Architect
Migrating Big Data Workloads to
AWS
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Architectural Principles
1. Build decoupled systems
2. Use the right tool for the job
3. Use managed and serverless services
4. Use log-centric design patterns
5. Be cost-conscious
6. AI/ML enable your applications
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• A cluster of 1/2U machines. Typically
12 cores, 32-128 GB RAM, and 12-
24 TB of HDD
• Networking switches and racks
• Long Term Hadoop Cluster with
fixed-licensing term
• HDFS uses local disk and has 50-
200% storage overhead
On-premises Hadoop clusters
Server rack 1
(20 nodes)
Server rack 2
(20 nodes)
Server rack N
(20 nodes)
Core
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Management of the cluster (failures, hardware
replacement, restarting services, expanding cluster)
• Configuration management
• Tuning of specific jobs or hardware
• Managing development and test environments
• Backing up data and disaster recovery
On premises: Role of a big data administrator
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
On premises: System management Challenges
• Managing distributed applications and availability
• Durable storage and disaster recovery
• Adding new frameworks and doing upgrades
• Multiple environments
• Need team to manage cluster and procure hardware
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
On premises: Workload types running on the same cluster
• Large-scale ETL
• Interactive queries
• Machine learning and data science
• NoSQL
• Stream processing
• Search
• Data warehouses
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
On premises: Swim lane of jobs
Over-utilized Under-utilized
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
On premises: Over-utilization and idle capacity
• Tightly coupled compute and storage requires
buying excess capacity
• Can be over-utilized during peak hours and
under-utilized at other times
• Results in high costs and low efficiency
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Key migration considerations
• Do not lift and shift
• Deconstruct workloads and use the right tool for the job
• Decouple storage and compute with Amazon Simple Storage Storage
Service (Amazon S3)
• Design for cost and scalability
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Foundational requirements
Secure
Encryption in flight & at rest
Lower TCO
Pay as per usage
Flexible
Customize per workload
Full control
Infrastructure as code
Scalable
Compute & storage
Managed
Reduced administration
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake - All Data is in One Place
Analyze all of your data,
from all of your sources, in one stored
location
“Why is the data distributed in
many locations? Where is the
single source of truth?”
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Designed for 11 9s
of durability
Designed for
99.99% availability
Durable Available High performance
 Multiple upload
 Range GET
 Scalable throughput
 Store as much as you need
 Scale storage and compute
independently
 No minimum usage commitments
Scalable
 Cloudera EDH
 Cloudera Altus
 Cloudera Impala
Integrated Partner Tools
 Simple REST API
 AWS SDKs
 Simple management tools
 Event notification
 Lifecycle policies
Easy to use
Why Amazon S3 for a Data Lake?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Encryption ComplianceSecurity
 Identity and access
Management (IAM) policies
 Bucket policies
 Access Control Lists (ACLs)
 Private VPC endpoints to
Amazon S3
 Amazon S3 object tagging to
manage access policies
 SSL endpoints
 Server-side encryption
(SSE-S3)
 S3 server-side
encryption with
provided keys (SSE-C,
SSE-KMS)
 Client-side encryption
 Buckets access logs
 Lifecycle management
policies
 Access Control Lists
(ACLs)
 Versioning and MFA
deletes
 Certifications—HIPAA,
PCI, SOC 1/2/3, etc.
Strong Security Controls
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Encrypt data in
transit and at rest
with keys managed by
our AWS Key Management
System (KMS) or managing
your own encryption keys
with Cloud HSM using
FIPS 140-2 Level 3
validated HSMs
Meet data
residency requirements
Choose an AWS Region
and AWS will not replicate it
elsewhere unless you choose
to do so
Access services and tools that
enable you to
buildGDPR-compliant
infrastructure
on top of AWS
Comply with local
data privacy laws
by controlling who
can access content, its
lifecycle and disposal
Highest standards for privacy
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“CIOs and CISOs need to stop obsessing over
unsubstantiated cloud security worries, and instead apply
their imagination and energy to developing new
approaches to cloud control, allowing them to securely,
compliantly, and reliably leverage the benefits of this
increasingly ubiquitous computing model.”
Source: Clouds Are Secure: Are You Using Them Securely?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What About HDFS & Data Tiering?
• Use HDFS for hottest datasets (e.g.
iterative read on the same datasets)
• Use Amazon S3 Standard for frequently
accessed data
• Use Amazon S3 Standard – IA for less
frequently accessed data
• Use Amazon Glacier for archiving cold
data
• Use S3 Analytics to optimize tiering
strategy S3 Analytics
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake - Quick Ingest
Quickly ingest data
without needing to force it into a
predefined schema
“How can I collect data quickly
from various sources and store
it efficiently?”
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Direct
Connect
AWS Snowball ISV Connectors
Kafka/Flume
Amazon Kinesis
Firehose
Amazon S3 Transfer
Acceleration
AWS Storage
Gateway
Data Ingestion into Amazon S3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake - Storage vs. Compute
Separating your storage and compute
allows you to scale each component as
required
“How can I scale up with the
volume of data being generated?”
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake - Schema on Read
“Is there a way I can apply multiple
analytics and processing frameworks
to the same data?”
A data lake enables ad-hoc
analysis by applying schemas
on read, not write
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of an AWS Amazon S3 Data Lake
Fixed cluster data lake AWS Amazon S3 data lake
• Limited to only the single tool contained
on the
• Long turnaround cycles to add nodes to
add storage capacity
• Expensive to replicate data against
node loss
• Complexity in scaling local storage
capacity
• Long refresh cycles to add additional
storage equipment
• Decouple storage and compute by
making S3 object based storage, not a
fixed tool to manage the data lake
• Flexibility to use any and all tools in the
ecosystem. The right tool for the job.
• Catalog, transform, and query in place
• Future-proof your architecture. As new
use cases and tools emerge you can
plug and play current best of breed.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Foundation Services
Compute Storage Database Networking
AWS Global
Infrastructure
Regions
Availability
Zones Edge
Locations
Client-side Data
Encryption
Server-side Data
Encryption
Network Traffic
Protection
Platform, Applications, Identity & Access Management
Operating System, Network, & Firewall Configuration
Customer applications & content
Customers
Security & compliance is a shared responsibility
Customers have
their choice of
security
configurations IN
the Cloud
AWS is
responsible for
the security OF
the Cloud
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inherit global security and compliance controls
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Summary
• Use Amazon S3 as the storage repository for your data lake
• Decouple Compute and Storage - Gain flexibility to use all the analytics tools in
the ecosystem
• Use managed PaaS like Cloudera Altus or Serverless services where possible
to reduce operational overhead
• Use granular encryption, roles, and access controls to build a secure, multi-
tenant centralized data platform

Weitere ähnliche Inhalte

Was ist angesagt?

PaaS or Fail: Rule the Cloud with Altus
PaaS or Fail: Rule the Cloud with AltusPaaS or Fail: Rule the Cloud with Altus
PaaS or Fail: Rule the Cloud with AltusCloudera, Inc.
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester WebinarCloudera, Inc.
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
Customer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWSCustomer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWSCloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Cloudera, Inc.
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldCloudera, Inc.
 
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence
Cloudera, Inc.
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndCloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondCloudera, Inc.
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...Cloudera, Inc.
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudCloudera, Inc.
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Cloudera, Inc.
 
How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Cloudera, Inc.
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudCloudera, Inc.
 
A Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsA Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsCloudera, Inc.
 

Was ist angesagt? (20)

PaaS or Fail: Rule the Cloud with Altus
PaaS or Fail: Rule the Cloud with AltusPaaS or Fail: Rule the Cloud with Altus
PaaS or Fail: Rule the Cloud with Altus
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Customer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWSCustomer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWS
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence

 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
A Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsA Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber Threats
 

Ähnlich wie Big data journey to the cloud rohit pujari 5.30.18

Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
Module 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSModule 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSLam Le
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
Backup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
Backup & Recovery - Optimize Your Backup and Restore Architectures in the CloudBackup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
Backup & Recovery - Optimize Your Backup and Restore Architectures in the CloudAmazon Web Services
 
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Amazon Web Services
 
Data freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSAmazon Web Services
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services
 
How a Biotech Firm Streamlined Data Protection on AWS
 How a Biotech Firm Streamlined Data Protection on AWS How a Biotech Firm Streamlined Data Protection on AWS
How a Biotech Firm Streamlined Data Protection on AWSAmazon Web Services
 
Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsAmazon Web Services
 
Building Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleBuilding Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleAmazon Web Services
 
AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive Amazon Web Services
 
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksAmazon Web Services
 
Using Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczUsing Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczAmazon Web Services
 
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...Amazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftData Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftAmazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 

Ähnlich wie Big data journey to the cloud rohit pujari 5.30.18 (20)

Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Module 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSModule 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWS
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWS
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Backup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
Backup & Recovery - Optimize Your Backup and Restore Architectures in the CloudBackup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
Backup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
 
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
 
Data freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWS
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
How a Biotech Firm Streamlined Data Protection on AWS
 How a Biotech Firm Streamlined Data Protection on AWS How a Biotech Firm Streamlined Data Protection on AWS
How a Biotech Firm Streamlined Data Protection on AWS
 
Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your Applications
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Building Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleBuilding Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scale
 
AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive
 
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
 
Using Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczUsing Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter Dachnowicz
 
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftData Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 

Kürzlich hochgeladen

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Kürzlich hochgeladen (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Big data journey to the cloud rohit pujari 5.30.18

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rohit Pujari, Solutions Architect Migrating Big Data Workloads to AWS
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architectural Principles 1. Build decoupled systems 2. Use the right tool for the job 3. Use managed and serverless services 4. Use log-centric design patterns 5. Be cost-conscious 6. AI/ML enable your applications
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • A cluster of 1/2U machines. Typically 12 cores, 32-128 GB RAM, and 12- 24 TB of HDD • Networking switches and racks • Long Term Hadoop Cluster with fixed-licensing term • HDFS uses local disk and has 50- 200% storage overhead On-premises Hadoop clusters Server rack 1 (20 nodes) Server rack 2 (20 nodes) Server rack N (20 nodes) Core
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Management of the cluster (failures, hardware replacement, restarting services, expanding cluster) • Configuration management • Tuning of specific jobs or hardware • Managing development and test environments • Backing up data and disaster recovery On premises: Role of a big data administrator
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. On premises: System management Challenges • Managing distributed applications and availability • Durable storage and disaster recovery • Adding new frameworks and doing upgrades • Multiple environments • Need team to manage cluster and procure hardware
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. On premises: Workload types running on the same cluster • Large-scale ETL • Interactive queries • Machine learning and data science • NoSQL • Stream processing • Search • Data warehouses
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. On premises: Swim lane of jobs Over-utilized Under-utilized
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. On premises: Over-utilization and idle capacity • Tightly coupled compute and storage requires buying excess capacity • Can be over-utilized during peak hours and under-utilized at other times • Results in high costs and low efficiency
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Key migration considerations • Do not lift and shift • Deconstruct workloads and use the right tool for the job • Decouple storage and compute with Amazon Simple Storage Storage Service (Amazon S3) • Design for cost and scalability
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Foundational requirements Secure Encryption in flight & at rest Lower TCO Pay as per usage Flexible Customize per workload Full control Infrastructure as code Scalable Compute & storage Managed Reduced administration
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake - All Data is in One Place Analyze all of your data, from all of your sources, in one stored location “Why is the data distributed in many locations? Where is the single source of truth?”
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Designed for 11 9s of durability Designed for 99.99% availability Durable Available High performance  Multiple upload  Range GET  Scalable throughput  Store as much as you need  Scale storage and compute independently  No minimum usage commitments Scalable  Cloudera EDH  Cloudera Altus  Cloudera Impala Integrated Partner Tools  Simple REST API  AWS SDKs  Simple management tools  Event notification  Lifecycle policies Easy to use Why Amazon S3 for a Data Lake?
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Encryption ComplianceSecurity  Identity and access Management (IAM) policies  Bucket policies  Access Control Lists (ACLs)  Private VPC endpoints to Amazon S3  Amazon S3 object tagging to manage access policies  SSL endpoints  Server-side encryption (SSE-S3)  S3 server-side encryption with provided keys (SSE-C, SSE-KMS)  Client-side encryption  Buckets access logs  Lifecycle management policies  Access Control Lists (ACLs)  Versioning and MFA deletes  Certifications—HIPAA, PCI, SOC 1/2/3, etc. Strong Security Controls
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Encrypt data in transit and at rest with keys managed by our AWS Key Management System (KMS) or managing your own encryption keys with Cloud HSM using FIPS 140-2 Level 3 validated HSMs Meet data residency requirements Choose an AWS Region and AWS will not replicate it elsewhere unless you choose to do so Access services and tools that enable you to buildGDPR-compliant infrastructure on top of AWS Comply with local data privacy laws by controlling who can access content, its lifecycle and disposal Highest standards for privacy
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “CIOs and CISOs need to stop obsessing over unsubstantiated cloud security worries, and instead apply their imagination and energy to developing new approaches to cloud control, allowing them to securely, compliantly, and reliably leverage the benefits of this increasingly ubiquitous computing model.” Source: Clouds Are Secure: Are You Using Them Securely?
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What About HDFS & Data Tiering? • Use HDFS for hottest datasets (e.g. iterative read on the same datasets) • Use Amazon S3 Standard for frequently accessed data • Use Amazon S3 Standard – IA for less frequently accessed data • Use Amazon Glacier for archiving cold data • Use S3 Analytics to optimize tiering strategy S3 Analytics
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake - Quick Ingest Quickly ingest data without needing to force it into a predefined schema “How can I collect data quickly from various sources and store it efficiently?”
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Direct Connect AWS Snowball ISV Connectors Kafka/Flume Amazon Kinesis Firehose Amazon S3 Transfer Acceleration AWS Storage Gateway Data Ingestion into Amazon S3
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake - Storage vs. Compute Separating your storage and compute allows you to scale each component as required “How can I scale up with the volume of data being generated?”
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake - Schema on Read “Is there a way I can apply multiple analytics and processing frameworks to the same data?” A data lake enables ad-hoc analysis by applying schemas on read, not write
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of an AWS Amazon S3 Data Lake Fixed cluster data lake AWS Amazon S3 data lake • Limited to only the single tool contained on the • Long turnaround cycles to add nodes to add storage capacity • Expensive to replicate data against node loss • Complexity in scaling local storage capacity • Long refresh cycles to add additional storage equipment • Decouple storage and compute by making S3 object based storage, not a fixed tool to manage the data lake • Flexibility to use any and all tools in the ecosystem. The right tool for the job. • Catalog, transform, and query in place • Future-proof your architecture. As new use cases and tools emerge you can plug and play current best of breed.
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Foundation Services Compute Storage Database Networking AWS Global Infrastructure Regions Availability Zones Edge Locations Client-side Data Encryption Server-side Data Encryption Network Traffic Protection Platform, Applications, Identity & Access Management Operating System, Network, & Firewall Configuration Customer applications & content Customers Security & compliance is a shared responsibility Customers have their choice of security configurations IN the Cloud AWS is responsible for the security OF the Cloud
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inherit global security and compliance controls
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Summary • Use Amazon S3 as the storage repository for your data lake • Decouple Compute and Storage - Gain flexibility to use all the analytics tools in the ecosystem • Use managed PaaS like Cloudera Altus or Serverless services where possible to reduce operational overhead • Use granular encryption, roles, and access controls to build a secure, multi- tenant centralized data platform

Hinweis der Redaktion

  1. Parquet, ORC, Avro, JSON, CSV, others Allows multiple analytic tools to operate on same data Proprietary data = countdown to next migration