SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a Data Lake in Amazon S3 &
Amazon Glacier
Joyjeet Banerjee
Enterprise Solutions Architect
AWS
S T G 4 0 1 - R
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• What is big data?
• What is a data lake?
• Achievable business outcomes
• Securing a data lake
• Art of the possible
• Lab Materials & code: http://bit.ly/2I9qHO7
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is big data?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is big data?
Volume
Velocity
Variety
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is big data? (cont.)
Volume—unconstrained growth!
GB
TB
PB
ZB
EB
Source: IDC, The Internet of Things: Getting Ready to Embrace Its Impact on the Digital Economy, March 2016.
RDB
DWH
Data
lake
Sources of data:
Relational
NoSQL
Web servers
Mobile
Third-party feeds
IoT
Clickstream
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is big data? (cont.)
Velocity
• Real-time streaming
• Near real-time
• Periodic
• Batch jobs
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is big data? (cont.)
Variety—structured, unstructured, and multi-structured
Transactions ERP Connected devices Social mediaWeb logs/cookies
Records StreamsFiles
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The AWS Cloud was built for big data
Agility: Try more, fail fast, go
big, or start small, and process
data at any scale
Scalability: Run jobs anytime,
without guessing capacity or
limiting functionality
Get to insights faster:
Focus on data science, not the
heavy undifferentiated lift of
managing raw data
Broadest and deepest
capabilities: Access 70+
managed big data services to
address any workload
Low cost: Pay only for the IT
you use, when you use it
Data migrations made easy:
Move exabyte-scale data to the
cloud quickly and cost-
effectively
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is a data lake?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is a data lake?
A data lake is a new and increasingly
popular architecture to store and
analyze massive volumes of
heterogeneous data in a centralized
repository
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Characteristics of a data lake
Future-proofFlexible
access
Dive in
anywhere
Collect
anything
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of a data lake
Quickly ingest and store
any type of data, at any
scale, and at low cost
Have a single source of
truth and quickly search
and find the relevant data
Easily query the data
through a unified set of
tools
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Defining the AWS data lake
Data lake is an architecture with a virtually limitless
centralized storage platform capable of
categorization, processing, analysis, and
consumption of heterogeneous data sets
Key data lake attributes
• Decoupled storage and compute
• Rapid ingest and transformation
• Secure multi-tenancy
• Query in place
• Schema on read
• Low cost of storage
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
High performance
▪ Multiple upload
▪ Range GET
▪ Amazon S3 Select
Why Amazon Simple Storage Service (Amazon S3) for
the data lake?
Secure
▪ Granular object-level controls
▪ Full compliance, and audit
capability
▪ Encryption @ rest & transit
Durable
Desired for 11 nines
of durability
Available
Designed for 99.99%
availability
Easy to use
▪ Simple REST API
▪ AWS SDKs
▪ Read-after-create consistency
▪ Event notification
▪ Lifecycle policies
Scalable & affordable
▪ Store as much as you need
▪ Scale storage and compute
independently
▪ No minimum usage commitments
▪ Pay for what you use
Integrated
▪ Amazon Redshift/Amazon
Redshift Spectrum
▪ Amazon EMR
▪ Amazon Athena
▪ Amazon DynamoDB
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Central storage
Secure, cost effective
storage in Amazon S3
Amazon S3
Catalog & search
Access & search metadata
DynamoDB Amazon ES
Access & user interface
Give your users easy & secure access
Amazon
API Gateway
AWS
IAM
Amazon
Cognito
Protect & secure
Use entitlements to ensure data is secure and users’ identities are verified
AWS STS Amazon
CloudWatch
AWS
CloudTrail
AWS KMS
Amazon
Athena
Amazon
QuickSight
Amazon
EMR
Amazon
Redshift
Processing & analytics
Use predictive and prescriptive
analytics to gain better understanding
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS
Snowball
AWS DMS
Data ingestion
Get your data into Amazon S3
quickly and securely
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Driving business outcomes
Modern data architectures on AWS
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Outcome 1: modernize and consolidate
• Insights to enhance business applications and create new digital services
Outcome 2: innovate for new revenues
• Personalization, demand forecasting, risk analysis
Outcome 3: real-time engagement
• Interactive customer experience, event-driven automation, fraud detection
Outcome 4: automate for expansive reach
• Automation of business processes and physical infrastructure
Business outcomes on a modern data architecture
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A platform to build business outcomes from data
Purchases
Movement
Influence
Ingest/
Collect
Consume/
Visualize
Store
Process/
Analyze
Revenue lift
Market acquisition
Customer delight
Brand advocacy
Inventory
optimization
Supply chain
efficiency
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (real-time)
Ingest ServingData
sources
Scale (batch)
Modernize and consolidate
Insights to enhance business applications, new digital servicesStart with the business case and the personas
Data analysts
Data scientists
Business users
Engagement platforms
Automation/events
Transactions
Web logs/
cookies
ERP
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (real-time)
Ingest ServingData
sources
Scale (batch)
Modernize and consolidate
Insights to enhance business applications, new digital servicesStart with the business case, and the personas
Data warehouse
Amazon Redshift
Legacy apps
Amazon RDS
Schemaless
Amazon ES
direct query
Athena
Near-zero latency
DynamoDB
Semi/Unstructured
Amazon EMR
Amazon S3
Staged data
(data lake)Transactions
Web logs/
cookies
ERP
Data analysts
Data scientists
Business users
Engagement platforms
Automation/events
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (real-time)
Ingest ServingData
sources
Scale (batch)
Modernize and consolidate
Insights to enhance business applications, new digital servicesProcess data for ETL, cleansing, tagging, and place into staged data (data lake)
Data warehouse
Amazon Redshift
Legacy apps
Amazon RDS
Schemaless
Amazon ES
direct query
Athena
Near-zero latency
DynamoDB
Semi/Unstructured
Amazon EMR
Amazon S3
Staged data
(data lake)
AWS DMS
DX
Internet
interfaces Amazon S3
Raw data
AWS Glue
ETL
Lab 2
Lab 4
Transactions
Web logs/
cookies
ERP
Data analysts
Data scientists
Business users
Engagement platforms
Automation/events
Amazon
QuickSight
Amazon
API Gateway
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (real-time)
Ingest ServingData
sources
Scale (batch)
Innovate for new revenues
Insights to enhance business applications, new digital services
Data warehouse
Amazon Redshift
Legacy apps
Amazon RDS
Schemaless
Amazon ES
direct query
Athena
Near-zero latency
DynamoDB
Semi/Unstructured
Amazon EMR
Amazon S3
Staged data
(data lake)
AWS DMS
DX
Internet
interfaces Amazon S3
Raw data
Advanced
analytics
Lab 3
AWS Glue
ETL
Transactions
Web logs/
cookies
ERP
Data analysts
Data scientists
Business users
Engagement platforms
Automation/events
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (real-time)
Ingest ServingData
sources
Scale (batch)
Real-time engagement
Events are captured in the speed layer
Amazon S3
Staged data
(data lake)
AWS DMS
DX
Internet
interfaces Amazon S3
Raw data
Amazon
Kinesis
Connected
devices
Social media
Event capture
Kinesis
Stream analysis
Kinesis Data Analytics
Lab 1
AWS Glue
ETL
Advanced
analytics
Transactions
Web logs/
cookies
ERP
Data warehouse
Amazon Redshift
Legacy apps
Amazon RDS
Schemaless
Amazon ES
direct query
Athena
Near-zero latency
DynamoDB
Semi/Unstructured
Amazon EMR
Data analysts
Data scientists
Business users
Engagement platforms
Automation/events
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Securing a data lake
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security
▪ AWS Identity and Access
Management (IAM) policies
▪ Bucket policies
▪ Access control lists (ACLs)
▪ Private VPC endpoints to
Amazon S3
▪ Pre-signed Amazon S3 URLs
Encryption
▪ SSL endpoints
▪ Server-side encryption
(SSE-S3)
▪ Amazon S3 server-side
encryption with provided
keys (SSE-C, SSE-KMS)
▪ Client-side encryption
Audit & compliance
▪ Buckets access logs
▪ Lifecycle management
policies
▪ Versioning & MFA deletes
▪ Certifications—HIPAA, PCI,
SOC 1/2/3, more
Implement the right cloud security controls
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Art of the possible
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Broad range of possibilities
Analytics services to
support streaming &
batch datasets
Direct integration with
artificial intelligence
services
Import data into
machine learning/
deep learning
modeling services
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analytics services on AWS
Broadest and deepest portfolio purpose-built for builders
Amazon EMR Amazon
EC2
Amazon
Glacier
Amazon S3
Kinesis
Amazon
Redshift
DynamoDB
Collect Orchestrate Store Analyze
AWS Lambda
AWS IoT
Core
AWS Data Pipeline
KinesisData
Analytics
Amazon
SNS
AWSSnowball
Amazon
SWF
Athena
AWS Glue
Amazon Aurora
Amazon
QuickSight
Amazon
SageMaker
AWSDirectConnect
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS AI/ML stack
Frameworks &
infrastructure
AWS Deep Learning AMI
GPU
(P3 Instances)
MobileCPU
IoT (AWS
Greengrass)
Vision:
Amazon Rekognition Image
Amazon Rekognition Video
Speech:
Amazon Polly
Amazon Transcribe
Language:
Amazon Lex, Amazon Translate,
Amazon Comprehend
Apache
MXNet
PyTorch
Cognitive
Toolkit
Keras
Caffe2
& Caffe
TensorFlow Gluon
Application
services
Platform
Services
Amazon Machine
Learning
Mechanical
Turk
Spark &
Amazon EMR
Amazon
SageMaker
AWS
DeepLens
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kinesis
Data Firehose
S3
bucket
AthenaAmazon ML
Twitter stream Amazon QuickSight
Example use case:
Social media analysis & visualization
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recap—What we covered
• Big data
• Data lake
• Achievable business outcomes
• Securing a data lake
• Art of the possible
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Joyjeet Banerjee
Enterprise Solutions Architect
AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Comprehend
Natural language processing
Data lake (Amazon
S3)
Amazon
Comprehend
Data lake (Amazon
S3)
Athena
Natural language processing
Amazon EMR
Amazon
QuickSight
Unstructured
text
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Transcribe
Audio to text analysis
Lambda
Data lake
Athena
Audio Input
Amazon
QuickSight
Amazon
Comprehend
Amazon
Transcribe
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine
translation
NLP
Data lake
Amazon
QuickSight
Lambda
Athena
Social Media Analysis & Visualization
Streams
AWS
Glue
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Firehose
Kinesis
Data Firehose
Twitter
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Outcome 1: Modernize and consolidate
Common initiatives
• Insights: 360° view of the business
• Digitization: web service that gives on-demand insights
• Data monetization: enrich, aggregate, and sell business data
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Outcome 2: Innovate for new revenues
Common initiatives
• Personalization: refine market approaches on optimal segments
• Predict demand: guide business owners to select best scenarios
• Risk measurement: create freedom to act by quantifying exposures
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Outcome 3: Real-time engagement
Common initiatives
• Interactive CX: natural customer journeys with adaptive interfaces
• Event-driven automation: triggered execution of business process
• Fraud detection: protect customer and business interests

Weitere ähnliche Inhalte

Was ist angesagt?

Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...Amazon Web Services
 
Introduction to Amazon Lightsail
Introduction to Amazon Lightsail Introduction to Amazon Lightsail
Introduction to Amazon Lightsail Amazon Web Services
 
AWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAmazon Web Services
 
Aws certified-solutions-architect-associate-training
Aws certified-solutions-architect-associate-trainingAws certified-solutions-architect-associate-training
Aws certified-solutions-architect-associate-trainingCloudsara
 
Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Next Gen Innovation: Enhancing your Contact Center with Amazon Connect for t...
Next Gen Innovation:  Enhancing your Contact Center with Amazon Connect for t...Next Gen Innovation:  Enhancing your Contact Center with Amazon Connect for t...
Next Gen Innovation: Enhancing your Contact Center with Amazon Connect for t...Amazon Web Services
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon AuroraAmazon Web Services
 
Identity and Access Management: The First Step in AWS Security
Identity and Access Management: The First Step in AWS SecurityIdentity and Access Management: The First Step in AWS Security
Identity and Access Management: The First Step in AWS SecurityAmazon Web Services
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
강연 1. AWS 소개 및 AWS의 역사:: AWSome Day Online Conference
강연 1. AWS 소개 및 AWS의 역사:: AWSome Day Online Conference 강연 1. AWS 소개 및 AWS의 역사:: AWSome Day Online Conference
강연 1. AWS 소개 및 AWS의 역사:: AWSome Day Online Conference Amazon Web Services Korea
 
AWS와 함께 하는 클라우드 컴퓨팅 - 홍민우 AWS 매니저
AWS와 함께 하는 클라우드 컴퓨팅 - 홍민우 AWS 매니저AWS와 함께 하는 클라우드 컴퓨팅 - 홍민우 AWS 매니저
AWS와 함께 하는 클라우드 컴퓨팅 - 홍민우 AWS 매니저Amazon Web Services Korea
 
E-Commerce 를 풍성하게 해주는 AWS 기술들 - 서호석 이사, YOUNGWOO DIGITAL :: AWS Summit Seoul ...
E-Commerce 를 풍성하게 해주는 AWS 기술들 - 서호석 이사, YOUNGWOO DIGITAL :: AWS Summit Seoul ...E-Commerce 를 풍성하게 해주는 AWS 기술들 - 서호석 이사, YOUNGWOO DIGITAL :: AWS Summit Seoul ...
E-Commerce 를 풍성하게 해주는 AWS 기술들 - 서호석 이사, YOUNGWOO DIGITAL :: AWS Summit Seoul ...Amazon Web Services Korea
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Amazon Web Services
 
Create, map, and drive performance with Amazon FSx for Windows File Server - ...
Create, map, and drive performance with Amazon FSx for Windows File Server - ...Create, map, and drive performance with Amazon FSx for Windows File Server - ...
Create, map, and drive performance with Amazon FSx for Windows File Server - ...Amazon Web Services
 

Was ist angesagt? (20)

Deep Dive: Amazon RDS
Deep Dive: Amazon RDSDeep Dive: Amazon RDS
Deep Dive: Amazon RDS
 
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
 
Introduction to Amazon Lightsail
Introduction to Amazon Lightsail Introduction to Amazon Lightsail
Introduction to Amazon Lightsail
 
AWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSight
 
Aws certified-solutions-architect-associate-training
Aws certified-solutions-architect-associate-trainingAws certified-solutions-architect-associate-training
Aws certified-solutions-architect-associate-training
 
Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
AWS Cost Optimisation Solutions
AWS Cost Optimisation SolutionsAWS Cost Optimisation Solutions
AWS Cost Optimisation Solutions
 
AWS Marketplace
AWS MarketplaceAWS Marketplace
AWS Marketplace
 
AWS networking fundamentals
AWS networking fundamentalsAWS networking fundamentals
AWS networking fundamentals
 
Next Gen Innovation: Enhancing your Contact Center with Amazon Connect for t...
Next Gen Innovation:  Enhancing your Contact Center with Amazon Connect for t...Next Gen Innovation:  Enhancing your Contact Center with Amazon Connect for t...
Next Gen Innovation: Enhancing your Contact Center with Amazon Connect for t...
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
Identity and Access Management: The First Step in AWS Security
Identity and Access Management: The First Step in AWS SecurityIdentity and Access Management: The First Step in AWS Security
Identity and Access Management: The First Step in AWS Security
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
강연 1. AWS 소개 및 AWS의 역사:: AWSome Day Online Conference
강연 1. AWS 소개 및 AWS의 역사:: AWSome Day Online Conference 강연 1. AWS 소개 및 AWS의 역사:: AWSome Day Online Conference
강연 1. AWS 소개 및 AWS의 역사:: AWSome Day Online Conference
 
AWS와 함께 하는 클라우드 컴퓨팅 - 홍민우 AWS 매니저
AWS와 함께 하는 클라우드 컴퓨팅 - 홍민우 AWS 매니저AWS와 함께 하는 클라우드 컴퓨팅 - 홍민우 AWS 매니저
AWS와 함께 하는 클라우드 컴퓨팅 - 홍민우 AWS 매니저
 
E-Commerce 를 풍성하게 해주는 AWS 기술들 - 서호석 이사, YOUNGWOO DIGITAL :: AWS Summit Seoul ...
E-Commerce 를 풍성하게 해주는 AWS 기술들 - 서호석 이사, YOUNGWOO DIGITAL :: AWS Summit Seoul ...E-Commerce 를 풍성하게 해주는 AWS 기술들 - 서호석 이사, YOUNGWOO DIGITAL :: AWS Summit Seoul ...
E-Commerce 를 풍성하게 해주는 AWS 기술들 - 서호석 이사, YOUNGWOO DIGITAL :: AWS Summit Seoul ...
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
Create, map, and drive performance with Amazon FSx for Windows File Server - ...
Create, map, and drive performance with Amazon FSx for Windows File Server - ...Create, map, and drive performance with Amazon FSx for Windows File Server - ...
Create, map, and drive performance with Amazon FSx for Windows File Server - ...
 

Ähnlich wie Building a Data Lake on AWS S3 and Glacier

Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)Amazon Web Services
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your SolutionsAmazon Web Services
 
Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018
Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018
Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018Amazon Web Services
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Amazon Web Services
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Amazon Web Services
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Amazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018Amazon Web Services
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSAmazon Web Services
 
Data Warehouses & Data Lakes: Data Analytics Week SF
Data Warehouses & Data Lakes: Data Analytics Week SFData Warehouses & Data Lakes: Data Analytics Week SF
Data Warehouses & Data Lakes: Data Analytics Week SFAmazon Web Services
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Amazon Web Services
 
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftData Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftAmazon Web Services
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018Amazon Web Services
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeAmazon Web Services
 

Ähnlich wie Building a Data Lake on AWS S3 and Glacier (20)

Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018
Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018
Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
Data Warehouses & Data Lakes: Data Analytics Week SF
Data Warehouses & Data Lakes: Data Analytics Week SFData Warehouses & Data Lakes: Data Analytics Week SF
Data Warehouses & Data Lakes: Data Analytics Week SF
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
 
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftData Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Building a Data Lake on AWS S3 and Glacier

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building a Data Lake in Amazon S3 & Amazon Glacier Joyjeet Banerjee Enterprise Solutions Architect AWS S T G 4 0 1 - R
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda • What is big data? • What is a data lake? • Achievable business outcomes • Securing a data lake • Art of the possible • Lab Materials & code: http://bit.ly/2I9qHO7
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is big data?
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is big data? Volume Velocity Variety
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is big data? (cont.) Volume—unconstrained growth! GB TB PB ZB EB Source: IDC, The Internet of Things: Getting Ready to Embrace Its Impact on the Digital Economy, March 2016. RDB DWH Data lake Sources of data: Relational NoSQL Web servers Mobile Third-party feeds IoT Clickstream
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is big data? (cont.) Velocity • Real-time streaming • Near real-time • Periodic • Batch jobs
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is big data? (cont.) Variety—structured, unstructured, and multi-structured Transactions ERP Connected devices Social mediaWeb logs/cookies Records StreamsFiles
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The AWS Cloud was built for big data Agility: Try more, fail fast, go big, or start small, and process data at any scale Scalability: Run jobs anytime, without guessing capacity or limiting functionality Get to insights faster: Focus on data science, not the heavy undifferentiated lift of managing raw data Broadest and deepest capabilities: Access 70+ managed big data services to address any workload Low cost: Pay only for the IT you use, when you use it Data migrations made easy: Move exabyte-scale data to the cloud quickly and cost- effectively
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is a data lake?
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is a data lake? A data lake is a new and increasingly popular architecture to store and analyze massive volumes of heterogeneous data in a centralized repository
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Characteristics of a data lake Future-proofFlexible access Dive in anywhere Collect anything
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of a data lake Quickly ingest and store any type of data, at any scale, and at low cost Have a single source of truth and quickly search and find the relevant data Easily query the data through a unified set of tools
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Defining the AWS data lake Data lake is an architecture with a virtually limitless centralized storage platform capable of categorization, processing, analysis, and consumption of heterogeneous data sets Key data lake attributes • Decoupled storage and compute • Rapid ingest and transformation • Secure multi-tenancy • Query in place • Schema on read • Low cost of storage
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. High performance ▪ Multiple upload ▪ Range GET ▪ Amazon S3 Select Why Amazon Simple Storage Service (Amazon S3) for the data lake? Secure ▪ Granular object-level controls ▪ Full compliance, and audit capability ▪ Encryption @ rest & transit Durable Desired for 11 nines of durability Available Designed for 99.99% availability Easy to use ▪ Simple REST API ▪ AWS SDKs ▪ Read-after-create consistency ▪ Event notification ▪ Lifecycle policies Scalable & affordable ▪ Store as much as you need ▪ Scale storage and compute independently ▪ No minimum usage commitments ▪ Pay for what you use Integrated ▪ Amazon Redshift/Amazon Redshift Spectrum ▪ Amazon EMR ▪ Amazon Athena ▪ Amazon DynamoDB
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Central storage Secure, cost effective storage in Amazon S3 Amazon S3 Catalog & search Access & search metadata DynamoDB Amazon ES Access & user interface Give your users easy & secure access Amazon API Gateway AWS IAM Amazon Cognito Protect & secure Use entitlements to ensure data is secure and users’ identities are verified AWS STS Amazon CloudWatch AWS CloudTrail AWS KMS Amazon Athena Amazon QuickSight Amazon EMR Amazon Redshift Processing & analytics Use predictive and prescriptive analytics to gain better understanding Amazon Kinesis Data Firehose AWS Direct Connect AWS Snowball AWS DMS Data ingestion Get your data into Amazon S3 quickly and securely
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Driving business outcomes Modern data architectures on AWS
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Outcome 1: modernize and consolidate • Insights to enhance business applications and create new digital services Outcome 2: innovate for new revenues • Personalization, demand forecasting, risk analysis Outcome 3: real-time engagement • Interactive customer experience, event-driven automation, fraud detection Outcome 4: automate for expansive reach • Automation of business processes and physical infrastructure Business outcomes on a modern data architecture
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A platform to build business outcomes from data Purchases Movement Influence Ingest/ Collect Consume/ Visualize Store Process/ Analyze Revenue lift Market acquisition Customer delight Brand advocacy Inventory optimization Supply chain efficiency
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (real-time) Ingest ServingData sources Scale (batch) Modernize and consolidate Insights to enhance business applications, new digital servicesStart with the business case and the personas Data analysts Data scientists Business users Engagement platforms Automation/events Transactions Web logs/ cookies ERP
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (real-time) Ingest ServingData sources Scale (batch) Modernize and consolidate Insights to enhance business applications, new digital servicesStart with the business case, and the personas Data warehouse Amazon Redshift Legacy apps Amazon RDS Schemaless Amazon ES direct query Athena Near-zero latency DynamoDB Semi/Unstructured Amazon EMR Amazon S3 Staged data (data lake)Transactions Web logs/ cookies ERP Data analysts Data scientists Business users Engagement platforms Automation/events
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (real-time) Ingest ServingData sources Scale (batch) Modernize and consolidate Insights to enhance business applications, new digital servicesProcess data for ETL, cleansing, tagging, and place into staged data (data lake) Data warehouse Amazon Redshift Legacy apps Amazon RDS Schemaless Amazon ES direct query Athena Near-zero latency DynamoDB Semi/Unstructured Amazon EMR Amazon S3 Staged data (data lake) AWS DMS DX Internet interfaces Amazon S3 Raw data AWS Glue ETL Lab 2 Lab 4 Transactions Web logs/ cookies ERP Data analysts Data scientists Business users Engagement platforms Automation/events Amazon QuickSight Amazon API Gateway
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (real-time) Ingest ServingData sources Scale (batch) Innovate for new revenues Insights to enhance business applications, new digital services Data warehouse Amazon Redshift Legacy apps Amazon RDS Schemaless Amazon ES direct query Athena Near-zero latency DynamoDB Semi/Unstructured Amazon EMR Amazon S3 Staged data (data lake) AWS DMS DX Internet interfaces Amazon S3 Raw data Advanced analytics Lab 3 AWS Glue ETL Transactions Web logs/ cookies ERP Data analysts Data scientists Business users Engagement platforms Automation/events
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (real-time) Ingest ServingData sources Scale (batch) Real-time engagement Events are captured in the speed layer Amazon S3 Staged data (data lake) AWS DMS DX Internet interfaces Amazon S3 Raw data Amazon Kinesis Connected devices Social media Event capture Kinesis Stream analysis Kinesis Data Analytics Lab 1 AWS Glue ETL Advanced analytics Transactions Web logs/ cookies ERP Data warehouse Amazon Redshift Legacy apps Amazon RDS Schemaless Amazon ES direct query Athena Near-zero latency DynamoDB Semi/Unstructured Amazon EMR Data analysts Data scientists Business users Engagement platforms Automation/events
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Securing a data lake
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Security ▪ AWS Identity and Access Management (IAM) policies ▪ Bucket policies ▪ Access control lists (ACLs) ▪ Private VPC endpoints to Amazon S3 ▪ Pre-signed Amazon S3 URLs Encryption ▪ SSL endpoints ▪ Server-side encryption (SSE-S3) ▪ Amazon S3 server-side encryption with provided keys (SSE-C, SSE-KMS) ▪ Client-side encryption Audit & compliance ▪ Buckets access logs ▪ Lifecycle management policies ▪ Versioning & MFA deletes ▪ Certifications—HIPAA, PCI, SOC 1/2/3, more Implement the right cloud security controls
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Art of the possible
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Broad range of possibilities Analytics services to support streaming & batch datasets Direct integration with artificial intelligence services Import data into machine learning/ deep learning modeling services
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analytics services on AWS Broadest and deepest portfolio purpose-built for builders Amazon EMR Amazon EC2 Amazon Glacier Amazon S3 Kinesis Amazon Redshift DynamoDB Collect Orchestrate Store Analyze AWS Lambda AWS IoT Core AWS Data Pipeline KinesisData Analytics Amazon SNS AWSSnowball Amazon SWF Athena AWS Glue Amazon Aurora Amazon QuickSight Amazon SageMaker AWSDirectConnect
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS AI/ML stack Frameworks & infrastructure AWS Deep Learning AMI GPU (P3 Instances) MobileCPU IoT (AWS Greengrass) Vision: Amazon Rekognition Image Amazon Rekognition Video Speech: Amazon Polly Amazon Transcribe Language: Amazon Lex, Amazon Translate, Amazon Comprehend Apache MXNet PyTorch Cognitive Toolkit Keras Caffe2 & Caffe TensorFlow Gluon Application services Platform Services Amazon Machine Learning Mechanical Turk Spark & Amazon EMR Amazon SageMaker AWS DeepLens
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis Data Firehose S3 bucket AthenaAmazon ML Twitter stream Amazon QuickSight Example use case: Social media analysis & visualization
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Recap—What we covered • Big data • Data lake • Achievable business outcomes • Securing a data lake • Art of the possible
  • 33. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Joyjeet Banerjee Enterprise Solutions Architect AWS
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Comprehend Natural language processing Data lake (Amazon S3) Amazon Comprehend Data lake (Amazon S3) Athena Natural language processing Amazon EMR Amazon QuickSight Unstructured text
  • 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Transcribe Audio to text analysis Lambda Data lake Athena Audio Input Amazon QuickSight Amazon Comprehend Amazon Transcribe
  • 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine translation NLP Data lake Amazon QuickSight Lambda Athena Social Media Analysis & Visualization Streams AWS Glue Amazon Kinesis Data Analytics Amazon Kinesis Data Firehose Kinesis Data Firehose Twitter
  • 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Outcome 1: Modernize and consolidate Common initiatives • Insights: 360° view of the business • Digitization: web service that gives on-demand insights • Data monetization: enrich, aggregate, and sell business data
  • 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Outcome 2: Innovate for new revenues Common initiatives • Personalization: refine market approaches on optimal segments • Predict demand: guide business owners to select best scenarios • Risk measurement: create freedom to act by quantifying exposures
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Outcome 3: Real-time engagement Common initiatives • Interactive CX: natural customer journeys with adaptive interfaces • Event-driven automation: triggered execution of business process • Fraud detection: protect customer and business interests