SlideShare ist ein Scribd-Unternehmen logo
1 von 66
Downloaden Sie, um offline zu lesen
Sungmin Kim
Solutions Architect, AWS
Analytics Immersion Day
Build BI System from Scratch
Itinerary to Analytics on
What is Architecting?
Art of Trade-off
Data
The world’s most
valuable resource is
no longer oil, but data.*
*Copyright: The Economist, 2017, David Parkins
“
”
Answers &
Insights
Collect
Consume
Store
Process/
Analyze
Data
14
09
5
*Copyright: Fractionating column, 2009 science-resources.co.uk
3+1 Vs of Big Data
Structured, Unstructured, and Semi-Structured
Data Temperature SpectrumStructure
Hot data Warm data Cold data
Low
High
High Request rate
LowHigh Cost / GB
Low HighLatency
Low HighData Volume
Low
In-Memory SQL
NoSQL Search
Object Storage
Archive
Storage
Graph
Simplify Big Data Processing
Collect ConsumeStore Process/
Analyze
Data
1 4
0 9
5
Answers &
Insights
Time to answer(Latency)
Throughput
Cost
Let’s build Business Intelligence System
Business Intelligence System
CRM
IoT
WEB
Messages
CDC*
Event Streams
* CDC: Change Data Capture
Business Intelligence System
QuickSightAmazon RDS
Relational
Databases
Flat Files
And Many Others!
Retail Data
Ops Data
Marketing Data
Amazon QuickSight
Fast BI Service with Pay-per-Session Pricing and ML Insights for everyone
Is it Well-Architected?
QuickSightAmazon RDS
Is it Well-Architected?
QuickSightAmazon RDS
AWS Well-Architecture Framework
Performance
efficiency
Cost
optimization
SecurityReliability
Operational
excellence
Set of questions you can use to evaluate how well an architecture is
aligned to AWS best practices
Is it Well-Architected?
QuickSightAmazon RDS
Operation excellence
ü Scale-up: CPU, Memory, Disk
ü Connection Pooling
ü Schema changes: alter table
ü Schema validation
Reliability
ü High availability: Master-Slave fail-over
ü Data loss
Performance efficiency
ü Slow SQL queries caused by poor
database indexing
ü Number of writes >>> Number of reads
Cost optimization
ü Management of RDMS
ü Compute(Instance), Storage cost
Security
ü Encryption data at rest and in transition
Key considerations for storing data
• Storage Volume
• Data Model
• Data Scheme
• Access Patterns
ü SQL Supported
ü Put/Get(key, value)
ü Range Query
ü Join Query
• Serverless or Managed Service
• Current skill set
Business Intelligence System
QuickSight
Redshift S3
Amazon RDS
DocumentDB
(with MongoDB
compatability)
DynamoDB
Object StorageData WarehouseNoSQL
Elasticsearch
Service
Business Intelligence System
QuickSight
Redshift S3Elasticsearch
Service
DocumentDB
(with MongoDB
compatability)
DynamoDB
Object StorageData WarehouseNoSQL
DynamoDB
Business Intelligence System
QuickSightDocumentDB (with
MongoDB compatibility)
Redshift S3Elasticsearch
Service
DocumentDB
(with MongoDB
compatability)
DynamoDB
Object StorageData WarehouseNoSQL
Business Intelligence System
QuickSightElasticsearch
Service
Redshift S3Elasticsearch
Service
DocumentDB
(with MongoDB
compatability)
DynamoDB
Object StorageData WarehouseNoSQL
Business Intelligence System
QuickSightRedshift
Redshift S3Elasticsearch
Service
DocumentDB
(with MongoDB
compatability)
DynamoDB
Object StorageData WarehouseNoSQL
Business Intelligence System
QuickSightS3
Redshift S3Elasticsearch
Service
DocumentDB
(with MongoDB
compatability)
DynamoDB
Object StorageData WarehouseNoSQL
Storage Engine Comparison
S3 Redshift Elasticsearch DocumentDB DynamoDB
Storage volume Unlimited
Limited
(~ tens of TB)
Limited
(~ tens of TB)
Limited
(~ tens of TB)
Unlimited
Data model Object Relational Document Document Key-value
Data scheme Schema-free Fixed schema Schema-free (JSON) Schema-free (JSON) (Key, value)
SQL supported Query engine dependent Yes Yes No No
Put/Get (key, value) Yes Yes Yes Yes Yes
Range query Query engine dependent Yes Yes Yes Difficult
Join query Query engine dependent Yes No Difficult Very Difficult
Serverless Yes No No Yes Yes
v https://db-engines.com/en/system/Amazon+DocumentDB%3BAmazon+DynamoDB%3BAmazon+Redshift%3BElasticsearch
Use Amazon S3 as Your Persistent Data Store
• Natively supported by big data frameworks (Spark, Hive, Presto, etc.)
• Decouple storage and compute
• No need to run compute clusters for storage (unlike HDFS)
• Can run transient Amazon EMR clusters with Amazon EC2 Spot
Instances
• Multiple & heterogeneous analysis clusters and services can use the
same data
• Designed for 99.999999999% durability
• No need to pay for data replication within a region
• Secure: SSL, client/server-side encryption at rest
• Low cost
Business Intelligence System
S3 QuickSight
Ingestion Query engine
Business Intelligence System
Kinesis
Data Firehose
S3 QuickSight
Amazon Kinesis Data Firehose
Prepare and load real-time data streams into data stores and analytics
tools
Business Intelligence System
Kinesis
Data Firehose
S3 QuickSight
Glue Athena EMR
Batch Interactive
Comparison of SQL Processing engines
Data Structure Semi Semi Semi Full
Languages API/SQL SQL SQL SQL
Data Store
S3 (Glue),
S3/HDFS (Spark)
S3/HDFS S3 Local
Use case Transformation
SQL Queries
for S3/HDFS
Serverless SQL
Queries for S3
Fully Featured
SQL Database
Performance
AWS Glue Amazon Athena Amazon Redshift
Business Intelligence System
Kinesis
Data Firehose
S3 Athena QuickSight
Data Processing: ETL vs ELT
Source1
Source2 Target
Source1
Source2
Staging
tables
Final
tables
Target
(MPP database)
Extract Transform Load
E → T → L
Extract & Load Transform
E → L → T
Source3Source3
How about Real-time Analytics?
Kinesis
Data Firehose
S3 Athena QuickSight
Real-time Analysis Layer
ü Streaming data processing
ü Latency: milliseconds ~ seconds
Messages
CDC
Event
Streams
Real-time Analytics
Kinesis
Data Firehose
S3 Athena QuickSight
Real-time Analysis Layer
SQS
Kinesis
Data Streams
MSK
ü Streaming data processing
ü Latency: milliseconds ~ seconds
• Decouple producers & consumers
• Persistent buffer
• Collect multiple streams
• Preserve client ordering
• Parallel consumption
• Streaming MapReduce
4 4 3 3 2 2 1 1
4 3 2 1
4 3 2 1
4 3 2 1
4 3 2 1
4 4 3 3 2 2 1 1
Producer 1
shard 1 / partition 1
shard 2 / partition 2
Consumer 1
Count of
red = 4
Count of
violet = 4
Consumer 2
Count of
blue = 4
Count of
green = 4
Producer 2
Producer 3
Producer n
Key = Red
Key = Green
Key = Blue
Key = Violet
Amazon Kinesis
Data Streams
Amazon Managed
Streaming for Kafka
…
Why Stream Storage?
• Decouple producers & consumers
• Persistent buffer
• Collect multiple streams
• No client ordering (standard)
• FIFO queue preserves client
ordering
• No streaming MapReduce
• No parallel consumption
• Amazon SNS can publish to
multiple SNS subscribers
(queues or Lambda functions)
Consumers
4 3 2 1
12344 3 2 1
1234
2134
13342
Standard
FIFO
Producers
Amazon SQS Queue
Publisher
Amazon SNS
Topic
AWS Lambda
function
Amazon SQS
queue
Queue
Subscriber
What about Amazon SQS?
Real-time Analytics
Kinesis
Data Firehose
S3 Athena QuickSight
Kinesis
Data Streams
Real-time Analysis Layer
ü Streaming data processing
ü Latency: milliseconds ~ seconds
Real-time Analytics
Kinesis
Data Firehose
S3 Athena QuickSight
Kinesis
Data Streams
Kinesis
Data Analytics
on EMR Elasticsearch
Service
Real-time Analysis Layer
on EMR
RDD Operator RDD RDD …Operator
Operator
Data
Sink
OperatorOperator
Data
Source
Amazon EMR
Applications
Framework
Process Layer
Data Layer
Infrastructure
S3
EMRFS
Amazon
S3
Instances Spot Instances
Real-time Analytics with Spark or Flink on EMR, RDS
Kinesis
Data Firehose
S3 Athena QuickSight
Kinesis
Data Streams
QuickSightEMR Amazon RDS
Real-time Analytics with Spark or Flink on EMR, DynamoDB
Kinesis
Data Firehose
S3 Athena QuickSight
Kinesis
Data Streams
EMR DynamoDB real-time
dashboard
Real-time Analytics with Spark or Flink on EMR, ElastiCache
Kinesis
Data Firehose
S3 Athena QuickSight
Kinesis
Data Streams
EMR real-time
dashboard
ElastiCache
Amazon Kinesis Data Analytics
Easily process and analyze streaming data with standard SQL
Real-time Analytics with Kinesis Data Analytics, RDS
Kinesis
Data Firehose
S3 Athena QuickSight
Kinesis
Data Streams
Kinesis
Data Analytics
Lambda
function
QuickSightAmazon RDSKinesis
Data Streams
Real-time Analytics with Kinesis Data Analytics, DynamoDB
Kinesis
Data Firehose
S3 Athena QuickSight
Kinesis
Data Streams
Kinesis
Data Analytics
Lambda
function
Kinesis
Data Streams
real-time
dashboard
DynamoDB
Real-time Analytics with Kinesis Data Analytics, ElastiCache
Kinesis
Data Firehose
S3 Athena QuickSight
Kinesis
Data Streams
Kinesis
Data Analytics
Lambda
function
Kinesis
Data Streams
real-time
dashboard
ElastiCache
Amazon Elasticsearch Service
Fully managed, scalable, and secure Elasticsearch service
Real-time Analytics with Elasticsearch Service
Kinesis
Data Firehose
S3 Athena QuickSight
Kinesis
Data Streams
Lambda function Elasticsearch Service Kibana
3 ways to build Real-time Analytics
Kinesis
Data Streams
Lambda
function
Elasticsearch Service Kibana
EMR
real-time
dashboard
ElastiCache
Kinesis
Data Analytics
Lambda
function
QuickSight
Amazon RDS
Kinesis
Data Streams
DynamoDB
1
2
3
Business Intelligence System
Kinesis
Data Firehose
S3 Athena QuickSight
Kinesis
Data Streams
Lambda function Elasticsearch Service Kibana
<demo>
https://github.com/ksmin23/aws-analytics-immersion-day
</demo>
Kinesis
Data Firehose
S3 Athena QuickSight
Kinesis
Data Streams
Lambda function Elasticsearch Service Kibana
Batch Layer Serving Layer
Speed Layer
Business Intelligence System
Lambda Architecture
Lambda vs Kappa Architecture
Lambda
Kappa
Beyond Data Analytics: Expand Business Intelligence
• Network(Graph) Analysis
• Social networking
• Recommendation engines
• Fraud detection
• Knowledge Graphs
• Recommendation
• Personalized recommendations
• Personalized search
• Personalized notifications
• Machine Learning
• Forecast(Predictive Analytics)
• Classification
• Sentiment Analysis
• and more …
Amazon Neptune
Fully managed graph database
Social Networking Recommendation Engines
Fraud Detection Knowledge Graphs
Beyond Data Analytics – Network Analysis(with Neptune)
Kinesis
Data Firehose
S3 Athena QuickSight
Kinesis
Data Streams
Lambda function Elasticsearch Service Kibana
NeptuneLambda function API GatewayLambda function
Deliver high-quality
recommendations
Deliver
personalization in
days, not months
Real-time Works with any
product or content
Amazon Personalize
Real-time personalization and recommendation, based on the same
technology used at Amazon.com
Beyond Data Analytics: Recommendation(with Personalize)
Kinesis
Data Firehose
S3 Athena QuickSight
Kinesis
Data Streams
Lambda function Elasticsearch Service Kibana
S3
ElastiCache API Gateway
Event
(time-based)
Lambda
function
Personalize
Lambda function
EMR
Fully managed
hosting with
auto scaling
One-click
deployment
Notebook
instances
Built-in, high-
performance
algorithms
Automatic
model tuning
One-click
training
Build Train Deploy
Amazon SageMaker
Build, Train, and Deploy Machine Learning Models Quickly & Easily, at
Scale
Beyond Data Analytics: Machine Learning(with SageMaker)
Kinesis Data Firehose S3 Athena QuickSight
Kinesis
Data Streams
Lambda function Elasticsearch Service Kibana
TrainNotebook
Model
Models
bucket
SageMaker
Endpoint
SageMaker
Business
Intelligence
System
Journey to Analytics on
Collect ConsumeStore Process/
Analyze
Data
1 4
0 9
5 Answers &
Insights
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Streams
Amazon Managed
Streams for Kafka
Amazon S3
Amazon Kinesis
Data Analytics
AWS Glue
Amazon EMR
Amazon Athena Amazon QuickSight
Amazon Redshift
Amazon Elasticsearch
Service
Amazon Machine
Learning
AWS Lambda
Lessons Learned: Architectural Principles
• Build decoupled systems
- Data → Store → Process → Store → Analyze → Answers
• Use the right tool for the job
- Data structure, latency, throughput, access patterns
• Leverage managed and serverless services
- Scalable/elastic, available, reliable, secure, no/low admin
• Use log-centric design patterns
- Immutable logs (data lake), materialized views
• Be cost-conscious
- Big data ≠ Big cost
• Working backwards
- Design from consume to collect
Learn more
- Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AWS re:Invent 2018
https://www.slideshare.net/AmazonWebServices/big-data-analytics-architectural-patterns-and-best-
practices-ant201r1-aws-reinvent-2018
- Everything You Need to Know About Big Data: From Architectural Principles to Best Practices
https://www.slideshare.net/AmazonWebServices/everything-you-need-to-know-about-big-data-
from-architectural-principles-to-best-practices
- Big Data Analytics Options on AWS
https://d1.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf
- AWS Big Data Blog
https://aws.amazon.com/ko/blogs/big-data
- AWS Well-Architected Labs
https://wellarchitectedlabs.com

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Introduction to AWS VPC, Guidelines, and Best Practices
Introduction to AWS VPC, Guidelines, and Best PracticesIntroduction to AWS VPC, Guidelines, and Best Practices
Introduction to AWS VPC, Guidelines, and Best Practices
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Introduction to Amazon S3
Introduction to Amazon S3Introduction to Amazon S3
Introduction to Amazon S3
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
AWS Storage - S3 Fundamentals
AWS Storage - S3 FundamentalsAWS Storage - S3 Fundamentals
AWS Storage - S3 Fundamentals
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Deep dive ECS & Fargate Deep Dive
Deep dive ECS & Fargate Deep DiveDeep dive ECS & Fargate Deep Dive
Deep dive ECS & Fargate Deep Dive
 
Deep Dive on AWS Lambda
Deep Dive on AWS LambdaDeep Dive on AWS Lambda
Deep Dive on AWS Lambda
 
Migrating Your AD to the Cloud with AWS Directory Services for Microsoft Acti...
Migrating Your AD to the Cloud with AWS Directory Services for Microsoft Acti...Migrating Your AD to the Cloud with AWS Directory Services for Microsoft Acti...
Migrating Your AD to the Cloud with AWS Directory Services for Microsoft Acti...
 
DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
Deep Dive: Amazon RDS
Deep Dive: Amazon RDSDeep Dive: Amazon RDS
Deep Dive: Amazon RDS
 
Fraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWSFraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWS
 
Using AWS Key Management Service for Secure Workloads
Using AWS Key Management Service for Secure WorkloadsUsing AWS Key Management Service for Secure Workloads
Using AWS Key Management Service for Secure Workloads
 
Disaster Recovery Options with AWS
Disaster Recovery Options with AWSDisaster Recovery Options with AWS
Disaster Recovery Options with AWS
 
복잡한 권한신청문제 ConsoleMe로 해결하기 - 손건 (AB180) :: AWS Community Day Online 2021
복잡한 권한신청문제 ConsoleMe로 해결하기 - 손건 (AB180) :: AWS Community Day Online 2021복잡한 권한신청문제 ConsoleMe로 해결하기 - 손건 (AB180) :: AWS Community Day Online 2021
복잡한 권한신청문제 ConsoleMe로 해결하기 - 손건 (AB180) :: AWS Community Day Online 2021
 
Aws concepts-power-point-slides
Aws concepts-power-point-slidesAws concepts-power-point-slides
Aws concepts-power-point-slides
 
Introduction to EC2
Introduction to EC2Introduction to EC2
Introduction to EC2
 

Ähnlich wie AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full Version)

Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Amazon Web Services
 

Ähnlich wie AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full Version) (20)

Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
 
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
 
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
 

Mehr von Sungmin Kim

Mehr von Sungmin Kim (14)

Build Computer Vision Applications with Amazon Rekognition and SageMaker
Build Computer Vision Applications with Amazon Rekognition and SageMakerBuild Computer Vision Applications with Amazon Rekognition and SageMaker
Build Computer Vision Applications with Amazon Rekognition and SageMaker
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
End-to-End Machine Learning with Amazon SageMaker
End-to-End Machine Learning with Amazon SageMakerEnd-to-End Machine Learning with Amazon SageMaker
End-to-End Machine Learning with Amazon SageMaker
 
1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기
 
AWS re:Invent 2020 Awesome AI/ML Services
AWS re:Invent 2020 Awesome AI/ML ServicesAWS re:Invent 2020 Awesome AI/ML Services
AWS re:Invent 2020 Awesome AI/ML Services
 
AWS Personalize 중심으로 살펴본 추천 시스템 원리와 구축
AWS Personalize 중심으로 살펴본 추천 시스템 원리와 구축AWS Personalize 중심으로 살펴본 추천 시스템 원리와 구축
AWS Personalize 중심으로 살펴본 추천 시스템 원리와 구축
 
Starup을 위한 AWS AI/ML 서비스 활용 방법
Starup을 위한 AWS AI/ML 서비스 활용 방법Starup을 위한 AWS AI/ML 서비스 활용 방법
Starup을 위한 AWS AI/ML 서비스 활용 방법
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
 
Octember on AWS (Revised Edition)
Octember on AWS (Revised Edition)Octember on AWS (Revised Edition)
Octember on AWS (Revised Edition)
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
Amazon Athena 사용 팁
Amazon Athena 사용 팁Amazon Athena 사용 팁
Amazon Athena 사용 팁
 
Databases & Analytics AWS re:invent 2019 Recap
Databases & Analytics AWS re:invent 2019 RecapDatabases & Analytics AWS re:invent 2019 Recap
Databases & Analytics AWS re:invent 2019 Recap
 
Octember on AWS
Octember on AWSOctember on AWS
Octember on AWS
 
AI/ML re:invent 2019 recap at Delivery Hero Korea
AI/ML re:invent 2019 recap at Delivery Hero KoreaAI/ML re:invent 2019 recap at Delivery Hero Korea
AI/ML re:invent 2019 recap at Delivery Hero Korea
 

Kürzlich hochgeladen

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 

Kürzlich hochgeladen (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 

AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full Version)

  • 1. Sungmin Kim Solutions Architect, AWS Analytics Immersion Day Build BI System from Scratch
  • 4. Data The world’s most valuable resource is no longer oil, but data.* *Copyright: The Economist, 2017, David Parkins “ ”
  • 6. 3+1 Vs of Big Data
  • 8. Data Temperature SpectrumStructure Hot data Warm data Cold data Low High High Request rate LowHigh Cost / GB Low HighLatency Low HighData Volume Low In-Memory SQL NoSQL Search Object Storage Archive Storage Graph
  • 9. Simplify Big Data Processing Collect ConsumeStore Process/ Analyze Data 1 4 0 9 5 Answers & Insights Time to answer(Latency) Throughput Cost
  • 10. Let’s build Business Intelligence System
  • 13. Relational Databases Flat Files And Many Others! Retail Data Ops Data Marketing Data Amazon QuickSight Fast BI Service with Pay-per-Session Pricing and ML Insights for everyone
  • 16. AWS Well-Architecture Framework Performance efficiency Cost optimization SecurityReliability Operational excellence Set of questions you can use to evaluate how well an architecture is aligned to AWS best practices
  • 17. Is it Well-Architected? QuickSightAmazon RDS Operation excellence ü Scale-up: CPU, Memory, Disk ü Connection Pooling ü Schema changes: alter table ü Schema validation Reliability ü High availability: Master-Slave fail-over ü Data loss Performance efficiency ü Slow SQL queries caused by poor database indexing ü Number of writes >>> Number of reads Cost optimization ü Management of RDMS ü Compute(Instance), Storage cost Security ü Encryption data at rest and in transition
  • 18. Key considerations for storing data • Storage Volume • Data Model • Data Scheme • Access Patterns ü SQL Supported ü Put/Get(key, value) ü Range Query ü Join Query • Serverless or Managed Service • Current skill set
  • 19. Business Intelligence System QuickSight Redshift S3 Amazon RDS DocumentDB (with MongoDB compatability) DynamoDB Object StorageData WarehouseNoSQL Elasticsearch Service
  • 20. Business Intelligence System QuickSight Redshift S3Elasticsearch Service DocumentDB (with MongoDB compatability) DynamoDB Object StorageData WarehouseNoSQL DynamoDB
  • 21. Business Intelligence System QuickSightDocumentDB (with MongoDB compatibility) Redshift S3Elasticsearch Service DocumentDB (with MongoDB compatability) DynamoDB Object StorageData WarehouseNoSQL
  • 22. Business Intelligence System QuickSightElasticsearch Service Redshift S3Elasticsearch Service DocumentDB (with MongoDB compatability) DynamoDB Object StorageData WarehouseNoSQL
  • 23. Business Intelligence System QuickSightRedshift Redshift S3Elasticsearch Service DocumentDB (with MongoDB compatability) DynamoDB Object StorageData WarehouseNoSQL
  • 24. Business Intelligence System QuickSightS3 Redshift S3Elasticsearch Service DocumentDB (with MongoDB compatability) DynamoDB Object StorageData WarehouseNoSQL
  • 25. Storage Engine Comparison S3 Redshift Elasticsearch DocumentDB DynamoDB Storage volume Unlimited Limited (~ tens of TB) Limited (~ tens of TB) Limited (~ tens of TB) Unlimited Data model Object Relational Document Document Key-value Data scheme Schema-free Fixed schema Schema-free (JSON) Schema-free (JSON) (Key, value) SQL supported Query engine dependent Yes Yes No No Put/Get (key, value) Yes Yes Yes Yes Yes Range query Query engine dependent Yes Yes Yes Difficult Join query Query engine dependent Yes No Difficult Very Difficult Serverless Yes No No Yes Yes v https://db-engines.com/en/system/Amazon+DocumentDB%3BAmazon+DynamoDB%3BAmazon+Redshift%3BElasticsearch
  • 26. Use Amazon S3 as Your Persistent Data Store • Natively supported by big data frameworks (Spark, Hive, Presto, etc.) • Decouple storage and compute • No need to run compute clusters for storage (unlike HDFS) • Can run transient Amazon EMR clusters with Amazon EC2 Spot Instances • Multiple & heterogeneous analysis clusters and services can use the same data • Designed for 99.999999999% durability • No need to pay for data replication within a region • Secure: SSL, client/server-side encryption at rest • Low cost
  • 27. Business Intelligence System S3 QuickSight Ingestion Query engine
  • 29. Amazon Kinesis Data Firehose Prepare and load real-time data streams into data stores and analytics tools
  • 30. Business Intelligence System Kinesis Data Firehose S3 QuickSight Glue Athena EMR Batch Interactive
  • 31. Comparison of SQL Processing engines Data Structure Semi Semi Semi Full Languages API/SQL SQL SQL SQL Data Store S3 (Glue), S3/HDFS (Spark) S3/HDFS S3 Local Use case Transformation SQL Queries for S3/HDFS Serverless SQL Queries for S3 Fully Featured SQL Database Performance AWS Glue Amazon Athena Amazon Redshift
  • 32. Business Intelligence System Kinesis Data Firehose S3 Athena QuickSight
  • 33. Data Processing: ETL vs ELT Source1 Source2 Target Source1 Source2 Staging tables Final tables Target (MPP database) Extract Transform Load E → T → L Extract & Load Transform E → L → T Source3Source3
  • 34. How about Real-time Analytics? Kinesis Data Firehose S3 Athena QuickSight Real-time Analysis Layer ü Streaming data processing ü Latency: milliseconds ~ seconds Messages CDC Event Streams
  • 35. Real-time Analytics Kinesis Data Firehose S3 Athena QuickSight Real-time Analysis Layer SQS Kinesis Data Streams MSK ü Streaming data processing ü Latency: milliseconds ~ seconds
  • 36. • Decouple producers & consumers • Persistent buffer • Collect multiple streams • Preserve client ordering • Parallel consumption • Streaming MapReduce 4 4 3 3 2 2 1 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 4 3 3 2 2 1 1 Producer 1 shard 1 / partition 1 shard 2 / partition 2 Consumer 1 Count of red = 4 Count of violet = 4 Consumer 2 Count of blue = 4 Count of green = 4 Producer 2 Producer 3 Producer n Key = Red Key = Green Key = Blue Key = Violet Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka … Why Stream Storage?
  • 37. • Decouple producers & consumers • Persistent buffer • Collect multiple streams • No client ordering (standard) • FIFO queue preserves client ordering • No streaming MapReduce • No parallel consumption • Amazon SNS can publish to multiple SNS subscribers (queues or Lambda functions) Consumers 4 3 2 1 12344 3 2 1 1234 2134 13342 Standard FIFO Producers Amazon SQS Queue Publisher Amazon SNS Topic AWS Lambda function Amazon SQS queue Queue Subscriber What about Amazon SQS?
  • 38. Real-time Analytics Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams Real-time Analysis Layer ü Streaming data processing ü Latency: milliseconds ~ seconds
  • 39. Real-time Analytics Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams Kinesis Data Analytics on EMR Elasticsearch Service Real-time Analysis Layer on EMR
  • 40. RDD Operator RDD RDD …Operator Operator Data Sink OperatorOperator Data Source Amazon EMR Applications Framework Process Layer Data Layer Infrastructure S3 EMRFS Amazon S3 Instances Spot Instances
  • 41. Real-time Analytics with Spark or Flink on EMR, RDS Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams QuickSightEMR Amazon RDS
  • 42. Real-time Analytics with Spark or Flink on EMR, DynamoDB Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams EMR DynamoDB real-time dashboard
  • 43. Real-time Analytics with Spark or Flink on EMR, ElastiCache Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams EMR real-time dashboard ElastiCache
  • 44. Amazon Kinesis Data Analytics Easily process and analyze streaming data with standard SQL
  • 45. Real-time Analytics with Kinesis Data Analytics, RDS Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams Kinesis Data Analytics Lambda function QuickSightAmazon RDSKinesis Data Streams
  • 46. Real-time Analytics with Kinesis Data Analytics, DynamoDB Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams Kinesis Data Analytics Lambda function Kinesis Data Streams real-time dashboard DynamoDB
  • 47. Real-time Analytics with Kinesis Data Analytics, ElastiCache Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams Kinesis Data Analytics Lambda function Kinesis Data Streams real-time dashboard ElastiCache
  • 48. Amazon Elasticsearch Service Fully managed, scalable, and secure Elasticsearch service
  • 49. Real-time Analytics with Elasticsearch Service Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams Lambda function Elasticsearch Service Kibana
  • 50. 3 ways to build Real-time Analytics Kinesis Data Streams Lambda function Elasticsearch Service Kibana EMR real-time dashboard ElastiCache Kinesis Data Analytics Lambda function QuickSight Amazon RDS Kinesis Data Streams DynamoDB 1 2 3
  • 51. Business Intelligence System Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams Lambda function Elasticsearch Service Kibana
  • 53. Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams Lambda function Elasticsearch Service Kibana Batch Layer Serving Layer Speed Layer Business Intelligence System Lambda Architecture
  • 54. Lambda vs Kappa Architecture Lambda Kappa
  • 55. Beyond Data Analytics: Expand Business Intelligence • Network(Graph) Analysis • Social networking • Recommendation engines • Fraud detection • Knowledge Graphs • Recommendation • Personalized recommendations • Personalized search • Personalized notifications • Machine Learning • Forecast(Predictive Analytics) • Classification • Sentiment Analysis • and more …
  • 56. Amazon Neptune Fully managed graph database Social Networking Recommendation Engines Fraud Detection Knowledge Graphs
  • 57. Beyond Data Analytics – Network Analysis(with Neptune) Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams Lambda function Elasticsearch Service Kibana NeptuneLambda function API GatewayLambda function
  • 58. Deliver high-quality recommendations Deliver personalization in days, not months Real-time Works with any product or content Amazon Personalize Real-time personalization and recommendation, based on the same technology used at Amazon.com
  • 59. Beyond Data Analytics: Recommendation(with Personalize) Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams Lambda function Elasticsearch Service Kibana S3 ElastiCache API Gateway Event (time-based) Lambda function Personalize Lambda function EMR
  • 60. Fully managed hosting with auto scaling One-click deployment Notebook instances Built-in, high- performance algorithms Automatic model tuning One-click training Build Train Deploy Amazon SageMaker Build, Train, and Deploy Machine Learning Models Quickly & Easily, at Scale
  • 61. Beyond Data Analytics: Machine Learning(with SageMaker) Kinesis Data Firehose S3 Athena QuickSight Kinesis Data Streams Lambda function Elasticsearch Service Kibana TrainNotebook Model Models bucket SageMaker Endpoint SageMaker
  • 64. Collect ConsumeStore Process/ Analyze Data 1 4 0 9 5 Answers & Insights Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Managed Streams for Kafka Amazon S3 Amazon Kinesis Data Analytics AWS Glue Amazon EMR Amazon Athena Amazon QuickSight Amazon Redshift Amazon Elasticsearch Service Amazon Machine Learning AWS Lambda
  • 65. Lessons Learned: Architectural Principles • Build decoupled systems - Data → Store → Process → Store → Analyze → Answers • Use the right tool for the job - Data structure, latency, throughput, access patterns • Leverage managed and serverless services - Scalable/elastic, available, reliable, secure, no/low admin • Use log-centric design patterns - Immutable logs (data lake), materialized views • Be cost-conscious - Big data ≠ Big cost • Working backwards - Design from consume to collect
  • 66. Learn more - Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AWS re:Invent 2018 https://www.slideshare.net/AmazonWebServices/big-data-analytics-architectural-patterns-and-best- practices-ant201r1-aws-reinvent-2018 - Everything You Need to Know About Big Data: From Architectural Principles to Best Practices https://www.slideshare.net/AmazonWebServices/everything-you-need-to-know-about-big-data- from-architectural-principles-to-best-practices - Big Data Analytics Options on AWS https://d1.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf - AWS Big Data Blog https://aws.amazon.com/ko/blogs/big-data - AWS Well-Architected Labs https://wellarchitectedlabs.com