Weitere ähnliche Inhalte
Ähnlich wie 在 AWS 上構建無服務器分析
Ähnlich wie 在 AWS 上構建無服務器分析 (20)
Mehr von Amazon Web Services
Mehr von Amazon Web Services (20)
在 AWS 上構建無服務器分析
- 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building Serverless Analytics on AWS
Ivan Cheng
Solutions Architect
AWS
Steven Hsieh
Engineer
TrendMicro
- 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
COLLECT STORE
PROCESS/
ANALYZE
CONSUME
Data Answers
Time to answer (Latency)
Throughput
Cost
Data Processing START HERE
WITH A BUSINESS CASE
- 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
To answer new questions quickly, we look to a
modern data architecture design
Massive upfront costs
Overprovisioned capacity
Long implementation times
Pay as you go, for what you use
Decoupled pipelines and engines
Experimentation platform
Ingest/
Collect
Consume/
visualize
Store Process/
analyze
1 4
0 9
5
- 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Is Changing Analytics Are Adopting
Capture and store
new data at PB-EB
scale
Do new type of analytics
in a cost effective way
• Machine learning
• Big data processing
• Real-time analytics
• Full-text search
New types of
analytics
- 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
More data lakes and analytics than anywhere else
More than 10,000 data lakes on AWS
- 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Movement
Analytics
AWS Analytics Portfolio
Broadest and deepest portfolio, purpose-built for builders
+ 10 more
Redshift
EMR (Spark &
Hadoop)
Athena
Elasticsearch
Service
Kinesis Data
Analytics
Glue (Spark &
Python)
S3/Glacier GlueLake
Formation
Visualization, Engagement, & Machine Learning
QuickSight SageMaker Comprehend Lex Polly Rekognition Translate Transcribe
Deep Learning
AMIs
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka
Data Lake Infrastructure & Management
- 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Snowball
Snowmobile Kinesis
Data Firehose
Kinesis
Data Streams
S3
Redshift
EMR
Athena Kinesis
Elasticsearch Service
Kinesis
Video Streams
AI Services
QuickSight
Durable and available; Exabyte scale
Secure, compliant, auditable
Rapid ingest and transformation
Schema on read
Decoupling of compute and storage
On-demand resources, tiering, cost choices
Data Lake Robust Infrastructure
- 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingest Consume
Amazon Kinesis
BI Tools
Data Analytics Pipeline
Database
Migration Service
AWS Snowball
Amazon MSK
Amazon
Athena
Amazon
EMR
Amazon
Redshift
Amazon
Elasticsearch
Process & Analyze
Jupyter
Notebooks
Amazon
API Gateway
Amazon
QuickSight
Catalog
AWS Glue
Store
Amazon S3
Store
Amazon S3
Data sources
Web logs /
cookies
ERP
Connected
devices
- 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Virtual
machines
Managed
services
Serverless
Cloud Services Evolution
- 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless analytics
Deliver on-demand analytics on the data lake
S3
Data lake
Glue
(ETL &
Data Catalog)
Athena
QuickSight
Serverless. Zero
infrastructure. Zero
administration
Never pay for
idle resources
$
Availability and
fault tolerance
built in
Automatically scales
resources with
usage
AI/ML
Devices Web Sensors Social
Kinesis Data
Firehose
- 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena-Interactive Analysis
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
Supports Multiple Data Formats – Define Schema on Demand
Fast. Really
Fast.
Interactive performance
even for large datasets.
Athena automatically
executes queries in parallel,
so most results come back
within seconds.
Open. Powerful.
Standard
Start Querying
Instantly
Pay Per Query
Athena is serverless. Just
point to your data in
Amazon S3, define the
schema, and start querying
using the built-in query
editor.
Amazon Athena uses Presto
with ANSI SQL support and
works with a variety of
standard data formats,
including CSV, JSON, ORC,
Avro, and Parquet
With Amazon Athena, you
pay only for the queries that
you run. You are charged $5
per terabyte scanned by your
queries.
- 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3 Amazon Athena
Data catalog
Data Engineer Data Consumer
AWS Tools and SDKs
AWS Management Console
Amazon QuickSight
Amazon SageMaker
User
Analyst
Data Scientist
- 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data consumption – Automated Reporting
athena.startQueryExecution("SELECT * FROM business_view”)
1
2
3 4
5
1. Schedule query
2. Track QueryID for status
3. Query results to Amazon S3
4. New file trigger
5. Job complete notification
Email
notification
Query_ID
- 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Athena Workgroups
Athena Workgroups are used to isolate queries
between different teams, workloads or applications,
and to set limits on amount of data each query or the
entire workgroup can process
Workload Isolation Query Metrics Cost Controls
- 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Visualize your data with your favorite tools
Featured Athena Partners
Amazon QuickSight
- 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why QuickSight
Scalable
From 10 users to 10,000, QuickSight seamlessly grows
with you with no need for additional servers or
infrastructure.
No Servers to Manage
QuickSight is a fully managed cloud service. There is
no infrastructure to maintain or upgrade and no
upfront costs.
Fully integrated
QuickSight integrates with your other AWS services
and data sources giving you everything you need to
build an end-to-end cloud analytics solution.
Pay For What You Use
Instead of buying costly licenses for all of your users,
QuickSight allows you to share dashboards and reports
and only pay when users access them.
- 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Connect to your data, wherever it is
QuickSight allows you to connect to AWS data sources, Private VPC subnets, on-premise and
hosted databases and third party business applications.
On-premises
Securely connect to on-premise
databases and flat files like
Excel and CSV
In the cloud
Connect to hosted database, big
data formats, and secure VPCs
Applications
Connect directly to third
party business applications
• Salesforce
• Square
• Adobe Analytics
• Jira
• ServiceNow
• Twitter
• Github
• Redshift
• RDS
• S3
• Athena
• Aurora
• Teradata
• MySQL
• Presto
• Spark
• SQL Server
• Postgre SQL
• MariaDB
• Snowflake
• IoT Analytics
• Excel
• CSV
• Teradata
• MySQL
• SQL Server
• PostgreSQL
- 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
- 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3
(Processed Data)
Amazon
Athena
Amazon
QuickSight
Demo Scenario
Glue Data
catalog
- 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building AWS Multi Account Cost
Analytics Solution at Scale
Steven Hsieh
Engineer
TrendMicro
- 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
- 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
About Me
Steven Hsieh
- 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Background
- 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pillars of
- 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Design Principles for Cost Optimization
• Adopt a consumption model
• Measure overall efficiency
• Stop spending money on data center operations
• Analyze and attribute expenditure
• Use managed services to reduce cost of ownership
Pay as you go / need
- 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges
Large Scale Accounts
• Almost 400 accounts
• Hard management via
AWS console
Multiple Data Sources
• Billing data
• Utilization data of AWS
services ( e.g., EC2, S3)
- 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges
Permission Management
• Multiple teams
• Authorization of
different team
Insight for Better Design
• Finding insight for
design improvement
• Providing utilization
visibility for design
change
- 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Other solution we have tried…
AWS Billing Console
• Hard to use in large
scale
• Single data source
Amazon Redshift
• Cost Model
• ETL
3rd Party BI Tool
• Expensive license
fee
• Additional
operation cost
- 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ideas
+ +
• Data persistence in Amazon S3
• Data querying via Amazon Athena
• Dashboard / Reporting via Amazon QuickSight
- 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges
- 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Global Accelerator
- 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Using SQS to trigger parallel tasks
• Lambda limitation:
• Timeout: 15 minutes
• /tmp: 512 MB
• Spot instance interruptions
• Fargate limitation:
• Container storage: 10 GB
• Run-task: 10
• Using assume role to collect data
across accounts
- 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Using SNS to trace data
uploading result
• Preprocessing data before
uploading to S3
• Only creator can modify
datasets in QuickSight
• Create view in Athena
- 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Global Accelerator
• Web application host in
Fargate
• Lambda Integration with
QuickSight for embedded
URL.
• Using ALB to handle all
HTTPS interaction.
• Permission & Metadata in
DynamoDB
• ADFS Federation using
Cognito
• Performance Improvement
via AWS Global Accelerator
• Web Security Enhancement
via AWS WAF
- 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Quick Development & Evaluation
- 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Low Utilization & Right Sizing
• Trusted Advisor Checks
• Low utilization EC2 instances: CPU was 10% or less and
network I/O was 5 MB or less on 4 or more days during last
14 days
• Right Sizing
• Analysis metric data to recommend proper instance type and
size
• Awareness of NIC driver and Linux virtualization type issue
- 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Saving Polar Bear
• Analyzing the CPU utilization pattern
• Tuning off non-production instances can saving almost
70% cost
- 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recap
• Using cost effective way to build the end-to-end BI
solution
• 2 power users $36 + ALB $18 = $54
• Using flexible reporting architecture to integrate with
multiple data sources
• Quick win & timely data driven decision
• Validating innovation idea (e.g., the potential saving of polar bear
project)
- 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
• More organizations building datalake on cloud to stay competitive
• AWS provides the broadest and deepest portfolio of databases and
analytics services includes machine learning.
• Serverless Analytics helps you build modern data pipeline with
increased agility and lower cost.
• Learn more at: https://aws.amazon.com/big-data/datalakes-and-analytics/
- 41. Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ivan Cheng
Solutions Architect
AWS
Steven Hsieh
Engineer
TrendMicro