SlideShare a Scribd company logo
1 of 62
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pavan Pothukuchi, Principal Product Manager , AWS
September 20, 2016
Deep Dive: Amazon Redshift for Big Data
Analytics
Agenda
• Service Overview
• Best Practices
• Schema / Table Design
• Data Ingestion
• Database Tuning
• Migration
• Examples
Service Overview
Relational data warehouse
Massively parallel; petabyte scale
Fully managed
HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
Selected Amazon Redshift customers
Amazon Redshift system architecture
Leader node
• SQL endpoint
• Stores metadata
• Coordinates query execution
Compute nodes
• Local, columnar storage
• Execute queries in parallel
• Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB, Amazon EMR, or SSH
Two hardware platforms
• Optimized for data processing
• DS2: HDD; scale from 2TB to 2PB
• DC1: SSD; scale from 160GB to 326TB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
A deeper look at compute node architecture
Each node contains multiple slices
• DS2 – 2 slices on XL, 16 on 8XL
• DC1 – 2 slices on L, 32 on 8XL
A slice can be thought as a “virtual
compute node”
• Unit of data partitioning
• Parallel query processing
Facts about slices:
• Each compute node has either 2,
16, or 32 slices
• Table rows are distributed to slices
• A slice processes only its own data
Leader Node
Amazon Redshift dramatically reduces I/O
Data compression
Zone maps
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• Calculating SUM(Amount) with
row storage:
– Need to read everything
– Unnecessary I/O
ID Age State Amount
Amazon Redshift dramatically reduces I/O
Data compression
Zone maps
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• Calculating SUM(Amount) with
column storage:
– Only scan the necessary
blocks
ID Age State Amount
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
• Columnar compression
– Effective due to like data
– Reduces storage
requirements
– Reduces I/O
ID Age State Amount
analyze compression orders;
Table | Column | Encoding
--------+-------------+----------
orders | id | mostly32
orders | age | mostly32
orders | state | lzo
orders | amount | mostly32
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
• In-memory block metadata
• Contains per-block MIN and MAX value
• Effectively prunes blocks which don’t
contain data for a given query
• Minimize unnecessary I/O
ID Age State Amount
Best Practices: Schema Design
Data Distribution
• Distribution style is a table property which dictates how that table’s data is
distributed throughout the cluster:
• KEY: Value is hashed, same value goes to same location (slice)
• ALL: Full table data goes to first slice of every node
• EVEN: Round robin
• Goals:
• Distribute data evenly for parallel processing
• Minimize data movement during query processing
KEY
ALL
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
EVEN
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M James White
306 F Lisa Green
2
3
4
ID Gender Name
101 M John Smith
306 F Lisa Green
ID Gender Name
292 F Jane Jones
209 M James White
ID Gender Name
139 M Peter Black
164 M Brian Snail
ID Gender Name
446 M Pat Partridge
658 F Sarah Cyan
Round
Robin
DISTSTYLE EVEN
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M James White
306 F Lisa Green
Hash
Function
ID Gender Name
101 M John Smith
306 F Lisa Green
ID Gender Name
292 F Jane Jones
209 M James White
ID Gender Name
139 M Peter Black
164 M Brian Snail
ID Gender Name
446 M Pat Partridge
658 F Sarah Cyan
DISTSTYLE KEY
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M James White
306 F Lisa Green
Hash
Function
ID Gender Name
101 M John Smith
139 M Peter Black
446 M Pat Partridge
164 M Brian Snail
209 M James White
ID Gender Name
292 F Jane Jones
658 F Sarah Cyan
306 F Lisa Green
DISTSTYLE KEY
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M James White
306 F Lisa Green
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M Lisa Green
306 F James White
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M Lisa Green
306 F James White
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M Lisa Green
306 F James White
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M Lisa Green
306 F James White
ALL
DISTSTYLE ALL
CUSTOMERS
CUST_ID GENDER NAME
101 M John Smith
306 F James White
ORDERS
ORDER_ID CUST_ID Amount
A1600 101 120
B8765 306 340
RESULTS
CUST_ID GENDER Amount
101 M 120
306 F 340
CUSTOMERS
CUST_ID GENDER NAME
292 F Jane Jones
209 M Lyall Green
ORDERS
ORDER_ID CUST_ID Amount
C0967 292 750
D8753 209 601
RESULTS
CUST_ID GENDER Amount
292 F 750
209 M 601
CUSTOMERS
CUST_ID GENDER NAME
101 M John Smith
306 F James White
ORDERS
ORDER_ID CUST_ID Amount
A1600 101 120
B8765 306 340
RESULTS
CUST_ID GENDER Amount
101 M 120
306 F 340
CUSTOMERS
CUST_ID GENDER NAME
292 F Jane Jones
209 M Lyall Green
ORDERS
ORDER_ID CUST_ID Amount
C0967 292 750
D8753 209 601
RESULTS
CUST_ID GENDER Amount
292 F 750
209 M 601
Choosing a Distribution Style
KEY
• Large FACT tables
• Large or rapidly changing
tables used in joins
• Localize columns used within
aggregations
ALL
• Have slowly changing data
• Reasonable size (i.e., few
millions but not 100’s of
millions of rows)
• No common distribution key for
frequent joins
• Typical use case – joined
dimension table without a
common distribution key
EVEN
• Tables not frequently joined or
aggregated
• Large tables without acceptable
candidate keys
Data Sorting
Goals
Physically order rows of table data based on certain column(s)
Optimize effectiveness of zone maps
Enable MERGE JOIN operations
Impact
Enables rrscans to prune blocks by leveraging zone maps
Overall reduction in block IO
Achieved with the table property SORTKEY defined over one or more columns
Optimal SORTKEY is dependent on:
Query patterns
Data profile
Business requirements
Zone Maps
SELECT COUNT(*) FROM LOGS WHERE DATE = ‘09-JUNE-2013’
MIN: 01-JUNE-2013
MAX: 20-JUNE-2013
MIN: 08-JUNE-2013
MAX: 30-JUNE-2013
MIN: 12-JUNE-2013
MAX: 20-JUNE-2013
MIN: 02-JUNE-2013
MAX: 25-JUNE-2013
MIN: 06-JUNE-2013
MAX: 12-JUNE-2013
Unsorted Table
MIN: 01-JUNE-2013
MAX: 06-JUNE-2013
MIN: 07-JUNE-2013
MAX: 12-JUNE-2013
MIN: 13-JUNE-2013
MAX: 18-JUNE-2013
MIN: 19-JUNE-2013
MAX: 24-JUNE-2013
MIN: 25-JUNE-2013
MAX: 30-JUNE-2013
Sorted By Date
READ
READ
READ
READ
READ
Single Column
• Table is sorted by 1 column
Date Region Country
2-JUN-2015 Oceania New Zealand
2-JUN-2015 Asia Singapore
2-JUN-2015 Africa Zaire
2-JUN-2015 Asia Hong Kong
3-JUN-2015 Europe Germany
3-JUN-2015 Asia Korea
[ SORTKEY ( date ) ]
Best for:
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group bys
Compound
Date Region Country
2-JUN-2015 Africa Zaire
2-JUN-2015 Asia Korea
2-JUN-2015 Asia Singapore
2-JUN-2015 Europe Germany
3-JUN-2015 Asia Hong Kong
3-JUN-2015 Asia Korea
[ SORTKEY COMPOUND ( date, region, country) ]
Best for:
• Queries that use 1st column as primary filter, then other cols
• Can speed up joins and group bys
Interleaved
• Equal weight is given to each column.
Date Region Country
2-JUN-2015 Africa Zaire
3-JUN-2015 Asia Singapore
2-JUN-2015 Asia Korea
2-JUN-2015 Europe Germany
3-JUN-2015 Asia Hong Kong
2-JUN-2015 Asia Korea
[ SORTKEY INTERLEAVED ( date, region, country) ]
Best for:
• Queries that use different columns in filter
• Queries get faster the more columns used in the filter
COMPOUND
• Most Common
• Well defined filter criteria
• Time-series data
Choosing a SORTKEY
INTERLEAVED
• Edge Cases
• Large tables (>Billion Rows)
• No common filter criteria
• Non time-series data
• Primarily as a query predicate (date, identifier, …)
• Optionally choose a column frequently used for aggregates
• Optionally choose same as distribution key column for most efficient
joins (merge join)
Compressing Data
• COPY automatically analyzes and compresses data
when loading into empty tables
• ANALYZE COMPRESSION checks existing tables and
proposes optimal compression algorithms for each
column
• Changing column encoding requires a table rebuild
Compressing Data
If you have a regular ETL process and you use temp tables
or staging tables, turn off automatic compression
• Use analyze compression to determine the right encodings
• Bake those encodings into your DML
• Use CREATE TABLE … LIKE
Compressing Data
• From the zone maps we know:
• Which block(s) contain the
range
• Which row offsets to scan
• Highly compressed sort keys:
• Many rows per block
• Large row offset
Skip compression on just the
leading column of the compound
sortkey
Best Practices: Ingestion
Amazon Redshift Loading Data Overview
AWS CloudCorporate Data center
Amazon
DynamoDB
Amazon S3
Data
Volume
Amazon Elastic
MapReduce
Amazon
RDS
Amazon
Redshift
Amazon
Glacier
logs / files
Source DBs
VPN
Connection
AWS Direct
Connect
S3 Multipart
Upload
AWS Import/
Export
EC2 or On-
Prem (using
SSH)
Parallelism is a function of load files
Each slice’s query processors are able to load one file at a time
• Streaming Decompression
• Parse
• Distribute
• Write
A single input file means
only one slice is ingesting data
Realizing only partial cluster usage as 6.25% of slices are active
2 4 6 8 10 12 141 3 5 7 9 11 13 15
Maximize Throughput with Multiple Files
Use at least as many input files as
there are slices in cluster
With 16 input files, all slices are
working so you maximize
throughput
COPY continues to scale linearly
as you add additional nodes
2 4 6 8 10 12 141 3 5 7 9 11 13 15
New feature: ALTER TABLE APPEND
ELT workloads typically “massage” or aggregate data in a
staging table and then append to production table
ALTER TABLE APPEND moves data from staging to
production table by manipulating metadata
Much faster than INSERT INTO as data is not duplicated
Best Practices: Performance
Tuning
Optimizing a database for querying
• Periodically check your table status
• Vacuum and Analyze regularly
• SVV_TABLE_INFO
• Missing statistics
• Table skew
• Uncompressed Columns
• Unsorted Data
• Check your cluster status
• WLM queuing
• Commit queuing
• Database Locks
Missing Statistics
• Amazon Redshift’s query
optimizer relies on up-to-date
statistics
• Statistics are only necessary for
data which you are accessing
• Updated stats important on:
• SORTKEY
• DISTKEY
• Columns in query predicates
Table Skew
• Unbalanced workload
• Query completes as fast as the
slowest slice completes
• Can cause skew inflight:
• Temp data fills a single
node resulting in query
failure
Table Maintenance and Status
Unsorted Table
• Sortkey is just a guide, but data
needs to actually be sorted
• VACUUM or DEEP COPY to
sort
• Scans against unsorted tables
continue to benefit from zone
maps:
• Load sequential blocks
WLM Queue
Identify short/long-running queries
and prioritize them
Define multiple queues to route
queries appropriately.
Default concurrency of 5
Leverage wlm_apex_hourly to tune
WLM based on peak concurrency
requirements
Cluster Status: Commits and WLM
Commit Queue
How long is your commit queue?
• Identify needless transactions
• Group dependent statements
within a single transaction
• Offload operational workloads
• STL_COMMIT_STATS
Cluster Status: Database Locks
• Database Locks
• Read locks, Write locks, Exclusive locks
• Reads block exclusive
• Writes block writes and exclusive
• Exclusives block everything
• Ungranted locks block subsequent lock requests
• Exposed through SVV_TRANSACTIONS
Migration Considerations
Typical ETL/ELT on legacy data warehouse
• One file per table, maybe a few if too big
• Many updates (“massage” the data)
• Every job clears the data, then loads
• Count on primary key to block double loads
• High concurrency of load jobs
• Small table(s) to control the job stream
Two questions to ask
Why you do what you do?
• Many times, users don’t know
What is the customer need?
• Many times, needs do not match current practice
• You might benefit from adding other AWS services
On Amazon Redshift
Updates are delete + insert of the row
• Deletes just mark rows for deletion
Blocks are immutable
• Minimum space used is one block per column, per slice
Commits are expensive
• 4 GB write on 8XL per node
• Mirrors WHOLE dictionary
• Cluster-wide serialized
On Amazon Redshift
• Not all aggregations created equal
• Pre-aggregation can help
• Order on group by matters
• Concurrency should be low for better throughput
• Caching layer for dashboards is recommended
• WLM parcels RAM to queries. Use multiple queues for
better control.
Workload Management (WLM)
Concurrency and memory can now be changed dynamically
You can have distinct values for load time and query time
Use wlm_apex_hourly.sql to monitor “queue pressure”
New Feature – WLM Queue Hopping
Query throughput vs. Concurrency
• Query throughput (QPM or QPH) is more representative
of end user experience than concurrency
• Several improvements over the last 6 months
• Commit improvements
• Dynamic resource management
• Query throughput doubled over the last 6 months
Resources
https://github.com/awslabs/amazon-redshift-utils
https://github.com/awslabs/amazon-redshift-monitoring
https://github.com/awslabs/amazon-redshift-udfs
https://s3.amazonaws.com/chriz-webinar/webinar.zip
Admin scripts
Collection of utilities for running diagnostics on your cluster
Admin views
Collection of utilities for managing your cluster, generating schema DDL, etc.
ColumnEncodingUtility
Gives you the ability to apply optimal column encoding to an established
schema with data already loaded
Monday, October 24, 2016
JW Marriot Austin
https://aws.amazon.com/events/devday-austin
Free, one-day developer event featuring tracks,
labs, and workshops around Serverless,
Containers, IoT, and Mobile
Q&A
If you want to learn more, register for our upcoming DevDay Austin:
Appendix: Performance
optimization examples
Use SORTKEYs to effectively prune blocks
Use SORTKEYs to effectively prune blocks
Use SORTKEYs to effectively prune blocks
Don’t compress initial SORTKEY column
Use compression encoding to reduce I/O
Choose a DISTKEY which avoids data skew
Ingest: Disable predictable compression analysis
Ingest: Load multiple files to match cluster slices
VACUUM to physically removed deleted rows
VACUUM to keep your tables sorted
Gather statistics to assist the query planner

More Related Content

What's hot

Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedDatabricks
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryBizTalk360
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data FlowMark Kromer
 
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch ServiceAmazon Web Services Japan
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in SparkDatabricks
 
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...NTT DATA Technology & Innovation
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentialsqureshihamid
 
Splunk Search Optimization
Splunk Search OptimizationSplunk Search Optimization
Splunk Search OptimizationSplunk
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsdatamantra
 
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-Yuta Imai
 
Apache Flink and Apache Hudi.pdf
Apache Flink and Apache Hudi.pdfApache Flink and Apache Hudi.pdf
Apache Flink and Apache Hudi.pdfdogma28
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptxDori Waldman
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 

What's hot (20)

Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think Vectorized
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
 
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
 
Splunk Search Optimization
Splunk Search OptimizationSplunk Search Optimization
Splunk Search Optimization
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
 
Apache Flink and Apache Hudi.pdf
Apache Flink and Apache Hudi.pdfApache Flink and Apache Hudi.pdf
Apache Flink and Apache Hudi.pdf
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptx
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 

Viewers also liked

Building prediction models with Amazon Redshift and Amazon Machine Learning -...
Building prediction models with Amazon Redshift and Amazon Machine Learning -...Building prediction models with Amazon Redshift and Amazon Machine Learning -...
Building prediction models with Amazon Redshift and Amazon Machine Learning -...Amazon Web Services
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
 
RedShift-Performance turning in few clicks
RedShift-Performance turning in few clicksRedShift-Performance turning in few clicks
RedShift-Performance turning in few clicksSadagopan Iyengar
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL ServicesAmazon Web Services
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호Amazon Web Services Korea
 
Redshift performance tuning
Redshift performance tuningRedshift performance tuning
Redshift performance tuningCarlos del Cacho
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722Amazon Web Services
 
Security Innovations in the Cloud
Security Innovations in the CloudSecurity Innovations in the Cloud
Security Innovations in the CloudAmazon Web Services
 
Deep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSDeep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSAmazon Web Services
 
AWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the CloudAWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the CloudAmazon Web Services
 
Getting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheGetting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheAmazon Web Services
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...Amazon Web Services
 
Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSAmazon Web Services
 
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAmazon Web Services
 
DevOps at Amazon: A Look at Our Tools and Processes
 DevOps at Amazon: A Look at Our Tools and Processes DevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and ProcessesAmazon Web Services
 
AWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at ScaleAWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at ScaleAmazon Web Services
 
AWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing ZoneAWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing ZoneAmazon Web Services
 
Fast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisFast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisAmazon Web Services
 

Viewers also liked (20)

Building prediction models with Amazon Redshift and Amazon Machine Learning -...
Building prediction models with Amazon Redshift and Amazon Machine Learning -...Building prediction models with Amazon Redshift and Amazon Machine Learning -...
Building prediction models with Amazon Redshift and Amazon Machine Learning -...
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
RedShift-Performance turning in few clicks
RedShift-Performance turning in few clicksRedShift-Performance turning in few clicks
RedShift-Performance turning in few clicks
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
Redshift performance tuning
Redshift performance tuningRedshift performance tuning
Redshift performance tuning
 
Benchmark slideshow
Benchmark slideshowBenchmark slideshow
Benchmark slideshow
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722
 
Security Innovations in the Cloud
Security Innovations in the CloudSecurity Innovations in the Cloud
Security Innovations in the Cloud
 
Getting Started on AWS
Getting Started on AWS Getting Started on AWS
Getting Started on AWS
 
Deep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSDeep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECS
 
AWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the CloudAWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
 
Getting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheGetting started with Amazon ElastiCache
Getting started with Amazon ElastiCache
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
 
Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWS
 
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
 
DevOps at Amazon: A Look at Our Tools and Processes
 DevOps at Amazon: A Look at Our Tools and Processes DevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and Processes
 
AWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at ScaleAWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at Scale
 
AWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing ZoneAWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing Zone
 
Fast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisFast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for Redis
 

Similar to Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series

Introdução ao Data Warehouse Amazon Redshift
Introdução ao Data Warehouse Amazon RedshiftIntrodução ao Data Warehouse Amazon Redshift
Introdução ao Data Warehouse Amazon RedshiftAmazon Web Services LATAM
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftData Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftAmazon Web Services
 
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStoreSesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStoreMariaDB plc
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Julien SIMON
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceAmazon Web Services
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseAmazon Web Services
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with RedshiftAmazon Web Services
 
Getting started with amazon redshift - Toronto
Getting started with amazon redshift - TorontoGetting started with amazon redshift - Toronto
Getting started with amazon redshift - TorontoAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
AWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell NashAWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell NashAmazon Web Services Korea
 
What You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsWhat You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsDell World
 
Deploying Your Data Warehouse on AWS
Deploying Your Data Warehouse on AWSDeploying Your Data Warehouse on AWS
Deploying Your Data Warehouse on AWSAmazon Web Services
 

Similar to Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series (20)

Introdução ao Data Warehouse Amazon Redshift
Introdução ao Data Warehouse Amazon RedshiftIntrodução ao Data Warehouse Amazon Redshift
Introdução ao Data Warehouse Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftData Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
 
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStoreSesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performance
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 
Getting started with amazon redshift - Toronto
Getting started with amazon redshift - TorontoGetting started with amazon redshift - Toronto
Getting started with amazon redshift - Toronto
 
DBMS Chapter-3.ppsx
DBMS Chapter-3.ppsxDBMS Chapter-3.ppsx
DBMS Chapter-3.ppsx
 
SQL
SQLSQL
SQL
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell NashAWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell Nash
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
What You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsWhat You Need To Know About The Top Database Trends
What You Need To Know About The Top Database Trends
 
Deploying Your Data Warehouse on AWS
Deploying Your Data Warehouse on AWSDeploying Your Data Warehouse on AWS
Deploying Your Data Warehouse on AWS
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pavan Pothukuchi, Principal Product Manager , AWS September 20, 2016 Deep Dive: Amazon Redshift for Big Data Analytics
  • 2. Agenda • Service Overview • Best Practices • Schema / Table Design • Data Ingestion • Database Tuning • Migration • Examples
  • 4. Relational data warehouse Massively parallel; petabyte scale Fully managed HDD and SSD platforms $1,000/TB/year; starts at $0.25/hour Amazon Redshift a lot faster a lot simpler a lot cheaper
  • 6. Amazon Redshift system architecture Leader node • SQL endpoint • Stores metadata • Coordinates query execution Compute nodes • Local, columnar storage • Execute queries in parallel • Load, backup, restore via Amazon S3; load from Amazon DynamoDB, Amazon EMR, or SSH Two hardware platforms • Optimized for data processing • DS2: HDD; scale from 2TB to 2PB • DC1: SSD; scale from 160GB to 326TB 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 7. A deeper look at compute node architecture Each node contains multiple slices • DS2 – 2 slices on XL, 16 on 8XL • DC1 – 2 slices on L, 32 on 8XL A slice can be thought as a “virtual compute node” • Unit of data partitioning • Parallel query processing Facts about slices: • Each compute node has either 2, 16, or 32 slices • Table rows are distributed to slices • A slice processes only its own data Leader Node
  • 8. Amazon Redshift dramatically reduces I/O Data compression Zone maps ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 • Calculating SUM(Amount) with row storage: – Need to read everything – Unnecessary I/O ID Age State Amount
  • 9. Amazon Redshift dramatically reduces I/O Data compression Zone maps ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 • Calculating SUM(Amount) with column storage: – Only scan the necessary blocks ID Age State Amount
  • 10. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps • Columnar compression – Effective due to like data – Reduces storage requirements – Reduces I/O ID Age State Amount analyze compression orders; Table | Column | Encoding --------+-------------+---------- orders | id | mostly32 orders | age | mostly32 orders | state | lzo orders | amount | mostly32
  • 11. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps • In-memory block metadata • Contains per-block MIN and MAX value • Effectively prunes blocks which don’t contain data for a given query • Minimize unnecessary I/O ID Age State Amount
  • 13. Data Distribution • Distribution style is a table property which dictates how that table’s data is distributed throughout the cluster: • KEY: Value is hashed, same value goes to same location (slice) • ALL: Full table data goes to first slice of every node • EVEN: Round robin • Goals: • Distribute data evenly for parallel processing • Minimize data movement during query processing KEY ALL Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 EVEN
  • 14. ID Gender Name 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M James White 306 F Lisa Green 2 3 4 ID Gender Name 101 M John Smith 306 F Lisa Green ID Gender Name 292 F Jane Jones 209 M James White ID Gender Name 139 M Peter Black 164 M Brian Snail ID Gender Name 446 M Pat Partridge 658 F Sarah Cyan Round Robin DISTSTYLE EVEN
  • 15. ID Gender Name 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M James White 306 F Lisa Green Hash Function ID Gender Name 101 M John Smith 306 F Lisa Green ID Gender Name 292 F Jane Jones 209 M James White ID Gender Name 139 M Peter Black 164 M Brian Snail ID Gender Name 446 M Pat Partridge 658 F Sarah Cyan DISTSTYLE KEY
  • 16. ID Gender Name 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M James White 306 F Lisa Green Hash Function ID Gender Name 101 M John Smith 139 M Peter Black 446 M Pat Partridge 164 M Brian Snail 209 M James White ID Gender Name 292 F Jane Jones 658 F Sarah Cyan 306 F Lisa Green DISTSTYLE KEY
  • 17. ID Gender Name 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M James White 306 F Lisa Green 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M Lisa Green 306 F James White 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M Lisa Green 306 F James White 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M Lisa Green 306 F James White 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M Lisa Green 306 F James White ALL DISTSTYLE ALL
  • 18. CUSTOMERS CUST_ID GENDER NAME 101 M John Smith 306 F James White ORDERS ORDER_ID CUST_ID Amount A1600 101 120 B8765 306 340 RESULTS CUST_ID GENDER Amount 101 M 120 306 F 340 CUSTOMERS CUST_ID GENDER NAME 292 F Jane Jones 209 M Lyall Green ORDERS ORDER_ID CUST_ID Amount C0967 292 750 D8753 209 601 RESULTS CUST_ID GENDER Amount 292 F 750 209 M 601
  • 19. CUSTOMERS CUST_ID GENDER NAME 101 M John Smith 306 F James White ORDERS ORDER_ID CUST_ID Amount A1600 101 120 B8765 306 340 RESULTS CUST_ID GENDER Amount 101 M 120 306 F 340 CUSTOMERS CUST_ID GENDER NAME 292 F Jane Jones 209 M Lyall Green ORDERS ORDER_ID CUST_ID Amount C0967 292 750 D8753 209 601 RESULTS CUST_ID GENDER Amount 292 F 750 209 M 601
  • 20. Choosing a Distribution Style KEY • Large FACT tables • Large or rapidly changing tables used in joins • Localize columns used within aggregations ALL • Have slowly changing data • Reasonable size (i.e., few millions but not 100’s of millions of rows) • No common distribution key for frequent joins • Typical use case – joined dimension table without a common distribution key EVEN • Tables not frequently joined or aggregated • Large tables without acceptable candidate keys
  • 21. Data Sorting Goals Physically order rows of table data based on certain column(s) Optimize effectiveness of zone maps Enable MERGE JOIN operations Impact Enables rrscans to prune blocks by leveraging zone maps Overall reduction in block IO Achieved with the table property SORTKEY defined over one or more columns Optimal SORTKEY is dependent on: Query patterns Data profile Business requirements
  • 22. Zone Maps SELECT COUNT(*) FROM LOGS WHERE DATE = ‘09-JUNE-2013’ MIN: 01-JUNE-2013 MAX: 20-JUNE-2013 MIN: 08-JUNE-2013 MAX: 30-JUNE-2013 MIN: 12-JUNE-2013 MAX: 20-JUNE-2013 MIN: 02-JUNE-2013 MAX: 25-JUNE-2013 MIN: 06-JUNE-2013 MAX: 12-JUNE-2013 Unsorted Table MIN: 01-JUNE-2013 MAX: 06-JUNE-2013 MIN: 07-JUNE-2013 MAX: 12-JUNE-2013 MIN: 13-JUNE-2013 MAX: 18-JUNE-2013 MIN: 19-JUNE-2013 MAX: 24-JUNE-2013 MIN: 25-JUNE-2013 MAX: 30-JUNE-2013 Sorted By Date READ READ READ READ READ
  • 23. Single Column • Table is sorted by 1 column Date Region Country 2-JUN-2015 Oceania New Zealand 2-JUN-2015 Asia Singapore 2-JUN-2015 Africa Zaire 2-JUN-2015 Asia Hong Kong 3-JUN-2015 Europe Germany 3-JUN-2015 Asia Korea [ SORTKEY ( date ) ] Best for: • Queries that use 1st column (i.e. date) as primary filter • Can speed up joins and group bys
  • 24. Compound Date Region Country 2-JUN-2015 Africa Zaire 2-JUN-2015 Asia Korea 2-JUN-2015 Asia Singapore 2-JUN-2015 Europe Germany 3-JUN-2015 Asia Hong Kong 3-JUN-2015 Asia Korea [ SORTKEY COMPOUND ( date, region, country) ] Best for: • Queries that use 1st column as primary filter, then other cols • Can speed up joins and group bys
  • 25. Interleaved • Equal weight is given to each column. Date Region Country 2-JUN-2015 Africa Zaire 3-JUN-2015 Asia Singapore 2-JUN-2015 Asia Korea 2-JUN-2015 Europe Germany 3-JUN-2015 Asia Hong Kong 2-JUN-2015 Asia Korea [ SORTKEY INTERLEAVED ( date, region, country) ] Best for: • Queries that use different columns in filter • Queries get faster the more columns used in the filter
  • 26. COMPOUND • Most Common • Well defined filter criteria • Time-series data Choosing a SORTKEY INTERLEAVED • Edge Cases • Large tables (>Billion Rows) • No common filter criteria • Non time-series data • Primarily as a query predicate (date, identifier, …) • Optionally choose a column frequently used for aggregates • Optionally choose same as distribution key column for most efficient joins (merge join)
  • 27. Compressing Data • COPY automatically analyzes and compresses data when loading into empty tables • ANALYZE COMPRESSION checks existing tables and proposes optimal compression algorithms for each column • Changing column encoding requires a table rebuild
  • 28. Compressing Data If you have a regular ETL process and you use temp tables or staging tables, turn off automatic compression • Use analyze compression to determine the right encodings • Bake those encodings into your DML • Use CREATE TABLE … LIKE
  • 29. Compressing Data • From the zone maps we know: • Which block(s) contain the range • Which row offsets to scan • Highly compressed sort keys: • Many rows per block • Large row offset Skip compression on just the leading column of the compound sortkey
  • 31. Amazon Redshift Loading Data Overview AWS CloudCorporate Data center Amazon DynamoDB Amazon S3 Data Volume Amazon Elastic MapReduce Amazon RDS Amazon Redshift Amazon Glacier logs / files Source DBs VPN Connection AWS Direct Connect S3 Multipart Upload AWS Import/ Export EC2 or On- Prem (using SSH)
  • 32. Parallelism is a function of load files Each slice’s query processors are able to load one file at a time • Streaming Decompression • Parse • Distribute • Write A single input file means only one slice is ingesting data Realizing only partial cluster usage as 6.25% of slices are active 2 4 6 8 10 12 141 3 5 7 9 11 13 15
  • 33. Maximize Throughput with Multiple Files Use at least as many input files as there are slices in cluster With 16 input files, all slices are working so you maximize throughput COPY continues to scale linearly as you add additional nodes 2 4 6 8 10 12 141 3 5 7 9 11 13 15
  • 34. New feature: ALTER TABLE APPEND ELT workloads typically “massage” or aggregate data in a staging table and then append to production table ALTER TABLE APPEND moves data from staging to production table by manipulating metadata Much faster than INSERT INTO as data is not duplicated
  • 36. Optimizing a database for querying • Periodically check your table status • Vacuum and Analyze regularly • SVV_TABLE_INFO • Missing statistics • Table skew • Uncompressed Columns • Unsorted Data • Check your cluster status • WLM queuing • Commit queuing • Database Locks
  • 37. Missing Statistics • Amazon Redshift’s query optimizer relies on up-to-date statistics • Statistics are only necessary for data which you are accessing • Updated stats important on: • SORTKEY • DISTKEY • Columns in query predicates
  • 38. Table Skew • Unbalanced workload • Query completes as fast as the slowest slice completes • Can cause skew inflight: • Temp data fills a single node resulting in query failure Table Maintenance and Status Unsorted Table • Sortkey is just a guide, but data needs to actually be sorted • VACUUM or DEEP COPY to sort • Scans against unsorted tables continue to benefit from zone maps: • Load sequential blocks
  • 39. WLM Queue Identify short/long-running queries and prioritize them Define multiple queues to route queries appropriately. Default concurrency of 5 Leverage wlm_apex_hourly to tune WLM based on peak concurrency requirements Cluster Status: Commits and WLM Commit Queue How long is your commit queue? • Identify needless transactions • Group dependent statements within a single transaction • Offload operational workloads • STL_COMMIT_STATS
  • 40. Cluster Status: Database Locks • Database Locks • Read locks, Write locks, Exclusive locks • Reads block exclusive • Writes block writes and exclusive • Exclusives block everything • Ungranted locks block subsequent lock requests • Exposed through SVV_TRANSACTIONS
  • 42. Typical ETL/ELT on legacy data warehouse • One file per table, maybe a few if too big • Many updates (“massage” the data) • Every job clears the data, then loads • Count on primary key to block double loads • High concurrency of load jobs • Small table(s) to control the job stream
  • 43. Two questions to ask Why you do what you do? • Many times, users don’t know What is the customer need? • Many times, needs do not match current practice • You might benefit from adding other AWS services
  • 44. On Amazon Redshift Updates are delete + insert of the row • Deletes just mark rows for deletion Blocks are immutable • Minimum space used is one block per column, per slice Commits are expensive • 4 GB write on 8XL per node • Mirrors WHOLE dictionary • Cluster-wide serialized
  • 45. On Amazon Redshift • Not all aggregations created equal • Pre-aggregation can help • Order on group by matters • Concurrency should be low for better throughput • Caching layer for dashboards is recommended • WLM parcels RAM to queries. Use multiple queues for better control.
  • 46. Workload Management (WLM) Concurrency and memory can now be changed dynamically You can have distinct values for load time and query time Use wlm_apex_hourly.sql to monitor “queue pressure”
  • 47. New Feature – WLM Queue Hopping
  • 48. Query throughput vs. Concurrency • Query throughput (QPM or QPH) is more representative of end user experience than concurrency • Several improvements over the last 6 months • Commit improvements • Dynamic resource management • Query throughput doubled over the last 6 months
  • 49. Resources https://github.com/awslabs/amazon-redshift-utils https://github.com/awslabs/amazon-redshift-monitoring https://github.com/awslabs/amazon-redshift-udfs https://s3.amazonaws.com/chriz-webinar/webinar.zip Admin scripts Collection of utilities for running diagnostics on your cluster Admin views Collection of utilities for managing your cluster, generating schema DDL, etc. ColumnEncodingUtility Gives you the ability to apply optimal column encoding to an established schema with data already loaded
  • 50. Monday, October 24, 2016 JW Marriot Austin https://aws.amazon.com/events/devday-austin Free, one-day developer event featuring tracks, labs, and workshops around Serverless, Containers, IoT, and Mobile Q&A If you want to learn more, register for our upcoming DevDay Austin:
  • 52. Use SORTKEYs to effectively prune blocks
  • 53. Use SORTKEYs to effectively prune blocks
  • 54. Use SORTKEYs to effectively prune blocks
  • 55. Don’t compress initial SORTKEY column
  • 56. Use compression encoding to reduce I/O
  • 57. Choose a DISTKEY which avoids data skew
  • 58. Ingest: Disable predictable compression analysis
  • 59. Ingest: Load multiple files to match cluster slices
  • 60. VACUUM to physically removed deleted rows
  • 61. VACUUM to keep your tables sorted
  • 62. Gather statistics to assist the query planner