Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse service that costs less than $1,000 per terabyte per year—less than a tenth the price of most traditional data warehousing solutions. In this session, you get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Finally, we announce new features that we've been working on over the past few months.
3. Amazon Redshift dramatically reduces I/O
ID
Amount
20
CA
500
345
25
WA
250
678
Data compression
State
123
•
Age
40
FL
125
37
WA
375
•
Zone maps
957
•
Direct-attached storage
•
With row storage you do
unnecessary I/O
•
To get total amount, you have
to read everything
4. Amazon Redshift dramatically reduces I/O
ID
Amount
20
CA
500
345
25
WA
250
678
Data compression
State
123
•
Age
40
FL
125
37
WA
375
•
Zone maps
957
•
Direct-attached storage
•
With column storage, you
only read the data you need
5. Amazon Redshift dramatically reduces I/O
analyze compression listing;
•
Data compression
•
Zone maps
•
Direct-attached storage
Table |
Column
| Encoding
---------+----------------+---------listing | listid
| delta
listing | sellerid
| delta32k
listing | eventid
| delta32k
listing | dateid
| bytedict
listing | numtickets
| bytedict
listing | priceperticket | delta32k
listing | totalprice
| mostly32
listing | listtime
| raw
•
COPY compresses
automatically on load
•
You can analyze and override
•
More performance, less cost
Slides not intended for redistribution.
6. Amazon Redshift dramatically reduces I/O
10
324
•
Data compression
375
623
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637
•
Zone maps
•
Direct-attached storage
637 | 712 | 809 …
959
… | 834 | 921 | 959
•
Track the minimum and
maximum value for each block
•
Skip over blocks that don’t
contain relevant data
7. Amazon Redshift dramatically reduces I/O
DW.HS1.XL:
•
Data compression
•
Zone maps
•
Direct-attached storage
DW.HS1.8XL:
•
> 2 GB/s scan rate
•
Optimized for data processing
•
High disk density
8. Amazon Redshift architecture
• Leader Node
–
–
–
JDBC/ODBC
SQL endpoint
Stores metadata
Coordinates query execution
• Compute Nodes
–
–
–
–
10 GigE
(HPC)
Local, columnar storage
Execute queries in parallel
Load, backup, restore via Amazon S3
Parallel load from Amazon DynamoDB
• Single node version available
Ingestion
Backup
Restore
10. Amazon Redshift parallelizes and distributes everything
•
Load
•
Backup/Restore
•
•
Load in parallel from Amazon S3
or Amazon DynamoDB
•
Data automatically distributed and
sorted according to DDL
•
Scales linearly with number of
nodes
Resize
11. Amazon Redshift parallelizes and distributes everything
•
Load
•
Backup/Restore
•
Backups to Amazon S3 are automatic,
continuous and incremental
•
Resize
•
Configurable system snapshot
retention period
•
Take user snapshots on-demand
•
Streaming restores enable you to
resume querying faster
12. Amazon Redshift parallelizes and distributes everything
•
Load
•
Backup/Restore
•
Resize
•
Resize while remaining online
•
Provision a new cluster in the
background
•
Copy data in parallel from node to
node
•
Only charged for source cluster
13. Amazon Redshift parallelizes and distributes everything
•
Load
•
Backup/Restore
•
•
Automatic SQL endpoint switchover
via DNS
•
Decommission the source cluster
•
Simple operation via Console or API
Resize
14. Amazon Redshift lets you start small and grow big
Extra Large Node (DW.HS1.XL)
3 spindles, 2 TB, 16 GB RAM, 2 cores
Eight Extra Large Node (DW.HS1.8XL)
24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE
Single Node (2 TB)
Cluster 2-100 Nodes (32 TB – 1.6 PB)
Cluster 2-32 Nodes (4 TB – 64 TB)
Note: Nodes not to scale
15. Amazon Redshift is priced to let you analyze all your data
Price Per Hour for
HS1.XL Single Node
Effective Hourly
Price per TB
Effective Annual
Price per TB
On-Demand
$ 0.850
$ 0.425
$ 3,723
1 Year Reservation
$ 0.500
$ 0.250
$ 2,190
3 Year Reservation
$ 0.228
$ 0.114
$
999
Simple Pricing
Number of Nodes x Cost per Hour
No charge for Leader Node
No upfront costs
Pay as you go
16. Amazon Redshift has security built in
Customer VPC
• SSL to secure data in transit
JDBC/ODBC
• Encryption to secure data at rest
–
–
Internal
VPC
AES-256; hardware accelerated
All blocks on disk and in Amazon S3
encrypted
10 GigE
(HPC)
• No direct access to compute
nodes
• Amazon VPC support
Ingestion
Backup
Restore
17. Amazon Redshift automatically manages data
replication and hardware failures
•
Replication within the cluster and backup to Amazon S3 to maintain multiple
copies of data at all times
•
Backups to Amazon S3 are continuous, automatic, and incremental
–
Designed for eleven nines of durability
•
Continuous monitoring and automated recovery from failures of drives and
nodes
•
Able to restore snapshots to any Availability Zone within a region
34. Redshift Customers at re:Invent
BDT 101: Big Data ‘State of the Union’
Earlier today
DAT 305: Getting Maximum Performance from Amazon Redshift
Wednesday 11/13: 3pm in Murano 3303
35. Redshift Customers at re:Invent
DAT 306: How Amazon.com is Leveraging Amazon Redshift
Thursday 11/14: 3pm in Murano 3303
DAT 205: Amazon Redshift in Action: Enterprise, Big Data, SaaS
Friday 11/15: 9am in Lido 3006
36. Please give us your feedback on this
presentation
DAT 103
As a thank you, we will select prize
winners daily for completed surveys!