Weitere ähnliche Inhalte Kürzlich hochgeladen (20) Loading and Analyzing Behavioral Data in Amazon Redshift1. Loading and Analyzing Behavioral
Data in Amazon Redshift
PresentedbySegment,AWS&XOGroup Inc.
March3,2015
6. Amazon Redshift is Easy to Use
• Provisioninminutes
• Monitorqueryperformance
• Pointandclickresize
• Builtinsecurity
• Automaticbackups
7. Amazon Redshift Architecture
• LeaderNode
– SQLendpoint
– Storesmetadata
– Coordinatesqueryexecution
• ComputeNodes
– Local,columnarstorage
– Executequeriesinparallel
– Load,backup,restoreviaAmazonS3
– ParallelloadfromAmazonDynamoDB,Amazon
EMR,AmazonS3,HDFS/SSH
• Twohardwareplatforms
– Optimizedfordataprocessing
– DW1:HDD;scalefrom2TBto1.6PB
– DW2:SSD;scalefrom160GBto256TB
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
8. • Columnstorage
• Datacompression
• Zonemaps
• Direct-attachedstorage • Withrowstorageyoudo
unnecessaryI/O
• Togettotalamount,youhavetoread
everything
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Amazon Redshift Dramatically Reduces I/O
10. analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
• Columnstorage
• Datacompression
• Zonemaps
• Direct-attachedstorage
• COPYcompressesautomatically
• Youcananalyzeandoverride
• Moreperformance,lesscost
Amazon Redshift Dramatically Reduces I/O
11. • Columnstorage
• Datacompression
• Zonemaps
• Direct-attachedstorage
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
• Tracktheminimumandmaximum
valueforeachblock
• Skipoverblocksthatdon’tcontain
relevantdata
Amazon Redshift Dramatically Reduces I/O
12. • Columnstorage
• Datacompression
• Zonemaps
• Direct-attachedstorage
128 GB RAM
16 cores
16 TB disk
DW.HS1.8XL:
• >2GB/sscanrate
• Optimizedfordataprocessing
• Highdiskdensity
16 GB RAM
2 cores
2 TB disk
DW.HS1.XL:
Amazon Redshift Dramatically Reduces I/O
13. • Query
• Load
• Backup/Restore
• Resize
Amazon Redshift Parallelizes and Distributes Everything
14. Amazon S3/DynamoDB
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
• Query
• Load
• Backup/Restore
• Resize
• ParallelloadfromAmazonDynamoDB,AmazonEMR,
AmazonS3,HDFS/SSH
• Kinesisintegration
• Dataautomaticallydistributedandsortedaccordingto
DDL
• Scaleslinearlywithnumber
ofnodes
Amazon Redshift Parallelizes and Distributes Everything
15. Amazon S3
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
• Query
• Load
• Backup/Restore
• Resize
• BackupstoAmazonS3areautomatic,continuous
andincremental
• Backupyourclustertoasecondregion
• Configurablesystemsnapshotretentionperiod;take
usersnapshotson-demand
• Streamingrestoresenableyoutoresumequerying
faster
Amazon Redshift Parallelizes and Distributes Everything
16. SQL Clients/BI Tools
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Leader
Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Leader
Node
• Query
• Load
• Backup/Restore
• Resize
• Add/removenodesorchangenodetypewhile
remainingonline
• Provisionanewclusterandcopydatainparallelfrom
nodetonode
• OnlychargedforsourceclusteruntilSQLendpoint
hasautomaticallybeenswitchedoverviaDNS
Amazon Redshift Parallelizes and Distributes Everything
17. • SSLtosecuredataintransit
• Encryptiontosecuredataatrest
– AES-256;hardwareaccelerated
– AllblocksondisksandinAmazonS3
encrypted
– HSM/CloudHSM
• Nodirectaccesstocomputenodes
• AmazonVPCsupport
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Amazon S3 / Amazon DynamoDB
Customer VPC
Internal
Security
Group
JDBC/ODBC
Leader
Node
Compute
Node
Compute
Node
Compute
Node
Amazon Redshift Has Security Built In
30. XO Group Inc. + Segment
Individualproductteams
wantedisolatedaccessto
theirownanalytics.
Segment+Mixpanel+
Customer.io+Optimizely+
Uservoice
31. XO Group Inc. + Segment
Stillneededasolutionto
connectSegmentdatafrom
multipleproductsand
platformsintoasingleview.
SegmentSQL+
ModeAnalytics
39. More considerate “Share” options
SMSadressfrom
desktopormobileweb
browser.
Oneclickemailyourselfthe
detailsofavenue.