SlideShare ist ein Scribd-Unternehmen logo
1 von 50
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Angelo Carvalho, Solutions Architect, AWS
22 de novembro de 2016
Introdução ao data warehouse
Amazon Redshift
Agenda
• Introdução
• Benefícios
• Começando
• Criando um cluster
• Modelo de dados
• Carregando dados
• Consultando
• Mais Recursos
AnalyzeStore
Amazon
Glacier
Amazon S3
Amazon
DynamoDB
Amazon RDS,
Amazon Aurora
AWS big data portfolio
AWS Data Pipeline
Amazon
CloudSearch
Amazon EMR Amazon EC2
Amazon
Redshift
Amazon
Machine
Learning
Amazon
Elasticsearch
Service
AWS Database
Migration Service
Amazon
QuickSight
Amazon
Kinesis
Firehose
AWS Import/Export
AWS Direct
Connect
Collect
Amazon Kinesis
Streams
Relational data warehouse
Massively parallel; petabyte scale
Fully managed
HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
The Amazon Redshift view of data warehousing
10x cheaper
Easy to provision
Higher DBA productivity
10x faster
No programming
Easily leverage BI tools,
Hadoop, machine learning,
streaming
Analysis inline with process
flows
Pay as you go, grow as you
need
Managed availability and
disaster recovery
Enterprise Big Data SaaS
The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical
representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any
vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
Forrester Wave™ Enterprise Data Warehouse
Q4‘15
Selected Amazon Redshift customers
Amazon Redshift architecture
Leader Node
Simple SQL endpoint
Stores metadata
Optimizes query plan
Coordinates query execution
Compute Nodes
Local columnar storage
Parallel/distributed execution of all queries, loads,
backups, restores, resizes
Start at just $0.25/hour, grow to 2 PB (compressed)
DC1: SSD; scale from 160 GB to 326 TB
DS2: HDD; scale from 2 TB to 2 PB
Ingestion/Backup
Backup
Restore
JDBC/ODBC
10 GigE
(HPC)
Benefit #1: Amazon Redshift is fast
Dramatically less I/O
Column storage
Data compression
Zone maps
Direct-attached storage
Large data block sizes
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
Benefit #1: Amazon Redshift is fast
Parallel and distributed
Query
Load
Export
Backup
Restore
Resize
Benefit #1: Amazon Redshift is fast
Hardware optimized for I/O intensive workloads, 4 GB/sec/node
Enhanced networking, over 1 million packets/sec/node
Choice of storage type, instance size
Regular cadence of autopatched improvements
Benefit #1: Amazon Redshift is fast
New Dense Storage (HDD) instance type
Improved memory 2x, compute 2x, disk throughput 1.5x
Cost: Same as our prior generation!
Performance improvement: 50%
Enhanced I/O and commit improvements (Jan ‘16)
Reduce amount of time to commit data
Performance improvement: 35%
Benefit #2: Amazon Redshift is inexpensive
DS2 (HDD)
Price Per Hour for
DW1.XL Single Node
Effective Annual
Price per TB Compressed
On-Demand $ 0.850 $ 3,725
1 Year Reservation $ 0.500 $ 2,190
3 Year Reservation $ 0.228 $ 999
DC1 (SSD)
Price Per Hour for
DW2.L Single Node
Effective Annual
Price per TB Compressed
On-Demand $ 0.250 $ 13,690
1 Year Reservation $ 0.161 $ 8,795
3 Year Reservation $ 0.100 $ 5,500
Pricing is simple
Number of nodes x price/hour
No charge for leader node
No upfront costs
Pay as you go
Benefit #3: Amazon Redshift is fully managed
Continuous/incremental backups
Multiple copies within cluster
Continuous and incremental backups
to S3
Continuous and incremental backups
across regions
Streaming restore
Amazon S3
Amazon S3
Region 1
Region 2
Benefit #3: Amazon Redshift is fully managed
Amazon S3
Amazon S3
Region 1
Region 2
Fault tolerance
Disk failures
Node failures
Network failures
Availability Zone/region level disasters
Benefit #4: Security is built-in
• Load encrypted from S3
• SSL to secure data in transit
• ECDHE perfect forward security
• Amazon VPC for network isolation
• Encryption to secure data at rest
• All blocks on disks and in S3 encrypted
• Block key, cluster key, master key (AES-256)
• On-premises HSM & AWS CloudHSM support
• Audit logging and AWS CloudTrail integration
• SOC 1/2/3, PCI-DSS, FedRAMP, BAA
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal
VPC
JDBC/ODBC
Benefit #5: We innovate quickly
Well over 100 new features added since launch
Release every two weeks
Automatic patching
Service Launch (2/14)
PDX (4/2)
Temp Credentials (4/11)
DUB (4/25)
SOC1/2/3 (5/8)
Unload Encrypted Files
NRT (6/5)
JDBC Fetch Size (6/27)
Unload logs (7/5)
SHA1 Builtin (7/15)
4 byte UTF-8 (7/18)
Sharing snapshots (7/18)
Statement Timeout (7/22)
Timezone, Epoch, Autoformat (7/25)
WLM Timeout/Wildcards (8/1)
CRC32 Builtin, CSV, Restore Progress
(8/9)
Resource Level IAM (8/9)
PCI (8/22)
UTF-8 Substitution (8/29)
JSON, Regex, Cursors (9/10)
Split_part, Audit tables (10/3)
SIN/SYD (10/8)
HSM Support (11/11)
Kinesis EMR/HDFS/SSH copy,
Distributed Tables, Audit
Logging/CloudTrail, Concurrency, Resize
Perf., Approximate Count Distinct, SNS
Alerts, Cross Region Backup (11/13)
Distributed Tables, Single Node Cursor
Support, Maximum Connections to 500
(12/13)
EIP Support for VPC Clusters (12/28)
New query monitoring system tables and
diststyle all (1/13)
Redshift on DW2 (SSD) Nodes (1/23)
Compression for COPY from SSH, Fetch
size support for single node clusters, new
system tables with commit stats,
row_number(), strotol() and query
termination (2/13)
Resize progress indicator & Cluster
Version (3/21)
Regex_Substr, COPY from JSON (3/25)
50 slots, COPY from EMR, ECDHE
ciphers (4/22)
3 new regex features, Unload to single
file, FedRAMP(5/6)
Rename Cluster (6/2)
Copy from multiple regions,
percentile_cont, percentile_disc (6/30)
Free Trial (7/1)
pg_last_unload_count (9/15)
AES-128 S3 encryption (9/29)
UTF-16 support (9/29)
Benefit #6: Redshift is powerful
• Approximate functions
• User defined functions
• Machine Learning
• Data Science
Benefit #7: Amazon Redshift has a large ecosystem
Data Integration Systems IntegratorsBusiness Intelligence
Benefit #8: Service oriented architecture
DynamoDB
EMR
S3
EC2/SSH
RDS/Aurora
Amazon
Redshift
Amazon Kinesis
Amazon ML
Data Pipeline
CloudSearch
Amazon
Mobile
Analytics
Começando…
Criando um cluster
Modelo de dados
SELECT COUNT(*) FROM LOGS WHERE DATE = ‘09-JUNE-2013’
MIN: 01-JUNE-2013
MAX: 20-JUNE-2013
MIN: 08-JUNE-2013
MAX: 30-JUNE-2013
MIN: 12-JUNE-2013
MAX: 20-JUNE-2013
MIN: 02-JUNE-2013
MAX: 25-JUNE-2013
Unsorted Table
MIN: 01-JUNE-2013
MAX: 06-JUNE-2013
MIN: 07-JUNE-2013
MAX: 12-JUNE-2013
MIN: 13-JUNE-2013
MAX: 18-JUNE-2013
MIN: 19-JUNE-2013
MAX: 24-JUNE-2013
Sorted By Date
Zone Maps
• Single Column
• Compound
• Interleaved
Single Column
• Table is sorted by 1 column
Date Region Country
2-JUN-2015 Oceania New Zealand
2-JUN-2015 Asia Singapore
2-JUN-2015 Africa Zaire
2-JUN-2015 Asia Hong Kong
3-JUN-2015 Europe Germany
3-JUN-2015 Asia Korea
[ SORTKEY ( date ) ]
• Best for:
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group bys
• Quickest to VACUUM
Compound
• Table is sorted by 1st column , then 2nd column etc.
Date Region Country
2-JUN-2015 Africa Zaire
2-JUN-2015 Asia Korea
2-JUN-2015 Asia Singapore
2-JUN-2015 Europe Germany
3-JUN-2015 Asia Hong Kong
3-JUN-2015 Asia Korea
[ SORTKEY COMPOUND ( date, region, country) ]
• Best for:
• Queries that use 1st column as primary filter, then other cols
• Can speed up joins and group bys
• Slower to VACUUM
Interleaved
• Equal weight is given to each column.
Date Region Country
2-JUN-2015 Africa Zaire
3-JUN-2015 Asia Singapore
2-JUN-2015 Asia Korea
2-JUN-2015 Europe Germany
3-JUN-2015 Asia Hong Kong
2-JUN-2015 Asia Korea
[ SORTKEY INTERLEAVED ( date, region, country) ]
• Best for:
• Queries that use different columns in filter
• Queries get faster the more columns used in the filter
• Slowest to VACUUM
• KEY
• ALL
• EVEN
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M James White
306 F Lisa Green
2
3
4
ID Gender Name
101 M John Smith
306 F Lisa Green
ID Gender Name
292 F Jane Jones
209 M James White
ID Gender Name
139 M Peter Black
164 M Brian Snail
ID Gender Name
446 M Pat Partridge
658 F Sarah Cyan
Even:
Round
Robin
DISTSTYLE EVEN
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M James White
306 F Lisa Green
Key: Hash
Function
ID Gender Name
101 M John Smith
306 F Lisa Green
ID Gender Name
292 F Jane Jones
209 M James White
ID Gender Name
139 M Peter Black
164 M Brian Snail
ID Gender Name
446 M Pat Partridge
658 F Sarah Cyan
DISTSTYLE KEY
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M James White
306 F Lisa Green
Key: Hash
Function
ID Gender Name
101 M John Smith
139 M Peter Black
446 M Pat Partridge
164 M Brian Snail
209 M James White
ID Gender Name
292 F Jane Jones
658 F Sarah Cyan
306 F Lisa Green
DISTSTYLE KEY
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M James White
306 F Lisa Green
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M Lisa Green
306 F James White
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M Lisa Green
306 F James White
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M Lisa Green
306 F James White
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M Lisa Green
306 F James White
ALL
DISTSTYLE ALL
• KEY
• Large Fact tables
• Large dimension tables
• ALL
• Medium dimension tables (1K – 2M)
• EVEN
• Tables with no joins or group by
• Small dimension tables (<1000)
Carregando dados
AWS CloudCorporate Data center
Amazon S3
Amazon
Redshift
Flat files
Data loading options
AWS CloudCorporate Data center
ETL
Source DBs
Amazon
Redshift
Amazon
Redshift
Data loading options
AWS Cloud
Amazon
Redshift
Amazon
Kinesis
Data loading options
Use the COPY command
Each slice can load one file at a
time
A single input file means only one
slice is ingesting data
Instead of 100MB/s, you’re only
getting 6.25MB/s
Use multiple input files to maximize
throughput
Use the COPY command
You need at least as many input
files as you have slices
With 16 input files, all slices are
working so you maximize
throughput
Get 100MB/s per node; scale
linearly as you add nodes
Use multiple input files to maximize
throughput
Consultando
JDBC/ODBC
Amazon Redshift
Amazon Redshift works with your existing
analysis tools
ODBC/JDBC
BI Clients
Redshift
ODBC/JDBC
BI Server Redshift
Clients
Monitor query performance
View explain plans
Resources
Angelo Carvalho | carvaa@amazon.com |
Detail Pages
• http://aws.amazon.com/redshift
• https://aws.amazon.com/marketplace/redshift/
Best Practices
• http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html
• http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html
• http://docs.aws.amazon.com/redshift/latest/dg/c-optimizing-query-performance.html
Open Source Tools
• https://github.com/awslabs/amazon-redshift-utils
Obrigado !

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseStreaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
 
Pg conf 2017 HIPAA Compliant and HA DB architecture on AWS
Pg conf 2017  HIPAA Compliant and HA DB architecture on AWSPg conf 2017  HIPAA Compliant and HA DB architecture on AWS
Pg conf 2017 HIPAA Compliant and HA DB architecture on AWS
 
Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute Services
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
 
AWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQL
AWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQLAWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQL
AWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQL
 
Getting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheGetting started with Amazon ElastiCache
Getting started with Amazon ElastiCache
 
Amazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database EngineAmazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database Engine
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
ElastiCache & Redis
ElastiCache & RedisElastiCache & Redis
ElastiCache & Redis
 
DAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL PerformanceDAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL Performance
 
Amazon (AWS) Aurora
Amazon (AWS) AuroraAmazon (AWS) Aurora
Amazon (AWS) Aurora
 
What’s New in Amazon Aurora for MySQL and PostgreSQL
What’s New in Amazon Aurora for MySQL and PostgreSQLWhat’s New in Amazon Aurora for MySQL and PostgreSQL
What’s New in Amazon Aurora for MySQL and PostgreSQL
 
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
 
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
 
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
 
Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum Efficiency
Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum EfficiencyDeploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum Efficiency
Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum Efficiency
 

Andere mochten auch

Andere mochten auch (20)

Começando com aplicações serverless na AWS
 Começando com aplicações serverless na AWS Começando com aplicações serverless na AWS
Começando com aplicações serverless na AWS
 
Comenzando con AWS Mobile Services
Comenzando con AWS Mobile ServicesComenzando con AWS Mobile Services
Comenzando con AWS Mobile Services
 
Bases de datos en la nube con AWS
Bases de datos en la nube con AWSBases de datos en la nube con AWS
Bases de datos en la nube con AWS
 
Almacenamiento en la nube con AWS
Almacenamiento en la nube con AWSAlmacenamiento en la nube con AWS
Almacenamiento en la nube con AWS
 
Contruyendo tu primera aplicación con AWS
Contruyendo tu primera aplicación con AWSContruyendo tu primera aplicación con AWS
Contruyendo tu primera aplicación con AWS
 
Contruyendo tu primera aplicación con AWS
Contruyendo tu primera aplicación con AWSContruyendo tu primera aplicación con AWS
Contruyendo tu primera aplicación con AWS
 
Escalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuariosEscalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuarios
 
Cloud Computing con Amazon Web Services
 Cloud Computing con Amazon Web Services Cloud Computing con Amazon Web Services
Cloud Computing con Amazon Web Services
 
Benefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon RedshiftBenefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon Redshift
 
Compute Services con AWS
Compute Services con AWSCompute Services con AWS
Compute Services con AWS
 
Comenzando con los servicios móviles en AWS
Comenzando con los servicios móviles en AWSComenzando con los servicios móviles en AWS
Comenzando con los servicios móviles en AWS
 
EC2 Computo en la Nube
EC2 Computo en la NubeEC2 Computo en la Nube
EC2 Computo en la Nube
 
Bases de datos en la nube con AWS
Bases de datos en la nube con AWSBases de datos en la nube con AWS
Bases de datos en la nube con AWS
 
AWS Services Overview
AWS Services OverviewAWS Services Overview
AWS Services Overview
 
Escalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuariosEscalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuarios
 
Ask the Trainer - Treinamentos e Certificações da AWS
Ask the Trainer - Treinamentos e Certificações da AWSAsk the Trainer - Treinamentos e Certificações da AWS
Ask the Trainer - Treinamentos e Certificações da AWS
 
Almacenamiento en la nube con AWS
Almacenamiento en la nube con AWSAlmacenamiento en la nube con AWS
Almacenamiento en la nube con AWS
 
Criando e conectando seu datacenter virtual
Criando e conectando seu datacenter virtualCriando e conectando seu datacenter virtual
Criando e conectando seu datacenter virtual
 
Construindo APIs com Amazon API Gateway e AWS Lambda
Construindo APIs com Amazon API Gateway e AWS LambdaConstruindo APIs com Amazon API Gateway e AWS Lambda
Construindo APIs com Amazon API Gateway e AWS Lambda
 
Escalando sua aplicação Web com Beanstalk
Escalando sua aplicação Web com BeanstalkEscalando sua aplicação Web com Beanstalk
Escalando sua aplicação Web com Beanstalk
 

Ähnlich wie Introdução ao Data Warehouse Amazon Redshift

Ähnlich wie Introdução ao Data Warehouse Amazon Redshift (20)

Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftData Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting started with amazon redshift - Toronto
Getting started with amazon redshift - TorontoGetting started with amazon redshift - Toronto
Getting started with amazon redshift - Toronto
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Amazon Redshift
Amazon Redshift Amazon Redshift
Amazon Redshift
 
Introdução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon RedshiftIntrodução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 
Getting Started with Amazon Redshift
 Getting Started with Amazon Redshift Getting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 

Mehr von Amazon Web Services LATAM

Mehr von Amazon Web Services LATAM (20)

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWS
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWS
 
Cómo empezar con Amazon EKS
Cómo empezar con Amazon EKSCómo empezar con Amazon EKS
Cómo empezar con Amazon EKS
 
Como começar com Amazon EKS
Como começar com Amazon EKSComo começar com Amazon EKS
Como começar com Amazon EKS
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWS
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWS
 
Ransomware: Estratégias de Mitigação
Ransomware: Estratégias de MitigaçãoRansomware: Estratégias de Mitigação
Ransomware: Estratégias de Mitigação
 
Ransomware: Estratégias de Mitigación
Ransomware: Estratégias de MitigaciónRansomware: Estratégias de Mitigación
Ransomware: Estratégias de Mitigación
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWS
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administrados
 
Simplifique su BI con AWS
Simplifique su BI con AWSSimplifique su BI con AWS
Simplifique su BI con AWS
 
Simplifique o seu BI com a AWS
Simplifique o seu BI com a AWSSimplifique o seu BI com a AWS
Simplifique o seu BI com a AWS
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWS
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Introdução ao Data Warehouse Amazon Redshift

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Angelo Carvalho, Solutions Architect, AWS 22 de novembro de 2016 Introdução ao data warehouse Amazon Redshift
  • 2. Agenda • Introdução • Benefícios • Começando • Criando um cluster • Modelo de dados • Carregando dados • Consultando • Mais Recursos
  • 3. AnalyzeStore Amazon Glacier Amazon S3 Amazon DynamoDB Amazon RDS, Amazon Aurora AWS big data portfolio AWS Data Pipeline Amazon CloudSearch Amazon EMR Amazon EC2 Amazon Redshift Amazon Machine Learning Amazon Elasticsearch Service AWS Database Migration Service Amazon QuickSight Amazon Kinesis Firehose AWS Import/Export AWS Direct Connect Collect Amazon Kinesis Streams
  • 4. Relational data warehouse Massively parallel; petabyte scale Fully managed HDD and SSD platforms $1,000/TB/year; starts at $0.25/hour Amazon Redshift a lot faster a lot simpler a lot cheaper
  • 5.
  • 6. The Amazon Redshift view of data warehousing 10x cheaper Easy to provision Higher DBA productivity 10x faster No programming Easily leverage BI tools, Hadoop, machine learning, streaming Analysis inline with process flows Pay as you go, grow as you need Managed availability and disaster recovery Enterprise Big Data SaaS
  • 7. The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change. Forrester Wave™ Enterprise Data Warehouse Q4‘15
  • 9. Amazon Redshift architecture Leader Node Simple SQL endpoint Stores metadata Optimizes query plan Coordinates query execution Compute Nodes Local columnar storage Parallel/distributed execution of all queries, loads, backups, restores, resizes Start at just $0.25/hour, grow to 2 PB (compressed) DC1: SSD; scale from 160 GB to 326 TB DS2: HDD; scale from 2 TB to 2 PB Ingestion/Backup Backup Restore JDBC/ODBC 10 GigE (HPC)
  • 10. Benefit #1: Amazon Redshift is fast Dramatically less I/O Column storage Data compression Zone maps Direct-attached storage Large data block sizes analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw 10 | 13 | 14 | 26 |… … | 100 | 245 | 324 375 | 393 | 417… … 512 | 549 | 623 637 | 712 | 809 … … | 834 | 921 | 959 10 324 375 623 637 959
  • 11. Benefit #1: Amazon Redshift is fast Parallel and distributed Query Load Export Backup Restore Resize
  • 12. Benefit #1: Amazon Redshift is fast Hardware optimized for I/O intensive workloads, 4 GB/sec/node Enhanced networking, over 1 million packets/sec/node Choice of storage type, instance size Regular cadence of autopatched improvements
  • 13. Benefit #1: Amazon Redshift is fast New Dense Storage (HDD) instance type Improved memory 2x, compute 2x, disk throughput 1.5x Cost: Same as our prior generation! Performance improvement: 50% Enhanced I/O and commit improvements (Jan ‘16) Reduce amount of time to commit data Performance improvement: 35%
  • 14. Benefit #2: Amazon Redshift is inexpensive DS2 (HDD) Price Per Hour for DW1.XL Single Node Effective Annual Price per TB Compressed On-Demand $ 0.850 $ 3,725 1 Year Reservation $ 0.500 $ 2,190 3 Year Reservation $ 0.228 $ 999 DC1 (SSD) Price Per Hour for DW2.L Single Node Effective Annual Price per TB Compressed On-Demand $ 0.250 $ 13,690 1 Year Reservation $ 0.161 $ 8,795 3 Year Reservation $ 0.100 $ 5,500 Pricing is simple Number of nodes x price/hour No charge for leader node No upfront costs Pay as you go
  • 15. Benefit #3: Amazon Redshift is fully managed Continuous/incremental backups Multiple copies within cluster Continuous and incremental backups to S3 Continuous and incremental backups across regions Streaming restore Amazon S3 Amazon S3 Region 1 Region 2
  • 16. Benefit #3: Amazon Redshift is fully managed Amazon S3 Amazon S3 Region 1 Region 2 Fault tolerance Disk failures Node failures Network failures Availability Zone/region level disasters
  • 17. Benefit #4: Security is built-in • Load encrypted from S3 • SSL to secure data in transit • ECDHE perfect forward security • Amazon VPC for network isolation • Encryption to secure data at rest • All blocks on disks and in S3 encrypted • Block key, cluster key, master key (AES-256) • On-premises HSM & AWS CloudHSM support • Audit logging and AWS CloudTrail integration • SOC 1/2/3, PCI-DSS, FedRAMP, BAA 10 GigE (HPC) Ingestion Backup Restore Customer VPC Internal VPC JDBC/ODBC
  • 18. Benefit #5: We innovate quickly Well over 100 new features added since launch Release every two weeks Automatic patching Service Launch (2/14) PDX (4/2) Temp Credentials (4/11) DUB (4/25) SOC1/2/3 (5/8) Unload Encrypted Files NRT (6/5) JDBC Fetch Size (6/27) Unload logs (7/5) SHA1 Builtin (7/15) 4 byte UTF-8 (7/18) Sharing snapshots (7/18) Statement Timeout (7/22) Timezone, Epoch, Autoformat (7/25) WLM Timeout/Wildcards (8/1) CRC32 Builtin, CSV, Restore Progress (8/9) Resource Level IAM (8/9) PCI (8/22) UTF-8 Substitution (8/29) JSON, Regex, Cursors (9/10) Split_part, Audit tables (10/3) SIN/SYD (10/8) HSM Support (11/11) Kinesis EMR/HDFS/SSH copy, Distributed Tables, Audit Logging/CloudTrail, Concurrency, Resize Perf., Approximate Count Distinct, SNS Alerts, Cross Region Backup (11/13) Distributed Tables, Single Node Cursor Support, Maximum Connections to 500 (12/13) EIP Support for VPC Clusters (12/28) New query monitoring system tables and diststyle all (1/13) Redshift on DW2 (SSD) Nodes (1/23) Compression for COPY from SSH, Fetch size support for single node clusters, new system tables with commit stats, row_number(), strotol() and query termination (2/13) Resize progress indicator & Cluster Version (3/21) Regex_Substr, COPY from JSON (3/25) 50 slots, COPY from EMR, ECDHE ciphers (4/22) 3 new regex features, Unload to single file, FedRAMP(5/6) Rename Cluster (6/2) Copy from multiple regions, percentile_cont, percentile_disc (6/30) Free Trial (7/1) pg_last_unload_count (9/15) AES-128 S3 encryption (9/29) UTF-16 support (9/29)
  • 19. Benefit #6: Redshift is powerful • Approximate functions • User defined functions • Machine Learning • Data Science
  • 20. Benefit #7: Amazon Redshift has a large ecosystem Data Integration Systems IntegratorsBusiness Intelligence
  • 21. Benefit #8: Service oriented architecture DynamoDB EMR S3 EC2/SSH RDS/Aurora Amazon Redshift Amazon Kinesis Amazon ML Data Pipeline CloudSearch Amazon Mobile Analytics
  • 25. SELECT COUNT(*) FROM LOGS WHERE DATE = ‘09-JUNE-2013’ MIN: 01-JUNE-2013 MAX: 20-JUNE-2013 MIN: 08-JUNE-2013 MAX: 30-JUNE-2013 MIN: 12-JUNE-2013 MAX: 20-JUNE-2013 MIN: 02-JUNE-2013 MAX: 25-JUNE-2013 Unsorted Table MIN: 01-JUNE-2013 MAX: 06-JUNE-2013 MIN: 07-JUNE-2013 MAX: 12-JUNE-2013 MIN: 13-JUNE-2013 MAX: 18-JUNE-2013 MIN: 19-JUNE-2013 MAX: 24-JUNE-2013 Sorted By Date Zone Maps
  • 26. • Single Column • Compound • Interleaved
  • 27. Single Column • Table is sorted by 1 column Date Region Country 2-JUN-2015 Oceania New Zealand 2-JUN-2015 Asia Singapore 2-JUN-2015 Africa Zaire 2-JUN-2015 Asia Hong Kong 3-JUN-2015 Europe Germany 3-JUN-2015 Asia Korea [ SORTKEY ( date ) ] • Best for: • Queries that use 1st column (i.e. date) as primary filter • Can speed up joins and group bys • Quickest to VACUUM
  • 28. Compound • Table is sorted by 1st column , then 2nd column etc. Date Region Country 2-JUN-2015 Africa Zaire 2-JUN-2015 Asia Korea 2-JUN-2015 Asia Singapore 2-JUN-2015 Europe Germany 3-JUN-2015 Asia Hong Kong 3-JUN-2015 Asia Korea [ SORTKEY COMPOUND ( date, region, country) ] • Best for: • Queries that use 1st column as primary filter, then other cols • Can speed up joins and group bys • Slower to VACUUM
  • 29. Interleaved • Equal weight is given to each column. Date Region Country 2-JUN-2015 Africa Zaire 3-JUN-2015 Asia Singapore 2-JUN-2015 Asia Korea 2-JUN-2015 Europe Germany 3-JUN-2015 Asia Hong Kong 2-JUN-2015 Asia Korea [ SORTKEY INTERLEAVED ( date, region, country) ] • Best for: • Queries that use different columns in filter • Queries get faster the more columns used in the filter • Slowest to VACUUM
  • 31. ID Gender Name 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M James White 306 F Lisa Green 2 3 4 ID Gender Name 101 M John Smith 306 F Lisa Green ID Gender Name 292 F Jane Jones 209 M James White ID Gender Name 139 M Peter Black 164 M Brian Snail ID Gender Name 446 M Pat Partridge 658 F Sarah Cyan Even: Round Robin DISTSTYLE EVEN
  • 32. ID Gender Name 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M James White 306 F Lisa Green Key: Hash Function ID Gender Name 101 M John Smith 306 F Lisa Green ID Gender Name 292 F Jane Jones 209 M James White ID Gender Name 139 M Peter Black 164 M Brian Snail ID Gender Name 446 M Pat Partridge 658 F Sarah Cyan DISTSTYLE KEY
  • 33. ID Gender Name 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M James White 306 F Lisa Green Key: Hash Function ID Gender Name 101 M John Smith 139 M Peter Black 446 M Pat Partridge 164 M Brian Snail 209 M James White ID Gender Name 292 F Jane Jones 658 F Sarah Cyan 306 F Lisa Green DISTSTYLE KEY
  • 34. ID Gender Name 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M James White 306 F Lisa Green 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M Lisa Green 306 F James White 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M Lisa Green 306 F James White 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M Lisa Green 306 F James White 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M Lisa Green 306 F James White ALL DISTSTYLE ALL
  • 35. • KEY • Large Fact tables • Large dimension tables • ALL • Medium dimension tables (1K – 2M) • EVEN • Tables with no joins or group by • Small dimension tables (<1000)
  • 37. AWS CloudCorporate Data center Amazon S3 Amazon Redshift Flat files Data loading options
  • 38. AWS CloudCorporate Data center ETL Source DBs Amazon Redshift Amazon Redshift Data loading options
  • 40. Use the COPY command Each slice can load one file at a time A single input file means only one slice is ingesting data Instead of 100MB/s, you’re only getting 6.25MB/s Use multiple input files to maximize throughput
  • 41. Use the COPY command You need at least as many input files as you have slices With 16 input files, all slices are working so you maximize throughput Get 100MB/s per node; scale linearly as you add nodes Use multiple input files to maximize throughput
  • 43. JDBC/ODBC Amazon Redshift Amazon Redshift works with your existing analysis tools
  • 48.
  • 49. Resources Angelo Carvalho | carvaa@amazon.com | Detail Pages • http://aws.amazon.com/redshift • https://aws.amazon.com/marketplace/redshift/ Best Practices • http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html • http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html • http://docs.aws.amazon.com/redshift/latest/dg/c-optimizing-query-performance.html Open Source Tools • https://github.com/awslabs/amazon-redshift-utils

Hinweis der Redaktion

  1. This is the AWS Big Data portfolio. We have tools like Direct Connect and Import Export that can bring in a lot of data. We can persist that data into a number of storage services from S3 to DynamoDB to EMR and RedShift for further analysis. Amazon Redshift provides a fast, fully managed, petabyte-scale data warehouse for less than $1000 per terabyte per year. Amazon Elastic MapReduce provides a managed, easy to use analytics platform built around the powerful Hadoop framework. Recently we announced Amazon Kinesis, a managed service for real-time processing of streaming big data. Amazon Glacier allows you to backup and archive an unlimited amount of data at just 1 cent per GB per month. Automate and schedule big data processing workloads with Data Pipeline. The tools to support big data collection, computation along with collaboration and sharing are all available in a couple of clicks, with AWS.
  2. Redshift value proposition: fast, simple, cost-effective data warehousing. Massively parallel and columnar technology. Easily scalable to petabytes or more. Makes administration simple, no DBA needed. You can choose from hard disk drive or solid state drive platform depending on your needs. All this for $1,000/TB/year, which is one-tenth of the cost of traditional data warehousing solutions.
  3. Our view of data warehousing and why we built Amazon Redshift. We want to make data warehousing fast, inexpensive and easy to use for companies of all sizes For Enterprises, benefits are: 1/10th cost, easy to provision and manage, productive DBAs For Big data companies: 10x faster, use standard SQL-no need to write custom code, use existing BI tools For SaaS: analysis in-line with process flows, pay as you go, grow as you need, managed availability & DR
  4. Amazon Redshift has been ranked as a leader in the recent data warehousing Forrester Wave
  5. Nasdaq security – Ingests 5B rows/trading day, analyzes orders, trades and quotes Yelp – 10s of millions of ads/day, 250M mobile events/day, analyzes ad campaign performance and new feature usage Desk.com – Ingests 200K case related events/hour and runs a user facing portal on Redshift
  6. High level overview of Amazon Redshift Architecture – MPP based with leader node that acts as a SQL end point and coordinates query execution with compute nodes. The computes store data locally in columnar format.
  7. Column storage fetches only those columns that are required for queries Customer typically get 2-4x compression. Compression also implies less I/O Zone maps: Each column is divided into 1MB blocks, we store the min/max values of each block in memory. We are able to quickly identify the blocks that are required to be fetched for a query Direct-attached storage/large block sizes also enable fast I/O
  8. All the operations that you care about from a database management and performance perspective are all parallelized, and hence scale horizontally with the number of nodes
  9. We work with EC2 and infrastructure teams closely to optimize the h/w for data warehousing needs.
  10. We work with EC2 and infrastructure teams closely to optimize the h/w for data warehousing needs.
  11. Prices start at $1,000/TB/yr. for a 3 year DS2.8XL RI. RIs provide up to 70% discounts.
  12. Backups are managed, continuous and incremental Multiple copies are made within and outside the cluster on S3 Streaming restore =>While restore happens in the background, cluster is made available for reads/write within minutes of initiating the restoring operation (whether the backup is for a 1TB or a 100TB cluster)
  13. Redshift monitors the health of a variety of components and failure conditions within an AZ and recovers from them automatically For AZ/regional level disasters, streaming restore will help get back to business within minutes
  14. From the time data is generated to the time it is queried, data can be encrypted end-to-end on Redshift
  15. Pace of innovation – Continuous deployment, different model, Oracle/Teradata etc. still took a 6 month deployment plan vs. us
  16. We respect the investments you made in your analytics platforms and work with a variety of ETL, BI and SI vendors. Standard JDBC/ODBC drivers make the integration process seamless but we work with these vendors closely to certify the joint platforms
  17. We believe in SOA, pick and choose the components that work for your use case instead of bundling it all
  18. Sorted columns enable fetching the minimum number of blocks required for query execution. In this example, an unsorted table al most leads to a full table scan O(N) and a sorted table leads to one block scanned O(1).
  19. Redshift works with customer’s BI tool of choice through Postgres drivers and a JDBC, ODBC connection. A number of partners shown here have certified integration with Redshift, meaning they have done testing to validate/build Redshift integration and make using Redshift easy from a UI perspective. If there are tools customer’s use not shown we can work with Redshift on getting them integrated.