SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Sao Paulo
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
Building Your Data Warehouse with
Amazon Redshift
Eric Ferreira, AWS (ericfe@)
Selected Amazon Redshift Customers
Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year
Amazon Redshift
Amazon Redshift
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC• Fast
• Simple
• Cheap
Amazon Redshift is easy to use
• Provision in minutes
• Monitor query performance
• Point and click resize
• Built in security
• Automatic backups
Amazon Redshift has security built-in
• Load encrypted from S3
• SSL to secure data in transit; ECDHE perfect
forward security
• Encryption to secure data at rest
– All blocks on disks & in Amazon S3 encrypted
– Block key, Cluster key, Master key (AES-256)
– On-premises HSM & CloudHSM support
• Audit logging & AWS CloudTrail integration
• Amazon VPC support
• SOC 1/2/3, PCI-DSS Level 1, FedRAMP
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal
VPC
JDBC/ODBC
• Replication within the cluster and backup to Amazon S3 to maintain multiple copies of
data at all times
• Backups to Amazon S3 are continuous, automatic, and incremental
– Designed for eleven nines of durability
• Continuous monitoring and automated recovery from failures of drives and nodes
• Able to restore snapshots to any Availability Zone within a region
• Easily enable backups to a second region for disaster recovery
Amazon Redshift continuously backs up your data
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With row storage you do
unnecessary I/O
• To get total amount, you have to
read everything
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With column storage, you only
read the data you need
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• Columnar compression saves
space & reduces I/O
• Amazon Redshift analyzes and
compresses your data
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Direct-attached storage
• Large data block sizes
• Track of the minimum and
maximum value for each block
• Skip over blocks that don’t
contain the data needed for a
given query
• Minimize unnecessary I/O
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• Use direct-attached storage to
maximize throughput
• Hardware optimized for high
performance data processing
• Large block sizes to make the
most of each read
• Amazon Redshift manages
durability for you
Amazon Redshift parallelizes and distributes everything
• Query
• Load
• Backup/Restore
• Resize
Use Sort and Distribution keys
Analyze and Vacuum
Load in parallel from Amazon S3 or Amazon
DynamoDB or any SSH connection
Data automatically distributed and sorted
according to DDL
Scales linearly with number of nodes
Amazon Redshift parallelizes and distributes everything
• Query
• Load
• Backup/Restore
• Resize
Automatic, continuous, incremental backups
Configurable backup retention period
Cross region backups for disaster recovery
Stream data to resume querying faster
Amazon Redshift parallelizes and distributes everything
• Query
• Load
• Backup/Restore
• Resize
Simple operation via Console or API
Provision a new cluster in the background
Copy data in parallel from node to node
Only charged for source cluster
Automatic SQL endpoint switchover via DNS
Amazon Redshift parallelizes and distributes everything
• Query
• Load
• Backup/Restore
• Resize
Clickstream Analysis for Amazon.com
• Redshift runs web log analysis for
Amazon.com
– 100 node Redshift Cluster (DW1.8XL)
– Over one petabyte workload
– Largest table: 400TB
– 2TB of data per day
• Understand customer behavior
– Who is browsing but not buying
– Which products / features are winners
– What sequence led to higher customer conversion
Redshift Performance Realized
• Scan 15 months of data: 14 minutes
– 2.25 trillion rows
• Load one day worth of data: 10 minutes
– 5 billion rows
• Backfill one month of data: 9.75 hours
– 150 billion rows
• Pig  Amazon Redshift: 2 days to 1 hr
– 10B row join with 700M rows
• Oracle  Amazon Redshift: 90 hours to 8 hrs
– Reduced number of SQLs by a factor of 3
Redshift Performance Realized
• Managed Service
– 20% time of one DBA
• Backup
• Restore
• Resizing
• 2PB cluster
– 100 node dw1.8xl (3yr RI)
– $180/hr
Expanding Amazon Redshift’s Functionality
Custom ODBC and JDBC Drivers
• Up to 35% higher performance than open
source drivers
• Supported by Informatica, Microstrategy,
Pentaho, Qlik, SAS, Tableau
• Will continue to support PostgreSQL
open source drivers
• Download drivers from console
Explain Plan Visualization
User Defined Functions
• We’re enabling User Defined Functions (UDFs) so
you can add your own
– Scalar and Aggregate Functions supported
• You’ll be able to write UDFs using Python 2.7
– Syntax is largely identical to PostgreSQL UDF Syntax
– System and network calls within UDFs are prohibited
• Comes with Pandas, NumPy, and SciPy pre-installed
– You’ll also be able import your own libraries for even more flexibility
Scalar UDF example – URL parsing
CREATE FUNCTION f_hostname (VARCHAR url)
RETURNS varchar
IMMUTABLE AS $$
import urlparse
return urlparse.urlparse(url).hostname
$$ LANGUAGE plpythonu;
Sorting by Multiple Columns
• Currently support Compound Sort Keys
– Optimized for applications that filter data by one leading
column
• Adding support for Interleaved Sort Keys
– Optimized for filtering data by up to eight columns
– No storage overhead unlike an index
– Lower maintenance penalty compared to indexes
Compound Sort Keys Illustrated
• Records in Redshift
are stored in blocks.
• For this illustration,
let’s assume that
four records fill a
block
• Records with a given
cust_id are all in one
block
• However, records
with a given prod_id
are spread across
four blocks
1
1
1
1
2
3
4
1
4
4
4
2
3
4
4
1
3
3
3
2
3
4
3
1
2
2
2
2
3
4
2
1
1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
cust_id prod_id other columns blocks
1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
Interleaved Sort Keys Illustrated
• Records with a given
cust_id are spread
across two blocks
• Records with a given
prod_id are also
spread across two
blocks
• Data is sorted in equal
measures for both
keys
1
1
2
2
2
1
2
3
3
4
4
4
3
4
3
1
3
4
4
2
1
2
3
3
1
2
2
4
3
4
1
1
cust_id prod_id other columns blocks
How to use the feature
• New keyword ‘INTERLEAVED’ when defining sort keys
– Existing syntax will still work and behavior is unchanged
– You can choose up to 8 columns to include and can query with any or all of
them
• No change needed to queries
• Benefits are significant
[[ COMPOUND | INTERLEAVED ] SORTKEY ( column_name [, ...] ) ]
New Dense Storage Instance
DS2, based on EC2’s D2, has twice the memory and CPU as DS1 (formerly DW1)
Instance vCPU ECU
Memory
(GiB)
Network Storage I/O
Price/TB
On Demand
Price/TB
3 YR RI
ds1.xlarge 2 4.4 15 Moderate 2TB HDD 0.30GB/s $3,330 $999
ds1.8xlarge 16 35 120 10 Gbps 16TB HDD 2.40GB/s $3,330 $999
ds2.xlarge 4 31 Enhanced 2TB HDD 0.50GB/s TBD TBD
ds2.8xlarge 36 244 Enhanced - 10Gbps 16TB HDD 4.00GB/s TBD TBD
dc1.large 2 7 15 Enhanced 0.16TB SSD 0.20GB/s $18,327 $5,498
dc1.8xlarge 32 104 244 Enhanced - 10Gbps 2.56TB SSD 3.70GB/s $18,327 $5,498
DS1 - Dense Storage (formerly DW1)
DS2 - Dense Storage Gen 2
DC1 - Dense Compute (formerly DW2)
Migrate from DS1 to DS2 by restoring from snapshot. We will help you migrate your RIs.
Open Source Tools
• https://github.com/awslabs/amazon-redshift-utils
• Admin Scripts
– Collection of utilities for running diagnostics on your Cluster
• Admin Views
– Collection of utilities for managing your Cluster, generating Schema
DDL, etc
• Column Encoding Utility
– Gives you the ability to apply optimal Column Encoding to an
established Schema with data already loaded
Q&A
Redshift overview

Weitere ähnliche Inhalte

Was ist angesagt?

Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
DataWorks Summit
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
 

Was ist angesagt? (20)

(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Modern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon RedshiftModern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon Redshift
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
 
Enabling a Data Mesh Architecture and Data Sharing Culture with Denodo
Enabling a Data Mesh Architecture and Data Sharing Culture with DenodoEnabling a Data Mesh Architecture and Data Sharing Culture with Denodo
Enabling a Data Mesh Architecture and Data Sharing Culture with Denodo
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 

Andere mochten auch

Andere mochten auch (9)

AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 
How Amazon.com is Leveraging Amazon Redshift (DAT306) | AWS re:Invent 2013
How Amazon.com is Leveraging Amazon Redshift (DAT306) | AWS re:Invent 2013How Amazon.com is Leveraging Amazon Redshift (DAT306) | AWS re:Invent 2013
How Amazon.com is Leveraging Amazon Redshift (DAT306) | AWS re:Invent 2013
 
SQL Server to Redshift Data Load Using SSIS
SQL Server to Redshift Data Load Using SSISSQL Server to Redshift Data Load Using SSIS
SQL Server to Redshift Data Load Using SSIS
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
 

Ähnlich wie Redshift overview

Ähnlich wie Redshift overview (20)

Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
 
Redshift Chartio Event Presentation
Redshift Chartio Event PresentationRedshift Chartio Event Presentation
Redshift Chartio Event Presentation
 
Introdução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon RedshiftIntrodução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 

Mehr von Amazon Web Services LATAM

Mehr von Amazon Web Services LATAM (20)

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWS
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWS
 
Cómo empezar con Amazon EKS
Cómo empezar con Amazon EKSCómo empezar con Amazon EKS
Cómo empezar con Amazon EKS
 
Como começar com Amazon EKS
Como começar com Amazon EKSComo começar com Amazon EKS
Como começar com Amazon EKS
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWS
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWS
 
Ransomware: Estratégias de Mitigação
Ransomware: Estratégias de MitigaçãoRansomware: Estratégias de Mitigação
Ransomware: Estratégias de Mitigação
 
Ransomware: Estratégias de Mitigación
Ransomware: Estratégias de MitigaciónRansomware: Estratégias de Mitigación
Ransomware: Estratégias de Mitigación
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWS
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administrados
 
Simplifique su BI con AWS
Simplifique su BI con AWSSimplifique su BI con AWS
Simplifique su BI con AWS
 
Simplifique o seu BI com a AWS
Simplifique o seu BI com a AWSSimplifique o seu BI com a AWS
Simplifique o seu BI com a AWS
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWS
 

Kürzlich hochgeladen

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Kürzlich hochgeladen (20)

WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 

Redshift overview

  • 2. ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Building Your Data Warehouse with Amazon Redshift Eric Ferreira, AWS (ericfe@)
  • 4. Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year Amazon Redshift
  • 6. Amazon Redshift is easy to use • Provision in minutes • Monitor query performance • Point and click resize • Built in security • Automatic backups
  • 7. Amazon Redshift has security built-in • Load encrypted from S3 • SSL to secure data in transit; ECDHE perfect forward security • Encryption to secure data at rest – All blocks on disks & in Amazon S3 encrypted – Block key, Cluster key, Master key (AES-256) – On-premises HSM & CloudHSM support • Audit logging & AWS CloudTrail integration • Amazon VPC support • SOC 1/2/3, PCI-DSS Level 1, FedRAMP 10 GigE (HPC) Ingestion Backup Restore Customer VPC Internal VPC JDBC/ODBC
  • 8. • Replication within the cluster and backup to Amazon S3 to maintain multiple copies of data at all times • Backups to Amazon S3 are continuous, automatic, and incremental – Designed for eleven nines of durability • Continuous monitoring and automated recovery from failures of drives and nodes • Able to restore snapshots to any Availability Zone within a region • Easily enable backups to a second region for disaster recovery Amazon Redshift continuously backs up your data
  • 9. Amazon Redshift dramatically reduces I/O • Data compression • Zone maps • Direct-attached storage • Large data block sizes ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 • With row storage you do unnecessary I/O • To get total amount, you have to read everything
  • 10. Amazon Redshift dramatically reduces I/O • Data compression • Zone maps • Direct-attached storage • Large data block sizes ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 • With column storage, you only read the data you need
  • 11. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Zone maps • Direct-attached storage • Large data block sizes • Columnar compression saves space & reduces I/O • Amazon Redshift analyzes and compresses your data analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw
  • 12. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Direct-attached storage • Large data block sizes • Track of the minimum and maximum value for each block • Skip over blocks that don’t contain the data needed for a given query • Minimize unnecessary I/O
  • 13. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Zone maps • Direct-attached storage • Large data block sizes • Use direct-attached storage to maximize throughput • Hardware optimized for high performance data processing • Large block sizes to make the most of each read • Amazon Redshift manages durability for you
  • 14. Amazon Redshift parallelizes and distributes everything • Query • Load • Backup/Restore • Resize Use Sort and Distribution keys Analyze and Vacuum
  • 15. Load in parallel from Amazon S3 or Amazon DynamoDB or any SSH connection Data automatically distributed and sorted according to DDL Scales linearly with number of nodes Amazon Redshift parallelizes and distributes everything • Query • Load • Backup/Restore • Resize
  • 16. Automatic, continuous, incremental backups Configurable backup retention period Cross region backups for disaster recovery Stream data to resume querying faster Amazon Redshift parallelizes and distributes everything • Query • Load • Backup/Restore • Resize
  • 17. Simple operation via Console or API Provision a new cluster in the background Copy data in parallel from node to node Only charged for source cluster Automatic SQL endpoint switchover via DNS Amazon Redshift parallelizes and distributes everything • Query • Load • Backup/Restore • Resize
  • 18. Clickstream Analysis for Amazon.com • Redshift runs web log analysis for Amazon.com – 100 node Redshift Cluster (DW1.8XL) – Over one petabyte workload – Largest table: 400TB – 2TB of data per day • Understand customer behavior – Who is browsing but not buying – Which products / features are winners – What sequence led to higher customer conversion
  • 19. Redshift Performance Realized • Scan 15 months of data: 14 minutes – 2.25 trillion rows • Load one day worth of data: 10 minutes – 5 billion rows • Backfill one month of data: 9.75 hours – 150 billion rows • Pig  Amazon Redshift: 2 days to 1 hr – 10B row join with 700M rows • Oracle  Amazon Redshift: 90 hours to 8 hrs – Reduced number of SQLs by a factor of 3
  • 20. Redshift Performance Realized • Managed Service – 20% time of one DBA • Backup • Restore • Resizing • 2PB cluster – 100 node dw1.8xl (3yr RI) – $180/hr
  • 22. Custom ODBC and JDBC Drivers • Up to 35% higher performance than open source drivers • Supported by Informatica, Microstrategy, Pentaho, Qlik, SAS, Tableau • Will continue to support PostgreSQL open source drivers • Download drivers from console
  • 24. User Defined Functions • We’re enabling User Defined Functions (UDFs) so you can add your own – Scalar and Aggregate Functions supported • You’ll be able to write UDFs using Python 2.7 – Syntax is largely identical to PostgreSQL UDF Syntax – System and network calls within UDFs are prohibited • Comes with Pandas, NumPy, and SciPy pre-installed – You’ll also be able import your own libraries for even more flexibility
  • 25. Scalar UDF example – URL parsing CREATE FUNCTION f_hostname (VARCHAR url) RETURNS varchar IMMUTABLE AS $$ import urlparse return urlparse.urlparse(url).hostname $$ LANGUAGE plpythonu;
  • 26. Sorting by Multiple Columns • Currently support Compound Sort Keys – Optimized for applications that filter data by one leading column • Adding support for Interleaved Sort Keys – Optimized for filtering data by up to eight columns – No storage overhead unlike an index – Lower maintenance penalty compared to indexes
  • 27. Compound Sort Keys Illustrated • Records in Redshift are stored in blocks. • For this illustration, let’s assume that four records fill a block • Records with a given cust_id are all in one block • However, records with a given prod_id are spread across four blocks 1 1 1 1 2 3 4 1 4 4 4 2 3 4 4 1 3 3 3 2 3 4 3 1 2 2 2 2 3 4 2 1 1 [1,1] [1,2] [1,3] [1,4] 2 [2,1] [2,2] [2,3] [2,4] 3 [3,1] [3,2] [3,3] [3,4] 4 [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id cust_id prod_id other columns blocks
  • 28. 1 [1,1] [1,2] [1,3] [1,4] 2 [2,1] [2,2] [2,3] [2,4] 3 [3,1] [3,2] [3,3] [3,4] 4 [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id Interleaved Sort Keys Illustrated • Records with a given cust_id are spread across two blocks • Records with a given prod_id are also spread across two blocks • Data is sorted in equal measures for both keys 1 1 2 2 2 1 2 3 3 4 4 4 3 4 3 1 3 4 4 2 1 2 3 3 1 2 2 4 3 4 1 1 cust_id prod_id other columns blocks
  • 29. How to use the feature • New keyword ‘INTERLEAVED’ when defining sort keys – Existing syntax will still work and behavior is unchanged – You can choose up to 8 columns to include and can query with any or all of them • No change needed to queries • Benefits are significant [[ COMPOUND | INTERLEAVED ] SORTKEY ( column_name [, ...] ) ]
  • 30. New Dense Storage Instance DS2, based on EC2’s D2, has twice the memory and CPU as DS1 (formerly DW1) Instance vCPU ECU Memory (GiB) Network Storage I/O Price/TB On Demand Price/TB 3 YR RI ds1.xlarge 2 4.4 15 Moderate 2TB HDD 0.30GB/s $3,330 $999 ds1.8xlarge 16 35 120 10 Gbps 16TB HDD 2.40GB/s $3,330 $999 ds2.xlarge 4 31 Enhanced 2TB HDD 0.50GB/s TBD TBD ds2.8xlarge 36 244 Enhanced - 10Gbps 16TB HDD 4.00GB/s TBD TBD dc1.large 2 7 15 Enhanced 0.16TB SSD 0.20GB/s $18,327 $5,498 dc1.8xlarge 32 104 244 Enhanced - 10Gbps 2.56TB SSD 3.70GB/s $18,327 $5,498 DS1 - Dense Storage (formerly DW1) DS2 - Dense Storage Gen 2 DC1 - Dense Compute (formerly DW2) Migrate from DS1 to DS2 by restoring from snapshot. We will help you migrate your RIs.
  • 31. Open Source Tools • https://github.com/awslabs/amazon-redshift-utils • Admin Scripts – Collection of utilities for running diagnostics on your Cluster • Admin Views – Collection of utilities for managing your Cluster, generating Schema DDL, etc • Column Encoding Utility – Gives you the ability to apply optimal Column Encoding to an established Schema with data already loaded
  • 32. Q&A

Hinweis der Redaktion

  1. In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
  2. Nasdaq security, HasOffers loads 60M rows per day in 2 min intervals, Desk: high concurrency user facing portal (read/write cluster), Amazon.com/NTT PB scale. Pinterest saw 50-100x speed ups when moved 300TB from Hadoop to Redshift. Nokia saw 50% reduction in costs.
  3. If there is one thing you take away from today’s presentation, I want it to be this. Redshift is a “Fast, Simple, Scalable and Inexpensive Data Warehouse solution” and it is successfully deployed for a variety of use cases.
  4. Redshift MPP architecture is key to delivering a solution that is Fast, Simple, Scalable, and Cost Effective.
  5. Read only the data you need
  6. Read only the data you need
  7. Read only the data you need
  8. Read only the data you need
  9. Read only the data you need
  10. Today we will over the role of Amazon Redshift in addressing the Web Log Analysis problem for one of the largest online retailer, Amazon.com <go over the slide with restated language>