SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved
Building Your Data Warehouse with
Amazon Redshift
Ian Meyers, AWS (meyersi@amazon.com)
Guest Speaker: Toby Moore, Co-Founder & CTO, Space Ape Games
(toby@spaceapegames.com)
Data Warehouse - Challenges
Cost
Complexity
Performance
Rigidity
1990	
   2000	
   2010	
   2020	
  
Enterprise	
  Data	
   Data	
  in	
  Warehouse	
  
Petabyte scale; massively parallel
Relational data warehouse
Fully managed; zero admin
SSD & HDD platforms
As low as $1,000/TB/Year
Amazon
Redshift
Redshift powers Clickstream Analytics for Amazon.com
Web log analysis for Amazon.com
Over one petabyte workload
Largest table: 400TB
2TB of data per day
Understand customer behavior
Who is browsing but not buying
Which products / features are winners
What sequence led to higher customer
conversion
Solution
Best scale out solution – Query across 1 week
Hadoop – query across 1 month
Redshift Performance Realized
Performance
Scan 15 months of data: 14 minutes
2.25 trillion rows
Load one day worth of data: 10 minutes
5 billion rows
Backfill one month of data: 9.75 hours
150 billion rows
Pig à Amazon Redshift: 2 days to 1 hr
10B row join with 700M rows
Oracle à Amazon Redshift: 90 hours to 8 hrs
Reduced number of SQLs by a factor of 3
Cost
2PB cluster
100 node dw1.8xl (3yr RI)
$180/hr
Complexity
20% time of one DBA
Backup
Restore
Resizing
0	
   4	
   8	
   12	
   16	
   20	
   24	
   28	
   32	
  
128	
  	
  
Nodes	
  
16	
  	
  
Nodes	
  
	
  2	
  	
  
Nodes	
  
Dura%on	
  (Minutes)	
  
Time	
  to	
  Deploy	
  and	
  Manage	
  a	
  Cluster	
  
Time	
  spent	
  	
  
on	
  clicks	
  
Deploy	
  	
   Connect	
   Backup	
   Restore	
   Resize	
  	
  
(2	
  to	
  16	
  nodes)	
  
Simplicity
Who uses Amazon Redshift?
Common Customer Use Cases
•  Reduce costs by
extending DW rather than
adding HW
•  Migrate completely from
existing DW systems
•  Respond faster to
business
•  Improve performance by
an order of magnitude
•  Make more data
available for analysis
•  Access business data via
standard reporting tools
•  Add analytic functionality
to applications
•  Scale DW capacity as
demand grows
•  Reduce HW & SW costs
by an order of magnitude
Traditional Enterprise DW Companies with Big Data SaaS Companies
Selected Amazon Redshift Customers
Amazon Redshift Partners
Amazon Redshift Architecture
•  Leader Node
–  SQL endpoint
–  Stores metadata
–  Coordinates query execution
•  Compute Nodes
–  Local, columnar storage
–  Execute queries in parallel
–  Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB or SSH
•  Two hardware platforms
–  Optimized for data processing
–  DW1: HDD; scale from 2TB to 1.6PB
–  DW2: SSD; scale from 160GB to 256TB
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3 / DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
Amazon Redshift dramatically reduces I/O
•  Data compression
•  Zone maps
•  Direct-attached storage
•  Large data block sizes
ID	
   Age	
   State	
   Amount	
  
123	
   20	
   CA	
   500	
  
345	
   25	
   WA	
   250	
  
678	
   40	
   FL	
   125	
  
957	
   37	
   WA	
   375	
  
Amazon Redshift dramatically reduces I/O
•  Data compression
•  Zone maps
•  Direct-attached storage
•  Large data block sizes
ID	
   Age	
   State	
   Amount	
  
123	
   20	
   CA	
   500	
  
345	
   25	
   WA	
   250	
  
678	
   40	
   FL	
   125	
  
957	
   37	
   WA	
   375	
  
Amazon Redshift dramatically reduces I/O
•  Column storage
•  Data compression
•  Zone maps
•  Direct-attached storage
•  Large data block sizes
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
Amazon Redshift dramatically reduces I/O
•  Column storage
•  Data compression
•  Direct-attached storage
•  Large data block sizes
•  Track of the minimum and
maximum value for each block
•  Skip over blocks that don’t
contain the data needed for a
given query
•  Minimize unnecessary I/O
Amazon Redshift dramatically reduces I/O
•  Column storage
•  Data compression
•  Zone maps
•  Direct-attached storage
•  Large data block sizes
•  Use direct-attached storage
to maximize throughput
•  Hardware optimized for high
performance data
processing
•  Large block sizes to make the
most of each read
•  Amazon Redshift manages
durability for you
Amazon Redshift has security built-in
•  SSL to secure data in transit
•  Encryption to secure data at rest
–  AES-256; hardware accelerated
–  All blocks on disks and in Amazon S3 encrypted
–  HSM Support
•  No direct access to compute nodes
•  Audit logging & AWS CloudTrail integration
•  Amazon VPC support
•  SOC 1/2/3, PCI-DSS Level 1, FedRAMP, others
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Amazon S3 / Amazon DynamoDB
Customer VPC
Internal
VPC
JDBC/ODBC
Leader
Node
Compute
Node
Compute
Node
Compute
Node
Amazon Redshift is 1/10th the Price of a Traditional Data
Warehouse
DW1 (HDD)
Price Per Hour for
DW1.XL Single Node
Effective Annual
Price per TB
On-Demand $ 0.850 $ 3,723
1 Year Reserved Instance $ 0.215 $ 2,192
3 Year Reserved Instance $ 0.114 $ 999
DW2 (SSD)
Price Per Hour for
DW2.L Single Node
Effective Annual
Price per TB
On-Demand $ 0.250 $ 13,688
1 Year Reserved Instance $ 0.075 $ 8,794
3 Year Reserved Instance $ 0.050 $ 5,498
Expanding Amazon Redshift’s
Functionality
New Dense Storage Instance
DS2, based on EC2’s D2, has twice the memory and CPU as DW1
Migrate from DS1 to DS2 by restoring from snapshot. We will help you migrate
your RIs
•  Twice the memory and compute power of DW1
•  Enhanced networking and 1.5X gain in disk throughput
•  40% to 60% performance gain over DW1
•  Available in the two node types: XL (2TB) and 8XL (16TB)
Custom ODBC and JDBC Drivers
•  Up to 35% higher performance than open source drivers
•  Supported by Informatica, Microstrategy, Pentaho, Qlik, SAS, Tableau
•  Will continue to support PostgreSQL open source drivers
•  Download drivers from console
Explain Plan Visualization
User Defined Functions
•  We’re enabling User Defined Functions (UDFs) so
you can add your own
–  Scalar and Aggregate Functions supported
•  You’ll be able to write UDFs using Python 2.7
–  Syntax is largely identical to PostgreSQL UDF Syntax
–  System and network calls within UDFs are prohibited
•  Comes with Pandas, NumPy, and SciPy pre-
installed
–  You’ll also be able import your own libraries for even more
flexibility
Scalar UDF example – URL parsing
CREATE FUNCTION f_hostname (VARCHAR url)
  RETURNS varchar
IMMUTABLE AS $$
  import urlparse
  return urlparse.urlparse(url).hostname
$$ LANGUAGE plpythonu;
Interleaved Multi Column Sort
•  Currently support Compound Sort Keys
–  Optimized for applications that filter data by one leading
column
•  Adding support for Interleaved Sort Keys
–  Optimized for filtering data by up to eight columns
–  No storage overhead unlike an index
–  Lower maintenance penalty compared to indexes
Compound Sort Keys Illustrated
Records in Redshift are
stored in blocks.
For this illustration, let’s
assume that four records
fill a block
Records with a given
cust_id are all in one
block
However, records with a
given prod_id are
spread across four
blocks
1
1
1
1
2
3
4
1
4
4
4
2
3
4
4
1
3
3
3
2
3
4
3
1
2
2
2
2
3
4
2
1
1  [1,1] [1,2] [1,3] [1,4]
2  [2,1] [2,2] [2,3] [2,4]
3  [3,1] [3,2] [3,3] [3,4]
4  [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
cust_id prod_id other columns blocks
1  [1,1] [1,2] [1,3] [1,4]
2  [2,1] [2,2] [2,3] [2,4]
3  [3,1] [3,2] [3,3] [3,4]
4  [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
Interleaved Sort Keys Illustrated
Records with a given
cust_id are spread across
two blocks
Records with a given
prod_id are also spread
across two blocks
Data is sorted in equal
measures for both keys
1
1
2
2
2
1
2
3
3
4
4
4
3
4
3
1
3
4
4
2
1
2
3
3
1
2
2
4
3
4
1
1
cust_id prod_id other columns blocks
How to use the feature
•  New keyword ‘INTERLEAVED’ when defining sort keys
–  Existing syntax will still work and behavior is unchanged
–  You can choose up to 8 columns to include and can query with any or
all of them
•  No change needed to queries
•  Benefits are significant
[ SORTKEY [ COMPOUND | INTERLEAVED ] ( column_name [, ...] ) ]
Amazon Redshift
Spend time with your data, not your database….
Gaming analytics with Redshift and AWS
Former CTO
Mind Candy / Moshi Monsters (2006-2012)
Introductions
Co-founder / CTO
Space Ape Games (2012-Present)
Toby Moore
Space Ape Games
12+ Million downloads
300k DAU
Coming soon!
Our games
Early needs + Approach
•  Highly empowered, analytical team
•  We hit a wall with 3rd party analytics tools
•  Big data is table stakes in games industry
•  We needed absolute flexibility on future tooling
•  No large capex spend
Pre-
Data
Not
Enough
Too
much
Tactical Predictive
Basic data
ANALYSIS
REPORTING
Data
Capture
A/B
Tests
Insights & Learning
Amazon S3
Amazon Redshift
Amazon EMR
Tactical data
ANALYSIS
REPORTING
CRM
Data
Capture
A/B
Tests
Amazon S3
Amazon Redshift
Amazon EMR
DATA MINING
ANALYSIS
REPORTING
CRM
Data
Capture
A/B
Tests
Insights & Learning
Amazon S3
Amazon Redshift
Amazon EMR
Predictive data
MODELLING
•  146Billion Rows
•  2 clusters: 1x 8 node 1 x 16 node dw1.xlarge
•  13TBOf Compressed Data
•  250m rows x 125 columns Rows Per Day
Today
Per	
  user,	
  Daily	
  Summary	
  (Over	
  200	
  Metrics)	
  
Spend Tier
In Game
Behaviour
Monetisation
Device
Tenure
Balances
Single	
  Player	
  View	
  
Platform
Spend BehaviourDevice
Retention
CountryLanguage
Acquisition Channel
Game Balances
Operating System
Castle Level
Retention
Modeling	
  and	
  predic%on	
  
Best offer bundle
Spend tier
Churn risk
Life time value
Spend propensity
Price optimisation
Never had to worry about:
•  Scalability
•  Backing up
•  Availability
•  Upgrades
•  Flexibility (ODBC etc.)
•  Performance
Summary
•  Move towards more real-time processing
•  Investigate machine learning
•  AWS Mobile analytics auto-export to Redshift
Next Steps
Thanks!
LONDON

Weitere ähnliche Inhalte

Was ist angesagt?

Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
Getting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheGetting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheAmazon Web Services
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceAmazon Web Services
 
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018Amazon Web Services Korea
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisAmazon Web Services
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher Tamir Dresher
 
Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Amazon Web Services
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed InstanceJames Serra
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Web Services
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Web Services
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 

Was ist angesagt? (20)

Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Getting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheGetting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCache
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher
 
Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Amazon S3 Masterclass
Amazon S3 MasterclassAmazon S3 Masterclass
Amazon S3 Masterclass
 
AWS RDS
AWS RDSAWS RDS
AWS RDS
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 

Ähnlich wie Building Your Data Warehouse with Amazon Redshift

Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features Amazon Web Services
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Amazon Web Services
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksAmazon Web Services
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesAlexandra Sasha Blumenfeld
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftAmazon Web Services
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 

Ähnlich wie Building Your Data Warehouse with Amazon Redshift (20)

Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 Minutes
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Kürzlich hochgeladen (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Building Your Data Warehouse with Amazon Redshift

  • 1. ©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved Building Your Data Warehouse with Amazon Redshift Ian Meyers, AWS (meyersi@amazon.com) Guest Speaker: Toby Moore, Co-Founder & CTO, Space Ape Games (toby@spaceapegames.com)
  • 2. Data Warehouse - Challenges Cost Complexity Performance Rigidity 1990   2000   2010   2020   Enterprise  Data   Data  in  Warehouse  
  • 3. Petabyte scale; massively parallel Relational data warehouse Fully managed; zero admin SSD & HDD platforms As low as $1,000/TB/Year Amazon Redshift
  • 4. Redshift powers Clickstream Analytics for Amazon.com Web log analysis for Amazon.com Over one petabyte workload Largest table: 400TB 2TB of data per day Understand customer behavior Who is browsing but not buying Which products / features are winners What sequence led to higher customer conversion Solution Best scale out solution – Query across 1 week Hadoop – query across 1 month
  • 5. Redshift Performance Realized Performance Scan 15 months of data: 14 minutes 2.25 trillion rows Load one day worth of data: 10 minutes 5 billion rows Backfill one month of data: 9.75 hours 150 billion rows Pig à Amazon Redshift: 2 days to 1 hr 10B row join with 700M rows Oracle à Amazon Redshift: 90 hours to 8 hrs Reduced number of SQLs by a factor of 3 Cost 2PB cluster 100 node dw1.8xl (3yr RI) $180/hr Complexity 20% time of one DBA Backup Restore Resizing
  • 6. 0   4   8   12   16   20   24   28   32   128     Nodes   16     Nodes    2     Nodes   Dura%on  (Minutes)   Time  to  Deploy  and  Manage  a  Cluster   Time  spent     on  clicks   Deploy     Connect   Backup   Restore   Resize     (2  to  16  nodes)   Simplicity
  • 7. Who uses Amazon Redshift?
  • 8. Common Customer Use Cases •  Reduce costs by extending DW rather than adding HW •  Migrate completely from existing DW systems •  Respond faster to business •  Improve performance by an order of magnitude •  Make more data available for analysis •  Access business data via standard reporting tools •  Add analytic functionality to applications •  Scale DW capacity as demand grows •  Reduce HW & SW costs by an order of magnitude Traditional Enterprise DW Companies with Big Data SaaS Companies
  • 11. Amazon Redshift Architecture •  Leader Node –  SQL endpoint –  Stores metadata –  Coordinates query execution •  Compute Nodes –  Local, columnar storage –  Execute queries in parallel –  Load, backup, restore via Amazon S3; load from Amazon DynamoDB or SSH •  Two hardware platforms –  Optimized for data processing –  DW1: HDD; scale from 2TB to 1.6PB –  DW2: SSD; scale from 160GB to 256TB 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node
  • 12. Amazon Redshift dramatically reduces I/O •  Data compression •  Zone maps •  Direct-attached storage •  Large data block sizes ID   Age   State   Amount   123   20   CA   500   345   25   WA   250   678   40   FL   125   957   37   WA   375  
  • 13. Amazon Redshift dramatically reduces I/O •  Data compression •  Zone maps •  Direct-attached storage •  Large data block sizes ID   Age   State   Amount   123   20   CA   500   345   25   WA   250   678   40   FL   125   957   37   WA   375  
  • 14. Amazon Redshift dramatically reduces I/O •  Column storage •  Data compression •  Zone maps •  Direct-attached storage •  Large data block sizes analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw
  • 15. Amazon Redshift dramatically reduces I/O •  Column storage •  Data compression •  Direct-attached storage •  Large data block sizes •  Track of the minimum and maximum value for each block •  Skip over blocks that don’t contain the data needed for a given query •  Minimize unnecessary I/O
  • 16. Amazon Redshift dramatically reduces I/O •  Column storage •  Data compression •  Zone maps •  Direct-attached storage •  Large data block sizes •  Use direct-attached storage to maximize throughput •  Hardware optimized for high performance data processing •  Large block sizes to make the most of each read •  Amazon Redshift manages durability for you
  • 17. Amazon Redshift has security built-in •  SSL to secure data in transit •  Encryption to secure data at rest –  AES-256; hardware accelerated –  All blocks on disks and in Amazon S3 encrypted –  HSM Support •  No direct access to compute nodes •  Audit logging & AWS CloudTrail integration •  Amazon VPC support •  SOC 1/2/3, PCI-DSS Level 1, FedRAMP, others 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Amazon S3 / Amazon DynamoDB Customer VPC Internal VPC JDBC/ODBC Leader Node Compute Node Compute Node Compute Node
  • 18. Amazon Redshift is 1/10th the Price of a Traditional Data Warehouse DW1 (HDD) Price Per Hour for DW1.XL Single Node Effective Annual Price per TB On-Demand $ 0.850 $ 3,723 1 Year Reserved Instance $ 0.215 $ 2,192 3 Year Reserved Instance $ 0.114 $ 999 DW2 (SSD) Price Per Hour for DW2.L Single Node Effective Annual Price per TB On-Demand $ 0.250 $ 13,688 1 Year Reserved Instance $ 0.075 $ 8,794 3 Year Reserved Instance $ 0.050 $ 5,498
  • 20. New Dense Storage Instance DS2, based on EC2’s D2, has twice the memory and CPU as DW1 Migrate from DS1 to DS2 by restoring from snapshot. We will help you migrate your RIs •  Twice the memory and compute power of DW1 •  Enhanced networking and 1.5X gain in disk throughput •  40% to 60% performance gain over DW1 •  Available in the two node types: XL (2TB) and 8XL (16TB)
  • 21. Custom ODBC and JDBC Drivers •  Up to 35% higher performance than open source drivers •  Supported by Informatica, Microstrategy, Pentaho, Qlik, SAS, Tableau •  Will continue to support PostgreSQL open source drivers •  Download drivers from console
  • 23. User Defined Functions •  We’re enabling User Defined Functions (UDFs) so you can add your own –  Scalar and Aggregate Functions supported •  You’ll be able to write UDFs using Python 2.7 –  Syntax is largely identical to PostgreSQL UDF Syntax –  System and network calls within UDFs are prohibited •  Comes with Pandas, NumPy, and SciPy pre- installed –  You’ll also be able import your own libraries for even more flexibility
  • 24. Scalar UDF example – URL parsing CREATE FUNCTION f_hostname (VARCHAR url)   RETURNS varchar IMMUTABLE AS $$   import urlparse   return urlparse.urlparse(url).hostname $$ LANGUAGE plpythonu;
  • 25. Interleaved Multi Column Sort •  Currently support Compound Sort Keys –  Optimized for applications that filter data by one leading column •  Adding support for Interleaved Sort Keys –  Optimized for filtering data by up to eight columns –  No storage overhead unlike an index –  Lower maintenance penalty compared to indexes
  • 26. Compound Sort Keys Illustrated Records in Redshift are stored in blocks. For this illustration, let’s assume that four records fill a block Records with a given cust_id are all in one block However, records with a given prod_id are spread across four blocks 1 1 1 1 2 3 4 1 4 4 4 2 3 4 4 1 3 3 3 2 3 4 3 1 2 2 2 2 3 4 2 1 1  [1,1] [1,2] [1,3] [1,4] 2  [2,1] [2,2] [2,3] [2,4] 3  [3,1] [3,2] [3,3] [3,4] 4  [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id cust_id prod_id other columns blocks
  • 27. 1  [1,1] [1,2] [1,3] [1,4] 2  [2,1] [2,2] [2,3] [2,4] 3  [3,1] [3,2] [3,3] [3,4] 4  [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id Interleaved Sort Keys Illustrated Records with a given cust_id are spread across two blocks Records with a given prod_id are also spread across two blocks Data is sorted in equal measures for both keys 1 1 2 2 2 1 2 3 3 4 4 4 3 4 3 1 3 4 4 2 1 2 3 3 1 2 2 4 3 4 1 1 cust_id prod_id other columns blocks
  • 28. How to use the feature •  New keyword ‘INTERLEAVED’ when defining sort keys –  Existing syntax will still work and behavior is unchanged –  You can choose up to 8 columns to include and can query with any or all of them •  No change needed to queries •  Benefits are significant [ SORTKEY [ COMPOUND | INTERLEAVED ] ( column_name [, ...] ) ]
  • 29. Amazon Redshift Spend time with your data, not your database….
  • 30. Gaming analytics with Redshift and AWS
  • 31. Former CTO Mind Candy / Moshi Monsters (2006-2012) Introductions Co-founder / CTO Space Ape Games (2012-Present) Toby Moore
  • 33. 12+ Million downloads 300k DAU Coming soon! Our games
  • 34. Early needs + Approach •  Highly empowered, analytical team •  We hit a wall with 3rd party analytics tools •  Big data is table stakes in games industry •  We needed absolute flexibility on future tooling •  No large capex spend Pre- Data Not Enough Too much Tactical Predictive
  • 35. Basic data ANALYSIS REPORTING Data Capture A/B Tests Insights & Learning Amazon S3 Amazon Redshift Amazon EMR
  • 37. DATA MINING ANALYSIS REPORTING CRM Data Capture A/B Tests Insights & Learning Amazon S3 Amazon Redshift Amazon EMR Predictive data MODELLING
  • 38. •  146Billion Rows •  2 clusters: 1x 8 node 1 x 16 node dw1.xlarge •  13TBOf Compressed Data •  250m rows x 125 columns Rows Per Day Today
  • 39. Per  user,  Daily  Summary  (Over  200  Metrics)   Spend Tier In Game Behaviour Monetisation Device Tenure Balances
  • 40. Single  Player  View   Platform Spend BehaviourDevice Retention CountryLanguage Acquisition Channel Game Balances Operating System Castle Level Retention
  • 41. Modeling  and  predic%on   Best offer bundle Spend tier Churn risk Life time value Spend propensity Price optimisation
  • 42. Never had to worry about: •  Scalability •  Backing up •  Availability •  Upgrades •  Flexibility (ODBC etc.) •  Performance Summary •  Move towards more real-time processing •  Investigate machine learning •  AWS Mobile analytics auto-export to Redshift Next Steps