SlideShare a Scribd company logo
1 of 29
Amazon Redshift
1
Author: Douglas Bernardini
What is Redshift?
• Cloud-Hosted data warehouse services: AWS
• Massive parallel processing (MPP)
• Analytics workloads on large scale datasets
• Stored by a column-oriented DBMS principle.
• Large scale datasets. Up petabytes
2
Features and Benefits
• Columnar storage
• Parallelizing queries
• Multiple nodes
• Custom JDBC and ODBC drivers
• Ready integraded:
• Amazon S3;
• Amazon DynamoDB;
• Amazon Elastic MapReduce;
• Amazon Kinesis
• Any SSH-enabled host.
• Fault Tolerant
• Automated Backups
• Fast Restores
• Secure:
• Encryption
• Network Isolation
• Audit and Compliance
• SQL friendly
3
MarketPlace
BI Tools
• Actian
• Actuate Corporation
• Birst
• Chartio
• ClearStory Data
• Dundas Data Visualization
• Infor
• Jaspersoft
• Jreport
• Logi Analytics
• Looker (Software)
• MicroStrategy
• Pentaho
• Periscope.io
Data Integrations Tools
• Attunity
• FlyData
• Informatica
• SnapLogic
• Talend
• Xplenty
4
• Qlik
• Redrock BI
• SAS (software)
• SiSense
• Spotfire
• Tableau Software
Data Load
5
DynamoDB Integration
6
DynamoDB Integration
7
Business Case
8
Data growing fast!
9
• Enterprise Data is growing at an exponential
rate
• Structured and Unstructured data
• Data requirements change rapidly
• Cost to maintain data is prohibitive
• Hardware not scalable
• Expensive to support
• Business agility suffers
• Reporting unable to change with the pace
of business
• Data silos create bottlenecks
Solution Proposal
10
• Leverage the flexibility of
Amazon Web Services
• Scalable
• Flexible
• Cost-Effective
• AWS Redshift
• Data Warehouse
• AWS S3
• Persistent Storage
• AWS Data Pipeline
• Data Orchestration and ETL
• AWS EC2 / MySQL
• Transaction Processing
• Qlik Sense Desktop
• Business Intelligence Reporting
AWS Redshift
11
Petabyte-Scale Data Warehouse
• Optimized for DW
• Columnar Storage
• Data Compression
• Zone Maps to reduce I/O
• Scalable
• Easily change # of Nodes
• 1-32 node configurations
• Cost-Efficient
• On-Demand pricing starts @ $.25/hr.
• Run as low as $1,000 per TB/yr.
AWS Redshift
12
Petabyte-Scale Data Warehouse
• Get Started in Minutes
• Web Console
• CLI
• Full Managed
• Fault Tolerant
• Automated Backups / Fast Restores
• Encryption
• Data at Rest – AES-256
• Can manage own keys
• Compatible
• SQL
• Data Integrations
AWS Simple Storage Service (S3)
13
Online File/Object Storage
• Durable
• Data redundantly stored across
multiple facilities/devices
• Available
• 99.99% availability
• Choose from different AWS regions
• Secure
• SSL – Data Transfer
• At Rest – Auto-Encrypted
• Scalable
• Flexible capacity based on data
demands
• Low Cost
• Pay for what you use
AWS Simple Storage Service (S3)
14
Reliable Simple
Scalable Low Cost
• Distributed Infrastructure
ensures activity completion
• Integrated with SNS for event
notifications
Data Processing and Transfer Platform
• Drag-and drop console
• Pre-built templates for other
AWS services
• Visual Pipeline editor
• Dispatch work to one machine
or many
• Serial and/or Parallel
processing
• Charged per Pipeline
• Frequency
• Volume
AWS Elastic Compute Cloud (EC2) + MySQL
15
Cloud Infrastructure for Applications & Development
• Flexible
• Linux and Windows virtual machines
• Supports multiple instance types, software packages, resource configs
• Elastic
• Increase/Decrease capacity within minutes
• Commission any number of server instances simultaneously
• Secure
• Security Groups / Network ACLs
• VPC / VPN
• Low Cost
• On-Demand / Reserved / Spot Instance options
Qlik Sense Desktop
16
Data Visualization / BI Tool
• Drag-and-drop Visualizations
• Smart Search
• Explore Multiple data sources in
single dashboard/report
• Access analytics on multiple device
types
• Collaborate and share insights within
reports
• Enables self-service simplicity
Architecture
17
Demo
18
Tech Demo
19
• During this demonstration, we will discuss the setup and execution of using Amazon Redshift as an on-
demand, cloud-based, data warehouse solution.
• Our sample data comes from the “Million Song Dataset” available from Columbia University -
http://labrosa.ee.columbia.edu/millionsong/
• The BI Tool that is used to create a business-focused dashboard is Qlik Sense Desktop, a Windows-
based desktop application - http://www.qlik.com/us/explore/products/sense
• In addition, the following services in the Amazon Web Services stack are used: Amazon Redshift,
Amazon S3, Pipeline, and EC2 (Linux AMI running MySQL serves as a transactional database for the
demo).
Demo Steps
1. Create new Linux AMI that will host
MySQL for transaction data processing.
• Start new Linux instance and update security groups
for MySQL accessibility
• Install MySQL
• Create new MySQL users, database, and populate
with demonstration dataset (using MySQL
Workbench)
2. Create new S3 bucket for Pipeline ETL
processes
3. Create Redshift Cluster (data warehouse)
• Instantiate cluster
• Connect using SQL Workbench (via JDBC)
• Create initial data table
4. Create AWS Pipeline(s) for data processing
• MySQL -> S3
• Activate Pipeline for initial ETL from MySQL to S3
• S3 -> Redshift
• Activate Pipeline for initial ETL from S3 to Redshift
5. Install Qlik Sense Desktop
• Install Redshift ODBC Drivers locally on desktop
• Create Qlik Sense “Report” (Included in FP
submission for simplicity). Verify initial data in
report.
6. Solution Demonstration
(Using Amazon CLI – Command Line Interface)
• Simulate transactional data load in MySQL
• Verify new data (record count) in MySQL using
MySQL Workbench
• Delete initial data in S3 bucket (from Round 1)
• Trigger AWS Pipeline that loads data to S3 from
MySQL
• Verify data load (CSV file) in S3 bucket
• Trigger AWS Pipeline that loads data to Redshift
from S3.
• Verify data load in Redshift (using SQL Workbench)
• Refresh Qlik report to view analytics of initial data
load.
20
Linux AMI hosts MySQL
21
Redshift Cluster
22
Pipes
23
QlikSense Desktop
24
Add New data into MySQL
25
Insert songs_data
Count (*)
Checking Redshift
26
Select count (*) from song_data
Qlik Update
27
Results
28
• Amazon Web Services provides a powerful
platform to extend on-premise Infrastructure to
the cloud
• Enables massive data consolidation
• Efficient ETL orchestration & workflow
• Simplifies resource management and drives
down computing costs across multiple
services
• Changing needs of Business Executives can be
made quickly and efficiently
• AWS supports industry standard data
source connections
• Existing Reporting/Dashboards can
consume AWS Redshift data with no code
changes
douglas.bernardini@d2-data.com
Questions?
29

More Related Content

What's hot

(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014Amazon Web Services
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysisAmazon Web Services
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Amazon Web Services
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsAmazon Web Services
 
Scaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsScaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsYelp Engineering
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataQubole
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Amazon Web Services
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAmazon Web Services
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Amazon Web Services
 
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기Amazon Web Services Korea
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsAmazon Web Services
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
 
ENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudAmazon Web Services
 
Modernize Legacy and Enterprise Application Through Implementation of Cloud N...
Modernize Legacy and Enterprise Application Through Implementation of Cloud N...Modernize Legacy and Enterprise Application Through Implementation of Cloud N...
Modernize Legacy and Enterprise Application Through Implementation of Cloud N...Amazon Web Services
 

What's hot (20)

Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
Scaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsScaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique Visitors
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
ENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the Cloud
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
 
AWS Real-Time Event Processing
AWS Real-Time Event ProcessingAWS Real-Time Event Processing
AWS Real-Time Event Processing
 
Modernize Legacy and Enterprise Application Through Implementation of Cloud N...
Modernize Legacy and Enterprise Application Through Implementation of Cloud N...Modernize Legacy and Enterprise Application Through Implementation of Cloud N...
Modernize Legacy and Enterprise Application Through Implementation of Cloud N...
 

Similar to AWS Redshift Data Warehouse

AWS 201 - A Walk through the AWS Cloud: What's New with AWS
AWS 201 - A Walk through the AWS Cloud: What's New with AWSAWS 201 - A Walk through the AWS Cloud: What's New with AWS
AWS 201 - A Walk through the AWS Cloud: What's New with AWSAmazon Web Services
 
Amazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs KubernetesAmazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs KubernetesStridely Solutions
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...SnapLogic
 
Cloud Computing - Challenges & Opportunities
Cloud Computing - Challenges & OpportunitiesCloud Computing - Challenges & Opportunities
Cloud Computing - Challenges & OpportunitiesOwen Cutajar
 
AWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity CouchsurfingAWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity CouchsurfingAmazon Web Services
 
Microservices Manchester: Serverless Architectures By Rafal Gancarz
Microservices Manchester: Serverless Architectures By Rafal GancarzMicroservices Manchester: Serverless Architectures By Rafal Gancarz
Microservices Manchester: Serverless Architectures By Rafal GancarzOpenCredo
 
SAP on Amazon web services
SAP on Amazon web servicesSAP on Amazon web services
SAP on Amazon web servicescloudnonstop
 
Grails in the Cloud (2013)
Grails in the Cloud (2013)Grails in the Cloud (2013)
Grails in the Cloud (2013)Meni Lubetkin
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAmazon Web Services
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWSPaolo latella
 
Architecting for AWS Cloud - let's do it right!
Architecting for AWS Cloud - let's do it right!Architecting for AWS Cloud - let's do it right!
Architecting for AWS Cloud - let's do it right!Misha Hanin
 
AWS Webcast - Library Systems on the AWS Cloud
AWS Webcast - Library Systems on the AWS CloudAWS Webcast - Library Systems on the AWS Cloud
AWS Webcast - Library Systems on the AWS CloudAmazon Web Services
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...Amazon Web Services
 
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...Amazon Web Services
 

Similar to AWS Redshift Data Warehouse (20)

AWS 201 - A Walk through the AWS Cloud: What's New with AWS
AWS 201 - A Walk through the AWS Cloud: What's New with AWSAWS 201 - A Walk through the AWS Cloud: What's New with AWS
AWS 201 - A Walk through the AWS Cloud: What's New with AWS
 
Amazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs KubernetesAmazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs Kubernetes
 
Big data on aws
Big data on awsBig data on aws
Big data on aws
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
Cloud Computing - Challenges & Opportunities
Cloud Computing - Challenges & OpportunitiesCloud Computing - Challenges & Opportunities
Cloud Computing - Challenges & Opportunities
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
AWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity CouchsurfingAWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity Couchsurfing
 
Microservices Manchester: Serverless Architectures By Rafal Gancarz
Microservices Manchester: Serverless Architectures By Rafal GancarzMicroservices Manchester: Serverless Architectures By Rafal Gancarz
Microservices Manchester: Serverless Architectures By Rafal Gancarz
 
SAP on Amazon web services
SAP on Amazon web servicesSAP on Amazon web services
SAP on Amazon web services
 
Grails in the Cloud (2013)
Grails in the Cloud (2013)Grails in the Cloud (2013)
Grails in the Cloud (2013)
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the Cloud
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
 
Débuter sur le cloud AWS
Débuter sur le cloud AWSDébuter sur le cloud AWS
Débuter sur le cloud AWS
 
CMS on AWS Deep Dive
CMS on AWS Deep DiveCMS on AWS Deep Dive
CMS on AWS Deep Dive
 
[Jun AWS 201] Technical Workshop
[Jun AWS 201] Technical Workshop[Jun AWS 201] Technical Workshop
[Jun AWS 201] Technical Workshop
 
Architecting for AWS Cloud - let's do it right!
Architecting for AWS Cloud - let's do it right!Architecting for AWS Cloud - let's do it right!
Architecting for AWS Cloud - let's do it right!
 
AWS Webcast - Library Systems on the AWS Cloud
AWS Webcast - Library Systems on the AWS CloudAWS Webcast - Library Systems on the AWS Cloud
AWS Webcast - Library Systems on the AWS Cloud
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
 
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
 

More from Douglas Bernardini

Top reasons to choose SAP hana
Top reasons to choose SAP hanaTop reasons to choose SAP hana
Top reasons to choose SAP hanaDouglas Bernardini
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedDouglas Bernardini
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRDouglas Bernardini
 
Finance month closing with HANA
Finance month closing with HANAFinance month closing with HANA
Finance month closing with HANADouglas Bernardini
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
SAP Business Objects - Lopes Supermarket
SAP   Business Objects - Lopes SupermarketSAP   Business Objects - Lopes Supermarket
SAP Business Objects - Lopes SupermarketDouglas Bernardini
 
SAP - Business Objects - Ri happy
SAP - Business Objects - Ri happySAP - Business Objects - Ri happy
SAP - Business Objects - Ri happyDouglas Bernardini
 
Retail: Big data e Omni-Channel
Retail: Big data e Omni-ChannelRetail: Big data e Omni-Channel
Retail: Big data e Omni-ChannelDouglas Bernardini
 
Granular Access Control Using Cell Level Security In Accumulo
Granular Access Control  Using Cell Level Security  In Accumulo             Granular Access Control  Using Cell Level Security  In Accumulo
Granular Access Control Using Cell Level Security In Accumulo Douglas Bernardini
 
Proposta aderencia drogaria onofre
Proposta aderencia   drogaria onofreProposta aderencia   drogaria onofre
Proposta aderencia drogaria onofreDouglas Bernardini
 

More from Douglas Bernardini (20)

Top reasons to choose SAP hana
Top reasons to choose SAP hanaTop reasons to choose SAP hana
Top reasons to choose SAP hana
 
The REAL face of Big Data
The REAL face of Big DataThe REAL face of Big Data
The REAL face of Big Data
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
 
SAP HORTONWORKS
SAP HORTONWORKSSAP HORTONWORKS
SAP HORTONWORKS
 
R-language
R-languageR-language
R-language
 
Splunk
SplunkSplunk
Splunk
 
Finance month closing with HANA
Finance month closing with HANAFinance month closing with HANA
Finance month closing with HANA
 
RDBMS x NoSQL
RDBMS x NoSQLRDBMS x NoSQL
RDBMS x NoSQL
 
SAP - SOLUTION MANAGER
SAP - SOLUTION MANAGER SAP - SOLUTION MANAGER
SAP - SOLUTION MANAGER
 
MS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTUREMS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTURE
 
DBA oracle
DBA oracleDBA oracle
DBA oracle
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
SAP Business Objects - Lopes Supermarket
SAP   Business Objects - Lopes SupermarketSAP   Business Objects - Lopes Supermarket
SAP Business Objects - Lopes Supermarket
 
SAP - Business Objects - Ri happy
SAP - Business Objects - Ri happySAP - Business Objects - Ri happy
SAP - Business Objects - Ri happy
 
Hadoop on retail
Hadoop on retailHadoop on retail
Hadoop on retail
 
Retail: Big data e Omni-Channel
Retail: Big data e Omni-ChannelRetail: Big data e Omni-Channel
Retail: Big data e Omni-Channel
 
Granular Access Control Using Cell Level Security In Accumulo
Granular Access Control  Using Cell Level Security  In Accumulo             Granular Access Control  Using Cell Level Security  In Accumulo
Granular Access Control Using Cell Level Security In Accumulo
 
Proposta aderencia drogaria onofre
Proposta aderencia   drogaria onofreProposta aderencia   drogaria onofre
Proposta aderencia drogaria onofre
 
SAP-Solution-Manager
SAP-Solution-ManagerSAP-Solution-Manager
SAP-Solution-Manager
 

Recently uploaded

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 

Recently uploaded (20)

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 

AWS Redshift Data Warehouse

  • 2. What is Redshift? • Cloud-Hosted data warehouse services: AWS • Massive parallel processing (MPP) • Analytics workloads on large scale datasets • Stored by a column-oriented DBMS principle. • Large scale datasets. Up petabytes 2
  • 3. Features and Benefits • Columnar storage • Parallelizing queries • Multiple nodes • Custom JDBC and ODBC drivers • Ready integraded: • Amazon S3; • Amazon DynamoDB; • Amazon Elastic MapReduce; • Amazon Kinesis • Any SSH-enabled host. • Fault Tolerant • Automated Backups • Fast Restores • Secure: • Encryption • Network Isolation • Audit and Compliance • SQL friendly 3
  • 4. MarketPlace BI Tools • Actian • Actuate Corporation • Birst • Chartio • ClearStory Data • Dundas Data Visualization • Infor • Jaspersoft • Jreport • Logi Analytics • Looker (Software) • MicroStrategy • Pentaho • Periscope.io Data Integrations Tools • Attunity • FlyData • Informatica • SnapLogic • Talend • Xplenty 4 • Qlik • Redrock BI • SAS (software) • SiSense • Spotfire • Tableau Software
  • 9. Data growing fast! 9 • Enterprise Data is growing at an exponential rate • Structured and Unstructured data • Data requirements change rapidly • Cost to maintain data is prohibitive • Hardware not scalable • Expensive to support • Business agility suffers • Reporting unable to change with the pace of business • Data silos create bottlenecks
  • 10. Solution Proposal 10 • Leverage the flexibility of Amazon Web Services • Scalable • Flexible • Cost-Effective • AWS Redshift • Data Warehouse • AWS S3 • Persistent Storage • AWS Data Pipeline • Data Orchestration and ETL • AWS EC2 / MySQL • Transaction Processing • Qlik Sense Desktop • Business Intelligence Reporting
  • 11. AWS Redshift 11 Petabyte-Scale Data Warehouse • Optimized for DW • Columnar Storage • Data Compression • Zone Maps to reduce I/O • Scalable • Easily change # of Nodes • 1-32 node configurations • Cost-Efficient • On-Demand pricing starts @ $.25/hr. • Run as low as $1,000 per TB/yr.
  • 12. AWS Redshift 12 Petabyte-Scale Data Warehouse • Get Started in Minutes • Web Console • CLI • Full Managed • Fault Tolerant • Automated Backups / Fast Restores • Encryption • Data at Rest – AES-256 • Can manage own keys • Compatible • SQL • Data Integrations
  • 13. AWS Simple Storage Service (S3) 13 Online File/Object Storage • Durable • Data redundantly stored across multiple facilities/devices • Available • 99.99% availability • Choose from different AWS regions • Secure • SSL – Data Transfer • At Rest – Auto-Encrypted • Scalable • Flexible capacity based on data demands • Low Cost • Pay for what you use
  • 14. AWS Simple Storage Service (S3) 14 Reliable Simple Scalable Low Cost • Distributed Infrastructure ensures activity completion • Integrated with SNS for event notifications Data Processing and Transfer Platform • Drag-and drop console • Pre-built templates for other AWS services • Visual Pipeline editor • Dispatch work to one machine or many • Serial and/or Parallel processing • Charged per Pipeline • Frequency • Volume
  • 15. AWS Elastic Compute Cloud (EC2) + MySQL 15 Cloud Infrastructure for Applications & Development • Flexible • Linux and Windows virtual machines • Supports multiple instance types, software packages, resource configs • Elastic • Increase/Decrease capacity within minutes • Commission any number of server instances simultaneously • Secure • Security Groups / Network ACLs • VPC / VPN • Low Cost • On-Demand / Reserved / Spot Instance options
  • 16. Qlik Sense Desktop 16 Data Visualization / BI Tool • Drag-and-drop Visualizations • Smart Search • Explore Multiple data sources in single dashboard/report • Access analytics on multiple device types • Collaborate and share insights within reports • Enables self-service simplicity
  • 19. Tech Demo 19 • During this demonstration, we will discuss the setup and execution of using Amazon Redshift as an on- demand, cloud-based, data warehouse solution. • Our sample data comes from the “Million Song Dataset” available from Columbia University - http://labrosa.ee.columbia.edu/millionsong/ • The BI Tool that is used to create a business-focused dashboard is Qlik Sense Desktop, a Windows- based desktop application - http://www.qlik.com/us/explore/products/sense • In addition, the following services in the Amazon Web Services stack are used: Amazon Redshift, Amazon S3, Pipeline, and EC2 (Linux AMI running MySQL serves as a transactional database for the demo).
  • 20. Demo Steps 1. Create new Linux AMI that will host MySQL for transaction data processing. • Start new Linux instance and update security groups for MySQL accessibility • Install MySQL • Create new MySQL users, database, and populate with demonstration dataset (using MySQL Workbench) 2. Create new S3 bucket for Pipeline ETL processes 3. Create Redshift Cluster (data warehouse) • Instantiate cluster • Connect using SQL Workbench (via JDBC) • Create initial data table 4. Create AWS Pipeline(s) for data processing • MySQL -> S3 • Activate Pipeline for initial ETL from MySQL to S3 • S3 -> Redshift • Activate Pipeline for initial ETL from S3 to Redshift 5. Install Qlik Sense Desktop • Install Redshift ODBC Drivers locally on desktop • Create Qlik Sense “Report” (Included in FP submission for simplicity). Verify initial data in report. 6. Solution Demonstration (Using Amazon CLI – Command Line Interface) • Simulate transactional data load in MySQL • Verify new data (record count) in MySQL using MySQL Workbench • Delete initial data in S3 bucket (from Round 1) • Trigger AWS Pipeline that loads data to S3 from MySQL • Verify data load (CSV file) in S3 bucket • Trigger AWS Pipeline that loads data to Redshift from S3. • Verify data load in Redshift (using SQL Workbench) • Refresh Qlik report to view analytics of initial data load. 20
  • 21. Linux AMI hosts MySQL 21
  • 25. Add New data into MySQL 25 Insert songs_data Count (*)
  • 26. Checking Redshift 26 Select count (*) from song_data
  • 28. Results 28 • Amazon Web Services provides a powerful platform to extend on-premise Infrastructure to the cloud • Enables massive data consolidation • Efficient ETL orchestration & workflow • Simplifies resource management and drives down computing costs across multiple services • Changing needs of Business Executives can be made quickly and efficiently • AWS supports industry standard data source connections • Existing Reporting/Dashboards can consume AWS Redshift data with no code changes