Migrating on-premises data operations to the cloud can have tremendous benefits. However traditional data integration tools are not always able to handle extremely large data sets, the need for new data pipelines, and complex data transformations. Matillion ETL for Redshift, a fast, modern ETL/ELT tool, helps businesses migrate to the cloud quickly and easily by orchestrating data transfers into Amazon Redshift. Join us to learn how GE’s Power and Water Division used Matillion ETL for Redshift to extract, load, and transform billions of rows of data and implement their big data solution in record time.
Join us to learn:
• Find out what distinguishes Matillion’s approach to data transformation from other ETL/ELT tools
• Learn how to load and transform data quickly on Amazon Redshift
• Learn best practices for deploying a big data solution gleaned from GE’s experience with AWS and Matillion.
Who should attend:
VPs of Engineering, VPs of Development, Business Development Directors, Senior Development Managers, Senior Architects, Data Architecture, Data Integration and Data Scientists
2. Today’s Presenters
Brandon Chavis, Solutions Architect, Amazon Web Services
Matthew Scullion, Matillion
Ryan Oattes, Enterprise Architect, GE Water
Free Trial Available in AWS
Marketplace
3. Today’s Agenda
1. An overview of AWS and AWS Marketplace, with an emphasis on
Amazon Redshift
2. The GE Water success story with AWS and Matillion
3. Matillion ETL for Redshift – product demo
4. Q&A/Discussion
4. Relational data warehouse
Massively parallel; petabyte scale
Fully managed
HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
5. The Amazon Redshift view of data warehousing
10x cheaper
Easy to provision
Higher DBA productivity
10x faster
No programming
Easily leverage BI tools,
Hadoop, machine learning,
streaming
Analysis inline with process
flows
Pay as you go, grow as you
need
Managed availability and
disaster recovery
Enterprise Big data SaaS
10. Benefit #1: Amazon Redshift is fast
Hardware optimized for I/O intensive workloads, 4 GB/sec/node
Enhanced networking, over 1 million packets/sec/node
Choice of storage type, instance size
Regular cadence of autopatched improvements
11. Benefit #2: Amazon Redshift is inexpensive
Ds2 (HDD)
Price per hour for
DW1.XL single node
Effective annual
price per TB compressed
On demand $ 0.850 $ 3,725
1-year reservation $ 0.500 $ 2,190
3-year reservation $ 0.228 $ 999
Dc1 (SSD)
Price per hour for
DW2.L single node
Effective annual
price per TB compressed
On demand $ 0.250 $ 13,690
1-year reservation $ 0.161 $ 8,795
3-year reservation $ 0.100 $ 5,500
Pricing is simple
Number of nodes x price/hour
No charge for leader node
No upfront costs
Pay as you go
12. Benefit #3: Amazon Redshift is fully managed
Continuous/incremental backups
Multiple copies within cluster
Continuous and incremental backups
to Amazon S3
Continuous and incremental backups
across regions
Streaming restore
Amazon S3
Amazon S3
Region 1
Region 2
13. Benefit #3: Amazon Redshift is fully managed
Amazon S3
Amazon S3
Region 1
Region 2
Fault tolerance
Disk failures
Node failures
Network failures
Availability Zone/region level disasters
14. Benefit #4: Security is built in
• Load encrypted from Amazon S3
• SSL to secure data in transit
• ECDHE perfect forward security
• Amazon VPC for network isolation
• Encryption to secure data at rest
– All blocks on disks and in Amazon S3 encrypted
– Block key, cluster key, master key (AES-256)
– On-premises HSM and AWS CloudHSM support
• Audit logging and AWS CloudTrail integration
• SOC 1/2/3, PCI-DSS, FedRAMP, BAA
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal
VPC
JDBC/ODBC
15. Benefit #5: We innovate quickly
Well over 125 new features added since launch
Release every two weeks
Automatic patching
Service Launch (2/14)
PDX (4/2)
Temp Credentials (4/11)
DUB (4/25)
SOC1/2/3 (5/8)
Unload Encrypted Files
NRT (6/5)
JDBC Fetch Size (6/27)
Unload logs (7/5)
SHA1 Builtin (7/15)
4 byte UTF-8 (7/18)
Sharing snapshots (7/18)
Statement Timeout (7/22)
Timezone, Epoch, Autoformat (7/25)
WLM Timeout/Wildcards (8/1)
CRC32 Builtin, CSV, Restore Progress
(8/9)
Resource Level IAM (8/9)
PCI (8/22)
UTF-8 Substitution (8/29)
JSON, Regex, Cursors (9/10)
Split_part, Audit tables (10/3)
SIN/SYD (10/8)
HSM Support (11/11)
Kinesis EMR/HDFS/SSH copy,
Distributed Tables, Audit
Logging/CloudTrail, Concurrency, Resize
Perf., Approximate Count Distinct, SNS
Alerts, Cross Region Backup (11/13)
Distributed Tables, Single Node Cursor
Support, Maximum Connections to 500
(12/13)
EIP Support for VPC Clusters (12/28)
New query monitoring system tables and
diststyle all (1/13)
Redshift on DW2 (SSD) Nodes (1/23)
Compression for COPY from SSH, Fetch
size support for single node clusters, new
system tables with commit stats,
row_number(), strotol() and query
termination (2/13)
Resize progress indicator & Cluster
Version (3/21)
Regex_Substr, COPY from JSON (3/25)
50 slots, COPY from EMR, ECDHE
ciphers (4/22)
3 new regex features, Unload to single
file, FedRAMP(5/6)
Rename Cluster (6/2)
Copy from multiple regions,
percentile_cont, percentile_disc (6/30)
Free Trial (7/1)
pg_last_unload_count (9/15)
AES-128 S3 encryption (9/29)
UTF-16 support (9/29)
16. Matillion ETL for Redshift and AWS
With Matillion ETL for Redshift on AWS
Marketplace, customers enjoy a range of benefits
building data warehouses on Amazon Redshift
quickly and simply:
ELT architecture: Using the power of the
Redshift cluster for transformation. Fast and
always in touch with the data
Intuitive and productive: Cloud 1st and
designed for AWS and Amazon Redshift. Makes
leveraging Amazon Redshift simple and
productive
Launched via AWS Marketplace: simple to
install and scale, secure, integrated utility billing
17. • General Electric has been in business for over 140 years,
investing $5.4B annually in R&D (6% of Revenue)
• Augmenting our Operational Technology depth with Digital.
• Focus on Industrial Internet of Things, and creating insights
based on business and machine data.
• Knew we needed best in class partners to let us focus on
what we do best; GE migrating 9000 workloads into AWS.
18. Our Challenge: Raise the bar on Data Warehouse scalability,
integration, stability and development velocity.
• Needed scalability for machine and business data, as GE
increasingly digitizes.
• Self-serve BI strategy meant we had to maintain our current
compute capabilities.
• Increasingly critical dependencies require rock-solid platform.
• Desired more intuitive and accessible analytics solution.
19. Our solution: AWS and Matillion ETL for Redshift
SAP
10 x DC1 Nodes
Amazon Redshift Cluster
Staging DWH
Matillion ETL
M3.Large
ELT
Tableau
CDC Data Replication (HVR)
21. Outcomes for GE:
• Preserved performance and added stability.
• Simplified operations with managed Redshift solution.
• OPEX savings. No CAPEX required.
• Fast development from concept to reality.
• Highly resilient, seamless scaling
23. Launch a free 14-day trial of Matillion ETL for Amazon Redshift on the AWS Marketplace
– https://aws.amazon.com/marketplace
Get tutorials and videos at the Matillion YouTube channel:
– https://www.youtube.com/user/MatillionVideos
Try Amazon Redshift for free. Get 750 free DC1.Large hours per month for 2 months.
– https://aws.amazon.com/redshift
Support and documentation
– https://redshiftsupport.matillion.com
Next steps and further information