Big Data in the Cloud with Informatica Cloud and Amazon Redshift

Cloud and Amazon Redshift
Rahul Pathak, Amazon Redshift Product Management
Nicolas Brisoux, Informatica Cloud Platform Adoption
Darren Cunningham, Informatica Cloud Marketing
@infacloud #redshift

Today’s Agenda
• Informatica and Amazon Strategic Partnership
• Amazon Redshift Overview
• Informatica Cloud Redshift Connector
• Demonstration
• Discussion
• Next Steps
2

Informatica: The Information Management Leader
B2B Data Exchange
Informatica supports the
requirements of cross-organizational
data exchange, so users apply
familiar & trusted data integration
tools and techniques to the growing
practice of B2B data integration.
Cloud Data IntegrationEnterprise Data Integration
Complex Event Processing
Informatica received high praise for
its services from customers. For
deployments involving systems
monitoring use cases, Informatica
offers a five-day stand‐up of
RulePoint.
Ultra Messaging
In spite of the new
entrants, Informatica remains the
market leader in this highly
demanding part of the messaging
market.
Data Quality Master Data Management
Application ILM

Informatica Cloud: our fastest growing product line
Today’s Focus: Cloud Data Integration
4

Informatica Cloud and Amazon Redshift:
Enabling cost-effective data warehousing
• Redshift Connector pre-release announced in February
• General availability this month (August)
5
InformaticaCloud.com/Amazon-Redshift

Rahul Pathak | rapathak@amazon.com | @rahulpathak
Senior Product Manager
Amazon Redshift

AWS Database Services
Amazon RDS
Fully managed SQL database service for OLTP
workloads
Amazon
DynamoDB
Fully managed NoSQL service for massively
scalable, high throughput, low latency
workloads
Amazon
Redshift
Fully managed fast and powerful, petabyte-
scale data warehouse service
Amazon
ElastiCache
Fully managed Memcached-compliant in
memory caching service

We set out to build…
A fast and powerful, petabyte-scale data warehouse that is:
A Lot Faster
A Lot Cheaper
A Lot Simpler
Amazon Redshift

Data warehousing done the AWS way
• Pay as you go, no up front costs
• Fast, cheap, easy to use
• SQL
• Easy to provision

Common Customer Use Cases
• Reduce costs by
extending DW rather than
adding HW
• Migrate completely from
existing DW systems
• Respond faster to
business; provision in
minutes
• Improve performance by
an order of magnitude
• Make more data
available for analysis
• Access business data via
standard reporting tools
• Add analytic functionality
to applications
• Scale DW capacity as
demand grows
• Reduce HW & SW costs
by an order of magnitude
Traditional Enterprise DW Companies with Big Data SaaS Companies

Progress Since Launch on Feb 14, 2013
• Fastest growing service in AWS history
• Well over 1,000 customers; adding over 100 per week
• Obtained SOC1 & SOC2 certification with more in progress
• Deployed in US East (N. Virginia), US West (Oregon), EU
(Ireland) and Asia Pacific (Tokyo)
• Additional global regions coming soon

Amazon Redshift Customers
• 5x – 20x reduction in query times; 4x cost reduction over HIVE
• 20x – 40x reduction in query times
• Nokia: 50% reduction in costs, 2x improvement in query times

Amazon Redshift Customer: bit.ly
“When we want to answer a
question with Redshift, we
just write a SQL query and
get an answer within a few
minutes – if not seconds.”
- Sean O’Connor, Engineer at bit.ly
Bit.ly provides social link sharing
analytics, managing over 300
million shortens and 5 billion
clicks each month

14
Amazon Redshift Customer: HasOffers
“Amazon Redshift introduces a
major opportunity to improve
the performance of our real-
time reporting, allowing us to
run queries up to 50 times faster
than our current OLAP solution.”
- Niek Sanders, VP of Engineering, HasOffers
HasOffers records and reports
billions of desktop and mobile
interactions for performance
marketers

Amazon Redshift Customer: Infor
“This is the formula for fast and broad
adoption, where customers can get
consistent, accurate, and useful
data fast - in weeks not months or
years.”
- Ali Shadman, SVP, Business Cloud & Upgrades, Infor
Infor is the world’s third largest
ERP vendor, serving over 70,000
customers in 194 countries

Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With row storage you do
unnecessary I/O
• To get total amount, you
have to read everything

• Zone maps
• With column storage, you
only read the data you need
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375

• Column storage
• Track of the minimum
and maximum value for
each block
• Skip over blocks that
don’t contain the data
needed for a given query
• Minimize unnecessary I/O

• Column storage
• Zone maps
• Use direct-attached storage
to maximize throughput
• Hardware optimized for high
performance data
processing
• Large block sizes to make
the most of each read
• Amazon Redshift manages
durability for you

Amazon Redshift architecture
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via
Amazon S3
– Parallel load from Amazon
DynamoDB
• Single node version available
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC

Amazon Redshift runs on optimized hardware
HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate
HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage
• Optimized for I/O intensive workloads
• High disk density
• Runs in HPC - fast network
• HS1.8XL available on Amazon EC2

Amazon Redshift lets you start small and grow big
Extra Large Node (HS1.XL)
3 spindles, 2 TB, 16 GB RAM, 2 cores
Single Node (2 TB)
Cluster 2-32 Nodes (4 TB – 64 TB)
Eight Extra Large Node (HS1.8XL)
24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE
Cluster 2-100 Nodes (32 TB – 1.6 PB)
Note: Nodes not to scale

Amazon Redshift is priced to let you analyze all your data
Simple Pricing
Number of Nodes x Cost per Hour
No charge for Leader Node
No upfront costs
Pay as you go
Price Per Hour for
HS1.XL Single Node
Effective Hourly
Price Per TB
Effective Annual
Price per TB
On-Demand $ 0.850 $ 0.425 $ 3,723
1 Year Reservation $ 0.500 $ 0.250 $ 2,190
3 Year Reservation $ 0.228 $ 0.114 $ 999

Amazon Redshift is easy to use
• Provision in minutes
• Monitor query
performance
• Point and click resize
• Built in security
• Automatic backups
Slides not intended for redistribution.

Amazon Redshift has security built-in
• SSL to secure data in transit
• Encryption to secure data at rest
– AES-256; hardware accelerated
– All blocks on disks and in Amazon
S3 encrypted
• No direct access to compute
nodes
• Amazon VPC support
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal
Security
Group
JDBC/ODBC

Amazon Redshift continuously backs up your data and
recovers from failures
• Replication within the cluster and backup to Amazon S3 to maintain
multiple copies of data at all times
• Backups to Amazon S3 are continuous, automatic, and incremental
– Designed for eleven nines of durability
• Continuous monitoring and automated recovery from failures of drives
and nodes
• Able to restore snapshots to any Availability Zone within a region

Amazon Redshift works with your existing analysis tools
More coming soon…
JDBC/ODBC
Amazon Redshift

Amazon Redshift integrates with multiple data sources
Amazon Elastic
MapReduce
Amazon
DynamoDB
Amazon Elastic
Compute Cloud
(EC2)
AWS Storage
Gateway
Service
Amazon Simple
Storage Service
(S3)
Corporate
Data Center
Amazon Relational
Database Service
(RDS)
Amazon
Redshift

Today’s Agenda
• Informatica and Amazon Strategic Partnership
• Amazon Redshift Overview
• Informatica Cloud Redshift Connector
• Demonstration
• Discussion
• Next Steps
30

2
1
Informatica Cloud Architecture Overview
4Secure
Agent
Your Company 3
Marketplace
Amazon
Redshift

Map Once. Deploy Anywhere.
ON PREMISE HADOOP 3rd PARTY
APPLICATIONS
CLOUD

Cloud Amazon Redshift
Connector Demo
Nicolas Brisoux, Cloud Platform Adoption

Best practices to remember…
• The Amazon S3 bucket that holds the data files must be
created in the same region as your cluster
• Files are deleted from Amazon S3 bucket when upload is
complete
• Choose a batch size where the number of batches
matches the number of slices in your cluster
• Each XL node has 2 slices, each 8XL node has 16
• If you have a 2 node XL cluster and 40,000 rows of data,
choose a batch size of 10,000
• The Informatica Cloud Redshift connector can maximize
Amazon’s parallel processing capabilities this way

Informatica Cloud Amazon Redshift demonstration
Firewall
Informatica Cloud
Secure Agent
Metadata Mappings
Authenticate and retrieve Data
Synchronization Task
1
1
Retrieve Account Data2
2
3 Perform lookup on SLA level
3
4
4
Put Account Data & SLA Level into
Flat File
5 Transferred compressed Flat File
5
6 Initiate load from Amazon S3
6
7 Load data into Amazon Redshift
7

PowerCenter Mappings and Informatica Cloud
• If you want to reuse your existing PowerCenter mappings
with Informatica Cloud and Redshift you have 2 options:
• Use the PowerCenter Repository Manager to export your
existing workflows and import them into Informatica Cloud
using the PowerCenter Tasks feature
Or…
• Keep your existing mappings in PowerCenter and stage the
data
• Create a DSS task in Informatica Cloud to move the data to
Redshift from the staging area
• This task can be managed from PowerCenter
1
2

Why Informatica Cloud Integration for Redshift?
37
1 Map Once, Deploy Anywhere
2 Rapid Connectivity & Deployment
3 Advanced Integration Delivered Easily
4 Excellence in batch and real-time integration
InformaticaCloud.com

Next Steps
• Get started with Amazon Redshift
• Get started with Informatica Cloud
• InformaticaCloud.com
• Learn more about our Redshift Connector
• InformaticaCloud.com/Amazon-Redshift
38

Discussion
Rahul Pathak, Amazon Redshift Product Management
Nicolas Brisoux, Informatica Cloud Platform Adoption
Darren Cunningham, Informatica Cloud Marketing
@infacloud #redshift
InformaticaCloud.com

Big Data in the Cloud with Informatica Cloud and Amazon Redshift

Big Data in the Cloud with Informatica Cloud and Amazon Redshift

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (13)

Mehr von Informatica Cloud

Mehr von Informatica Cloud (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Big Data in the Cloud with Informatica Cloud and Amazon Redshift

Hinweis der Redaktion