AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Analyze Big Data for Consumer Applications with
Looker BI and Amazon Redshift

Welcome

Maya Cabassi
Partner Marketing Manager
Amazon Web Services

Webinar Overview
 Submit Your Questions using the Q&A tool.
 A copy of today’s presentation will be made available on:
 AWS SlideShare Channel@ http://www.slideshare.net/AmazonWebServices/
 AWS Webinar Channel on YouTube@ http://www.youtube.com/channel/UCTnPlVzJI-ccQXlxjSvJmw

Introducing
Keenan Rice
VP, Marketing & Alliances
Looker

Justin Rosenthal

Tina Adams

Chief Technology Officer
MessageMe

Senior Product Manager
Amazon Web Services

What We’ll Cover
 Overview of Amazon Redshift data warehouse
 How Looker integrates with Amazon Redshift to enable
big data analytics in the cloud

 How MessageMe turns application metrics stored in
Amazon Redshift into actionable insights with Looker BI
 Q&A

Amazon Redshift
Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year

Tina Adams| tinaadam@amazon.com
Senior Product Manager

We set out to build…
A fast and powerful, petabyte-scale data warehouse that is:

A Lot Faster
A Lot Cheaper

Amazon Redshift

A Lot Simpler

Data warehousing done the AWS way

Deploy

• Easy to provision
• Pay as you go, no up front costs
• Fast, cheap, easy to use
• SQL

Common Customer Use Cases

Traditional Enterprise DW

Companies with Big Data

SaaS Companies

•

Reduce costs by extending
DW rather than adding HW

•

Improve performance by
an order of magnitude

•

Add analytic functionality
to applications

•

Migrate completely from
existing DW systems

•

Make more data
available for analysis

•

Scale DW capacity as
demand grows

•

Respond faster to business;
provision in minutes

•

Access business data via
standard reporting tools

•

Reduce HW & SW costs
by an order of magnitude

Amazon Redshift Customers

Channel

Feature Delivery
Unload logs (7/5)
Temp Credentials (4/11)

Sharing snapshots (7/18)

DUB (4/25)

Resource Level IAM (8/9)

SOC1/2/3 (5/8)

SHA1 Builtin (7/15)
Statement Timeout (7/22)
WLM Timeout/Wildcards (8/1)

JDBC Fetch Size (6/27)

UTF-8 Substitution (8/29)

Service Launch (2/14)

Kinesis EMR/HDFS/SSH copy,
Distributed Tables, Audit
Logging/CloudTrail, Concurrency,
Resize Perf., Approximate Count
Distinct, SNS Alerts (11/13)

Split_part, Audit tables (10/3)
EIP Support for VPC Clusters
(12/28)

PCI (8/22)
SIN/SYD (10/8)
PDX (4/2)

Distributed Tables, Single Node
Cursor Support, Maximum
Connections to 500 (12/13)

JSON, Regex, Cursors (9/10)
NRT (6/5)

CRC32 Builtin, CSV, Restore
Progress (8/9)
Timezone, Epoch, Autoformat
(7/25)
4 byte UTF-8 (7/18)

Unload Encrypted Files

HSM Support (11/11)

Amazon Redshift architecture
•

Leader Node
–
–

Stores metadata

–

•

SQL endpoint
Coordinates query execution

Compute Nodes
–

Local, columnar storage

–

Execute queries in parallel

–

Load, backup, restore via Amazon S3

–

Parallel load from Amazon Amazon S3,
DynamoDB, EMR/HDFS/SSH
Kinesis integration

–

•

•

JDBC/ODBC

Hardware optimized for data
processing

10 GigE
(HPC)

Ingestion
Backup
Restore

Scale while remaining online from a
single node to a 100 node 1.6 PB cluster

Amazon Redshift is priced to let you analyze all your data
Effective Hourly
Price (single node)

Effective Hourly
Price Per TB

Effective Annual
Price per TB

On-Demand

$ 0.850

$ 0.425

$ 3,723

1 Year Reservation

$ 0.500

$ 0.250

$ 2,190

3 Year Reservation

$ 0.228

$ 0.114

$

Simple Pricing
Number of Nodes x Cost per Hour
No charge for Leader Node
No upfront costs
Pay as you go

999

Amazon Redshift has security built-in
•

SSL to secure data in transit

•

Encryption to secure data at rest

Customer VPC

– AES-256; hardware accelerated
– All blocks on disks and in Amazon
S3 encrypted
– HSM/CloudHSM

JDBC/ODBC

Internal
Security
Group

10 GigE
(HPC)

•

No direct access to compute
nodes

•

Amazon VPC support

•

SOC1/2/3, PCI level 1, and others
Ingestion
coming soon
Backup
Restore

Amazon Redshift integrates with multiple data sources

Corporate
Datacenter
Amazon RDS

Amazon S3

JDBC
ODBC
Amazon Kinesis

Amazon Redshift

Amazon
DynamoDB
Amazon EMR

Analytics For Today’s
Data-Driven Organizations
Keenan Rice, Vice President, Marketing & Alliances
1.28.14

17

The New Data Landscape
The Missed Innovation Cycle
The Next Generation

Innovative Customers
MessageMe Intro

18

Ridiculous Quantities of
Event & Business Data

Business Data

New MPP
ETL
Data Warehouse
Databases

Data Analysts

Business Users

New Breed of Data Experts
Data Modeling

New Curious Generation
Limited data discovery
Expect Immediate Results

New Data Landscape

19

Event & Business
Application Data

New MPP
databases
No direct
data access

No
reusability

Cubes / Simple
models

BI Software

One-time-use queries

Heavy desktop apps

Traditional
BI

Back to
handcoding SQL

Data Analysts

Business Users

New Breed of Data Experts

New Curious Generation
Expect Immediate Results

Missed Innovation Cycle
BI is a relic of the old (expensive) data landscape

20

Load

Query

Transform

Data Analysts
Flexible Delivery
Agile Modeling

BI Software
Web Based App

Business Users
High-Resolution Discovery
Sharing & Collaboration

Looker — The Next Generation
Modern analytics, built for the new data landscape

21

Load

Query

Transform

Near real-time access to your Redshift data
Data Analysts computing power of theBusiness Users
• Exploit the
BI Software
Flexible Delivery
AWS cloud and Redshift App
Web Based
•

Agile Modeling

•


No need to re-architect or cube data

Looker Inside

22

Copy

Query

Transform

•

Extend the power of your data analysts

Fold data as complex as necessary
Business Users
without any
BI Software database effortDiscovery
High-Resolution
Web Based App
• Use Git for agile team development
•

Data Analysts
Flexible Delivery
Agile Modeling

Looker Intelligence

23

Copy

Transform

•

Powerful data discovery for anyone

•

Share, save, and collaborate
Data Analysts
BI Software
Access allFlexible data, in an interactive App
the Delivery
Web Based
Agile Modeling
web application

Query

•

Business Users

Looker Everywhere

24

A New Perspective
Changing the way organizations make decisions
2012 Founded in Santa Cruz, California
$18M Redpoint, First Round Capital & Pivot North

1200 Hours per month spent in Looker per customer
50+ Customers changing how they run their businesses
Lloyd Tabb

Frank Bien

Marc Randolph

Created first app server
(Netscape), founder
Mozilla.org, LiveOps, etc.

Proven software exec:
Greenplum, EMC

Founder and first
CEO Netflix

© 2014 Looker Inc. All Rights Reserved.

25

Who’s Lookering?
Data-driven organizations realizing the power of Looker

© 2014 Looker Inc. All Rights Reserved.

26

Powering Analytics @ MessageMe
1. Redshift + Looker
2. Example Looker Report & Model
3. MessageMe Data Storage
4. Analytics Strategies
5. DynamoDB → Redshift

Redshift + Looker
Empower your team to answer their own questions.

• What types of Stickers are sent the most?
• How do event/holiday themed-packs perform?

• Which SMS provider is most cost-effective?

Internal dashboards and Looker link-sharing are commonplace.
Looker makes the data accessible and Redshift makes it fast.

Data Storage: Why Redshift?
At Launch:
• DynamoDB for all application data
• MySQL for all statistics data

RDS Config (March 2013)

RDS Config (April 2013)

Master: db.m1.xlarge (15GB)
Slave: db.m1.xlarge (15GB)

Master: db.m1.xlarge (15GB)
Slave: db.m2.4xlarge (68GB)

90% of writes were via LOAD_DATA_INFILE, so write IOPS were not a problem.
However, index sizes were growing quickly…

MySQL Status (April 2013)

event

message

Engine

InnoDB

Engine

InnoDB

Index Width

48 Bytes / Row

Index Width

32 Bytes / Row

Row Count

~3 Billion

Row Count

~2 Billion

Index Size

144 GB

Index Size

64 GB

Slave: db.m2.4xlarge (68GB)

We could put data in, but we couldn’t get it back out!
Possible Solutions
1. Summarize
• PRO: Data compression
• CON: Data loss
2. Shard
• PRO: No data loss
• CON: Difficult to query
3. Redshift?

Data Storage: Current System

Redshift (90%)

MySQL (10%)

• Append-only tables
• Delayed, bulk inserts OK

•
•

Examples:
• èvent`
• `message`
• ùser_demographic`

Examples:
• `purchase`
• ùser_demograhic`

Inline inserts
Non-negotiable uniqueness
requirements (ON DUPLICATE
KEY UPDATE)

Analytics Strategies w/ Billions of Rows
Deep-dive queries w/ row-level specifics

vs.
Super fast top-line metrics, aggregates

You get this out-of-the-box with Redshift

1. Summarization
2. Cached Derived Tables

How do we get these, really fast?

Analytics Strategies: Summarization
sm_message

message

Columns

ìd`
`sender_id`
`recipient_user_id`
`recipient_room_id`
`message_type`
`country`
òs_family`
òs_version`
àpp_version`
`timestamp`

Rows / Day

10-100,000,000

Columns

1,000:1
Compression

`send_hour`
`recipient_type`
`message_type`
`country`
òs_family`
`send_count`

Rows / Day

10-100,000

How many doodles were sent each day in the US since we launched?
100 seconds vs. 3 seconds

Analytics Strategies: Cached Derived Tables
Some important queries will be complex and demand row-specific data.
Summarizing is not an option, what to do?

…build Cached Derived Tables
• Turn long-running, complex queries into flat tables

Analytics Strategies: Cached Derived Tables
Example: Retention by Cohort

SELECT
…
INTO TABLE `sm_retention_day`
FROM (
SELECT
….
FROM ùser`
JOIN `message`
JOIN ùser_source`
), (
SELECT
….
FROM ùser`
JOIN ùser_source`
)

sm_retention_day
`join_day`
`nday`
`country`
òs_family`
òs_version`
`traffic_source`
àctive_users`
`signups`

DynamoDB → Redshift
• Stats tables are homogenous and compact
• Application data can be heterogeneous and heavy
– Mixture of numbers, strings, binary, etc.

How many users signed up this week with a .edu email address?
COPY dynamodb://user

Questions
Contacts:
Looker:
http://www.looker.com/request-demo
MessageMe:
https://messageme.com

AWS:
aws.amazon.com/contact-us
tinaadam@amazon.com

We’d like your feedback.
Please respond to a short survey.

https://aws.asia.qualtrics.com/SE/?SID=SV_1
yUN9wjaZX960kd

AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (18)

Ähnlich wie AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

Ähnlich wie AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift (20)

Mehr von Amazon Web Services

Mehr von Amazon Web Services (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift