More Related Content
Similar to Data Warehouse Evolution Roadshow (20)
More from MapR Technologies (20)
Data Warehouse Evolution Roadshow
- 2. Agenda
Welcome
MapR
Data
and
your
Data
Warehouse
MapR
Big
The
New
Data
Warehouse
Informa6ca
Making
the
Most
of
Big
Data
MicroStrategy
Enterprise-‐Grade
Hadoop:
Use
Cases
MapR
Infrastructure
PlaLorm
For
Big
Data
Cisco
GeNng
Started/Q&A
All
Close
MapR
©MapR
Technologies
-‐
Confiden6al
2
- 3. Big
Data
and
Your
Data
Warehouse
©MapR
Technologies
-‐
Confiden6al
3
- 4.
“Data is a precious thing and will last
longer than the systems themselves.”
– Tim Berners-Lee, inventor of the World Wide Web.
©MapR
Technologies
-‐
Confiden6al
4
- 5.
“Without big data analytics, companies are
blind and deaf, wandering out onto the web
like deer on a freeway.”
– Geoffrey Moore, author and consultant.
©MapR
Technologies
-‐
Confiden6al
5
- 6.
“If we have data, let’s look at data. If all we
have are opinions, let’s go with mine.”
– Jim Barksdale, former Netscape CEO
©MapR
Technologies
-‐
Confiden6al
6
- 7. Big
Data
today
in
the
Enterprise
“Too
many
different
types,
sources
&
formats
of
cri6cal
data”
Mul0ple
data
sources
Mul0ple
technologies
Mul0ple
copies
of
data
©MapR
Technologies
-‐
Confiden6al
7
- 8. An
Enterprise
Data
Hub
Sensor
Data
Click
Streams
Enterprise
Data
Hub
Produc6on
Data
Web
Logs
Loca6on
Public
Social
Media
Sales
SCM
ü
ü
ü
©MapR
Technologies
-‐
Confiden6al
CRM
Combine
different
data
sources
Minimize
data
movement
One
plaLorm
for
analy6cs
8
Billing
- 9. Big
Data
in
our
World
YouTube
users
upload
48
hours
of
new
video
every
minute
of
the
day.
§ Twieer
sees
roughly
175
million
tweets
every
day,
and
has
more
than
465
million
accounts.
§ Facebook
stores,
accesses,
and
analyzes
30+
Petabytes
of
user
generated
data.
§ More
than
5
billion
people
are
calling,
tex6ng,
twee6ng
and
browsing
on
mobile
phones
worldwide.
§ 2.7
Zetabytes
of
data
exist
in
the
digital
universe
today.
§ Data
produc6on
will
be
44
6mes
greater
in
2020
than
it
was
in
2009.
§
©MapR
Technologies
-‐
Confiden6al
9
- 10. Arrival
of
Big
Data
Impacts
Data
Warehouses
Variety
Volume
Prohibi6vely
expensive
storage
costs
©MapR
Technologies
-‐
Confiden6al
Inability
to
process
unstructured
formats
Velocity
Data
Warehouse
10
Faster
arrival
and
processing
needs
- 11. The
Hadoop
Advantage
§
Fueling
an
industry
revolu6on
by
providing
infinite
capability
to
store
and
process
big
data
§
Expanding
analy6cs
across
data
types
§
Compelling
economics
–
20
to
100X
more
cost
effec6ve
than
alterna6ves
Pioneered
at
©MapR
Technologies
-‐
Confiden6al
11
- 12. Important
Drivers
for
Hadoop
§
Data
on
compute
drives
efficiencies
and
beeer
analy6cs
§
With
Hadoop
you
don’t
need
to
know
what
ques6ons
to
ask
beforehand
§
Simple
algorithms
on
Big
Data
outperform
complex
models
§
Powerful
ability
to
analyze
unstructured
data
©MapR
Technologies
-‐
Confiden6al
12
- 13. What
is
the
Best
Way
to
Deploy
Hadoop?
Transitory
Data
Store
• No long-term scale
advantages
• Unprotected data
Permanent
Data
Store
• Highly available and fully
protected
data
• Works with existing tools
vs.
• ETL Tool focus
• Real-time ingestion and
extraction
• Archive data from data
warehouse
Enterprise
Data
Hub
©MapR
Technologies
-‐
Confiden6al
13
- 14.
“Hadoop ingests and stores data very cost effectively, and
handles workloads such as the simple transformations in ETL.
On the other hand, Hadoop does not address the missioncritical complex business analytic workloads…”
Mike
Koehler
-‐
CEO
Teradata
©MapR
Technologies
-‐
Confiden6al
14
- 15. Data
Warehouse
Op0mized:
Cost
Savings
RDBMS
DW
ETL
+ Long
erm
S Storage
ETL
+
Long
TTerm
torage
Sensor
Data
Web
Logs
Query
+
Present
Hadoop
Benefits:
ü Both
structured
and
unstructured
data
ü Expanded
analy6cs
with
MapReduce,
NoSQL,
etc.
Solu0on
Hadoop
Cost
/
Terabyte
Hadoop
Advantage
$333
Teradata
Warehouse
Appliance
$16,500
50x
savings
Oracle
Exadata
$14,000
42x
savings
IBM
Netezza
$10,000
30x
savings
©MapR
Technologies
-‐
Confiden6al
15
- 16. Exis6ng
Data
Social
Data
Weblog
Data
Telemetry
The
Enterprise
Data
Hub
for
Hadoop
Compute
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
-‐
Freed
Up
Space
Fraud
Detec6on
Applica6on
©MapR
Technologies
-‐
Confiden6al
Enterprise
Data
W16
arehouse
Recommenda6on
Engine
- 17. Mul0-‐Tenant
Capabili0es
to
Share
a
Cluster
Successfully
§
Isola6on
– Data
placement
control
– Label
based
job
scheduling
§
Quotas
– Storage,
CPU,
Memory
§
Security
and
delega6on
– ACLs
– AD,
LDAP,
Linux
PAM
§
Repor6ng
– About
70
resource
usage
metrics
– REST
API
integra6on
©MapR
Technologies
-‐
Confiden6al
17
- 18. One
PlaMorm
for
Big
Data
Batch
Interac0ve
Log
file
Analysis
Data
Warehouse
Offload
Fraud
Detec6on
Clickstream
Analy6cs
Forensic
Analysis
Analy6c
Modeling
BI
User
Focus
Map
Reduce
File-‐Based
Applica6ons
99.999%
HA
Data
Protec6on
Sensor
Analysis
“Twieerscraping”
Telema6cs
Process
Op6miza6on
Interac6ve
Batch
©MapR
Technologies
-‐
Confiden6al
Real-‐Time
Real-‐6me
SQL
Database
Scalability
&
Disaster
Recovery
Performance
18
Search
Enterprise
Integra6on
Stream
Processing
Mul6-‐
tenancy
…
- 20. The
New
Data
Warehouse
Big
Data
+
Hadoop
©MapR
Technologies
-‐
Confiden6al
20
- 21. Agenda
§ Big
Data
and
Data
Warehouse
Op6miza6on
§ What
Are
Customers
Doing
to
Op6mize
their
Data
Warehouse?
§ Informa6ca
on
Hadoop
Complements
Your
Data
Warehouse
©MapR
Technologies
-‐
Confiden6al
21
- 22. Big
Data
and
Data
Warehouse
Op0miza0on
©MapR
Technologies
-‐
Confiden6al
22
- 24. Informa0ca
+
Hadoop
PowerCenter
Developers
are
Now
Hadoop
Developers
Archive
Profile
Parse
ETL
Cleanse
Match
Transactions,
OLTP, OLAP
Analytics & Op
Dashboards
Documents and Emails
Mobile
Apps
Social Media, Web Logs
Machine Device,
Scientific
©MapR
Technologies
-‐
Confiden6al
Real-Time
Alerts
24
- 25. Data
Warehouse
Op0miza0on
1.
Iden6fy
inac6ve
&
infrequently
used
data
Data Warehouse
Transactions,
OLTP, OLAP
Reports
Documents and Emails
Social Media, Web Logs
2.
Offload
data
&
processing
to
Hadoop
5.
Move
high
value
results
data
into
DW
3.
Ingest
raw
data,
replicate
changes
&
schemas
Machine Device,
Scientific
4.
Store
&
prepare
(e.g.
ETL)
data
on
Hadoop
©MapR
Technologies
-‐
Confiden6al
25
- 26. PowerCenter
Big
Data
Edi0on
Minimize
Risk
Quickly
staff
projects
with
trained
experts
Map
Once.
Deploy
AnywhereTM
Deploy
On-‐Premise
or
in
the
Cloud
Traditional Grid
©MapR
Technologies
-‐
Confiden6al
26
- 27. What
Are
Customers
Doing
to
Op0mize
their
Data
Warehouse?
©MapR
Technologies
-‐
Confiden6al
27
- 28. Minimize
risk
and
grow
digital
business
The Challenge. Grow
digital
business
to
30%
($1.8B)
and
reduce
fraud
The
Solu0on
Relational - SQL Server, Oracle,
DB2, AS400, Mainframe
The
Result
BI / Analytics
Visualization & Reporting
PowerCenter
Big
Data
Edi6on
Profile
Parse
ETL
• Comprehensive
data
integra6on
plaLorm
to
integrate
large
volumes
of
data
from
over
18+
systems
• Ability
to
use
exis6ng
skill
sets
&
make
them
more
produc6ve
Surveys & Net Promoter
Scores (NPS)
• Lowest
risk
as
industry
leader
Social Media, Web Logs,
JSON, XML
Netezza, SQL
Server, Oracle, SAS
Machine, Forensic, Splunk
Large
Global
Financial
Services
and
Communica0ons
Company
©MapR
Technologies
-‐
Confiden6al
28
- 29. Reduce
Costs
&
Increase
Revenue
Consolidate
Data
on
Hadoop
&
Provide
360
View
of
Customer
The Challenge Data
increasing
20x
every
year
with
costs
rising
from
$17K
per
day
to
$50K
per
day
within
6
months.
Time
to
deliver
informa6on
taking
too
long.
The
Solu0on
Business
Reports
Traditional Grid
• Gain
360
view
of
customer
behavior,
increase
cross-‐sell
&
up-‐sell
revenue
Transactions from
70 Data Centers
In-‐Store
POS
Data
B2B
Data
Exchange
Expected
Result
Data
Warehouse
Power
Center
Big
Data
Edi6on
• Reduce
data
storage
costs
from
$50K
per
day
to
$500
per
day
172
TB
&
Data
Valida0on
Data
from
Gaming
Consoles,
TV,
Tablets,
Readers,
&
Clickstreams
from
5000
Web
Sites
• Reduce
6me
to
deliver
informa6on
to
business
from
48
hours
to
15
minutes
Large
Global
Media
&
Entertainment
Company
©MapR
Technologies
-‐
Confiden6al
29
- 30. Flexible
architecture
to
support
rapid
changes
The Challenge. Data
volumes
growing
at
3-‐5
6mes
over
the
next
2-‐3
years
The
Solu0on
The
Result
• Manage
data
integra6on
and
load
of
10+
billion
records
from
mul6ple
disparate
data
sources
Traditional Grid
DW
Data Virtualization
Mainframe
RDBMS
EDW
Business
Reports
• Flexible
data
integra6on
architecture
to
support
changing
business
requirements
in
a
heterogeneous
data
management
environment
DW
Unstructured
Data
Large
Government
Agency
©MapR
Technologies
-‐
Confiden6al
30
- 31. Lower
costs
of
Big
Data
projects
The Challenge. Data
warehouse
exploding
with
over
200TB
of
data.
User
ac6vity
genera6ng
up
to
5
million
queries
a
day
impac6ng
query
performance
The
Solu0on
The
Result
Business
Reports
ERP
CRM
Custom
Interac0on
Data
EDW
• Saved
$20M
+
$2-‐3M
on-‐going
by
archiving
&
op6miza6on
• Reduced
project
6meline
from
6
months
to
2
weeks
Phase
1
• Improved
performance
by
25%
Archived
Archived
Data
Data
• Return
on
investment
in
less
than
6
months
©MapR
Technologies
-‐
Confiden6al
Large
Global
Financial
Ins0tu0on
31
- 32. Lower
costs
and
minimize
risk
The Challenge. Increasing demand for faster data driven decision making and analytics
as data volumes and processing loads rapidly increase
The
Solu0on
RDBMS
The
Result
• Cost-‐effec6vely
scale
performance
Near Real-Time
Datamarts
RDBMS
Traditional Grid
• Increased
agility
by
standardizing
on
one
data
integra6on
plaLorm
Data
Warehouse
Web Logs
©MapR
Technologies
-‐
Confiden6al
• Lower
hardware
costs
Large
Global
Financial
Ins0tu0on
32
• Leverage
new
data
sources
for
faster
innova6on
- 34. Maximize
Your
Return
On
Big
Data
Hadoop
complements
your
exisIng
infrastructure
Data
Assets
Opera0onal
Systems
OLTP
Analy0cal
Systems
Data
Products
Data
Warehouse
MDM
Transactions,
OLTP, OLAP
OLTP
Data
Mart
ODS
Documents,
Email
&
other
NoSQL
Social Media,
Web Logs
Machine Device,
Scientific
Access
&
Ingest
Parse
&
Prepare
Discover
&
Profile
Transform
&
Cleanse
Manage
(i.e.
Security,
Performance,
Governance,
Collabora6on)
©MapR
Technologies
-‐
Confiden6al
34
Extract
&
Deliver
- 35. Data
Integra0on
&
Quality
on
Hadoop
1. Entire Informatica mapping
translated to Hive Query Language
2. Optimized HQL converted to
MapReduce & submitted to Hadoop
cluster (job tracker).
3. Advanced mapping transformations
executed on Hadoop through User
Defined Functions using Vibe
SELECT
T1.ORDERKEY1
AS
ORDERKEY2,
T1.li_count,
orders.O_CUSTKEY
AS
CUSTKEY,
customer.C_NAME,
customer.C_NATIONKEY,
na6on.N_NAME,
na6on.N_REGIONKEY
FROM
(
SELECT
TRANSFORM
(L_Orderkey.id)
USING
CustomInfaTx
FROM
lineitem
GROUP
BY
L_ORDERKEY
)
T1
JOIN
orders
ON
(customer.C_ORDERKEY
=
orders.O_ORDERKEY)
JOIN
customer
ON
(orders.O_CUSTKEY
=
customer.C_CUSTKEY)
JOIN
na6on
ON
(customer.C_NATIONKEY
=
na6on.N_NATIONKEY)
WHERE
na6on.N_NAME
=
'UNITED
STATES'
)
T2
INSERT
OVERWRITE
TABLE
TARGET1
SELECT
*
INSERT
OVERWRITE
TABLE
TARGET2
SELECT
CUSTKEY,
count(ORDERKEY2)
GROUP
BY
CUSTKEY;
MapReduce
UDF
Hive-QL
©MapR
Technologies
-‐
Confiden6al
35
- 36. Accelerate
Development
Reuse
and
Import
PowerCenter
Metadata
Import
and
validate
exis6ng
PowerCenter
mappings
before
running
on
Hadoop
©MapR
Technologies
-‐
Confiden6al
36
- 37. Hadoop
Data
Profiling
Results
Value
and
Paeern
Frequency
to
isolated
inconsistent/dirty
data
or
unexpected
paeerns
Hadoop
Data
Profiling
results
–
exposed
to
anyone
in
enterprise
via
browser
CUSTOMER_ID
example
COUNTRY
CODE
example
2.
Value
&
Pabern
Analysis
of
Hadoop
Data
1.
Profiling
Stats:
Min/Max
Values,
NULLs,
Inferred
Data
Types,
etc.
Stats
to
iden6fy
outliers
and
anomalies
in
data
3.
Drilldown
Analysis
(into
Hadoop
Data)
Drill
down
into
actual
data
values
to
inspect
results
across
en6re
data
set,
including
poten6al
duplicates
©MapR
Technologies
-‐
Confiden6al
37
- 38. Hadoop
Data
Domain
Discovery
Finding
funcIonal
meaning
of
Data
in
Hadoop
Leverage
INFA
rules/mapplets
to
iden6fy
func6onal
meaning
of
Hadoop
data
Sensi6ve
data
(e.g.
SSN,
Credit
Card
number,
etc.)
View/share
report
of
data
domains/
sensi6ve
data
contained
in
Hadoop.
Ability
to
drill
down
to
see
suspect
data
values.
PHI:
Protected
Health
Informa0on
PII:
Personally
Iden0fiable
Informa0on
Scalable
to
look
for/discover
ANY
Domain
type
©MapR
Technologies
-‐
Confiden6al
38
- 39. Unified
Administra0on
Single Place to Manage & Monitor
Full
traceability
from
workflow
to
MapReduce
jobs
View
generated
Hive
scripts
©MapR
Technologies
-‐
Confiden6al
39
- 40. Maximize
Your
Return
on
Big
Data
Lower Big Data Costs Up To 2X
(helps self-fund big data projects)
• 5x
produc6vity
increase
using
exis6ng
developer
skills
Minimize Risk of New Technologies
(single platform, quickly staff projects)
• Design
in
PowerCenter,
run
on
Hadoop
or
any
other
data
plaLorm
Accelerate Innovation
(onboard, discover, operationalize)
• Enterprise
scalability,
security,
&
support
©MapR
Technologies
-‐
Confiden6al
40
- 41. Making
the
Most
of
Big
Data
Leveraging
business
intelligence
to
turn
business
users
into
data
scien6sts
©MapR
Technologies
-‐
Confiden6al
41
MicroStrategy
Confiden6al.
Distribu6on
Prohibited
without
Prior
Authoriza6on.
- 42. Agenda
1.
Self
Service
2.
Informa6on
Driven
Apps
3.
Mobility
4.
Advanced
Analy6cs
©MapR
Technologies
-‐
Confiden6al
42
- 44. Self-‐Service
Analy0cs
Revolu0onizes
Tradi0onal
BI
Boost
user
sa6sfac6on
while
massively
increasing
produc6vity
More Productive!
5-10x!
More content per creator"
More Content"
More Producers!
5-10x!
More users can create
content"
More Collaborative!
Peer-to-peer sharing"
©MapR
Technologies
-‐
Confiden6al
More Content"
Creators"
5-10x!
More Sharing"
44
>100x!
more content"
creation and "
consumption"
- 45. Business
User
Access
to
1000s
of
Data
Sources
Faster
access
to
your
data
Enterprise
Applications
Relational
Databases
CloudBased Data
Personal or
Departmental
Big Data &
Hadoop
Spreadsheets, Access
databases, CSV, public data
downloads, etc.
MapR
MicroStrategy
Modeled Data
SAP, Oracle e-Business,
Siebel, Peoplesoft, etc.
Oracle, SQL Server,
MySQL, Teradata,
Netezza, etc.
Salesforce.com, NetSuite.
Facebook, Eloqua, Google
Docs, etc.
Quick Data Import
©MapR
Technologies
-‐
Confiden6al
No SQL or Scripting
45
Enterprise-certified singleversion of the truth
- 46. Enrich
Every
Analysis
with
Added
Insight
Enrich
with
Weather
Data
Impact
of
weather
on
game
outcome
and
aeendance
Professional
Sports
Enrich
with
Demographic
Data
Product
popularity
by
demographic
segment
Product
Sales
Enrich
with
Social
Data
Cross-‐brand
affinity
to
determine
promo6ons
or
bundling
offers
Marke0ng
Promo0ons
©MapR
Technologies
-‐
Confiden6al
46
- 47. World-‐Class
Produc0on
Dashboard
Applica0ons
Informa6on-‐Driven
Apps
are
the
future
of
dashboards
• 100
%
customized
look
and
feel
• Comprehensive
data
• Easy
to
use
• Guided
workflow
for
consistent
user
experience
• Personalized
for
each
user
• Online
or
distributed
via
email
• Mul6media
content-‐enabled
• Transac6on-‐enabled
• Live
data
©MapR
Technologies
-‐
Confiden6al
47
- 48. Beyond
Mobile
Dashboards
Build
great
mobile
Smart
Apps
without
the
pain
of
na6ve
development
Analy0cs
Transac0ons
Mul0media
Update
systems
like
ERP/CRM
Analy6cs
and
data
visualiza6on
Add
videos
and
other
content
+"
+"
Apps for Every Customer-Facing Process"
Apps for Every Internal Business Process"
Logistics
Apps
©MapR
Technologies
-‐
Confiden6al
Operations
Apps
B2E"
Apps
Data Collection" Product"
Apps
Apps
48
Context-Aware"
Apps
Executive"
Apps
- 49. Easy
Integra0on
with
Third
Party
Analy0c
Models
All
of
an
Organiza6on’s
Analy6cs
Can
Now
be
Distributed
Through
a
Single
PlaLorm
Deploy
Any
of
5000+
Open
Source
R
Analy6cs
Import
Predic6ve
Models
from
Popular
Packages
Create
Your
Own
Custom
Func6ons
MicroStrategy
R
Integra6on
Pack
PMML
Model
ƒApply(X)
MicroStrategy
Custom
Func6on
Plug-‐in
As
a
MicroStrategy
metric,
use
models
and
func6ons
in
any
report
or
dashboard
©MapR
Technologies
-‐
Confiden6al
49
- 50. MicroStrategy
Analy0cs
PlaMorm
Comprehensive
analy6cs
suite
for
business
MicroStrategy Analytics Platform
Self-Service
Analytics
Enterprise-Grade
Business Intelligence
Big Data
Analytics
Rapid-fire data
discovery
Produce and publish trusted
analytics to elevate performance
The power to transform your
Big Data into insight
• Intuitive data exploration
• Self-service with no IT needed
• Access and combine data from all
sources
• Trusted system-of-record reliability
• Advanced and predictive analytics
• Easy, cost-effective administration
• Fast dashboard development
• Comprehensive delivery options
with massive user scale
• Blazing speed and performance
Web or Mobile
On-Premises or on MicroStrategy Cloud
©MapR
Technologies
-‐
Confiden6al
50
- 51. Two
Ways
to
Experience
MicroStrategy
Today
Best
of
all,
they’re
free!
MicroStrategy
Analy0cs
Desktop
Fastest, easiest self-service analytics
tool for business users.
100% free!
See it in action
MicroStrategy
Analy0cs
Express
Cloud-based self-service visual
analytics for any organization.
Free for one year!
See it in action
©MapR
Technologies
-‐
Confiden6al
51
- 53. Use
Cases
©MapR
Technologies
-‐
Confiden6al
53
- 54. Data
Warehouse
Offload:
Cost
Savings
+
Analy0cs
RDBMS
DW
ETL
+ Long
erm
S Storage
ETL
+
Long
TTerm
torage
Sensor
Data
Web
Logs
Query
+
Present
Hadoop
Benefits:
ü Both
structured
and
unstructured
data
ü Expanded
analy6cs
with
MapReduce,
NoSQL,
etc.
Solu0on
Hadoop
Cost
/
Terabyte
Hadoop
Advantage
$333
Teradata
Warehouse
Appliance
$16,500
50x
savings
Oracle
Exadata
$14,000
42x
savings
IBM
Netezza
$10,000
30x
savings
©MapR
Technologies
-‐
Confiden6al
54
- 55. Expand
Data
For
Exis0ng
Applica0ons
§
§
Network
security:
Network
IDS
with
a
3-‐day
window
instead
of
a
10-‐minute
window
Trade
Surveillance:
Rogue
trader
detec6on
on
intra-‐
day
instead
of
end-‐of-‐day
market
data
§
Insurance:
Calculate
risk
triangles
for
individual
proper6es
instead
of
neighborhoods
©MapR
Technologies
-‐
Confiden6al
55
Advantages:
ü 1T
files
and
tables
ü Real-‐6me
data
inges6on
with
streaming
writes
ü 24x7
opera6ons
with
automated
failure
recovery
ü Beeer
hardware
u6liza6on
with
2x
performance
- 56. Combine
Different
Data
Sources
Advantages:
Streaming
writes
to
Hadoop
ü Exponen6al
decrease
in
6me
to
market
Hadoop
ü Real-‐6me
data
inges6on
with
streaming
writes
Real-‐6me
offers
ü 1T
files
and
tables
ü 24x7
opera6ons
with
automated
failure
recovery
POS/Online
Data
Retail
purchase
Info
©MapR
Technologies
-‐
Confiden6al
56
- 57. New
Analy0cs
Advantages
ü Increased
ROI
with
2x
performance
ü High
available,
fully
data
protected
environment
• Enhanced search
• Real-time event processing
• MapReduce-enabled machine learning algorithms
©MapR
Technologies
-‐
Confiden6al
57
ü Mul6ple
users
running
different
jobs
on
one
cluster
- 58. Customer
Example
Cloud-‐based
predic6ve
analy6cs
plaLorm
Apache
HBase
ý
• Compac6ons
• Manual
administra6on
• Poor
reliability
Cassandra
ý
þ
• Compac6ons
• Manual
administra6on
• Eventual
consistency
•
•
•
•
•
No
compac6ons
Zero
administra6on
Strong
consistency
2x
Cassandra
performance
3x
HBase
performance
Sociocast
conducted
a
POC
with
the
three
solu6ons
©MapR
Technologies
-‐
Confiden6al
58
- 59. MapR
Advantages
for
Enterprise
Data
Hub
• Enterprise Grade Platform
• 99.999% HA
• Full data protection
• Disaster recovery
• Easiest Integration
• Industry-standard interfaces:
NFS, ODBC, LDAP, REST
• Streaming writes
• Best ROI
• Faster time to market
• Eliminate risk
• Reuse existing apps and tools
©MapR
Technologies
-‐
Confiden6al
59
- 60. In
the
era
of
the
“Internet
Of
Everything”
Unified
Compu0ng
Systems
The
Infrastructure
PlaMorm
For
Big
Data
©MapR
Technologies
-‐
Confiden6al
60
- 62. “The
internet
of
everything
will
provide
a
21%
increase
in
corporate
profits
in
the
next
10
years”
©MapR
Technologies
-‐
Confiden6al
62
- 63. How
many
IP
addresses
does
your
home
have?
IPV6
©MapR
Technologies
-‐
Confiden6al
63
- 65. How
will
the
internet
of
things
change
Basketball?
©MapR
Technologies
-‐
Confiden6al
65
- 66. Facebook
And
Cisco
Let
Brick-‐&-‐Mortars
Demand
Customers
Check-‐In
To
Get
Wi-‐Fi
10.03.13
at
Interop
Facebook
and
Cisco
roll
out
a
way
to
help
any
brick-‐and-‐
mortar
recoup
its
costs
by
asking
users
to
check-‐in
to
get
Internet
access.
Those
who
oblige
get
dropped
on
the
business’
Facebook
Page,
and
their
anonymous,
aggregate
demographic
info
is
passed
to
the
merchant.
hep://techcrunch.com/2013/10/02/facebook-‐wifi/
©MapR
Technologies
-‐
Confiden6al
66
- 67. In-‐Store
Manager
View
&
Capabili0es
Product
Catalog
Product
Characteris6cs
Marke6ng
Descrip6on
Quality
Data
Mul6-‐media
Informa6on
Product
Sugges6ons
Promo0on
PorMolio
Campaign
Management
Customer
Segmenta6on
Loca6on
triggered
Rules
Consumer
Profile
CRM
profile
Loyalty
status
Consumer
Preferences
Applica0on
Analy0cs
&
Forecas0ng
Based
on
Historical
FooLal
Heatmap
Preferences
©MapR
Technologies
-‐
Confiden6al
67
- 68. Beber
Retailing?
Retailers
Dashboard
Mobility
Services
Engine
Exis6ng
ERP
Systems
t
Exis0ng
Retailing
PlaMorm
Cisco
Wireless
WLAN
Controller
Consumer
Personal
Shopping
Assistant
©MapR
Technologies
-‐
Confiden6al
Cisco Wireless Access
68
Point
- 69. Big
Data
and
Key
Infrastructure
Abributes
(What
big
data
isn’t)
§
§
§
§
Usually
not
blade
servers
(not
enough
local
storage)
Usually
not
virtualized
(hypervisor
only
adds
overhead)
Usually
not
highly
oversubscribed
(significant
east-‐west
traffic)
Usually
not
SAN/NAS
Low-‐cost,
DAS-‐based,
scale-‐out
clustered
filesystem
Move
the
compute
to
the
storage
©MapR
Technologies
-‐
Confiden6al
69
$$$
69
69
- 70. Cost,
Performance,
and
Capacity
HW:SW $ split 30:70
Expensive
Load
1TB/hr
ETL
Structured
Data:
Rela0onal
Database
$20K/TB
Enterprise
Data
Massive Scale-Out
Column Store
$10K/TB
$500-‐$1K/TB
Hadoop
No SQL
HW:SW $ split 70:30
©MapR
Technologies
-‐
Confiden6al
70
Unstructured
Data:
Machine
Logs,
Web
Click
Stream,
Call
Data
Records,
Satellite
Feeds,
GPS
Data,
Sensor
Readings,
Sales
Data,
Blogs,
Emails,
Video
- 71. Typical
big
data
deployments
Dedicated
“Pod”
for
Big
Data
General
Purpose
IT
Data
Center
IT
Infrastructure
standard
IT
servers
SAP
VMwar
e
WEB
X86
servers
Big
Data
Big
Data
§
Experimental
use
of
Big
Data
§
App
team
mandated
infrastructure
§
Deployed
into
IT
Ops
mandated
infrastructures
§
Purpose
built
for
Big
Data
§
Big
Data
has
established
business
value
§
Performance
maeers
§
Large
or
small
clusters
§
“Skunk
works”
§
Small
to
medium
clusters
©MapR
Technologies
-‐
Confiden6al
71
- 72. Cisco
UCS
Common
PlaMorm
Architecture
(CPA)
Building
Blocks
for
Big
Data
UCS
Manager
UCS
6200
Series
Fabric
Interconnects
Nexus
2232
Fabric
Extenders
©MapR
Technologies
-‐
Confiden6al
LAN,
SAN,
Management
UCS
240
M3
Servers
72
72
- 73. Cisco
Big
Data
Common
PlaMorm
Architecture
Single-‐SKU
Big
Data
SmartPlay
Bundles
The
Big
Data
Accelera0on
Kit
Cisco
Components
• 16
node
UCS
CPA
Solu6on
Cisco
SKUs:
UCS-‐EZ-‐BD-‐HP
and
UCS-‐EZ-‐BD-‐HC
MapR
Components
Single
Rack
UCS
Solu0ons
Single
Rack
Half-‐Rack
UCS
Solu0ons
• 16-‐node
M7
license
UCS
Solu0ons
Bundle
for
Hadoop
Bundle
for
Hadoop
Bundle
for
MPP
• (2)
Free
Administrator
Training
Credits
Performance
Capacity
Configura0on
• Installa6on
and
configura6on
UCS-‐EZ-‐BD-‐HP
UCS-‐EZ-‐BD-‐HC
UCS-‐EZ-‐BD-‐STRT
Data
strategy
and
explora6on
•
• MapR
SKU:
M7-‐16-‐CISCO-‐12
2
x
UCS
6248
2
x
Nexus
2232
PP
8
x
C240
M3
(SFF)
2x
E5-‐2690
256GB
24x
600GB
10K
SAS
hep://www.cisco.com/en/US/docs/
unified_compu6ng/ucs/UCS_CVDs/
Cisco_UCS_CPA_for_Big_Data_with_MapR.h
tml
©MapR
Technologies
-‐
Confiden6al
2
x
UCS
6296
2
x
Nexus
2232
PP
16
x
C240
M3
(LFF)
E5-‐2640
(12
cores)
128GB
12x
3TB
7.2K
SATA
73
2
x
UCS
6296
2
x
Nexus
2232
PP
16
x
C240
M3
(SFF)
2x
E5-‐2665
(16
cores)
256GB
24
x
1TB
7.2K
SAS
73
- 74. Hadoop
Hardware
Evolving
in
the
Enterprise
Typical
2009
Hadoop
node
• 1RU
server
• 4
x
1TB
3.5”
spindles
• 2
x
4-‐core
CPU
• 1
x
GE
• 24
GB
RAM
• Single
PSU
• Running
Apache
• $
©MapR
Technologies
-‐
Confiden6al
Economics
favor
“fat”
nodes
• 6x-‐9x
more
data/
node
• 3x-‐6x
more
IOPS/
node
• Saturated
gigabit,
10GE
on
the
rise
• Fewer
total
nodes
lowers
licensing/
support
costs
• Increased
significance
of
node
and
switch
failure
74
Typical
2012
Hadoop
node
• 2RU
server
• 12
x
3TB
3.5”
or
24
x
1TB
2.5”
spindles
• 2
x
8-‐core
CPU
• 1-‐2
x
10GE
• 128
GB
RAM
• Dual
PSU
• Running
MapR
• $$$
- 75. Seamless
Integra0on
with
Enterprise
ETH
1
ETH
2
SAN
B
Applica0ons
SAN
A
MGMT
MGMT
Uplink
Ports
OOB
Mgmt
Fabric
Switch
Server
Ports
Fabric
Extenders
Virtualized
Adapters
Compute
Blades
Half
/
Full
width
©MapR
Technologies
-‐
Confiden6al
6200
Fabric
A
F
E
X
A
Cluster
Chassis
1
F
E
X
B
CNA
6200
Fabric
B
FEX A
FEX B
CNA
Rack Mount
B200
75
- 76. Extending
UCS
Enterprise
Applica0on
Ecosystem
to
Big
Data
Big Data Common
Platform Architecture
Enterprise
Applications
UCS Rack-Mount
Servers
©MapR
Technologies
-‐
Confiden6al
76
UCS Blade
Servers
SAN/NAS
Arrays
- 77. UCSM
policy-‐based
management,
provisioning,
and
monitoring
for
Big
Data
Infrastructure
UCS
Management
(160
Nodes
per
UCS
Managed
Cluster
Domain)
• Cluster
Layout
and
Inventory
• Per-‐Server
Inventory
• ID
Pools
(MAC,
IP,
UUID)
Management
Inventory &
Asset Mgmt
Fault Detection
& SW Updates
QoS Policies &
Power Capping
©MapR
Technologies
-‐
Confiden6al
• Fault
detec6on
&
Logs
• Event
Aggrega6on
• System
so•ware
updates
• QoS
Policy
defini6on
• Policy
driven
framework
• Policy
Based
Power
Capping
77
- 78. CPA:
High-‐performance
unified
fabric
and
compute
increases
cluster
efficiency
Single
wire
for
data
and
management
8
x
10GE
uplinks
per
FEX=
2:1
oversub
(16
servers/rack),
no
portchannel
(sta6c
pinning)
2
x
10GE
links
per
server
for
all
traffic,
data
and
management
©MapR
Technologies
-‐
Confiden6al
78
- 79. Cisco
Unified
IO
Grant
Bandwidth
3G/s
2G/s
Individual
Ethernets
LAN
Traffic
(HDFS
Import)
3G/s
3G/s
Cluster
Traffic
(Shuffle)
3G/s
3G/s
Priori6sed
QoS
3G/s
Applica6on
Traffic
(HBase)
4G/s
5G/s
t1
t2
•
Near
Wire
Speed
without
CPU
load
•
Dynamic
bandwidth
management
according
to
SLA’s
•
See
network
sec6on
for
more
©MapR
Technologies
-‐
Confiden6al
79
t3
- 80. Scaling
the
CPA
L2/L3
Switching
Single Rack
16 servers
Single Domain
Up to 10 racks, 160 servers
UCS
Manager
UCS
Central
©MapR
Technologies
-‐
Confiden6al
Multiple
Domains
80
80
- 81. Big
Data
Infrastructure
UCS
Mul6-‐Domain
(UCS
Central
Manages
up
to
10,000
nodes)
• Inventory,
Fault,
Log,
Event
Aggrega6on
• Global
ID
Pools,
Firmware
Updates,
Backups
and
Global
Admin
Policies
• Global
Service
Profiles,
Templates
&
Policies
• Sta6s6cs
Aggrega6on
• HA
for
UCS
Central
Virtual
Machine
with
shared
storage
©MapR
Technologies
-‐
Confiden6al
81
- 85. Big
Data
Accelera0on
-‐
Key
Benefits
§ Rapid
Big
Data
plaLorm
deployment/Accelerate
Big
Data
ROI
§
Ease
of
infrastructure
management
and
cluster
administra6on
§
Support
for
mission
cri6cal
workloads
§
Enterprise-‐ready
workload
automa6on
§
Powerful
plaLorm
for
high
performance
and
high
capacity
§
Produc6on
ready
with
full
data
protec6on
and
disaster
recovery
§
Support
for
wide
variety
of
Big
Data
applica6ons,
including
but
not
limited
to:
– data
warehouse
offload,
– predic6ve
analy6cs,
– 360°
view
of
the
customer,
– recommenda6on
engine,
and
– long-‐term
data
store
©MapR
Technologies
-‐
Confiden6al
85
- 86. Big
Data
Accelera0on
Kit
Consul0ng
Services
16-‐node
M7
UCS
Cluster
ü Data
strategy
&
explora6on
ü Integra6on
planning
ü Installa6on
&
configura6on
ü Highly
scalable
Cisco
UCS
CPA
solu6on
ü HA
and
full
data
protec6on
ü Advanced
admin
console
Helping
You
Get
Started
Formal
Training
&
Support
Hadoop
Self
Training
ü Free
admin
training
for
(2)
ü 24/7
support
©MapR
Technologies
-‐
Confiden6al
ü Series
of
jumpstart
videos
ü User
forum
access
86