Weitere ähnliche Inhalte
Ähnlich wie Carpe Datum: Building Big Data Analytical Applications with HP Haven (20)
Mehr von DataWorks Summit (20)
Kürzlich hochgeladen (20)
Carpe Datum: Building Big Data Analytical Applications with HP Haven
- 1. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
CarpeDatum
Building Big Data Analytical Applications with HP Haven
Steve Sarsfield (HP Big Data Software)
- 2. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2
Well-known Innovations at HP
Infrastructure Enterprise
Services
Reference
Architectures
- 3. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3
Ideal for Hadoop and scale-out parallel processing applications
ProLiant SL4540 Server
Maximize Storage
Capacity and Compute
Performance
• Latest Xeon processors
• Each chassis: 270 TB
• Each rack 2.43 PB
• Software for monitoring
health and utilization of
nodes and other
maintenance.
- 4. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4
HP Haven
Outcomes
Infrastructure
Engines
100%
Harness
of your data
Scale
Extreme
of your data
Speed
Extreme
In performing
analytics
Deploy
Seamlessly
next-gen
applications
anywhere
Cloud
Supports
with Haven
On-Demand
Advanced
Delivers
analytics leveraging
power of the cluster
Turbocharges
Analytics with
cluster-friendly
R language
Predictive
Composite Analytical Applications
HP Haven Platform
HadoopOn-premise Private cloud Public cloud
Human data
Machine data
Business data
• Understand human information
• Voice, video, facial recognition
and more
HP IDOL
• Big Data analytics
• Full ANSI SQL engine on MPP
architecture
HP Vertica
• Predictive analytics
• Leverage your cluster for
R-based predictive analytics
Distributed R
- 5. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5
Vertica Core Capabilities – Built for Speed
We boost performance
Use to take Now takes
1 hour 3.6 Seconds
8 hours (overnight) Under 30 seconds
What 1000% means:
"When we did the first queries, they were done so
fast, we thought they were broken.“
- Michael Relich, Guess?
- 6. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6
Vertica Optimizations
Columnar
Storage
Compression MPP Scale-Out Distributed
Query
Projections
Speeds Query Time
by Reading Only
Necessary Data
Lowers costly I/O to
boost overall
performance
Provides high
scalability on
clusters with no
name node or other
single point of
failure
Any node can initiate
the queries and use
other nodes for
work. No single point
of failure
Combine high
availability with
special
optimizations for
query performance
CPU
Memory
Disk
CPU
Memory
Disk
CPU
Memory
Disk
A B D C E A
- 7. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7
• HP Vertica for SQL on Hadoop offers the only
full-featured query engine on Hadoop
• Same Core Engine
• Hadoop Distribution Agnostic
• Enterprise-ready Solution
• World-class Enterprise Support and Services
• Open platform
• Ready for Haven
• Competitive price point
HP Vertica for SQL on Hadoop
Hadoop Storage
Vertica ANSI SQL
Data
Exploration
- 8. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8
One Query Engine to Serve it all
• Query data in place in Hadoop Formats
• Leverage existing Hadoop infrastructure
• Avoid unnecessary data movement
• Single query engine across diverse formats and infrastructure
Query Engine
Format
File System Vertica (EXT4)
Vertica Optimized (ROS, Flex Tables)
HP Vertica
ANSI SQL
Hadoop (HDP, CDH, MapR NFS)
Hadoop (ORC, Parquet, et al)
Query data without moving the data
- 9. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9
Hortonworks/Vertica Beats the Competition
Based on TCP-DS Benchmarks
ACID Protection,
Data Integrity
Full-function
ANSI SQL
Completeness of
Query
Concurrency of
Workload
HP Vertica
ANSI SQL
Hadoop (HDP)
ORC file reader
HP Vertica
SQL on Hadoop
Faster than Parquet
About 10% faster
Faster than our old connector strategy
More complete than Impala
98% VS 30% query completion
- 10. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10
More HW Partner Integration
Ambari Integration
• Integration with our own management
console
• Ability to monitor Vertica health and
resources, and those Hadoop services that
are enabled on the Vertica nodes.
• Ability to monitor metrics: hostname, IP,
memory usage, CPU usage, network
usage, disk usage.
Vertica Management Console (MC) can now monitor Vertica clusters and databases that are
running on Hadoop nodes
- 11. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11
More HW Partner Integration
Kerberos Integration
• Provides security and encryption
• Single Authentication to both
Vertica and Hadoop services via
Kerberos
• Eliminates need for dual
authentication steps for Hadoop
services
- 12. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12
HP Haven
Outcomes
Infrastructure
Engines
100%
Harness
of your data
Scale
Extreme
of your data
Speed
Extreme
In performing
analytics
Deploy
Seamlessly
next-gen
applications
anywhere
Cloud
Supports
with Haven
On-Demand
Advanced
Delivers
analytics leveraging
power of the cluster
Turbocharges
Analytics with
cluster-friendly
R language
Predictive
Composite Analytical Applications
HP Haven Platform
HadoopOn-premise Private cloud Public cloud
Human data
Machine data
Business data
• Understand human information
• Voice, video, facial recognition
and more
HP IDOL
• Big Data analytics
• Full ANSI SQL engine on MPP
architecture
HP Vertica
• Predictive analytics
• Leverage your cluster for
R-based predictive analytics
Distributed R
- 13. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13
This is your brain on…
R
Distributed R
- 14. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14
A scalable and high-performance platform for the R language
Use familiar GUIs
and packages like
R Studio
Analyze data too
large for vanilla R
Leverage multiple
nodes for
distributed
processing
Vastly improved
performance
Distributed R
- 15. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15
HP Vertica Distributed R
Build Models
Evaluate Models
Deploy Models
(In-database scoring)BI Integration
1 2
3
Build and evaluate
predictive models on
large datasets using
Distributed R
2
1 Ingest and prepare
data by leveraging HP
Vertica Analytics
Platform
3
Deploy models to
Vertica and use in-
database scoring to
produce prediction
results for BI and
applications.
Operationalize big data predictive analytics
- 16. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16
Distributed R
Out-of-the-box Distributed Algorithms
Algorithm Use cases
Linear Regression (GLM) Risk Analysis, Trend Analysis, etc.
Logistic Regression (GLM)
Customer Response modeling, Healthcare analytics
(Disease analysis)
Random Forest Customer churn, Market campaign analysis
K-Means Clustering
Customer segmentation, Fraud detection, Anomaly
detection
Page Rank Identify influencers
- 17. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17
#Records (billion) #Servers Time (sec)
0.3 1 145
1.0 3 165
1.7 5 173
2.9 8 270
Algorithm: Logistic regression
Setup: DL 380 servers, 16*2 HT cores/server, 120GB RAM
Distributed R in a single node is ~30x faster than R’s glm
Dataset: 100Mx7 (~6GB), 1xDL 380
Distributed RPredictive Models on billions of observations
- 18. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18
HP Haven
Outcomes
Infrastructure
Engines
100%
Harness
of your data
Scale
Extreme
of your data
Speed
Extreme
In performing
analytics
Deploy
Seamlessly
next-gen
applications
anywhere
Cloud
Supports
with Haven
On-Demand
Advanced
Delivers
analytics leveraging
power of the cluster
Turbocharges
Analytics with
cluster-friendly
R language
Predictive
Composite Analytical Applications
HP Haven Platform
HadoopOn-premise Private cloud Public cloud
Human data
Machine data
Business data
• Understand human information
• Voice, video, facial recognition
and more
HP IDOL
• Big Data analytics
• Full ANSI SQL engine on MPP
architecture
HP Vertica
• Predictive analytics
• Leverage your cluster for
R-based predictive analytics
Distributed R
- 19. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19
IDOL
Audio, video and picture analysis Find related concepts and search
Optical Character Recognition
HUMAN INFORMATION
Much more at
IdolOnDemand.COM
- 20. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
MoreHPPlugs
2
- 21. © Copyright 2015 Hewlett-Packard Development Company,L.P. The information contained herein is subject to change without notice.
HP Services – a path to Hadoop value through consulting and SI
Challenges – ROI and
business value inhibitors
• Business led drivers
• Use cases with business
value
• Maturing technology
stack
• Hadoop skills in the
market
Success
Factors
• Expert guidance
• Relevant business cases
• Discovering value-driven
uses
• Execution plan to extract
value
• Leveraging the right
deployment option
Consulting
and SI
• Demonstrated Hadoop
expertise
• Ability to deploy
around existing
infrastructure
• Roadmap to accelerate
time value
• Attain top
management relevance
and impact
HP Analytics & Data
Management Services
• Discovery
• Integration
• Workload
optimization
• Platform
– As a service
– On premises
– Managed
• Analytics
Merge traditional Business Intelligence with new Big Data technologies and transition to a more
data-driven and agile enterprise.
- 22. © Copyright 2012 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice.
22
HPBigDataConference2015
August10-13
WestinWaterfrontHotel,Boston
REGISTER TODAY!
For general inquiries: Info.BDC@hp.com
For inquires on becoming a sponsor: Sponsorinfo.BDC@hp.com
#HPBigData2015
40+ technical sessions
Network with hundreds of peers
Hands-on hackathon
Just Announced!
- 23. © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23
Community Edition
• Free Download 1TB, 3 nodes
my.vertica.com/evaluate
Learn More About – and Try! - HP Vertica
- 24. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
ThankYou