2. Agenda
⢠A Data-driven world
⢠HPE Contribution to Spark
⢠HPE Innovations for Hadoop
⢠Enterprise Grade SQL Analytics for Hadoop
⢠Data-centric Security for Hadoop
⢠HPE Data Discovery service
to help you pull together these innovations
6. HPE and Hortonworks joint announcement
Hortonworks announcement event on March 1st
7
HPE CTO Martin Fink on stage
7. HPE Contribution to Apache Spark
Martin Fink announcement
Hortonworks and HP Labs join forces to boost Spark
Hewlett Packard Labs is working with Hortonworks to enhance the efficiency and scale of memory for
the enterprise and to dramatically improve memory utilization
â Enhanced shuffle engine technologies. Faster sorting and in-memory
computations, which has the potential to dramatically improve Spark performance
â Better memory utilization. Improved performance and usage for broader
scalability, which will help enable new large-scale use cases
âWe're hoping to enable the Spark community to derive insight more rapidly,
from much larger data sets, without having to change a single line of codeâ
Martin Fink, CTO & Director HPLabs
8
Tested with customers from the Financial services industry
Provides from 3x to 15x performance increases
9. HPE Servers and Architectures for Hadoop
Traditional
⢠Tried-and-True Platform
⢠Corp standard: âI buy DL380âsâ
⢠Small to large deployments
(very often ~20 nodes)
⢠Linear growth of balanced
workloads
Optimized
⢠Purpose-Built for Big Data
⢠Mid-size to large deployments
⢠Single, resource-intensive
workload
⢠Workload optimized
⢠Multi-temperate storage
⢠âOptimized traditionalâ
⢠Higher density, lower TCO
Converged
⢠MPP DBMS approach + open source
⢠Mid-size to large deployments
⢠Non-linear storage and
compute/memory growth
⢠Multiple workloads, latency demands
⢠Isolate workload hot spots
⢠Scale compute and storage
separately, elastically
⢠Innovative, TCO-driven approach
ProLiant
DL380
Gen9
UID
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
UID UID
21
UID
Apollo
4500
Gen9
UID
Tray 2
22191613
24211815
10741
12963
Tray 1
Pull for tray 2Pull for tray 2
Apollo
4200 Gen9
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo
2000 System
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
UID UID UID
21
UID
21
UID
21
UID
Apollo
4500
Gen9
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo
2000 System
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
ProLiant
DL380
Gen9
UID
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
Symmetric Architectures Asymmetric Architecture
Conventional Wisdom Forward-thinking
UID
28
30
29
31
33
21
34
36
35
37
39
38
40
42
41
43
45
44
1
3
2
4
6
5
7
9
8
10
12
11
13
15
14
16
18
17
19
21
20
22
24
23
25
27
26
BA
Moonshot
1500
DL380 Gen9
Apollo 4xxx
Moonshot & Apollo
10. HPE Reference Architecture(s) for Hadoop
⢠Scaling from 4 to thousands of HPE Servers
⢠Sized to customerâs workload and storage needs
⢠Impressive Processor and Storage density
A set of pre-tested hardware components
⢠Processor, Drives, Network, 1TB/8TB disk size etc ...
Breakthrough economics, density, simplicity
Flexible, pre-approved & optimized configurations
HPE Apollo 4000
example
24 x HPE
ProLiant
Apollo 4530
Worker Nodes
HPE 5900 10GbE
HPE 5930 10GbE x 2
Network Switches
3 x DL360 Gen9
Head Nodes
Apollo 4510
3.5 PB raw storage
900 TB Hadoop usable
960 Xeon E5 cores
for a full rack
Apollo 4530
UID
ProLiant
DL380e
Gen8
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
DL 380
2.46 PB raw storage
630 TB Hadoop usable
756 Xeon E5 cores
for a full rack
UID
ProLiant
DL380e
Gen8
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
Apollo 4200
4.6 PB raw storage
1 PB Hadoop usable
756 Xeon E5 cores
for a full rack
UID
10 134 71
11 145 82
12 156 93
UID
10 134 71
11 145 82
12 156 93
UID
10 134 71
11 145 82
12 156 93
UID UID UID
ProLiant
SL4540
Gen8
SATA
7.2K
500GB
SATA
7.2K
500GB
SATA
7.2K
500GB
SATA
7.2K
500GB
SATA
7.2K
500GB
SATA
7.2K
500GB
5.3 PB raw storage
1.3 PB Hadoop usable
320 Xeon E3 cores
for a full rack
11. HPE Apollo 4200 - Bringing Big Data storage server density to enterprise
Used as standard Hadoop Worker node and BDRA Asymmetric Storage node
Storage density
28 LFF Data drives
DataCenter
Plug and play
Performance and
efficiency
Divide by 2 the number of server
Divide by 2 the number of Network ports
Divide by 2 the needed square meters
Lower the number of needed licenses/subscriptions
Highest storage density in a traditional 2U rack server - 224 TB ď up to 4.6PB / rack
Perfect core/spindle ratio of 1 with 28 cores (2 x 14) and 28 drive spindles
Enterprise bridge
Fits traditional enterprise/SME rack server data centers
Lower the electric power needs
Configuration flexibility
Balanced capacity, performance and throughput with
flexible options - Disks, CPUs , I/O and interconnects
12. Hadoop on HPE Moonshot
What would be a good server cartridge for Hadoop ?
Processing
â Number of Xeon cores : 8
â very efficient I/Os
Memory
â Memory : 128GB
Storage
â Data storage : 2TB m.2 (SSD)
Network
â Fast network (2 x 10GbE)
â Low latency chassis interconnect
14
Impala
SQL on Hadoop
45 x 128GB = 5.6TB RAM - 45 x 2TB = 90TB fast Data storage in 4U
45 servers per enclosure
14. HPE Big Data Reference Architecture
HPE Brings Enterprise Data Center Architecture to Hadoop
Traditional Hadoop Cluster Architecture
â Compute and storage are always co-located
â All servers are identical
â Data is partitioned across servers on direct attached storage
HPE Big Data Reference Architecture
â Separate, optimized compute and storage tiers
connected by high speed networking
â Standard Hadoop installed with storage components on the
storage servers and applications on the compute servers
â Enabled and optimized by purpose-selected HPE Moonshot and
Apollo servers and HPE/Hortonworks workload management
software (contributed to the community)
17
Servers
Applications,
data files
Compute Servers
Storage Servers
Applications,
intermediate data
Data files
Symmetric architecture
Asymmetric architecture
15. 10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo
2000 System
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo
2000 System
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo
2000 System
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo
2000 System
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
Benefits of HPE Big Data Reference Architecture for Hadoop
Delivering value to the business
18
High Speed
Network
Data Consolidation
Hosting Multiple Workloads
Maximum Elasticity and
Workload Isolation
Balance and Scale Compute and Storage
Independently
Breakthrough Density and
TCO
HPE Moonshot or HPE Apollo
HPE Apollo 4xx0
16. Advantages* of HPE Big Data Reference Architecture
Room to Grow - The same performance in half the space
19
* Normalized on performance, based on Terasort testing
HPE Big Data
Reference Architecture
Traditional Architecture
Traditional Big Data
Architecture
HPE Big Data
Reference Architecture
Hadoop performance Equivalent
Density >2x more dense
Network bandwidth 40Gbit versus 10Gbit
HDFS Storage
performance
2x greater
Power (watts) Half the power
17. Independent scaling of compute and storage
Grow to match your workload and data sources
20
Hot (Compute) Configuration Cold (Storage) Configuration
HPE Big Data Reference ArchitectureTraditional Architecture
2.8x compute
97% of the storage capacity
4x the memory
1.6x compute
1.5x the storage capacity
2.5x the memory
90% of the compute
2.1x the storage capacity
1.5x the memory
18. HPE Big Data Reference Architecture
Hadoop and its ecosystem take advantage of the BDRA
Network Switches
East - West Networking
Impala
SSD based Hard Disk based Archive
High Speed
Network
19. Enterprise Grade SQL Analytics for Hadoop
⢠Develop your own analytical applications with
full-functionality ANSI SQL
⢠Vertica Inside - Powerful and Proven SQL
Query Engine
⢠Installs in Hadoop cluster, supporting Ambari,
YARN-ready
⢠Enterprise-Ready,
Stable with full ANSI SQL capabilities,
Predictive analytics
HPE Vertica SQL on Hadoop
YARN Apps
HDFS, ORC,
Parquet
Compute optimized Servers
Storage optimized Servers
SQL on Hadoop
20. First commercially available
columnar database
Native Advanced Analytics to
deliver insight at the speed of
business
Native Hadoop Integration
SaaS and AMI Cloud options
Support for new open source
architectures including
Kafka and Spark.
Core Vertica SQL Engine
Advanced Analytics
Open ANSI SQL Standards ++
R, Python, Java, ScalaCore is Key
Same core Vertica engine
delivers advanced analytics
wherever your enterprise
needs demand â today
and tomorrow.
HP Vertica for
SQL on Hadoop
Native support for ORC, Parquet
Supports all distributions
No helper node or single
point of failure
HP Vertica
Enterprise Edition
Columnar storage and
advanced compression
Industry leading performance
and scalability
Vertica Community Edition
Free up to 1 TB
Build a data-centric foundation
HPE Vertica Advanced Analytics Familyâ with enterprise-grade reliability and scalability
HP Vertica OnDemand
Get up and running in < 1HR
Pay by the TB or Query
HP Vertica AMI
Hundreds of TB deployed
Bring your own license to
Amazon Web Services
21. HPE Big Data Architecture long term view
Evolve to support multiple compute and storage blocks
Low Cost Nodes
SSD Nodes Disk Nodes Archive Nodes
Multi-temperate Storage using HDFS Tiering and ObjectStores
GPU Nodes FPGA Nodes Big Memory Nodes
Workload Optimized compute nodes to accelerate various big data software
23. HPE SecureData provides the missing data protection
26
Traditional IT
Infrastructure Security
Disk encryption
Database encryption
SSL/TLS/firewalls
Authentication
Management
Threats to
Data
Malware,
Insiders
SQL injection,
Malware
Traffic
Interceptors
Malware,
Insiders
Credential
Compromise
Security
Gaps
HPE SecureData
Data-centric Security
SSL/TLS/firewalls
Datasecuritycoverage
End-to-endProtection
Middleware/Network
Storage
Databases
File Systems
Data & Applications
Data
Ecosystem
Security gap
Security gap
Security gap
Security gap
24. HPE SecureData
Protecting sensitive and regulated data in Hadoop
â Stateless Key Management
â No key database to store or manage
â High performance, unlimited scalability
â Both encryption and tokenization technologies
â Customize solution to meet exact requirements
â Broad platform support
â On-premise / Cloud / Big Data
â Structured / Unstructured
â Hadoop, HPE Vertica, Linux, Windows, AWS, HPE NonStop,
Teradata, IBM z/OS, etc.
â Quick time-to-value
â Complete end-to-end protection within a common platform
â Format-preservation dramatically reduces implementation effort
27
HPE SecureData
Management Console
HPE SecureData
Web Services API
HPE SecureData
Native APIs
(C, Java, C#./NET)
HPE SecureData
Command Lines
HPE SecureData
Key Servers
HPE SecureData
File Processor
27. Align business goals and
challenges with the relevant
data
How to discover the value of your data
Evaluate your data and
quickly test, learn, and iterate
ideas to discover value
Create a strategic roadmap
based on learnings
Key HPE solutions
Data Discovery
Data Driven Transformation Planning
Business benefits
Agile execution to impactful projects
Maximize alignment to value
28. ⢠To help you with your journey, HPE Data Discovery
Solution provides an end-to-end approach to
realizing the value of your data
⢠Includes experienced consultants, proven
processes, modern big data analytics platforms
and infrastructure, and convenient delivery options.
⢠Empowers you to realize:
⢠Clear path to business insights and value
⢠Rapid exploration and real-time access
⢠Lower risk
⢠Lower costs
Business value metrics
⢠Improve business processes
⢠Enable better operations performance
⢠Understand customer better
⢠Increase market share, margin, and/or revenue
Business Value HPE Data Discovery Solution Framework
ď Discovery Workshop
HPE Vertica, HPE IDOL, Hadoop, SAP HANA
Premises Cloud
ď Discovery Experience
ď Discovery Production Implementation
Discovery Lab
HPE Servers and Storage
29. Rapid, low-risk, securely designed path to big data value delivered as-a-service
in the HPE Cloud or on Client premises
Expertise
HPE data
scientists,
technology
experts, industry
SMEs
Big data
platforms
HPE Haven,
Hadoop, SAP
HANA, etc.
Platform
flexibility
On premise or
cloud-based
delivery
models
Guided process
Proven processes
to accelerate time-
to-value
Use case library
Industry and
business function
examples
Discovery Production
Implementation
Operationalize and monetize the
new insights by implementing them
into your business processes
Discovery Workshop
One to two-day workshop to align
business and IT, discuss opportunities
and determine priorities
Discovery Experience
A private, secure and low risk big
data âtest-driveâ functional and
technical environment
HPE Data Discovery Service
Big data
infrastructure
HPE Moonshot,
HPE Apollo,
HPE 3PAR, HPE
ProLiant
Data
discovery lab
Rapid
deployment of
data discovery
labs
ď ď ď
31. HPE Solution for Hadoop
36
BigData
AnalyticsRA
HPE Vertica SQL for
Hadoop
SAP HANA HPE IDOL
Hadoop Reference Architectures for MapR, Hortonworks & Cloudera
HPE Information
Governance
Hadoop
HPE Apollo + Moonshot + ProLiant
HPE Analytics Consulting Services for Hadoop
HPEIntegration
Services
On-Premise and Hybrid Cloud deployment options
Flexible, Purpose-built
Infrastructure
High-Performing
Analytics Engines
Consulting &
Implementation Services
32. High performance computing
2x Hadoop performance
or 50% less space
HPE Infrastructure Big Data
Reference Architecture
Analyze at scale and speed
100% of your data
10x to 1,000x faster
HPE Big Data platform
Powered by Vertica & IDOL
Secure and govern
Protect and manage
your data and reputation
HPE Security and Governance
Solutions for Hadoop
Data management, data discovery and governance services
Build a Data Centric Foundation
Hadoop for the Enterprise
33. Why Hewlett Packard Enterprise?
Enterprise Scale with Hadoop
Solution leadership Market leadership Flexible and OpenExperience and
expertise
3000+ global analytics and
data management
professionals
Hundreds of data scientists
Proven analytics and
compute platforms for all
data, environments, and
analytics
Services to deliver value
from discovery to
achieving business
outcomes
Gartnerâs Magic Quadrant
leader for:
â Enterprise Data
Warehouse and Data
Management Solutions
for Analytics (2015)
â eDiscovery (2015)
Solutions built on open-
standards, offering choice
and flexibility
Strong strategic alliances
complementing HPE
solutions
Session duration : 40 min
22 slides
30 to 35 min presentation
5 to 10 min Q&A
Empower Data-Driven Organizations with HPE and Hadoop
Data is the fuel for the idea economy, and being data-driven is essential for businesses to be competitive. HPE works with our partner Hortonworks to deliver a total solution for all your big data initiatives, accelerating the value of Hadoop. Join us in this session and you'll hear about: -â HPE Spark Optimizer â 15x performance improvement for Spark? Yes please â Hortonworks/HPE labs collaboration on enhancing spark for workloads with large shared pools of memory --â Data Discovery - Quickly discover the value of your data with the help of Analytics experts, starting with a data lab on your premise or delivered through the cloud â- Enterprise Grade Hadoop - Innovative asymmetrical compute and storage architecture with better performance per square feet and power utilization for unprecedented elasticity and scalability â- Security for Hadoop - HPE SecureData is a data-centric framework that protects sensitive data at rest, in motion, and in use in Hadoop and other Big Data systems â- SQL on Hadoop - analytics made easier, bridge your EDW legacy systems through tight integrations to Kafka, R, Python, and Apache Spark
We are living in a digital world where everyone is connected, everywhere. Weâre living in an Idea Economy, where the ability to turn an idea into a new product or service has never been easier. Anyone with an idea can actually change the world.
Â
Of course, ideas have always been the root of progress and business success. Theyâve launched companies, created markets and built industries. But thereâs a difference today. In this hyper-connected, technology-driven world, it takes more than good ideas to be successful.
Today, the tools that enable disruption â things like cloud computing, mobile technology, big data analytics â are so easily accessible and affordable, they have given rise to a new class of entrepreneurs. And, these challengers of the status quo are revolutionizing entire industries at a pace and scale never seen before.
In the Idea Economy, no industry is immune to disruption. Whether in energy, healthcare, manufacturing or telecommunications, companies â be they start-ups or large enterprises â can only survive if they have both the vision and technological agility to respond to market opportunities and threats and quickly turn ideas into reality.
Today, an entrepreneur with a good idea has access to all of the infrastructure and resources that a traditional Fortune 1000 company would haveâŚand they can pay for it all with a credit card. They can rent compute on demand, get a SAAS ERP system, use PayPal or Square for transactions, they can market using Facebook or Google, and have FedEx run their supply chain.
Â
The days of needing millions of dollars to launch a new company or bring a new idea to market are fading fast.
Â
You donât have to look any further than more recent companies such as Vimeo, One Kings Lane or Dock to Dish â all HPE customers and partners. . . or with more common names like Salesforce, Airbnb, Netflix and Pandora to see how the Idea Economy is exploding.
Â
And how about Uber?
Uberâs impact has been dramatic since it launched its application to connect riders and drivers in 2009. Without owning a single car, it now serves more than 250 cities in 55 countries and has completely disrupted the taxi industry.
Â
San Francisco Municipal Transportation Agency says that cab use has dropped 65 percent in San Francisco in two years.
Ideas have always fueled business success but itâs how fast you can turn an idea into reality. Ask yourself, how quickly can I capitalize on a new idea, seize a new business opportunity or respond to a competitor that threatens my business?
4
5
Early results of the collaboration include the following:
Enhanced shuffle engine technologies: Faster sorting and in-memory computations, which has the potential to dramatically improve Spark performance. Â Â
Better memory utilization: Improved performance and usage for broader scalability, which will help enable new large-scale use cases.
Apollo 4200 Storage calculation per rack if keeping two drives for the OS :
26 LFF Data drives x 8TB = 208TB per server
208TB x 21 = 4368TB = 4.26PB
1092TB = 1.066PB
21 x 18 cores x 2 = 756 cores
Apollo 4530 :
Number of servers per rack : 10 enclosure = 40U = 10 x 3 = 30 server nodes
30 x 15 drives = 450 drives x 8TB = 3600TB = 3.5PB
3600TB / 4 = 900 TB
30 x 16 cores x 2 = 480 x 2 = 960 cores
DL 380 :
18 cores x 2 x 21 = 756 cores
15 drives x 8TB x 21 = 2520 TB = 2.46PB
2520 / 4 = 630TB Hadoop usable
13
The last trend about where Hadoop is going, and this is very important to the Minotaur solution, is that Hadoop is becoming more asymmetric. Two things went into the core Hadoop trunk in the 2.6 release (December 2014). One is the concept of tiering within the file system, so that I can define disks as standard disks, SSDs or archival tier.
Very basic functionality â anyone thatâs been in the storage business probably looks at this and says, âhey, theyâve got a long way to goâ. Open source software isnât better, itâs just open source. The enemy of great is good enough, and the enemy of good enough is open source. ď
The interesting aspect of this is that Hadoop will now allow you to configure servers which are full of large disk drives with very little compute power to use for archival purposes. So, no longer is every node in the cluster the same; we start to have nodes which are more skewed towards certain types of functions and workloads.
At the same time, thereâs a feature that went into the YARN container environment called labels. We actually helped to do this â we contributed code and contributed to the spec through our relationship with Hortonworks. Labels lets you take the nodes in the cluster and assign them a label. Then, when you run an application under YARN, you can tell it which labels you want to run on. So, maybe I have a pool of nodes with a lot of memoryâŚI give them a label called âlots of memoryâ, and I can now run all of my Spark (in-memory) jobs on these nodes. Hadoop used to be this completely symmetric environment, and was all about taking the work to the data. No shoveling data around, moving it from place to place; Iâll move the work to the node where the data resides and run it against the data (sitting in internal storage) on that node. Now, weâre starting to see more of this asymmetry, perhaps reaching out to the node next to us to grab data from here that Iâm running on a different node. This asymmetry is important to our architecture â our architecture embraces the asymmetry and optimizes for it.
Hadoop management software (YARN node labels)
Key Takeaway: HP Big Data Reference Architecture is another innovation from HP that leverages the strength of HPâs portfolio to deliver value for our customers via a differentiated solution that combines HP Moonshot server and HP Apollo storage servers.
The Value Proposition and the HOW
Traditional scale-up infrastructures separate compute and storage for flexibility of scaling them independently, but at the cost of management complexity and cost
Scale-out architectures, and new technologies that use DAS storage within a server, lose this ability to scale independently by combining compute and storage in one box â a tradeoff for achieving hyper-scalability and simple management
HP Big Data Reference architecture deploys standard Hadoop distribution in an asymmetric fashion running the storage related components such as Hadoop Distributed File System (HDFS) and Hbase (open source non-relational distributed database) on Apollo Density Optimized servers and compute related components running under Yarn on Moonshot Hyperscale servers.
This essentially provides the best-of-both worlds; ability to scale compute and storage independently without losing the benefits of scale-out infrastructure.
In order to make this more flexible HP worked with Hortonworks to create a new feature in Hadoop called Yarn Labels â innovation that we contributed to Open Source! Yarn Labels allows us to create pools of compute nodes where applications run so it is possible to dynamically provision clusters without repartitioning data (since data can be shared across compute nodes)
We can scale compute and storage independently by simply adding compute nodes or storage nodes to scale performance linearly
This fundamentally changes the economics of the solution across scale, performance and cost efficiency to meet specific use case and workload needs!
Extremely Elastic
Nodes can be allocated by time of day or even for a single job without redistributing data
Vertica, Autonomy and our partners have high performance access to HDFS
Hadoop data can be efficiently shared
Can use the best platform for each task
Low power Moonshot, Compute intense DL380, Big Memory Superdome, etc
No Longer committed to CPU/Storage ratios
Compute cores can be allocated as needed
Better Capacity management
Compute nodes can be provisioned on the fly.
Storage nodes are a smaller subset of the cluster thus less costly to overprovision
Same core engine as HP Vertica with Hadoop as the data storage layer
Perform analytics regardless of the format of data or Hadoop distribution used
Robust, enterprise-ready solution with world-class enterprise support and services
Open APIs and developer tools with a vibrant ecosystem of partners to support your big data project
Ease management of big data â solution is part of a greater HP Enterprise Software Platform â Haven
Unique types of hardware coming to play : we think that in the near future we are going to see FPGA and GPUs and other types of silicon accelerations become very common in Hadoop
We can take those kinds of new hardware into this architecture and let them run alongside of the existing hardware. And then just designate to workload that can take advantage of those platforms to those systems.
Ability to consolidate clusters â workloads isolation
Application will run unchanged.
Being very community driven is important for us.
CI : Converged Infrastructure
So how to you discover the value of your data?
Aligning business goals and challenges with the right impact levers. If youâre business goal is increasing customer loyalty â what are the impact levers that influences the outcomeâŚcustomer sentiment, product performance, customer service productivity â and how can you use data and insights to positively affect those levers?
Evaluating your data to quickly test, learn, and iterate ideas to discover value. Speed and agility are key â and you donât have time nor resources to make large bets without proving value first. So how can you test ideas with the least upfront investments?
Creating a prioritized roadmap of projects. Project execution has to be agile, but you have to have the goal in mind and align to it, but of course this is not about hardsetting a 10-yr plan, but so you have to have flexibility built in to pivot when necessary.
Through a consultative approach, HPE helps you aligns people, processes, and technologies through out Data Discovery workshops, and quickly evaluate your data through on-demand offerings of our analytics platforms (Vertica + IDOL)
The outcome is agile execution of projects and maximizing alignment to value
Emmi
Business Need â Emmi is a Switzerland top dairy processor. The measurability of the effectiveness and efficiency of marketing campaigns in the age of âdigital cross-mediaâ represented a challenge and they wanted to increase market presence, identify customer needs to increase potential revenue, and build brand awareness. They knew that there is customer interaction data available on the web from, but didnât know how to leverage it.
HPE Solution - HPE provided Big Data Discovery Experience Services to collect data on Emmi's customers from a wide range of sources and analyze it via a secure HPE cloud analytics environment based on IDOL.
Business Outcome - Emmi now has a 360-degree view of its customers, consumers and influencers and can address them in a more targeted manner, Marketing activities are now targeted and used in real time enabling Emmi to effectively reach customers and ultimately reduce costs, and Emmi realized that the correct communication, marketing activity and customer experience would help improve their business
Global Mining Service Provider
Business Need â This company wanted to gain insight into sensor data to help improve equipment maintenance, operational losses, and safety practices. They wanted to also see how they can leverage these insights to innovate their business model.
HPE Solution â providing the analytics capability through its Big Data Discovery Experience services and environment
Business Outcome â able to detect equipment failures early as part of predictive analytics analysis and Identify operational losses
Blabla Car
Business Need â This is an Idea Economy company that is revolutionizing transportation and car sharing. To support its growth, BlaBlaCar needed a way measure the effectiveness of its website design and understand customer usage patterns in order make it easier for customers to complete a transaction.
HPE Solution â HPE Vertica. This is a âdata discoveryâ customer because they first utilized the Community Edition, then once they were ready to scale, they switched to Enterprise Edition. Community Edition gave them a platform to discover the value of their data.
Business Outcome - HPE helped to build BlaBlaCarâs data-centric foundation, allowing them to improve their customersâ online experience by analyzing massive amounts of both structured and unstructured data at up to 1,000 times faster than traditional data warehouse solutions. By rapidly collecting and processing clickstream data, BlaBlaCar was able to measure and improve the effect of changes made to their websites and find new ways to engage their customers
Core across these is to break down the existing silos between your various EDWs, ECMs, and BI tools, accumulated through game-of-throne fiefdoms and acquisitions.
Many companies are trying to extract the data from these systems and dump them into an open standard repository thatâs capable of running analytics and can scale into Petabyte ranges, Hadoop. But as good as Hadoop is, you really need a partner that can make it great, scale it, secure it, enable it to hit top marks in every use case.
HP Haven can allow you to unleash the power of Hadoop and realize its full potential.
It starts with high performance computing. We have helped customers see 2x improvements in Hadoop performance while using 50% less space by using HP big data reference architectures running on the entire range of our state of the art servers from the DL380 all the way up to Apollo and Moonshot.
Analyze at speed and scale. It is not unusual to see performance improvements of one thousand fold by using HP Haven alongside Hadoop. With Vertica we can approach SQL query speeds of SAP Hana but over Petabytes of data instead of sub 100TB ranges. Donât get me wrong, there are clearly times when you need that extra speed of in-memory and here we have very both the services and hardware to support, likewise for Microsoft PDW.
Equally as important, you must govern and protect the data. Here we have HP Control Point, Records Manager, Data Integrator for governance with products like Data Protector, HP Voltage for at rest encryption of all the data you put into your Hadoop Smart Data lake.
In essence, weâve innovated on Hadoop with Haven so that you can better innovate.
Claim: 2x Hadoop performance, or 50% less space
Source: http://www8.hp.com/de/de/hp-news/press-release.html?id=1964038#.VWT5lc9Viko
Claim: 100% of your data, 10x-1000x faster
Sources: 100% of your data â we can analyze machine data, human data & business data; 3 out of 3 = 100%. Get answers up thousands of time faster answers, e.g. the Game Show Network delivers âA/B testâ results 2700x faster than on MySQL - 12 seconds vs 9 hours, watch video here: http://h30614.www3.hp.com/Discover/OnDemand/LasVegas2013/SessionDetail/55cdb60b-ef50-42eb-9008-80f358da2a11;
Why HPE for Empowering the Data-Driven Organization?
HPE can help you become a data-driven organization and maximize your outcomes, no matter where you are in your transformation to a data-driven organization. HPE will help you to discover the value of your data, help you to build your data-centric foundation and enable you to achieve superior business outcomes, and build a sustainable, integrated approach that empowers a data-driven organization.
Â
Proven Experience & Expertise
¡          10,000 customer engagements
¡          Hundreds of data scientists
¡          3000+ dedicated global analytics and data management professionals
Â
Technology Leadership
¡          Best-in-class analytics and compute platforms for all use cases
Â
Market Leadership
¡          HPE is recognized as a leader in the Gartnerâs leaders quadrant in the Enterprise Data Warehouse and Data Management Solutions for Analytics (2015)
¡          HPE is recognized as a leader information governance through in Gartnerâs Magic Quadrant report for eDiscovery (2015)
¡          Forrester Wave Report: forthcoming Flexible & Open
¡          HPE solutions built on open-standards, offering choice and flexibility
¡          Integrated rich partner ecosystem