HPE Empower Data-Driven Organizations with Hadoop

Empower Data-Driven Organizations
with HPE and Hadoop
Gilles Noisette – HPE EMEA Big Data CoE
04/13/2016

Agenda
• A Data-driven world
• HPE Contribution to Spark
• HPE Innovations for Hadoop
• Enterprise Grade SQL Analytics for Hadoop
• Data-centric Security for Hadoop
• HPE Data Discovery service
to help you pull together these innovations

Transform
to a hybrid
infrastructure
Enable
workplace
productivity
Protect
your digital
enterprise
Empower
the data-driven
organization

Transform
to a hybrid
infrastructure
Enable
workplace
productivity
Protect
your digital
enterprise
Empower the data-
driven organization
Harness 100% of your
relevant data to empower
people with actionable
insights that drive superior
business outcomes.

Enterprise Spark at scale
HP Labs is helping make Apache Spark better

HPE and Hortonworks joint announcement
Hortonworks announcement event on March 1st
7
HPE CTO Martin Fink on stage

HPE Contribution to Apache Spark
Martin Fink announcement
Hortonworks and HP Labs join forces to boost Spark
Hewlett Packard Labs is working with Hortonworks to enhance the efficiency and scale of memory for
the enterprise and to dramatically improve memory utilization
– Enhanced shuffle engine technologies. Faster sorting and in-memory
computations, which has the potential to dramatically improve Spark performance
– Better memory utilization. Improved performance and usage for broader
scalability, which will help enable new large-scale use cases
“We're hoping to enable the Spark community to derive insight more rapidly,
from much larger data sets, without having to change a single line of code”
Martin Fink, CTO & Director HPLabs
8
Tested with customers from the Financial services industry
Provides from 3x to 15x performance increases

HPE Innovations for Hadoop
Optimized Infrastructure and Architecture
10

HPE Servers and Architectures for Hadoop
Traditional
• Tried-and-True Platform
• Corp standard: “I buy DL380’s”
• Small to large deployments
(very often ~20 nodes)
• Linear growth of balanced
workloads
Optimized
• Purpose-Built for Big Data
• Mid-size to large deployments
• Single, resource-intensive
workload
• Workload optimized
• Multi-temperate storage
• “Optimized traditional”
• Higher density, lower TCO
Converged
• MPP DBMS approach + open source
• Mid-size to large deployments
• Non-linear storage and
compute/memory growth
• Multiple workloads, latency demands
• Isolate workload hot spots
• Scale compute and storage
separately, elastically
• Innovative, TCO-driven approach
ProLiant
DL380
Gen9
UID
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
UID UID
21
UID
Apollo
4500
Gen9
UID
Tray 2
22191613
24211815
10741
12963
Tray 1
Pull for tray 2Pull for tray 2
Apollo
4200 Gen9
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo
2000 System
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
UID UID UID
21
UID
21
UID
21
UID
Apollo
4500
Gen9
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo
2000 System
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
ProLiant
DL380
Gen9
UID
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
SATA
7.2K
3.0 TB
Symmetric Architectures Asymmetric Architecture
Conventional Wisdom Forward-thinking
UID
28
30
29
31
33
21
34
36
35
37
39
38
40
42
41
43
45
44
1
3
2
4
6
5
7
9
8
10
12
11
13
15
14
16
18
17
19
21
20
22
24
23
25
27
26
BA
Moonshot
1500
DL380 Gen9
Apollo 4xxx
Moonshot & Apollo

HPE Reference Architecture(s) for Hadoop
• Scaling from 4 to thousands of HPE Servers
• Sized to customer’s workload and storage needs
• Impressive Processor and Storage density
A set of pre-tested hardware components
• Processor, Drives, Network, 1TB/8TB disk size etc ...
Breakthrough economics, density, simplicity
Flexible, pre-approved & optimized configurations
HPE Apollo 4000
example
24 x HPE
ProLiant
Apollo 4530
Worker Nodes
HPE 5900 10GbE
HPE 5930 10GbE x 2
Network Switches
3 x DL360 Gen9
Head Nodes
Apollo 4510
3.5 PB raw storage
900 TB Hadoop usable
960 Xeon E5 cores
for a full rack
Apollo 4530
UID
ProLiant
DL380e
Gen8
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
DL 380
2.46 PB raw storage
630 TB Hadoop usable
756 Xeon E5 cores
for a full rack
UID
ProLiant
DL380e
Gen8
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
SATA
7.2K
2.0 TB
Apollo 4200
4.6 PB raw storage
1 PB Hadoop usable
756 Xeon E5 cores
for a full rack
UID
10 134 71
11 145 82
12 156 93
UID
10 134 71
11 145 82
12 156 93
UID
10 134 71
11 145 82
12 156 93
UID UID UID
ProLiant
SL4540
Gen8
SATA
7.2K
500GB
SATA
7.2K
500GB
SATA
7.2K
500GB
SATA
7.2K
500GB
SATA
7.2K
500GB
SATA
7.2K
500GB
5.3 PB raw storage
1.3 PB Hadoop usable
320 Xeon E3 cores
for a full rack

HPE Apollo 4200 - Bringing Big Data storage server density to enterprise
Used as standard Hadoop Worker node and BDRA Asymmetric Storage node
Storage density
28 LFF Data drives
DataCenter
Plug and play
Performance and
efficiency
Divide by 2 the number of server
Divide by 2 the number of Network ports
Divide by 2 the needed square meters
Lower the number of needed licenses/subscriptions
Highest storage density in a traditional 2U rack server - 224 TB  up to 4.6PB / rack
Perfect core/spindle ratio of 1 with 28 cores (2 x 14) and 28 drive spindles
Enterprise bridge
Fits traditional enterprise/SME rack server data centers
Lower the electric power needs
Configuration flexibility
Balanced capacity, performance and throughput with
flexible options - Disks, CPUs , I/O and interconnects

Hadoop on HPE Moonshot
What would be a good server cartridge for Hadoop ?
Processing
– Number of Xeon cores : 8
– very efficient I/Os
Memory
– Memory : 128GB
Storage
– Data storage : 2TB m.2 (SSD)
Network
– Fast network (2 x 10GbE)
– Low latency chassis interconnect
14
Impala
SQL on Hadoop
45 x 128GB = 5.6TB RAM - 45 x 2TB = 90TB fast Data storage in 4U
45 servers per enclosure

HPE Asymmetric Architecture for Hadoop
HPE Vertica SQL on Hadoop
Enterprise-Grade Hadoop
15

HPE Big Data Reference Architecture
HPE Brings Enterprise Data Center Architecture to Hadoop
Traditional Hadoop Cluster Architecture
– Compute and storage are always co-located
– All servers are identical
– Data is partitioned across servers on direct attached storage
– Separate, optimized compute and storage tiers
connected by high speed networking
– Standard Hadoop installed with storage components on the
storage servers and applications on the compute servers
– Enabled and optimized by purpose-selected HPE Moonshot and
Apollo servers and HPE/Hortonworks workload management
software (contributed to the community)
17
Servers
Applications,
data files
Compute Servers
Storage Servers
Applications,
intermediate data
Data files
Symmetric architecture
Asymmetric architecture

10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo
2000 System
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo
2000 System
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo
2000 System
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo
2000 System
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
SAS
900 GB
10K
Benefits of HPE Big Data Reference Architecture for Hadoop
Delivering value to the business
18
High Speed
Network
Data Consolidation
Hosting Multiple Workloads
Maximum Elasticity and
Workload Isolation
Balance and Scale Compute and Storage
Independently
Breakthrough Density and
TCO
HPE Moonshot or HPE Apollo
HPE Apollo 4xx0

Advantages* of HPE Big Data Reference Architecture
Room to Grow - The same performance in half the space
19
* Normalized on performance, based on Terasort testing
HPE Big Data
Reference Architecture
Traditional Architecture
Traditional Big Data
Architecture
HPE Big Data
Hadoop performance Equivalent
Density >2x more dense
Network bandwidth 40Gbit versus 10Gbit
HDFS Storage
performance
2x greater
Power (watts) Half the power

Independent scaling of compute and storage
Grow to match your workload and data sources
20
Hot (Compute) Configuration Cold (Storage) Configuration
HPE Big Data Reference ArchitectureTraditional Architecture
2.8x compute
97% of the storage capacity
4x the memory
1.6x compute
1.5x the storage capacity
2.5x the memory
90% of the compute
2.1x the storage capacity
1.5x the memory

Hadoop and its ecosystem take advantage of the BDRA
Network Switches
East - West Networking
Impala
SSD based Hard Disk based Archive
High Speed
Network

Enterprise Grade SQL Analytics for Hadoop
• Develop your own analytical applications with
full-functionality ANSI SQL
• Vertica Inside - Powerful and Proven SQL
Query Engine
• Installs in Hadoop cluster, supporting Ambari,
YARN-ready
• Enterprise-Ready,
Stable with full ANSI SQL capabilities,
Predictive analytics
HPE Vertica SQL on Hadoop
YARN Apps
HDFS, ORC,
Parquet
Compute optimized Servers
Storage optimized Servers
SQL on Hadoop

First commercially available
columnar database
Native Advanced Analytics to
deliver insight at the speed of
business
Native Hadoop Integration
SaaS and AMI Cloud options
Support for new open source
architectures including
Kafka and Spark.
Core Vertica SQL Engine
Advanced Analytics
Open ANSI SQL Standards ++
R, Python, Java, ScalaCore is Key
Same core Vertica engine
delivers advanced analytics
wherever your enterprise
needs demand — today
and tomorrow.
HP Vertica for
SQL on Hadoop
Native support for ORC, Parquet
Supports all distributions
No helper node or single
point of failure
HP Vertica
Enterprise Edition
Columnar storage and
advanced compression
Industry leading performance
and scalability
Vertica Community Edition
Free up to 1 TB
Build a data-centric foundation
HPE Vertica Advanced Analytics Family– with enterprise-grade reliability and scalability
HP Vertica OnDemand
Get up and running in < 1HR
Pay by the TB or Query
HP Vertica AMI
Hundreds of TB deployed
Bring your own license to
Amazon Web Services

HPE Big Data Architecture long term view
Evolve to support multiple compute and storage blocks
Low Cost Nodes
SSD Nodes Disk Nodes Archive Nodes
Multi-temperate Storage using HDFS Tiering and ObjectStores
GPU Nodes FPGA Nodes Big Memory Nodes
Workload Optimized compute nodes to accelerate various big data software

Data-centric security for Hadoop
Enterprise-Grade Hadoop
25

HPE SecureData provides the missing data protection
26
Traditional IT
Infrastructure Security
Disk encryption
Database encryption
SSL/TLS/firewalls
Authentication
Management
Threats to
Data
Malware,
Insiders
SQL injection,
Malware
Traffic
Interceptors
Malware,
Insiders
Credential
Compromise
Security
Gaps
HPE SecureData
Data-centric Security
SSL/TLS/firewalls
Datasecuritycoverage
End-to-endProtection
Middleware/Network
Storage
Databases
File Systems
Data & Applications
Data
Ecosystem
Security gap
Security gap
Security gap
Security gap

HPE SecureData
Protecting sensitive and regulated data in Hadoop
– Stateless Key Management
– No key database to store or manage
– High performance, unlimited scalability
– Both encryption and tokenization technologies
– Customize solution to meet exact requirements
– Broad platform support
– On-premise / Cloud / Big Data
– Structured / Unstructured
– Hadoop, HPE Vertica, Linux, Windows, AWS, HPE NonStop,
Teradata, IBM z/OS, etc.
– Quick time-to-value
– Complete end-to-end protection within a common platform
– Format-preservation dramatically reduces implementation effort
27
HPE SecureData
Management Console
HPE SecureData
Web Services API
HPE SecureData
Native APIs
(C, Java, C#./NET)
HPE SecureData
Command Lines
HPE SecureData
Key Servers
HPE SecureData
File Processor

28
Field level, format-preserving, reversible data de-identification
Customizable to granular requirements addressed by encryption & tokenization
Credit card
1234 5678 8765 4321
SSN/ID
934-72-2356
Email
bob@voltage.com
DOB
31-07-1966
Full 8736 5533 4678 9453 347-98-8309 hry@ghohawd.jiw 20-05-1972
Partial 1234 5681 5310 4321 634-34-2356 hry@ghohawd.jiw 20-05-1972
Obvious 1234 56AZ UYTZ 4321 AZS-UD-2356 hry@ghohawd.jiw 20-05-1972
FPE**SST*
*Secure Stateless Tokenization (SST)
**Format-Preserving Encryption (FPE)

Data Discovery service
Discover the value of your Data
29

Align business goals and
challenges with the relevant
data
How to discover the value of your data
Evaluate your data and
quickly test, learn, and iterate
ideas to discover value
Create a strategic roadmap
based on learnings
Key HPE solutions
Data Discovery
Data Driven Transformation Planning
Business benefits
Agile execution to impactful projects
Maximize alignment to value

• To help you with your journey, HPE Data Discovery
Solution provides an end-to-end approach to
realizing the value of your data
• Includes experienced consultants, proven
processes, modern big data analytics platforms
and infrastructure, and convenient delivery options.
• Empowers you to realize:
• Clear path to business insights and value
• Rapid exploration and real-time access
• Lower risk
• Lower costs
Business value metrics
• Improve business processes
• Enable better operations performance
• Understand customer better
• Increase market share, margin, and/or revenue
Business Value HPE Data Discovery Solution Framework
 Discovery Workshop
HPE Vertica, HPE IDOL, Hadoop, SAP HANA
Premises Cloud
 Discovery Experience
 Discovery Production Implementation
Discovery Lab
HPE Servers and Storage

Rapid, low-risk, securely designed path to big data value delivered as-a-service
in the HPE Cloud or on Client premises
Expertise
HPE data
scientists,
technology
experts, industry
SMEs
Big data
platforms
HPE Haven,
Hadoop, SAP
HANA, etc.
Platform
flexibility
On premise or
cloud-based
delivery
models
Guided process
Proven processes
to accelerate time-
to-value
Use case library
Industry and
business function
examples
Discovery Production
Implementation
Operationalize and monetize the
new insights by implementing them
into your business processes
Discovery Workshop
One to two-day workshop to align
business and IT, discuss opportunities
and determine priorities
Discovery Experience
A private, secure and low risk big
data “test-drive” functional and
technical environment
HPE Data Discovery Service
Big data
infrastructure
HPE Moonshot,
HPE Apollo,
HPE 3PAR, HPE
ProLiant
Data
discovery lab
Rapid
deployment of
data discovery
labs
  

HPE Solution for Hadoop
36
BigData
AnalyticsRA
HPE Vertica SQL for
Hadoop
SAP HANA HPE IDOL
Hadoop Reference Architectures for MapR, Hortonworks & Cloudera
HPE Information
Governance
Hadoop
HPE Apollo + Moonshot + ProLiant
HPE Analytics Consulting Services for Hadoop
HPEIntegration
Services
On-Premise and Hybrid Cloud deployment options
Flexible, Purpose-built
Infrastructure
High-Performing
Analytics Engines
Consulting &
Implementation Services

High performance computing
2x Hadoop performance
or 50% less space
HPE Infrastructure Big Data
Analyze at scale and speed
100% of your data
10x to 1,000x faster
HPE Big Data platform
Powered by Vertica & IDOL
Secure and govern
Protect and manage
your data and reputation
HPE Security and Governance
Solutions for Hadoop
Data management, data discovery and governance services
Build a Data Centric Foundation
Hadoop for the Enterprise

Why Hewlett Packard Enterprise?
Enterprise Scale with Hadoop
Solution leadership Market leadership Flexible and OpenExperience and
expertise
3000+ global analytics and
data management
professionals
Hundreds of data scientists
Proven analytics and
compute platforms for all
data, environments, and
analytics
Services to deliver value
from discovery to
achieving business
outcomes
Gartner’s Magic Quadrant
leader for:
— Enterprise Data
Warehouse and Data
Management Solutions
for Analytics (2015)
— eDiscovery (2015)
Solutions built on open-
standards, offering choice
and flexibility
Strong strategic alliances
complementing HPE
solutions

HPE Empower Data-Driven Organizations with Hadoop

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie HPE Empower Data-Driven Organizations with Hadoop

Ähnlich wie HPE Empower Data-Driven Organizations with Hadoop (20)

Mehr von DataWorks Summit/Hadoop Summit

Mehr von DataWorks Summit/Hadoop Summit (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

HPE Empower Data-Driven Organizations with Hadoop

Hinweis der Redaktion