SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Delivering Apache Hadoop for the Modern 
Data Architecture 
Page 1 © Hortonworks Inc. 2014 
HP & Hortonworks. We do Hadoop Together
Your speakers… 
Page 2 © Hortonworks Inc. 2014 
Raghu Thiagarajan 
Director, Partner Product Management, 
Hortonworks 
Chris Daly 
Chief Outbound Engineer, CSS and Big Data Systems, 
HP
Why Hadoop: Traditional Data Architecture Pressured 
Page 3 © Hortonworks Inc. 2014 
2.8 ZB in 2012 
85% from New Data Types 
15x Machine Data by 2020 
40 ZB by 2020 
Data source: IDC 
SOURCES 
OLTP, ERP, 
CRM 
Documents, 
Emails 
Web Logs, 
Click 
Streams 
Social 
Networks 
Machine 
Generated 
Sensor 
Data 
Geolocation 
Data
What: Business Applications of Hadoop 
Page 4 © Hortonworks Inc. 2014 
Sensor 
Server 
Logs 
Text 
Social 
Geographic 
Machine 
Clickstream 
Structured 
Unstructured 
Financial 
Services 
New Account Risk Screens ✔ ✔ 
Trading Risk ✔ 
Insurance Underwriting ✔ 
✔ 
✔ 
Telecom Call Detail Records (CDR) ✔ 
✔ 
Infrastructure Investment ✔ 
✔ 
Real-time Bandwidth Allocation ✔ 
✔ 
✔ 
Retail 360° View of the Customer ✔ 
✔ 
Localized, Personalized Promotions ✔ 
Website Optimization ✔
What: Business Applications of Hadoop 
Page 5 © Hortonworks Inc. 2014 
Sensor 
Server 
Logs 
Text 
Social 
Geographic 
Machine 
Clickstream 
Structured 
Unstructured 
Manufacturing Supply Chain and Logistics ✔ 
Preventive Maintenance ✔ 
Crowd-sourced Quality Assurance ✔ 
Healthcare Use Genomic Data in Medial Trials ✔ 
✔ 
Monitor Patient Vitals in Real-Time 
Pharmaceutical 
s 
Recruit & Retain Patients for Drug 
Trials ✔ 
✔ 
Improve Prescription Adherence ✔ 
✔ 
✔ 
Oil & Gas Unify Exploration & Production Data ✔ 
✔ 
✔ 
Monitor Rig Safety in Real-Time ✔ 
✔ 
Government ETL Offload in Response to Budgetary 
Pressures ✔ 
Sentiment Analysis for Gov’t Programs 
✔
How: Modern Data Architecture with Hadoop 
Statistical 
Analysis 
Page 6 © Hortonworks Inc. 2014 
DEV & DATA TOOLS 
Build & Test 
OPERATIONS TOOLS 
Provision, 
Manage & 
Monitor 
DATA SYSTEMS APPLICATIONS 
Repositories 
ROOMS 
BI / Reporting, 
Ad Hoc Analysis 
Interactive Web 
& Mobile Apps 
Enterprise 
Applications 
RDBMS EDW MPP 
Governance 
& 
Integra.on 
ENTERPRISE HADOOP 
Security 
Opera.ons 
Data 
Access 
Data 
Management 
SOURCES OLTP, ERP, 
CRM 
Documents, 
Emails 
Web Logs, 
Click Streams 
Social 
Networks 
Machine 
Generated 
Sensor 
Data 
Geolocation 
Data
YARN Transforms Hadoop’s Architecture 
Page 7 © Hortonworks Inc. 2014 
Enables 
deep 
insight 
across 
a 
large, 
broad, 
diverse 
set 
of 
data 
at 
efficient 
scale 
Mul.-­‐Use 
Data 
Pla>orm 
Store 
all 
data 
in 
one 
place, 
process 
in 
many 
ways 
Batch 
Interac.ve 
Itera.ve 
Streaming 
1 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
n 
Store any/all raw data sources 
and processed data over 
extended periods of time. 
YARN 
: 
Data 
Opera.ng 
System
Designing Hadoop Cluster 
§ Cluster Storage Capacity 
§ Server Specification 
§ Cluster Size 
§ Factoring Performance 
Page 8 © Hortonworks Inc. 2014 
Key Considerations 
§ Any piece of hardware can and will 
fail 
§ More nodes means less impact on 
failure 
§ Resiliency and fault tolerance 
improve with scale 
§ Build resiliency through scale 
§ Still use modern hardware 
§ Software handles hardware failures
Storage Capacity 
§ Key Input 
§ Initial Data Size 
§ 3 year YOY growth 
§ Compression ratio 
§ Intermediate and materialized views 
§ Replication Factor 
§ Note 
Materialized Views 
Master Data 
Work In Process Data 
§ Hard to accurately predict the size of intermediate & materialized views at the start of a 
project 
§ Be conservative with compression ratio. Mileage varies by data type 
§ Hadoop needs temp space to store intermediate files 
Page 9 © Hortonworks Inc. 2014 
Hadoop Cluster 
Raw Data
Storage Capacity 
Page 10 © Hortonworks Inc. 2014 
Total Storage 
Required 
(Initial Size + " 
YOY Growth + 
Intermediate Data Size) " 
X Replication Count " 
X 1.2" 
Compression Ratio" 
Good Rule of Thumb 
Replication Count = 3" 
" 
Compression Ratio = 
4-5" 
" 
Intermediate Data Size 
= 50%-100% of Raw 
Data Size" 
Note 
1.2 factor is included in 
the sizing estimator to 
account for the temp 
space requirement of 
Hadoop"
Server Specification 
§ Master Nodes – NameNode, Resource Manager, HBase Master 
§ Dual Intel Xeon E5-26xx series processors 
§ 128GB or 256GB RAM per chassis 
§ 4+ – 1TB NL-SAS/SATA Drives RAID10+ Spares 
§ Worker Nodes – DataNode, Node Manager and Region Server 
§ Dual Intel Xeon E5-26xx series processors 
§ 128GB RAM or 256GB RAM 
§ 12 – 1-4 TB NLSAS/SATA Drives 
§ Gateway Nodes / Edge Nodes 
§ Mirror of Master Nodes configuration 
Page 11 © Hortonworks Inc. 2014
Cluster Size 
Number of Data Nodes 
Page 12 © Hortonworks Inc. 2014 
12 
Storage Per Server 
Number of Master Nodes 
§ Name Node, Zookeeper 
§ Resource Manager, Zookeeper 
§ Failover Name Node, HBase Master, Hive 
Server, Zookeeper 
§ In a half-rack cluster, this would be combined with 
Resource Manager 
§ Management Node (Ambari, Ganglia, Nagios) 
§ In a half-rack cluster, this would be combined with 
the Name Node 
Total Storage" 
Required" 
Note 
§ Large clusters may need more than 4 
master nodes 
§ Start at 2/4 and grow based on usage
Factoring Performance 
§ Data Nodes 
§ 1 TB drives for performance clusters 
§ 4 TB drives for archive clusters 
§ Meeting SLA Requirements 
§ Hadoop workloads are varied 
§ Difficult to assess cluster size based on SLAs without actual testing 
§ Good News: Hadoop performs linearly with scale 
§ Enables one to design experiments using a fraction of data 
§ Best Practice Guidance 
§ Create a test configuration with a rack of servers 
§ Load a slice of data 
§ Run tests with real-life queries to measure performance & fine tune the system 
§ Scale cluster size based on observed performance 
Page 13 © Hortonworks Inc. 2014 
13
HDP and HP are deeply integrated in the data center 
Page 14 © Hortonworks Inc. 2014 
DEV 
& 
DATA 
TOOLS 
OPERATIONAL 
TOOLS 
INFRASTRUCTURE 
SOURCES 
EXISTING 
Systems 
YARN 
Clickstream 
Web 
&Social 
Geoloca.on 
Sensor 
& 
Machine 
Server 
Logs 
Unstructured 
DATA SYSTEM 
RDBMS 
EDW 
MPP 
HANA 
APPLICATIONS 
BusinessObjects BI 
Deep Partnerships 
Hortonworks and HP engaged 
in deep engineered relationships 
with the leaders in the data center, 
such as Microsoft, Teradata, Redhat, 
& SAP 
Broad Partnerships 
Over 600 partners work with 
Hortonworks to certify their 
applications to work with Hadoop so 
they can extend big data to their 
users 
HDP 2.1 Governance 
& Integration 
Security 
Operations 
Data Access 
Data Management
Delivering Apache 
Hadoop for the Modern 
Data Architecture 
HP + Hortonworks Validated Design 
Christopher Daly 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The HP Approach to Apache Hadoop 
Why a Reference Architecture? 
• Provides a starting point or 
baseline 
• Maximum flexibility 
• Customizable to fit YOUR needs 
• Adopt the parts you want 
• Replace the parts you don’t 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 16 without notice.
Solution components 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 17 without notice.
Pre-deployment considerations / system 
selection 
• Operating system 
• Computation 
• Memory 
• Storage 
• Network 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 18 without notice.
High-availability considerations 
• Hadoop NameNode HA 
• ResourceManager HA 
• OS availability and 
reliability 
• Network reliability 
• Power supply 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 19 without notice.
Server selection 
Management nodes – The HP ProLiant DL360p Gen8 
The Management node and head nodes, as 
tested in the Reference Architecture, contain 
the following base configuration: 
2 x Eight-Core Intel E5-2650 v2 Processors 
Smart Array P420i Controller with 512MB FBWC 
3.6 TB – 4 x 900GB SFF SAS 10K RPM disks 
128 GB DDR3 Memory – 8 x 16GB 2Rx4 
PC3-14900R-13 
10GbE 2P NIC 561FLR-T card 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 20 without notice.
Server selection 
Worker nodes – ProLiant DL380p Gen8 
The ProLiant DL380p Gen8 (2U) as 
configured for the Reference Architecture 
as a worker node has the following 
configuration: 
Dual 10-Core Intel Xeon E5-2670 v2 Processors 
with Hyper-Threading 
Twelve 2TB 3.5” 7.2K LFF SATA MDL (22 TB for 
Data) 
128 GB DDR3 Memory (8 x HP 16GB), 4 
channels per socket 
1 x 10GbE 2 Port NIC FlexibleLOM (Bonded) 
1 x Smart Array P420i Controller with 512MB 
FBWC 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 21 without notice.
Switch selection 
Top of Rack (ToR) switches 
The 5900AF-48XGT-4QSFP+10GbE is an ideal ToR 
switch with forty eight 10GbE ports and four 40GbE 
uplinks providing resiliency, high availability and 
scalability support. In addition this model comes with 
support for CAT6 cables (copper wires) and Software 
defined networking (SDN). 
Aggregation switches 
The FlexFabric 5930-32QSFP+40GbE switch is an 
ideal aggregation switch as it is well suited to handle 
very large volumes of inter-rack traffic such as can 
occur during shuffle and sort operations, or large scale 
block replication to recreate a failed node 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 22 without notice.
HP Insight CMU – pushbutton scale-out 
management 
Provision, monitor, and 
control 
Thousands of nodes instantly 
Push-button roll out 
Provisioning via cloning for 
seamless scaling 
Rest easy 
Battletested at top 500 sites for 
over a decade 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 23 without notice.
HP Insight CMU – GUI Monitoring at a Cluster level 
Historical analysis and job recording 
• Designed for Big Data 
customer 
• Multi-petal aggregated, 3D 
RT, and time series views of 
cluster metrics 
• “Click & zoom” analysis at 
both solution and component 
levels 
• Proactively identify and 
isolate performance issues 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 24 without notice.
Single Rack Reference Architecture 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 25 without notice.
Multi-Rack Reference Architecture 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 26 without notice.
Capacity and sizing 
Here is a general guideline on data 
inventory: 
• Sources of data 
• Frequency of data 
• Raw storage 
• Processed HDFS storage 
• Replication factor 
• Default compression turned on 
• Space for intermediate files 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 27 without notice.
System configuration guidance 
Machine 
Type 
Workload 
Patten/Cluster 
Type 
Storage Processor 
(# of 
Cores) 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 28 without notice. 
Memory 
(GB) 
Network 
Slaves 
Balanced workload Four to six 
1-2 TB disks 
Dual 6/8/10 
cores 48-96 
Dual 10 GB 
links for all 
nodes in a 20 
node rack and 
min 2x10 / 2 x 
40 GB 
interconnect 
links per rack 
going to a pair 
of central 
switches 
Compute intensive 
workload 
Four to six 
1-2 TB disks 
Dual 8/10/12 
cores 48-128 
IO intensive workload Twelve 1-2 
TB disks 
Dual 8/10/12 
cores 48-96 
HBase clusters Twelve 1-2 
TB disks 
Dual 8/10/12 
cores 48-128 
Masters All workload patterns/ 
HBase clusters 
Four to six 
1-2 TB disks 
Dual 6/8/10 
cores 
Depends on number 
of file system 
objects to be 
created by 
NameNode.
For More Information 
Get the Reference Architecture at 
http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA5-4975ENW 
Hortonworks www.hortonworks.com 
HP Solutions for Apache Hadoop hp.com/go/Hadoop 
HP ProLiant servers hp.com/go/proliant 
HP Insight Cluster Management Utility (CMU) hp.com/go/cmu 
HP Networking hp.com/go/networking 
Or Contact Me: Christopher.Daly@hp.com 
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 29 without notice.
Next Steps... 
More about HP & Hortonworks 
http://hortonworks.com/partner/HP 
Download the Hortonworks Sandbox 
Learn Hadoop 
Build Your Analytic App 
Try Hadoop 2 
Contact us: events@hortonworks.com 
Page 30 © Hortonworks Inc. 2014
THANK YOU 
Page 31 © Hortonworks Inc. 2014

Weitere ähnliche Inhalte

Was ist angesagt?

Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
Hortonworks
 

Was ist angesagt? (20)

Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 

Andere mochten auch

Hortonworks, Novetta and Noble Energy Webinar
Hortonworks, Novetta and Noble Energy Webinar Hortonworks, Novetta and Noble Energy Webinar
Hortonworks, Novetta and Noble Energy Webinar
Hortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 

Andere mochten auch (20)

Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
 
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinar
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Hortonworks, Novetta and Noble Energy Webinar
Hortonworks, Novetta and Noble Energy Webinar Hortonworks, Novetta and Noble Energy Webinar
Hortonworks, Novetta and Noble Energy Webinar
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHow to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
 
Adoption de Hadoop : des Possibilités Illimitées - Hortonworks and Talend
Adoption de Hadoop : des Possibilités Illimitées - Hortonworks and TalendAdoption de Hadoop : des Possibilités Illimitées - Hortonworks and Talend
Adoption de Hadoop : des Possibilités Illimitées - Hortonworks and Talend
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
 

Ähnlich wie Hp Converged Systems and Hortonworks - Webinar Slides

Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
DataWorks Summit
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
DataWorks Summit
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Pentaho
 

Ähnlich wie Hp Converged Systems and Hortonworks - Webinar Slides (20)

Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopHP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution Plans
 

Mehr von Hortonworks

Mehr von Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Kürzlich hochgeladen

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Hp Converged Systems and Hortonworks - Webinar Slides

  • 1. Delivering Apache Hadoop for the Modern Data Architecture Page 1 © Hortonworks Inc. 2014 HP & Hortonworks. We do Hadoop Together
  • 2. Your speakers… Page 2 © Hortonworks Inc. 2014 Raghu Thiagarajan Director, Partner Product Management, Hortonworks Chris Daly Chief Outbound Engineer, CSS and Big Data Systems, HP
  • 3. Why Hadoop: Traditional Data Architecture Pressured Page 3 © Hortonworks Inc. 2014 2.8 ZB in 2012 85% from New Data Types 15x Machine Data by 2020 40 ZB by 2020 Data source: IDC SOURCES OLTP, ERP, CRM Documents, Emails Web Logs, Click Streams Social Networks Machine Generated Sensor Data Geolocation Data
  • 4. What: Business Applications of Hadoop Page 4 © Hortonworks Inc. 2014 Sensor Server Logs Text Social Geographic Machine Clickstream Structured Unstructured Financial Services New Account Risk Screens ✔ ✔ Trading Risk ✔ Insurance Underwriting ✔ ✔ ✔ Telecom Call Detail Records (CDR) ✔ ✔ Infrastructure Investment ✔ ✔ Real-time Bandwidth Allocation ✔ ✔ ✔ Retail 360° View of the Customer ✔ ✔ Localized, Personalized Promotions ✔ Website Optimization ✔
  • 5. What: Business Applications of Hadoop Page 5 © Hortonworks Inc. 2014 Sensor Server Logs Text Social Geographic Machine Clickstream Structured Unstructured Manufacturing Supply Chain and Logistics ✔ Preventive Maintenance ✔ Crowd-sourced Quality Assurance ✔ Healthcare Use Genomic Data in Medial Trials ✔ ✔ Monitor Patient Vitals in Real-Time Pharmaceutical s Recruit & Retain Patients for Drug Trials ✔ ✔ Improve Prescription Adherence ✔ ✔ ✔ Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔ Monitor Rig Safety in Real-Time ✔ ✔ Government ETL Offload in Response to Budgetary Pressures ✔ Sentiment Analysis for Gov’t Programs ✔
  • 6. How: Modern Data Architecture with Hadoop Statistical Analysis Page 6 © Hortonworks Inc. 2014 DEV & DATA TOOLS Build & Test OPERATIONS TOOLS Provision, Manage & Monitor DATA SYSTEMS APPLICATIONS Repositories ROOMS BI / Reporting, Ad Hoc Analysis Interactive Web & Mobile Apps Enterprise Applications RDBMS EDW MPP Governance & Integra.on ENTERPRISE HADOOP Security Opera.ons Data Access Data Management SOURCES OLTP, ERP, CRM Documents, Emails Web Logs, Click Streams Social Networks Machine Generated Sensor Data Geolocation Data
  • 7. YARN Transforms Hadoop’s Architecture Page 7 © Hortonworks Inc. 2014 Enables deep insight across a large, broad, diverse set of data at efficient scale Mul.-­‐Use Data Pla>orm Store all data in one place, process in many ways Batch Interac.ve Itera.ve Streaming 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° n Store any/all raw data sources and processed data over extended periods of time. YARN : Data Opera.ng System
  • 8. Designing Hadoop Cluster § Cluster Storage Capacity § Server Specification § Cluster Size § Factoring Performance Page 8 © Hortonworks Inc. 2014 Key Considerations § Any piece of hardware can and will fail § More nodes means less impact on failure § Resiliency and fault tolerance improve with scale § Build resiliency through scale § Still use modern hardware § Software handles hardware failures
  • 9. Storage Capacity § Key Input § Initial Data Size § 3 year YOY growth § Compression ratio § Intermediate and materialized views § Replication Factor § Note Materialized Views Master Data Work In Process Data § Hard to accurately predict the size of intermediate & materialized views at the start of a project § Be conservative with compression ratio. Mileage varies by data type § Hadoop needs temp space to store intermediate files Page 9 © Hortonworks Inc. 2014 Hadoop Cluster Raw Data
  • 10. Storage Capacity Page 10 © Hortonworks Inc. 2014 Total Storage Required (Initial Size + " YOY Growth + Intermediate Data Size) " X Replication Count " X 1.2" Compression Ratio" Good Rule of Thumb Replication Count = 3" " Compression Ratio = 4-5" " Intermediate Data Size = 50%-100% of Raw Data Size" Note 1.2 factor is included in the sizing estimator to account for the temp space requirement of Hadoop"
  • 11. Server Specification § Master Nodes – NameNode, Resource Manager, HBase Master § Dual Intel Xeon E5-26xx series processors § 128GB or 256GB RAM per chassis § 4+ – 1TB NL-SAS/SATA Drives RAID10+ Spares § Worker Nodes – DataNode, Node Manager and Region Server § Dual Intel Xeon E5-26xx series processors § 128GB RAM or 256GB RAM § 12 – 1-4 TB NLSAS/SATA Drives § Gateway Nodes / Edge Nodes § Mirror of Master Nodes configuration Page 11 © Hortonworks Inc. 2014
  • 12. Cluster Size Number of Data Nodes Page 12 © Hortonworks Inc. 2014 12 Storage Per Server Number of Master Nodes § Name Node, Zookeeper § Resource Manager, Zookeeper § Failover Name Node, HBase Master, Hive Server, Zookeeper § In a half-rack cluster, this would be combined with Resource Manager § Management Node (Ambari, Ganglia, Nagios) § In a half-rack cluster, this would be combined with the Name Node Total Storage" Required" Note § Large clusters may need more than 4 master nodes § Start at 2/4 and grow based on usage
  • 13. Factoring Performance § Data Nodes § 1 TB drives for performance clusters § 4 TB drives for archive clusters § Meeting SLA Requirements § Hadoop workloads are varied § Difficult to assess cluster size based on SLAs without actual testing § Good News: Hadoop performs linearly with scale § Enables one to design experiments using a fraction of data § Best Practice Guidance § Create a test configuration with a rack of servers § Load a slice of data § Run tests with real-life queries to measure performance & fine tune the system § Scale cluster size based on observed performance Page 13 © Hortonworks Inc. 2014 13
  • 14. HDP and HP are deeply integrated in the data center Page 14 © Hortonworks Inc. 2014 DEV & DATA TOOLS OPERATIONAL TOOLS INFRASTRUCTURE SOURCES EXISTING Systems YARN Clickstream Web &Social Geoloca.on Sensor & Machine Server Logs Unstructured DATA SYSTEM RDBMS EDW MPP HANA APPLICATIONS BusinessObjects BI Deep Partnerships Hortonworks and HP engaged in deep engineered relationships with the leaders in the data center, such as Microsoft, Teradata, Redhat, & SAP Broad Partnerships Over 600 partners work with Hortonworks to certify their applications to work with Hadoop so they can extend big data to their users HDP 2.1 Governance & Integration Security Operations Data Access Data Management
  • 15. Delivering Apache Hadoop for the Modern Data Architecture HP + Hortonworks Validated Design Christopher Daly © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 16. The HP Approach to Apache Hadoop Why a Reference Architecture? • Provides a starting point or baseline • Maximum flexibility • Customizable to fit YOUR needs • Adopt the parts you want • Replace the parts you don’t © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 16 without notice.
  • 17. Solution components © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 17 without notice.
  • 18. Pre-deployment considerations / system selection • Operating system • Computation • Memory • Storage • Network © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 18 without notice.
  • 19. High-availability considerations • Hadoop NameNode HA • ResourceManager HA • OS availability and reliability • Network reliability • Power supply © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 19 without notice.
  • 20. Server selection Management nodes – The HP ProLiant DL360p Gen8 The Management node and head nodes, as tested in the Reference Architecture, contain the following base configuration: 2 x Eight-Core Intel E5-2650 v2 Processors Smart Array P420i Controller with 512MB FBWC 3.6 TB – 4 x 900GB SFF SAS 10K RPM disks 128 GB DDR3 Memory – 8 x 16GB 2Rx4 PC3-14900R-13 10GbE 2P NIC 561FLR-T card © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 20 without notice.
  • 21. Server selection Worker nodes – ProLiant DL380p Gen8 The ProLiant DL380p Gen8 (2U) as configured for the Reference Architecture as a worker node has the following configuration: Dual 10-Core Intel Xeon E5-2670 v2 Processors with Hyper-Threading Twelve 2TB 3.5” 7.2K LFF SATA MDL (22 TB for Data) 128 GB DDR3 Memory (8 x HP 16GB), 4 channels per socket 1 x 10GbE 2 Port NIC FlexibleLOM (Bonded) 1 x Smart Array P420i Controller with 512MB FBWC © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 21 without notice.
  • 22. Switch selection Top of Rack (ToR) switches The 5900AF-48XGT-4QSFP+10GbE is an ideal ToR switch with forty eight 10GbE ports and four 40GbE uplinks providing resiliency, high availability and scalability support. In addition this model comes with support for CAT6 cables (copper wires) and Software defined networking (SDN). Aggregation switches The FlexFabric 5930-32QSFP+40GbE switch is an ideal aggregation switch as it is well suited to handle very large volumes of inter-rack traffic such as can occur during shuffle and sort operations, or large scale block replication to recreate a failed node © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 22 without notice.
  • 23. HP Insight CMU – pushbutton scale-out management Provision, monitor, and control Thousands of nodes instantly Push-button roll out Provisioning via cloning for seamless scaling Rest easy Battletested at top 500 sites for over a decade © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 23 without notice.
  • 24. HP Insight CMU – GUI Monitoring at a Cluster level Historical analysis and job recording • Designed for Big Data customer • Multi-petal aggregated, 3D RT, and time series views of cluster metrics • “Click & zoom” analysis at both solution and component levels • Proactively identify and isolate performance issues © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 24 without notice.
  • 25. Single Rack Reference Architecture © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 25 without notice.
  • 26. Multi-Rack Reference Architecture © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 26 without notice.
  • 27. Capacity and sizing Here is a general guideline on data inventory: • Sources of data • Frequency of data • Raw storage • Processed HDFS storage • Replication factor • Default compression turned on • Space for intermediate files © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 27 without notice.
  • 28. System configuration guidance Machine Type Workload Patten/Cluster Type Storage Processor (# of Cores) © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 28 without notice. Memory (GB) Network Slaves Balanced workload Four to six 1-2 TB disks Dual 6/8/10 cores 48-96 Dual 10 GB links for all nodes in a 20 node rack and min 2x10 / 2 x 40 GB interconnect links per rack going to a pair of central switches Compute intensive workload Four to six 1-2 TB disks Dual 8/10/12 cores 48-128 IO intensive workload Twelve 1-2 TB disks Dual 8/10/12 cores 48-96 HBase clusters Twelve 1-2 TB disks Dual 8/10/12 cores 48-128 Masters All workload patterns/ HBase clusters Four to six 1-2 TB disks Dual 6/8/10 cores Depends on number of file system objects to be created by NameNode.
  • 29. For More Information Get the Reference Architecture at http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA5-4975ENW Hortonworks www.hortonworks.com HP Solutions for Apache Hadoop hp.com/go/Hadoop HP ProLiant servers hp.com/go/proliant HP Insight Cluster Management Utility (CMU) hp.com/go/cmu HP Networking hp.com/go/networking Or Contact Me: Christopher.Daly@hp.com © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 29 without notice.
  • 30. Next Steps... More about HP & Hortonworks http://hortonworks.com/partner/HP Download the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2 Contact us: events@hortonworks.com Page 30 © Hortonworks Inc. 2014
  • 31. THANK YOU Page 31 © Hortonworks Inc. 2014