SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
How to Set Up a Hadoop
Cluster with Oracle Solaris
[HOL10182]
Orgad Kimchi
Principal Software Engineer
Disclaimer

The following is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decisions. The development, release, and timing of any
features or functionality described for Oracle’s products
remains at the sole discretion of
Oracle Corporation.

2Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
Agenda
 Lab Overview
 Hadoop Overview
 The Benefits of Using Oracle Solaris Technologies for

a Hadoop Cluster

3Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
Lab Overview
 In this Hands-on-Lab we will preset and demonstrate using exercises how to

set up a Hadoop cluster Using Oracle Solaris 11 technologies like: Zones, ZFS,
DTrace and Network Virtualization
 Key topics include the Hadoop Distributed File System and MapReduce.
 We will also cover the Hadoop installation process and the cluster building

blocks: NameNode, a secondary NameNode, and DataNodes.

4Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
Lab Overview – Cont’d

 During the lab users will learn how to load data into the Hadoop

cluster and run Map-Reduce job.
 This hands-on training lab is for system administrators and others

responsible for managing Apache Hadoop clusters in production or
development environments

5Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
Lab Main Topics
This hands-on lab consists of 13 exercises covering various Oracle Solaris and Apache Hadoop technologies:

1.

Install Hadoop.

2.

Edit the Hadoop configuration files.

3.

Configure the Network Time Protocol.

4.

Create the virtual network interfaces (VNICs).

5.

Create the NameNode and the secondary NameNode zones.

6.

Set up the DataNode zones.

7.

Configure the NameNode.

8.

Set up SSH.

9.

Format HDFS from the NameNode.

10.

Start the Hadoop cluster.

11.

Run a MapReduce job.

12.

Secure data at rest using ZFS encryption.

13.

Use Oracle Solaris DTrace for performance monitoring.

6Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
What is Big Data
 Big Data is both: Large and Variable Datasets + New Set of

Technologies
 Extremely large files of unstructured or semi-structured data
 Large and highly distributed datasets that are otherwise difficult to manage

as a single unit of information
 That can economically acquire, organize, store, analyze and extract value
from Big Data datasets – thus facilitating better, more informed business
decisions

7Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
Data is Everywhere!
Facts & Figures

8Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

234M Web sites
 Facebook
 500M Users
 40M photos per day
 30 billion new pieces of
content per month
7M New sites in 2010
New York Stock Exchange
 1 TB of data per day
 Web 2.0
 147M Blogs and growing
 Twitter – 12TB of data per
day

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template

8
Introduction To Hadoop

9Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
What is Hadoop ?
 Originated at Google 2003
 – Generation of search indexes and web scores
 Top level Apache project, Consists of two key services

1. Hadoop Distributed File System (HDFS), highly scalable,
fault-tolerant , distributed
2. MapReduce API (Java), Can be scripted in other
languages
 Hadoop brings the ability to cheaply process large
amounts of data, regardless of its structure.
10Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
Components of Hadoop

11Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
HDFS
 HDFS is the file system responsible for storing data on the cluster
 Written in Java (based on Google’s GFS)
 Sits on top of a native file system (ext3, ext4, xfs, etc)
 POSIX like file permissions model
 Provides redundant storage for massive amounts of data
 HDFS is optimized for large, streaming reads of files

12Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
The Five Hadoop Daemons - Hadoop is
comprised of five separate daemons
 NameNode : Holds the metadata for HDFS
 Secondary NameNode : Performs housekeeping functions for the

NameNode
 DataNode : Stores actual HDFS data blocks
 JobTracker : Manages MapReduce jobs, distributes individual
tasks to machines running the TaskTracker. Coordinates
MapReduce stages.
 TaskTracker : Responsible for instantiating and monitoring
individual Map and Reduce tasks

13Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
Hadoop Architecture

14Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
MapReduce

Very
big
data

M
A
P

 Map:
– Accepts input key/value pair
– Emits intermediate key/value

Partitioning
Function

R
E
D
U
C
E

Result

Reduce:
– Accepts intermediate key/value* pair
– Emits output key/value pair

pair

15Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template

15
MapReduce Example
Counting word occurrences in a document:
how many chucks could a woodchuck chuck if a woodchuck could chuck wood

4 Node Map
how,1 many,1 chucks,1 could,1

a,1 woodchuck,1 chuck,1

if,1 a,1 woodchuck,1

could,1 chuck,1 wood,1

Group by Key

2 Node Reduce
a,1:1

chuck,1:1

chucks,1

could,1:1

how,1

if,1

many,1

wood,1

woodchuck,1:1

Output
a,2 chuck,2 chucks,1 could,2 how,1 if,1 many,1 wood,1 woodchuck,2

16Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template

16
MapReduce Functions
 MapReduce partitions data into 64MB chunks ( default )
 Distributes data and jobs across thousands of nodes
 Tasks scheduled based on location of data
 Master writes periodic checkpoints
 If map worker fails Master restarts job on new node
 Barrier - no reduce can begin until all maps are complete
 HDFS manages data replication for redundancy
 MapReduce library does the hard work for us!

17Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
RDBMS compared to MapReduce
Traditional RDBMS

MapReduce

Data size

Gigabytes

Petabytes

Access

Interactive and batch

Batch

Updates

Read and write many Write once, read
times
many times

Structure

Static schema

Dynamic schema

Integrity

High

Low

Scaling

Nonlinear

Linear

18Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
The benefits of using Oracle
Solaris technologies for a
Hadoop cluster

19Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template

Insert Picture Here
Architecture Layout

20Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
The benefits of using Oracle Solaris Zones for a
Hadoop cluster
Oracle Solaris Zones Benefits

 Fast provision of new cluster

members using the Solaris
zones cloning feature

Insert Picture Here

 Very high network throughput

between the zones for data
node replication

21Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
The benefits of using Oracle Solaris ZFS for a
Hadoop cluster
Oracle Solaris ZFS Benefits
 Immense data capacity,128 bit

file system, perfect for big dataset
 Optimized disk I/O utilization for

Insert Picture Here

better I/O performance with ZFS
built-in compression
 Secure data at rest using ZFS

encryption

22Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
The benefits of using Oracle Solaris
technologies for a Hadoop cluster
• Multithread awareness - Oracle Solaris understands the correlation

between cores and the threads, and it provides a fast and efficient thread
implementation.
• DTrace - comprehensive, advanced tracing tool for troubleshooting

systematic problems in real time.
• SMF – allow to build dependencies between Hadoop services (e.g.

starting the MapReduce daemons after the HDFS daemons).
23Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
For more information

 How to Set Up a Hadoop Cluster Using Oracle Solaris Zones
 How to Build Native Hadoop Libraries for Oracle Solaris 11
 How to Set Up a Hadoop

Cluster Using Oracle Solaris (Hands-on Lab)
 My Blog

24Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
Graphic Section Divider

25Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
26Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template

Weitere ähnliche Inhalte

Andere mochten auch

Adopt-a-JSR for JSON Processing 1.1, JSR 374
Adopt-a-JSR for JSON Processing 1.1, JSR 374Adopt-a-JSR for JSON Processing 1.1, JSR 374
Adopt-a-JSR for JSON Processing 1.1, JSR 374Heather VanCura
 
Guia de Semana at GlassFish Community Event, JavaOne 2011
Guia de Semana at GlassFish Community Event, JavaOne 2011Guia de Semana at GlassFish Community Event, JavaOne 2011
Guia de Semana at GlassFish Community Event, JavaOne 2011Arun Gupta
 
GlassFish Story by David Heffelfinger/Ensode Technology
GlassFish Story by David Heffelfinger/Ensode TechnologyGlassFish Story by David Heffelfinger/Ensode Technology
GlassFish Story by David Heffelfinger/Ensode Technologyglassfish
 
Parleys.com at GlassFish Community Event, JavaOne 2011
Parleys.com at GlassFish Community Event, JavaOne 2011Parleys.com at GlassFish Community Event, JavaOne 2011
Parleys.com at GlassFish Community Event, JavaOne 2011Arun Gupta
 
LodgON at GlassFish Community Event, JavaOne 2011
LodgON at GlassFish Community Event, JavaOne 2011LodgON at GlassFish Community Event, JavaOne 2011
LodgON at GlassFish Community Event, JavaOne 2011Arun Gupta
 
Adam Bien at GlassFish Community Event, JavaOne 2011
Adam Bien at GlassFish Community Event, JavaOne 2011Adam Bien at GlassFish Community Event, JavaOne 2011
Adam Bien at GlassFish Community Event, JavaOne 2011Arun Gupta
 
Java EE 6 Adoption in One of the World’s Largest Online Financial Systems
Java EE 6 Adoption in One of the World’s Largest Online Financial SystemsJava EE 6 Adoption in One of the World’s Largest Online Financial Systems
Java EE 6 Adoption in One of the World’s Largest Online Financial SystemsArshal Ameen
 
GlassFish Story by Kerry Wilson/Vanderbilt University Medical Center
GlassFish Story by Kerry Wilson/Vanderbilt University Medical CenterGlassFish Story by Kerry Wilson/Vanderbilt University Medical Center
GlassFish Story by Kerry Wilson/Vanderbilt University Medical Centerglassfish
 
Getting Hired: How to Get a Job as a Product Manager
Getting Hired: How to Get a Job as a Product ManagerGetting Hired: How to Get a Job as a Product Manager
Getting Hired: How to Get a Job as a Product ManagerJason Shah
 
Jenzabar at GlassFish Community Event, JavaOne 2011
Jenzabar at GlassFish Community Event, JavaOne 2011Jenzabar at GlassFish Community Event, JavaOne 2011
Jenzabar at GlassFish Community Event, JavaOne 2011Arun Gupta
 
Data Mining Scoring Engine development process
Data Mining Scoring Engine development processData Mining Scoring Engine development process
Data Mining Scoring Engine development processDylan Wan
 
Building WebLogic Domains With WLST
Building WebLogic Domains With WLSTBuilding WebLogic Domains With WLST
Building WebLogic Domains With WLSTC2B2 Consulting
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANDataWorks Summit/Hadoop Summit
 
JavaOne 2011: Migrating Spring Applications to Java EE 6
JavaOne 2011: Migrating Spring Applications to Java EE 6JavaOne 2011: Migrating Spring Applications to Java EE 6
JavaOne 2011: Migrating Spring Applications to Java EE 6Bert Ertman
 

Andere mochten auch (14)

Adopt-a-JSR for JSON Processing 1.1, JSR 374
Adopt-a-JSR for JSON Processing 1.1, JSR 374Adopt-a-JSR for JSON Processing 1.1, JSR 374
Adopt-a-JSR for JSON Processing 1.1, JSR 374
 
Guia de Semana at GlassFish Community Event, JavaOne 2011
Guia de Semana at GlassFish Community Event, JavaOne 2011Guia de Semana at GlassFish Community Event, JavaOne 2011
Guia de Semana at GlassFish Community Event, JavaOne 2011
 
GlassFish Story by David Heffelfinger/Ensode Technology
GlassFish Story by David Heffelfinger/Ensode TechnologyGlassFish Story by David Heffelfinger/Ensode Technology
GlassFish Story by David Heffelfinger/Ensode Technology
 
Parleys.com at GlassFish Community Event, JavaOne 2011
Parleys.com at GlassFish Community Event, JavaOne 2011Parleys.com at GlassFish Community Event, JavaOne 2011
Parleys.com at GlassFish Community Event, JavaOne 2011
 
LodgON at GlassFish Community Event, JavaOne 2011
LodgON at GlassFish Community Event, JavaOne 2011LodgON at GlassFish Community Event, JavaOne 2011
LodgON at GlassFish Community Event, JavaOne 2011
 
Adam Bien at GlassFish Community Event, JavaOne 2011
Adam Bien at GlassFish Community Event, JavaOne 2011Adam Bien at GlassFish Community Event, JavaOne 2011
Adam Bien at GlassFish Community Event, JavaOne 2011
 
Java EE 6 Adoption in One of the World’s Largest Online Financial Systems
Java EE 6 Adoption in One of the World’s Largest Online Financial SystemsJava EE 6 Adoption in One of the World’s Largest Online Financial Systems
Java EE 6 Adoption in One of the World’s Largest Online Financial Systems
 
GlassFish Story by Kerry Wilson/Vanderbilt University Medical Center
GlassFish Story by Kerry Wilson/Vanderbilt University Medical CenterGlassFish Story by Kerry Wilson/Vanderbilt University Medical Center
GlassFish Story by Kerry Wilson/Vanderbilt University Medical Center
 
Getting Hired: How to Get a Job as a Product Manager
Getting Hired: How to Get a Job as a Product ManagerGetting Hired: How to Get a Job as a Product Manager
Getting Hired: How to Get a Job as a Product Manager
 
Jenzabar at GlassFish Community Event, JavaOne 2011
Jenzabar at GlassFish Community Event, JavaOne 2011Jenzabar at GlassFish Community Event, JavaOne 2011
Jenzabar at GlassFish Community Event, JavaOne 2011
 
Data Mining Scoring Engine development process
Data Mining Scoring Engine development processData Mining Scoring Engine development process
Data Mining Scoring Engine development process
 
Building WebLogic Domains With WLST
Building WebLogic Domains With WLSTBuilding WebLogic Domains With WLST
Building WebLogic Domains With WLST
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
 
JavaOne 2011: Migrating Spring Applications to Java EE 6
JavaOne 2011: Migrating Spring Applications to Java EE 6JavaOne 2011: Migrating Spring Applications to Java EE 6
JavaOne 2011: Migrating Spring Applications to Java EE 6
 

Mehr von Orgad Kimchi

Deploying and Managing Artificial Intelligence Services using the Open Data H...
Deploying and Managing Artificial Intelligence Services using the Open Data H...Deploying and Managing Artificial Intelligence Services using the Open Data H...
Deploying and Managing Artificial Intelligence Services using the Open Data H...Orgad Kimchi
 
Red hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategyRed hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategyOrgad Kimchi
 
Red Hat Enteprise Linux Open Stack Platfrom Director
Red Hat Enteprise Linux Open Stack Platfrom DirectorRed Hat Enteprise Linux Open Stack Platfrom Director
Red Hat Enteprise Linux Open Stack Platfrom DirectorOrgad Kimchi
 
OpenStack for devops environment
OpenStack for devops environment OpenStack for devops environment
OpenStack for devops environment Orgad Kimchi
 
Solaris 11.2 What's New
Solaris 11.2 What's NewSolaris 11.2 What's New
Solaris 11.2 What's NewOrgad Kimchi
 
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...Orgad Kimchi
 
Oracle Solaris 11.1 New Features
Oracle Solaris 11.1 New FeaturesOracle Solaris 11.1 New Features
Oracle Solaris 11.1 New FeaturesOrgad Kimchi
 
New Generation of SPARC Processors Boosting Oracle S/W Angelo Rajadurai
New Generation of SPARC Processors Boosting Oracle S/W Angelo RajaduraiNew Generation of SPARC Processors Boosting Oracle S/W Angelo Rajadurai
New Generation of SPARC Processors Boosting Oracle S/W Angelo RajaduraiOrgad Kimchi
 
Oracle Solaris 11 platform for ECI Telecom private cloud infrastructure
Oracle Solaris 11 platform for ECI Telecom private cloud infrastructure Oracle Solaris 11 platform for ECI Telecom private cloud infrastructure
Oracle Solaris 11 platform for ECI Telecom private cloud infrastructure Orgad Kimchi
 

Mehr von Orgad Kimchi (9)

Deploying and Managing Artificial Intelligence Services using the Open Data H...
Deploying and Managing Artificial Intelligence Services using the Open Data H...Deploying and Managing Artificial Intelligence Services using the Open Data H...
Deploying and Managing Artificial Intelligence Services using the Open Data H...
 
Red hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategyRed hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategy
 
Red Hat Enteprise Linux Open Stack Platfrom Director
Red Hat Enteprise Linux Open Stack Platfrom DirectorRed Hat Enteprise Linux Open Stack Platfrom Director
Red Hat Enteprise Linux Open Stack Platfrom Director
 
OpenStack for devops environment
OpenStack for devops environment OpenStack for devops environment
OpenStack for devops environment
 
Solaris 11.2 What's New
Solaris 11.2 What's NewSolaris 11.2 What's New
Solaris 11.2 What's New
 
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
 
Oracle Solaris 11.1 New Features
Oracle Solaris 11.1 New FeaturesOracle Solaris 11.1 New Features
Oracle Solaris 11.1 New Features
 
New Generation of SPARC Processors Boosting Oracle S/W Angelo Rajadurai
New Generation of SPARC Processors Boosting Oracle S/W Angelo RajaduraiNew Generation of SPARC Processors Boosting Oracle S/W Angelo Rajadurai
New Generation of SPARC Processors Boosting Oracle S/W Angelo Rajadurai
 
Oracle Solaris 11 platform for ECI Telecom private cloud infrastructure
Oracle Solaris 11 platform for ECI Telecom private cloud infrastructure Oracle Solaris 11 platform for ECI Telecom private cloud infrastructure
Oracle Solaris 11 platform for ECI Telecom private cloud infrastructure
 

Kürzlich hochgeladen

Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 

Kürzlich hochgeladen (20)

Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 

How to Set Up a Hadoop Cluster Using Oracle Solaris (Hands-On Lab)

  • 1. How to Set Up a Hadoop Cluster with Oracle Solaris [HOL10182] Orgad Kimchi Principal Software Engineer
  • 2. Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle Corporation. 2Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 3. Agenda  Lab Overview  Hadoop Overview  The Benefits of Using Oracle Solaris Technologies for a Hadoop Cluster 3Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 4. Lab Overview  In this Hands-on-Lab we will preset and demonstrate using exercises how to set up a Hadoop cluster Using Oracle Solaris 11 technologies like: Zones, ZFS, DTrace and Network Virtualization  Key topics include the Hadoop Distributed File System and MapReduce.  We will also cover the Hadoop installation process and the cluster building blocks: NameNode, a secondary NameNode, and DataNodes. 4Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 5. Lab Overview – Cont’d  During the lab users will learn how to load data into the Hadoop cluster and run Map-Reduce job.  This hands-on training lab is for system administrators and others responsible for managing Apache Hadoop clusters in production or development environments 5Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 6. Lab Main Topics This hands-on lab consists of 13 exercises covering various Oracle Solaris and Apache Hadoop technologies: 1. Install Hadoop. 2. Edit the Hadoop configuration files. 3. Configure the Network Time Protocol. 4. Create the virtual network interfaces (VNICs). 5. Create the NameNode and the secondary NameNode zones. 6. Set up the DataNode zones. 7. Configure the NameNode. 8. Set up SSH. 9. Format HDFS from the NameNode. 10. Start the Hadoop cluster. 11. Run a MapReduce job. 12. Secure data at rest using ZFS encryption. 13. Use Oracle Solaris DTrace for performance monitoring. 6Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 7. What is Big Data  Big Data is both: Large and Variable Datasets + New Set of Technologies  Extremely large files of unstructured or semi-structured data  Large and highly distributed datasets that are otherwise difficult to manage as a single unit of information  That can economically acquire, organize, store, analyze and extract value from Big Data datasets – thus facilitating better, more informed business decisions 7Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 8. Data is Everywhere! Facts & Figures 8Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 234M Web sites  Facebook  500M Users  40M photos per day  30 billion new pieces of content per month 7M New sites in 2010 New York Stock Exchange  1 TB of data per day  Web 2.0  147M Blogs and growing  Twitter – 12TB of data per day Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template 8
  • 9. Introduction To Hadoop 9Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 10. What is Hadoop ?  Originated at Google 2003  – Generation of search indexes and web scores  Top level Apache project, Consists of two key services 1. Hadoop Distributed File System (HDFS), highly scalable, fault-tolerant , distributed 2. MapReduce API (Java), Can be scripted in other languages  Hadoop brings the ability to cheaply process large amounts of data, regardless of its structure. 10Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 11. Components of Hadoop 11Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 12. HDFS  HDFS is the file system responsible for storing data on the cluster  Written in Java (based on Google’s GFS)  Sits on top of a native file system (ext3, ext4, xfs, etc)  POSIX like file permissions model  Provides redundant storage for massive amounts of data  HDFS is optimized for large, streaming reads of files 12Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 13. The Five Hadoop Daemons - Hadoop is comprised of five separate daemons  NameNode : Holds the metadata for HDFS  Secondary NameNode : Performs housekeeping functions for the NameNode  DataNode : Stores actual HDFS data blocks  JobTracker : Manages MapReduce jobs, distributes individual tasks to machines running the TaskTracker. Coordinates MapReduce stages.  TaskTracker : Responsible for instantiating and monitoring individual Map and Reduce tasks 13Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 14. Hadoop Architecture 14Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 15. MapReduce Very big data M A P  Map: – Accepts input key/value pair – Emits intermediate key/value Partitioning Function R E D U C E Result Reduce: – Accepts intermediate key/value* pair – Emits output key/value pair pair 15Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template 15
  • 16. MapReduce Example Counting word occurrences in a document: how many chucks could a woodchuck chuck if a woodchuck could chuck wood 4 Node Map how,1 many,1 chucks,1 could,1 a,1 woodchuck,1 chuck,1 if,1 a,1 woodchuck,1 could,1 chuck,1 wood,1 Group by Key 2 Node Reduce a,1:1 chuck,1:1 chucks,1 could,1:1 how,1 if,1 many,1 wood,1 woodchuck,1:1 Output a,2 chuck,2 chucks,1 could,2 how,1 if,1 many,1 wood,1 woodchuck,2 16Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template 16
  • 17. MapReduce Functions  MapReduce partitions data into 64MB chunks ( default )  Distributes data and jobs across thousands of nodes  Tasks scheduled based on location of data  Master writes periodic checkpoints  If map worker fails Master restarts job on new node  Barrier - no reduce can begin until all maps are complete  HDFS manages data replication for redundancy  MapReduce library does the hard work for us! 17Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 18. RDBMS compared to MapReduce Traditional RDBMS MapReduce Data size Gigabytes Petabytes Access Interactive and batch Batch Updates Read and write many Write once, read times many times Structure Static schema Dynamic schema Integrity High Low Scaling Nonlinear Linear 18Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 19. The benefits of using Oracle Solaris technologies for a Hadoop cluster 19Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template Insert Picture Here
  • 20. Architecture Layout 20Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 21. The benefits of using Oracle Solaris Zones for a Hadoop cluster Oracle Solaris Zones Benefits  Fast provision of new cluster members using the Solaris zones cloning feature Insert Picture Here  Very high network throughput between the zones for data node replication 21Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 22. The benefits of using Oracle Solaris ZFS for a Hadoop cluster Oracle Solaris ZFS Benefits  Immense data capacity,128 bit file system, perfect for big dataset  Optimized disk I/O utilization for Insert Picture Here better I/O performance with ZFS built-in compression  Secure data at rest using ZFS encryption 22Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 23. The benefits of using Oracle Solaris technologies for a Hadoop cluster • Multithread awareness - Oracle Solaris understands the correlation between cores and the threads, and it provides a fast and efficient thread implementation. • DTrace - comprehensive, advanced tracing tool for troubleshooting systematic problems in real time. • SMF – allow to build dependencies between Hadoop services (e.g. starting the MapReduce daemons after the HDFS daemons). 23Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 24. For more information  How to Set Up a Hadoop Cluster Using Oracle Solaris Zones  How to Build Native Hadoop Libraries for Oracle Solaris 11  How to Set Up a Hadoop Cluster Using Oracle Solaris (Hands-on Lab)  My Blog 24Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 25. Graphic Section Divider 25Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template
  • 26. 26Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template

Hinweis der Redaktion

  1. <number>
  2. <number>
  3. <number>
  4. <number>
  5. <number>
  6. <number>
  7. <number>
  8. <number>
  9. <number>
  10. <number>
  11. <number>