SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Hadoop Admin Best Practices with HDP 2.3
Part-2
 We have INSTRUCTOR LED - both Online LIVE & Classroom Session
 Present for classroom sessions in Bangalore & Delhi (NCR)
 We are the ONLY Education delivery partners for Mulesoft, Elastic, Pivotal & Lightbend in India
 We have delivered more than 5000 trainings and have over 400 courses and a vast pool of over 200 experts to make YOU the
EXPERT!
FOLLOW US ON SOCIAL MEDIA TO STAY UPDATED ON THE UPCOMING WEBINARS
Online and Classroom Training on Technology Courses
at SpringPeople
Certified Partners
Non-Certified Courses
…and many more
…NOW
The Hadoop Ecosystem
Hadoop
The HDP 2.3 Platform
Versions
Covered Till Now
1. Use Ambari – Cluster Management Tool
2. More of WebHDFS Access
3. WebHDFS
4. Use More of HDFS Access Control Lists
5. Use HDFS Quotas
6. Understanding of YARN Components
7. Adding, Deleting, or Replacing Worker Nodes
8. Rack Awareness
9. NameNode High Availability
10. ResourceManager High Availability
11. Ambari Metrics System
12. What to Backup?
13 – Setting appropriate Directory Space Quota
• Best practice is to also set space limits on home directory To set a
12TB limit:
$ hdfs dfsadmin –setSpaceQuota 12t /user/username
• Includes space for replication
• This is the actual use of space
• Example:
• If storing 1TB and replication factor is 3
• 3TB is needed
• Quota can be set on any directory
14 - Configuring Trash
• Enable by setting time delay for trash's checkpoint removal:
In core-site.xml
• fs.trash.interval
• Delay is set in minutes (24 hours would be 1440 minutes)
• Recommendation is to set to 360 minutes (6 hours)
• Setting the value to 0 disables Trash
• Files deleted programmatically are deleted immediately
• Files can be immediately deleted from the command line using -skipTrash
15 - Compression Needs and Tradeoffs
 Compressing data can speed up data-intensive I/O operations
• MapReduce jobs are almost always I/O bound
 Compressed data can save storage space and speed up data transfers across the network
• Capital allocation for hardware can go further
 Reduced I/O and network load can result in significant performance improvements
• MapReduce jobs can finish faster overall
 But, CPU utilization and processing time increase during compression and decompression
• Understanding the tradeoffs is important for MapReduce pipeline’s overall performance
16 - Sqoop Security
• Database Authentication:
• Sqoop needs to authenticate to the RDBMS
• How?
• Usually this involves a username/password
(Oracle Wallet is the exception)
• Can hard code password in scripts (not recommended/used)
• Password usually stored in plaintext in a file protected by the filesystem
• Hadoop Credential Management Framework added in HDP 2.2
• Not a keystore, but a way to interface with keystore backends
• Passwords can be stored in a keystore and not in plain text
• Can help with “no passwords in plaintext” requirements
17 - distcp Configurations
• If Distcp runs out of memory before copying:
• Possible Cause: Number of files/directories being copied from source
path(s) is extremely large (e.g. 100,000 paths)
• Change: HEAP Size
- Export HADOOP_CLIENT_OPTS="-Xms64m -Xmx1024m”
• Map Sizing
• If -m is not specified: Default to 20 maps max
• Tuning the number of maps to:
- Size of the source and destination cluster
- The size of the copy
- Available bandwidth
18 - Falcon
 Centrally manages data lifecycle
• Centralized definition & management of pipelines for data ingest,
process and export
 Supports Business continuity and Disaster
Recovery
• Out of the box policies for data replication
and retention
• End-to-end monitoring of data pipelines
 Addresses basic audit & compliance requirements
• Visualize data pipeline lineage
• Track data pipeline audit logs
• Tag data with business metadata
19 - Running Balancer
• Can be run periodically as a batch job
• Examples: every 24 hours or weekly
• Run after new nodes have been added to the cluster
• To run balancer:
hdfs balancer [-threshold <threshold>] [-policy <policy>]]
• Runs until there are no blocks to move
or
Until it has lost contact with the NameNode
• Can be stopped with a Ctrl+C
20 - HDFS Snapshots
Create HDFS directory snapshots
Fast operation - only metadata affected
Results in .snapshot/ directory in the HDFS directory
Snapshots are named or default to timestamp
Directories must be made snapshottable
Snapshot Steps:
– Allow snapshot on directory
hdfs dfsadmin -allowSnapshot foo/bar/
– Create snapshot for directory and optionally provide snapshot name
hdfs dfs -createSnapshot foo/bar/ mysnapshot_today
– Verify snapshot
hdfs dfs -ls foo/bar/.snapshot
21 - HDFS Data – Automate & Restore
• Use Falcon/Oozie to automate backups
• Falcon utilizes Oozie as a workflow scheduler
• distcp is an Oozie action
- use -update and -prbugp
• Restoring is the reverse process of backups
1. On your backup cluster choose which snapshot to restore
2. Remove/move target directory on production system
3. Run distcp without -update options
22 - Apache Ranger
www.springpeople.comtraining@springpeople.com
Upcoming Hortonworks Classes at
SpringPeople
Classroom (Bengaluru)
05 - 08 Sept
26 - 28 Sept
10 - 13 Oct
07 - 10 Nov
05 - 08 Dec
19 - 21 Dec
Online LIVE
22 - 31 Aug
05 - 17 Sept
19 Sept - 01 Oct

Weitere ähnliche Inhalte

Was ist angesagt?

Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoopjdcryans
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataRyan Bosshart
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSDataWorks Summit
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hiverxu
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon
 

Was ist angesagt? (20)

Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and HadoopEmpower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
 

Ähnlich wie Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) 2.3 _Part 2

Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Global Business Events
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadooplarsgeorge
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Chris Nauroth
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst TrainingCloudera, Inc.
 
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationshadooparchbook
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyAlluxio, Inc.
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudAlluxio, Inc.
 

Ähnlich wie Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) 2.3 _Part 2 (20)

Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Instant hadoop of your own
Instant hadoop of your ownInstant hadoop of your own
Instant hadoop of your own
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applications
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Oracle Big Data Cloud service
Oracle Big Data Cloud serviceOracle Big Data Cloud service
Oracle Big Data Cloud service
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiency
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the Cloud
 

Mehr von SpringPeople

Growth hacking tips and tricks that you can try
Growth hacking tips and tricks that you can tryGrowth hacking tips and tricks that you can try
Growth hacking tips and tricks that you can trySpringPeople
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataSpringPeople
 
Introduction to Microsoft Azure IaaS
Introduction to Microsoft Azure IaaSIntroduction to Microsoft Azure IaaS
Introduction to Microsoft Azure IaaSSpringPeople
 
Introduction to Selenium WebDriver
Introduction to Selenium WebDriverIntroduction to Selenium WebDriver
Introduction to Selenium WebDriverSpringPeople
 
Introduction to Open stack - An Overview
Introduction to Open stack - An Overview Introduction to Open stack - An Overview
Introduction to Open stack - An Overview SpringPeople
 
Why 2 million Developers depend on MuleSoft
Why 2 million Developers depend on MuleSoftWhy 2 million Developers depend on MuleSoft
Why 2 million Developers depend on MuleSoftSpringPeople
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsSpringPeople
 
Mastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium SuccessfullyMastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium SuccessfullySpringPeople
 
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...SpringPeople
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
SpringPeople - Devops skills - Do you have what it takes?
SpringPeople - Devops skills - Do you have what it takes?SpringPeople - Devops skills - Do you have what it takes?
SpringPeople - Devops skills - Do you have what it takes?SpringPeople
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaSpringPeople
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0SpringPeople
 
Introduction To Core Java - SpringPeople
Introduction To Core Java - SpringPeopleIntroduction To Core Java - SpringPeople
Introduction To Core Java - SpringPeopleSpringPeople
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleSpringPeople
 
Introduction To Cloud Foundry - SpringPeople
Introduction To Cloud Foundry - SpringPeopleIntroduction To Cloud Foundry - SpringPeople
Introduction To Cloud Foundry - SpringPeopleSpringPeople
 
Introduction To Spring Enterprise Integration - SpringPeople
Introduction To Spring Enterprise Integration - SpringPeopleIntroduction To Spring Enterprise Integration - SpringPeople
Introduction To Spring Enterprise Integration - SpringPeopleSpringPeople
 
Introduction To Groovy And Grails - SpringPeople
Introduction To Groovy And Grails - SpringPeopleIntroduction To Groovy And Grails - SpringPeople
Introduction To Groovy And Grails - SpringPeopleSpringPeople
 
Introduction To Jenkins - SpringPeople
Introduction To Jenkins - SpringPeopleIntroduction To Jenkins - SpringPeople
Introduction To Jenkins - SpringPeopleSpringPeople
 

Mehr von SpringPeople (20)

Growth hacking tips and tricks that you can try
Growth hacking tips and tricks that you can tryGrowth hacking tips and tricks that you can try
Growth hacking tips and tricks that you can try
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Microsoft Azure IaaS
Introduction to Microsoft Azure IaaSIntroduction to Microsoft Azure IaaS
Introduction to Microsoft Azure IaaS
 
Introduction to Selenium WebDriver
Introduction to Selenium WebDriverIntroduction to Selenium WebDriver
Introduction to Selenium WebDriver
 
Introduction to Open stack - An Overview
Introduction to Open stack - An Overview Introduction to Open stack - An Overview
Introduction to Open stack - An Overview
 
Why 2 million Developers depend on MuleSoft
Why 2 million Developers depend on MuleSoftWhy 2 million Developers depend on MuleSoft
Why 2 million Developers depend on MuleSoft
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
 
Mastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium SuccessfullyMastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium Successfully
 
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
SpringPeople - Devops skills - Do you have what it takes?
SpringPeople - Devops skills - Do you have what it takes?SpringPeople - Devops skills - Do you have what it takes?
SpringPeople - Devops skills - Do you have what it takes?
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0
 
Introduction To Core Java - SpringPeople
Introduction To Core Java - SpringPeopleIntroduction To Core Java - SpringPeople
Introduction To Core Java - SpringPeople
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
 
Introduction To Cloud Foundry - SpringPeople
Introduction To Cloud Foundry - SpringPeopleIntroduction To Cloud Foundry - SpringPeople
Introduction To Cloud Foundry - SpringPeople
 
Introduction To Spring Enterprise Integration - SpringPeople
Introduction To Spring Enterprise Integration - SpringPeopleIntroduction To Spring Enterprise Integration - SpringPeople
Introduction To Spring Enterprise Integration - SpringPeople
 
Introduction To Groovy And Grails - SpringPeople
Introduction To Groovy And Grails - SpringPeopleIntroduction To Groovy And Grails - SpringPeople
Introduction To Groovy And Grails - SpringPeople
 
Introduction To Jenkins - SpringPeople
Introduction To Jenkins - SpringPeopleIntroduction To Jenkins - SpringPeople
Introduction To Jenkins - SpringPeople
 

Kürzlich hochgeladen

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) 2.3 _Part 2

  • 1. Hadoop Admin Best Practices with HDP 2.3 Part-2
  • 2.  We have INSTRUCTOR LED - both Online LIVE & Classroom Session  Present for classroom sessions in Bangalore & Delhi (NCR)  We are the ONLY Education delivery partners for Mulesoft, Elastic, Pivotal & Lightbend in India  We have delivered more than 5000 trainings and have over 400 courses and a vast pool of over 200 experts to make YOU the EXPERT! FOLLOW US ON SOCIAL MEDIA TO STAY UPDATED ON THE UPCOMING WEBINARS
  • 3. Online and Classroom Training on Technology Courses at SpringPeople Certified Partners Non-Certified Courses …and many more …NOW
  • 5. The HDP 2.3 Platform Versions
  • 6. Covered Till Now 1. Use Ambari – Cluster Management Tool 2. More of WebHDFS Access 3. WebHDFS 4. Use More of HDFS Access Control Lists 5. Use HDFS Quotas 6. Understanding of YARN Components 7. Adding, Deleting, or Replacing Worker Nodes 8. Rack Awareness 9. NameNode High Availability 10. ResourceManager High Availability 11. Ambari Metrics System 12. What to Backup?
  • 7. 13 – Setting appropriate Directory Space Quota • Best practice is to also set space limits on home directory To set a 12TB limit: $ hdfs dfsadmin –setSpaceQuota 12t /user/username • Includes space for replication • This is the actual use of space • Example: • If storing 1TB and replication factor is 3 • 3TB is needed • Quota can be set on any directory
  • 8. 14 - Configuring Trash • Enable by setting time delay for trash's checkpoint removal: In core-site.xml • fs.trash.interval • Delay is set in minutes (24 hours would be 1440 minutes) • Recommendation is to set to 360 minutes (6 hours) • Setting the value to 0 disables Trash • Files deleted programmatically are deleted immediately • Files can be immediately deleted from the command line using -skipTrash
  • 9. 15 - Compression Needs and Tradeoffs  Compressing data can speed up data-intensive I/O operations • MapReduce jobs are almost always I/O bound  Compressed data can save storage space and speed up data transfers across the network • Capital allocation for hardware can go further  Reduced I/O and network load can result in significant performance improvements • MapReduce jobs can finish faster overall  But, CPU utilization and processing time increase during compression and decompression • Understanding the tradeoffs is important for MapReduce pipeline’s overall performance
  • 10. 16 - Sqoop Security • Database Authentication: • Sqoop needs to authenticate to the RDBMS • How? • Usually this involves a username/password (Oracle Wallet is the exception) • Can hard code password in scripts (not recommended/used) • Password usually stored in plaintext in a file protected by the filesystem • Hadoop Credential Management Framework added in HDP 2.2 • Not a keystore, but a way to interface with keystore backends • Passwords can be stored in a keystore and not in plain text • Can help with “no passwords in plaintext” requirements
  • 11. 17 - distcp Configurations • If Distcp runs out of memory before copying: • Possible Cause: Number of files/directories being copied from source path(s) is extremely large (e.g. 100,000 paths) • Change: HEAP Size - Export HADOOP_CLIENT_OPTS="-Xms64m -Xmx1024m” • Map Sizing • If -m is not specified: Default to 20 maps max • Tuning the number of maps to: - Size of the source and destination cluster - The size of the copy - Available bandwidth
  • 12. 18 - Falcon  Centrally manages data lifecycle • Centralized definition & management of pipelines for data ingest, process and export  Supports Business continuity and Disaster Recovery • Out of the box policies for data replication and retention • End-to-end monitoring of data pipelines  Addresses basic audit & compliance requirements • Visualize data pipeline lineage • Track data pipeline audit logs • Tag data with business metadata
  • 13. 19 - Running Balancer • Can be run periodically as a batch job • Examples: every 24 hours or weekly • Run after new nodes have been added to the cluster • To run balancer: hdfs balancer [-threshold <threshold>] [-policy <policy>]] • Runs until there are no blocks to move or Until it has lost contact with the NameNode • Can be stopped with a Ctrl+C
  • 14. 20 - HDFS Snapshots Create HDFS directory snapshots Fast operation - only metadata affected Results in .snapshot/ directory in the HDFS directory Snapshots are named or default to timestamp Directories must be made snapshottable Snapshot Steps: – Allow snapshot on directory hdfs dfsadmin -allowSnapshot foo/bar/ – Create snapshot for directory and optionally provide snapshot name hdfs dfs -createSnapshot foo/bar/ mysnapshot_today – Verify snapshot hdfs dfs -ls foo/bar/.snapshot
  • 15. 21 - HDFS Data – Automate & Restore • Use Falcon/Oozie to automate backups • Falcon utilizes Oozie as a workflow scheduler • distcp is an Oozie action - use -update and -prbugp • Restoring is the reverse process of backups 1. On your backup cluster choose which snapshot to restore 2. Remove/move target directory on production system 3. Run distcp without -update options
  • 16. 22 - Apache Ranger
  • 17.
  • 18. www.springpeople.comtraining@springpeople.com Upcoming Hortonworks Classes at SpringPeople Classroom (Bengaluru) 05 - 08 Sept 26 - 28 Sept 10 - 13 Oct 07 - 10 Nov 05 - 08 Dec 19 - 21 Dec Online LIVE 22 - 31 Aug 05 - 17 Sept 19 Sept - 01 Oct

Hinweis der Redaktion

  1. 8/12/2016 2:39 PM
  2. Feel free to spend a lot of time on this slide. Many of these frameworks are not discussed later in the course, so now is likely your only chance to explain them. Let the students ask questions and make the discussion interactive.
  3. So what is Hortonworks Data Platform(HDP)? It is an open enterprise version of Hadoop distributed by Hortonworks. It includes a single installation utility that installs many of the Apache Hadoop software frameworks. Even the installer is pure Hadoop. The primary benefit is that Hortonworks has put HDP through a rigorous set of system, functional, and regression tests to ensure that versions of any frameworks included in the distribution work seamlessly together in a secure and reliable manner. Because HDP is an open enterprise version of Hadoop, it is imperative that it uses the best combination of the most stable, reliable, secure, and current frameworks.
  4. There is one more property which is having relation with the above property called fs.trash.checkpoint.interval. It is the number of minutes between trash checkpoints. This should be smaller or equal to fs.trash.interval. Everytime the checkpointer runs, it creates a new checkpoint out of current and removes checkpoints created more than fs.trash.interval minutes ago.The default value of this property is zero.