SlideShare ist ein Scribd-Unternehmen logo
1 von 31
HDFS: Optimization, Stabilization and
Supportability
June 28, 2016
Chris Nauroth
email: cnauroth@hortonworks.com
twitter: @cnauroth
Arpit Agarwal
email: aagarwal@hortonworks.com
twitter: @aagarw
© Hortonworks Inc. 2011
About Us
Chris Nauroth
• Member of Technical Staff, Hortonworks
– Apache Hadoop committer, PMC member, and Apache Software Foundation member
– Major contributor to HDFS ACLs, Windows compatibility, and operability improvements
• Hadoop user since 2010
– Prior employment experience deploying, maintaining and using Hadoop clusters
Arpit Agarwal
• Member of Technical Staff, Hortonworks
– Apache Hadoop Committer, PMC Member
– Major contributor to HDFS Heterogeneous Storage Support, Windows Compatibility
Page 2
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Motivation
• HDFS engineers are on the front line for operational support of Hadoop.
– HDFS is the foundational storage layer for typical Hadoop deployments.
– Therefore, challenges in HDFS have the potential to impact the entire Hadoop ecosystem.
– Conversely, application problems can become visible at the layer of HDFS operations.
• Analysis of Hadoop Support Cases
– Support case trends reveal common patterns for HDFS operational challenges.
– Those challenges inform what needs to improve in the software.
• Software Improvements
– Optimization: Identify and mitigate bottlenecks.
– Stabilization: Prevent unusual circumstances from harming cluster uptime.
– Supportability: When something goes wrong, provide visibility and tools to fix it.
Thank you to the entire community of Apache contributors.
Page 3
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Performance
• Garbage Collection
– NameNode heap must scale up in relation to the number of file system objects (files, directories, blocks, etc.).
– Recent hardware trends can cause larger DataNode heaps too. (Nodes have more disks and those disks are
larger, therefore the memory footprint has increased for tracking block state)
– Much has been written about garbage collection tuning for large heap JVM processes.
– In addition to recommending configuration best practices, we can optimize the codebase to reduce garbage
collection pressure.
Page 4
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Performance
• Block Reporting
– The process by which DataNodes report information about their stored blocks to the NameNode.
– Full Block Report: a complete catalog of all of the node’s blocks, sent infrequently.
– Incremental Block Report: partial information about recently added or deleted blocks, sent more frequently.
– All block reporting occurs asynchronous of any user-facing operations, so it does not impact end user latency
directly.
– However, inefficiencies in block reporting can overwhelm a cluster to the point that it can no longer serve end
user operations sufficiently.
Page 5
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-7435: PB encoding of block reports is very
inefficient
• Block report RPC message encoding can cause memory allocation inefficiency and garbage
collection churn.
– HDFS RPC messages are encoded using Protocol Buffers.
– Block reports encode each block as a sequence of 3 64-bit long fields.
– Behind the scenes, this becomes an ArrayList<Integer> with a default capacity of 10.
– DataNodes almost always send a larger block report than this, so array reallocation churn is almost guaranteed.
– Boxing and unboxing causes additional allocation requirements.
• Solution: a more GC-friendly encoding of block reports.
– Take over serialization directly.
– Manually encode number of longs, followed by list of primitive longs.
– Eliminates ArrayList reallocation costs.
– Eliminates boxing and unboxing costs by deserializing straight to primitive long.
Page 6
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-9710: Change DN to send block receipt IBRs in
batches
• Incremental block reports trigger multiple RPC calls.
– When a DataNode receives a block, it sends an incremental block report RPC to the NameNode immediately.
– Even multiple block receipts translate to multiple individual incremental block report RPCs.
– With consideration of all DataNodes in a large cluster, this can become a huge number of RPC messages for the
NameNode to process.
• Solution: batch multiple block receipt events into a single RPC message.
– Reduces RPC overhead of sending multiple messages.
– Scales better with respect to number of nodes and number of blocks in a cluster.
Page 7
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Liveness
• "...make progress despite the fact that its concurrently executing components ("processes") may
have to "take turns" in critical sections..." -Wikipedia
• DataNode Heartbeats
– Responsible for reporting health of a DataNode to the NameNode.
– Operational problems of managing load and performance can block timely heartbeat processing.
– Heartbeat processing at the NameNode can be surprisingly costly due to contention on a global lock and
asynchronous dispatch of commands (e.g. delete block).
• Blocked heartbeat processing can cause cascading failure and downtime.
– Blocked heartbeat processing: looks the same as DataNode not running at all.
– DataNodes not running: flagged by the NameNode as stale, then dead.
– Multiple stale DataNodes: reduced cluster capacity.
– Multiple dead DataNodes: storm of wasteful re-replication activity.
Page 8
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-9239: DataNode Lifeline Protocol: an alternative
protocol for reporting DataNode health
• The lifeline keeps the DataNode alive, despite conditions of unusually high load.
– Optionally run a separate RPC server within the NameNode dedicated to processing of lifeline messages sent by
DataNodes.
– Lifeline messages are a simplified form of heartbeat messages, but do not have the same costly requirements for
asynchronous command dispatch, and therefore do not need to contend on a shared lock.
– Even if the main NameNode RPC queue is overwhelmed, the lifeline still keeps the DataNode alive.
– Prevents erroneous and costly re-replication activity.
Page 9
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-9311: Support optional offload of NameNode HA
service health checks to a separate RPC server.
• RPC offload of HA health check and failover messages.
– Similar to problem of timely heartbeat message delivery.
– NameNode HA requires messages sent from the ZKFC (ZooKeeper Failover Controller) process to the
NameNode.
– Messages are related to handling periodic health checks and initiating shutdown and failover if necessary.
– A NameNode overwhelmed with unusually high load cannot process these messages.
– Delayed processing of these messages slows down NameNode failover, and thus creates a visibly prolonged
outage period.
– The lifeline RPC server can be used to offload HA messages, and similarly keep processing them even in the
case of unusually high load.
Page 10
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Optimizing Applications
• HDFS Utilization Patterns
– Sometimes it’s helpful to look a layer higher and assess what applications are doing with HDFS.
– FileSystem API unfortunately can make it too easy to implement inefficient call patterns.
Page 11
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HIVE-10223: Consolidate several redundant FileSystem
API calls.
• Hadoop FileSystem API can cause applications to make redundant RPC calls.
• Before:
if (fs.isFile(file)) { // RPC #1
...
} else if (fs.isDirectory(file)) { // RPC #2
...
}
• After:
FileStatus fileStatus = fs.getFileStatus(file); // Just 1 RPC
if (fileStatus.isFile()) { // Local, no RPC
...
} else if (fileStatus.isDirectory()) { // Local, no RPC
...
}
• Good for Hive, because it reduces latency associated with NameNode RPCs.
• Good for the whole ecosystem, because it reduces load on the NameNode, a shared service.
Page 12
Architecting the Future of Big Data
© Hortonworks Inc. 2011
PIG-4442: Eliminate redundant RPC call to get file
information in HPath.
• A similar story of redundant RPC within Pig code.
• Before:
long blockSize = fs.getHFS().getFileStatus(path).getBlockSize(); // RPC #1
short replication = fs.getHFS().getFileStatus(path).getReplication(); // RPC #2
• After:
FileStatus fileStatus = fs.getHFS().getFileStatus(path); // Just 1 RPC
long blockSize = fileStatus.getBlockSize(); // Local, no RPC
short replication = fileStatus.getReplication(); // Local, no RPC
• Revealed from inspection of HDFS audit log.
– HDFS audit log shows a record of each file system operation executed against the NameNode.
– This continues to be one of the most significant sources of HDFS troubleshooting information.
– In this case, manual inspection revealed a suspicious pattern of multiple getfileinfo calls for the same path from a
Pig job submission.
Page 13
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Managing NameNode Load
• NameNode no longer a single point of failure
–However NameNode performance can still be a bottleneck
• Assumption that applications will be well-behaved
• A single inefficient job can easily overwhelm the NameNode with too much RPC load.
Page 14
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Hadoop RPC Architecture
• Hadoop RPC admits incoming calls into a shared queue.
• Worker threads consume incoming calls from that shared queue and process them
• In an overloaded situation, calls spend longer waiting in the queue for a worker thread
to become available.
• If the RPC queue overflows, requests are queued in the OS socket buffers.
–More buffering leads to higher RPC latencies and potentially client side timeouts.
–Timeouts often result in job failures and restarts
–Restarted jobs cause more work - positive feedback loop.
• Affects all callers, not just the caller that triggered the unusually high load.
Page 15
© Hortonworks Inc. 2011
HADOOP-10597: RPC Server signals backoff to clients
when all request queues are full
• If an RPC server’s queue is full, respond to new requests with a backoff signal.
• Clients react by performing exponential backoff before retrying the call.
–Reduce job failures by avoiding client timeouts
• Improves QoS for clients when server is under heavy load.
• RPC calls that would have timed out will instead succeed, but with longer latency.
Page 16
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HADOOP-10282: FairCallQueue
• Replace single RPC queue with multiple prioritized queues.
• Server maintains sliding window of RPC request counts, by user.
• New RPC calls placed into queues with priority based on the calling user’s history
• Calls are de-queued and processed with higher probability from higher-priority queues
• De-prioritizes heavy users under high load, prevents starvation of other jobs
• Complements RPC Congestion Control.
Page 17
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HADOOP-12916: Allow RPC scheduler/CallQueue backoff
using response times
• Flexible back-off policies.
– Triggering backoff when the queue is full is often too late.
– Clients may be already experiencing timeouts before the RPC queue overflows.
• Instead, track call response time and trigger backoff when response time exceeds
bounds.
• Further reduces the probability of client timeouts and hence reduces job failures.
Page 18
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HADOOP-13128: Manage Hadoop RPC resource usage
via resource coupon (proposed feature)
• Multi-tenancy is a key challenge in large enterprise deployments.
• Allows HDFS and the YARN ResourceManager to coordinate allocation of RPC
resources to multiple applications running concurrently in a multi-tenant deployment.
• FairCallQueue can lead to priority inversion
– NameNode is not aware of relative priorities of YARN jobs
– Requests from a high priority application can be demoted to a lower-priority RPC call queue.
– Resource coupon presented by incoming RPC requests.
• Allow the Resource Manager to request a slice of NameNode capacity via a coupon.
Page 19
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Logging
• Logging requires a careful balance.
• Too much logging causes
– Information overload
– Increased system load - Rendering strings is expensive, creates garbage
• Too little logging hides valuable operational information.
Page 20
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Too much logging
• Benign errors can confuse administrators
– INFO ipc.Server (Server.java:run(2165)) - IPC Server handler 32 on 8021, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing from
192.168.22.1:60216 Call#9371 Retry#0: org.apache.hadoop.ipc.StandbyException:
Operation category READ is not supported in state standby
– ERROR datanode.DataNode (DataXceiver.java:run(278)) –
myhost.hortonworks.com:50010:DataXceiver error processing unknown operation
src: /127.0.0.1:60681 dst: /127.0.0.1:50010 java.io.EOFException
Page 21
© Hortonworks Inc. 2011
Logging Pitfalls
• Forgotten guard logic.
– if (LOG.isDebugEnabled()) {
LOG.debug(“Processing block: “ + block); // expensive toString()
implementation!
}
• Switching the logging API to SLF4J can eliminate the need for log-level guards in most
cases.
– LOG.debug(“Processing block: {}”, block); // calls toString() only if
debug enabled
• Logging in a tight loop.
• Logging while holding a shared resource, such as a mutually exclusive lock.
Page 22
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-9434: Recommission a datanode with 500k blocks
may pause NN for 30 seconds
• Logging is too verbose
– Summary of patch: don’t log too much!
– Move detailed logging to debug or trace level.
• Before:
LOG.info("BLOCK* processOverReplicatedBlock: " +
"Postponing processing of over-replicated " +
block + " since storage + " + storage
+ "datanode " + cur + " does not yet have up-to-date " +
"block information.");
• After:
LOG.trace("BLOCK* processOverReplicatedBlock: Postponing {}"
+ " since storage {} does not yet have up-to-date information.",
block, storage);
Page 23
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Troubleshooting
• Metrics are vital for diagnosis of most operational problems.
– Metrics must be capable of showing that there is a problem. (e.g. RPC call volume spike)
– Metrics also must be capable of identifying the source of that problem. (e.g. user issuing RPC
calls)
Page 24
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-6982: nntop
• Find activity trends of HDFS operations.
– HDFS audit log contains a record of each file system operation to the NameNode.
2015-11-16 21:00:00,109 INFO FSNamesystem.audit: allowed=true ugi=bob (auth:SIMPLE)
ip=/192.168.1.5 cmd=listStatus src=/app-logs/pcd_batch/application_1431545431771/
dst=null perm=null
– However identifying sources of load from audit log requires ad-hoc scripting.
• nntop: HDFS operation counts aggregated per operation and per user within time
windows.
– TopUserOpCounts - default time windows of 1 minute, 5 minutes, 25 minutes
– curl
'http://127.0.0.1:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystemState’
Page 25
Architecting the Future of Big Data
© Hortonworks Inc. 2011
nnTop sample Output
"windowLenMs": 60000,
"ops": [
{
"opType": "create",
"topUsers": [
{
"user": "alice@EXAMPLE.COM",
"count": 4632
},
{
"user": "bob@EXAMPLE.COM",
"count": 1387
}
],
"totalCount": 6019
}
...
Page 26
© Hortonworks Inc. 2011
Troubleshooting Kerberos
• Kerberos is hard.
– Many moving parts: KDC, DNS, principals, keytabs and Hadoop configuration.
– Management tools like Apache Ambari automate initial provisioning of principals, keytabs and
configuration.
– When it doesn’t work, finding root cause is challenging.
Page 27
© Hortonworks Inc. 2011
HADOOP-12426: kdiag
• Kerberos misconfiguration diagnosis.
– DNS
– Hadoop configuration files
– KDC configuration
• kdiag: a command-line tool for diagnosis of Kerberos problems
– Prints various environment variables, Java system properties and Hadoop configuration options
related to security.
– Attempt a login.
– If keytab used, print principal information from keytab.
– Print krb5.conf.
– Validate kinit executable (used for ticket renewals).
Page 28
Architecting the Future of Big Data
© Hortonworks Inc. 2011
kdiag Sample Output - misconfigured DNS
[hdfs@c6401 ~]$ hadoop org.apache.hadoop.security.KDiag
== Kerberos Diagnostics scan at Mon Jun 27 23:13:40 UTC 2016 ==
16/06/27 23:13:40 ERROR security.KDiag: java.net.UnknownHostException:
java.net.UnknownHostException: c6401.ambari.apache.org: c6401.ambari.apache.org:
unknown error
at java.net.InetAddress.getLocalHost(InetAddress.java:1505)
at org.apache.hadoop.security.KDiag.execute(KDiag.java:266)
at org.apache.hadoop.security.KDiag.run(KDiag.java:221)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.security.KDiag.exec(KDiag.java:926)
at org.apache.hadoop.security.KDiag.main(KDiag.java:936)
...
Page 29
© Hortonworks Inc. 2011
Summary
• A variety of recent enhancements have improved the ability of HDFS to serve as the foundational
storage layer of the Hadoop ecosystem.
• Optimization
– Performance
– Optimizing Applications
• Stabilization
– Liveness
– Managing Load
• Supportability
– Logging
– Troubleshooting
Page 30
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Thank you! Q&A
• A few recommended best practices while we address questions…
– Enable HDFS audit logs and periodically monitor audit logs/nnTop for unexpected patterns.
– Configure service heap settings correctly.
– https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-
80953924-1cbf-4655-9953-1e744290a6c3.1.html
– Use dedicated disks for NN metadata directories/journal node directories.
– http://hortonworks.com/blog/hdfs-metadata-directories-explained/
– Run balancer (and soon disk-balancer) periodically.
– http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer
– Monitor for LDAP group lookup performance issues.
– https://community.hortonworks.com/content/kbentry/38591/hadoop-and-ldap-usage-load-patterns-and-tuning.html
– Use SmartSense for proactive analysis of potential issues and recommended fixes.
– http://hortonworks.com/products/subscriptions/smartsense/
Page 31
Architecting the Future of Big Data

Weitere ähnliche Inhalte

Was ist angesagt?

HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentDataWorks Summit/Hadoop Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleDataWorks Summit/Hadoop Summit
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Chris Nauroth
 

Was ist angesagt? (20)

Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFiFrom Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
HDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and SupportabilityHDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and Supportability
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 

Andere mochten auch

Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...DataWorks Summit/Hadoop Summit
 
Improving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of ServiceImproving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of ServiceMing Ma
 
Investing the Effects of Overcommitting YARN resources
Investing the Effects of Overcommitting YARN resourcesInvesting the Effects of Overcommitting YARN resources
Investing the Effects of Overcommitting YARN resourcesDataWorks Summit/Hadoop Summit
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impalaNAVER D2
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Chris Nauroth
 
Como utilizar las redes sociales
Como utilizar las redes socialesComo utilizar las redes sociales
Como utilizar las redes socialesmberaliz
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)W2O Group
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemRutvik Bapat
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala InternalsDavid Groozman
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemCloudera, Inc.
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 

Andere mochten auch (20)

Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
 
SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser
 
Extreme Analytics @ eBay
Extreme Analytics @ eBayExtreme Analytics @ eBay
Extreme Analytics @ eBay
 
HDFS Analysis for Small Files
HDFS Analysis for Small FilesHDFS Analysis for Small Files
HDFS Analysis for Small Files
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
 
Improving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of ServiceImproving HDFS Availability with Hadoop RPC Quality of Service
Improving HDFS Availability with Hadoop RPC Quality of Service
 
Investing the Effects of Overcommitting YARN resources
Investing the Effects of Overcommitting YARN resourcesInvesting the Effects of Overcommitting YARN resources
Investing the Effects of Overcommitting YARN resources
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1
 
Como utilizar las redes sociales
Como utilizar las redes socialesComo utilizar las redes sociales
Como utilizar las redes sociales
 
What is Data?
What is Data?What is Data?
What is Data?
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Scheduling Policies in YARN
Scheduling Policies in YARNScheduling Policies in YARN
Scheduling Policies in YARN
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File System
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 

Ähnlich wie Hdfs 2016-hadoop-summit-san-jose-v4

Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Chris Nauroth
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Etu Solution
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 Chris Almond
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2hdhappy001
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoBig Data Joe™ Rossi
 
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersEdelweiss Kammermann
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)VMware Tanzu
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Cloudera, Inc.
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clustersenissoz
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Community
 

Ähnlich wie Hdfs 2016-hadoop-summit-san-jose-v4 (20)

Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
 
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
 

Kürzlich hochgeladen

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 

Kürzlich hochgeladen (20)

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 

Hdfs 2016-hadoop-summit-san-jose-v4

  • 1. HDFS: Optimization, Stabilization and Supportability June 28, 2016 Chris Nauroth email: cnauroth@hortonworks.com twitter: @cnauroth Arpit Agarwal email: aagarwal@hortonworks.com twitter: @aagarw
  • 2. © Hortonworks Inc. 2011 About Us Chris Nauroth • Member of Technical Staff, Hortonworks – Apache Hadoop committer, PMC member, and Apache Software Foundation member – Major contributor to HDFS ACLs, Windows compatibility, and operability improvements • Hadoop user since 2010 – Prior employment experience deploying, maintaining and using Hadoop clusters Arpit Agarwal • Member of Technical Staff, Hortonworks – Apache Hadoop Committer, PMC Member – Major contributor to HDFS Heterogeneous Storage Support, Windows Compatibility Page 2 Architecting the Future of Big Data
  • 3. © Hortonworks Inc. 2011 Motivation • HDFS engineers are on the front line for operational support of Hadoop. – HDFS is the foundational storage layer for typical Hadoop deployments. – Therefore, challenges in HDFS have the potential to impact the entire Hadoop ecosystem. – Conversely, application problems can become visible at the layer of HDFS operations. • Analysis of Hadoop Support Cases – Support case trends reveal common patterns for HDFS operational challenges. – Those challenges inform what needs to improve in the software. • Software Improvements – Optimization: Identify and mitigate bottlenecks. – Stabilization: Prevent unusual circumstances from harming cluster uptime. – Supportability: When something goes wrong, provide visibility and tools to fix it. Thank you to the entire community of Apache contributors. Page 3 Architecting the Future of Big Data
  • 4. © Hortonworks Inc. 2011 Performance • Garbage Collection – NameNode heap must scale up in relation to the number of file system objects (files, directories, blocks, etc.). – Recent hardware trends can cause larger DataNode heaps too. (Nodes have more disks and those disks are larger, therefore the memory footprint has increased for tracking block state) – Much has been written about garbage collection tuning for large heap JVM processes. – In addition to recommending configuration best practices, we can optimize the codebase to reduce garbage collection pressure. Page 4 Architecting the Future of Big Data
  • 5. © Hortonworks Inc. 2011 Performance • Block Reporting – The process by which DataNodes report information about their stored blocks to the NameNode. – Full Block Report: a complete catalog of all of the node’s blocks, sent infrequently. – Incremental Block Report: partial information about recently added or deleted blocks, sent more frequently. – All block reporting occurs asynchronous of any user-facing operations, so it does not impact end user latency directly. – However, inefficiencies in block reporting can overwhelm a cluster to the point that it can no longer serve end user operations sufficiently. Page 5 Architecting the Future of Big Data
  • 6. © Hortonworks Inc. 2011 HDFS-7435: PB encoding of block reports is very inefficient • Block report RPC message encoding can cause memory allocation inefficiency and garbage collection churn. – HDFS RPC messages are encoded using Protocol Buffers. – Block reports encode each block as a sequence of 3 64-bit long fields. – Behind the scenes, this becomes an ArrayList<Integer> with a default capacity of 10. – DataNodes almost always send a larger block report than this, so array reallocation churn is almost guaranteed. – Boxing and unboxing causes additional allocation requirements. • Solution: a more GC-friendly encoding of block reports. – Take over serialization directly. – Manually encode number of longs, followed by list of primitive longs. – Eliminates ArrayList reallocation costs. – Eliminates boxing and unboxing costs by deserializing straight to primitive long. Page 6 Architecting the Future of Big Data
  • 7. © Hortonworks Inc. 2011 HDFS-9710: Change DN to send block receipt IBRs in batches • Incremental block reports trigger multiple RPC calls. – When a DataNode receives a block, it sends an incremental block report RPC to the NameNode immediately. – Even multiple block receipts translate to multiple individual incremental block report RPCs. – With consideration of all DataNodes in a large cluster, this can become a huge number of RPC messages for the NameNode to process. • Solution: batch multiple block receipt events into a single RPC message. – Reduces RPC overhead of sending multiple messages. – Scales better with respect to number of nodes and number of blocks in a cluster. Page 7 Architecting the Future of Big Data
  • 8. © Hortonworks Inc. 2011 Liveness • "...make progress despite the fact that its concurrently executing components ("processes") may have to "take turns" in critical sections..." -Wikipedia • DataNode Heartbeats – Responsible for reporting health of a DataNode to the NameNode. – Operational problems of managing load and performance can block timely heartbeat processing. – Heartbeat processing at the NameNode can be surprisingly costly due to contention on a global lock and asynchronous dispatch of commands (e.g. delete block). • Blocked heartbeat processing can cause cascading failure and downtime. – Blocked heartbeat processing: looks the same as DataNode not running at all. – DataNodes not running: flagged by the NameNode as stale, then dead. – Multiple stale DataNodes: reduced cluster capacity. – Multiple dead DataNodes: storm of wasteful re-replication activity. Page 8 Architecting the Future of Big Data
  • 9. © Hortonworks Inc. 2011 HDFS-9239: DataNode Lifeline Protocol: an alternative protocol for reporting DataNode health • The lifeline keeps the DataNode alive, despite conditions of unusually high load. – Optionally run a separate RPC server within the NameNode dedicated to processing of lifeline messages sent by DataNodes. – Lifeline messages are a simplified form of heartbeat messages, but do not have the same costly requirements for asynchronous command dispatch, and therefore do not need to contend on a shared lock. – Even if the main NameNode RPC queue is overwhelmed, the lifeline still keeps the DataNode alive. – Prevents erroneous and costly re-replication activity. Page 9 Architecting the Future of Big Data
  • 10. © Hortonworks Inc. 2011 HDFS-9311: Support optional offload of NameNode HA service health checks to a separate RPC server. • RPC offload of HA health check and failover messages. – Similar to problem of timely heartbeat message delivery. – NameNode HA requires messages sent from the ZKFC (ZooKeeper Failover Controller) process to the NameNode. – Messages are related to handling periodic health checks and initiating shutdown and failover if necessary. – A NameNode overwhelmed with unusually high load cannot process these messages. – Delayed processing of these messages slows down NameNode failover, and thus creates a visibly prolonged outage period. – The lifeline RPC server can be used to offload HA messages, and similarly keep processing them even in the case of unusually high load. Page 10 Architecting the Future of Big Data
  • 11. © Hortonworks Inc. 2011 Optimizing Applications • HDFS Utilization Patterns – Sometimes it’s helpful to look a layer higher and assess what applications are doing with HDFS. – FileSystem API unfortunately can make it too easy to implement inefficient call patterns. Page 11 Architecting the Future of Big Data
  • 12. © Hortonworks Inc. 2011 HIVE-10223: Consolidate several redundant FileSystem API calls. • Hadoop FileSystem API can cause applications to make redundant RPC calls. • Before: if (fs.isFile(file)) { // RPC #1 ... } else if (fs.isDirectory(file)) { // RPC #2 ... } • After: FileStatus fileStatus = fs.getFileStatus(file); // Just 1 RPC if (fileStatus.isFile()) { // Local, no RPC ... } else if (fileStatus.isDirectory()) { // Local, no RPC ... } • Good for Hive, because it reduces latency associated with NameNode RPCs. • Good for the whole ecosystem, because it reduces load on the NameNode, a shared service. Page 12 Architecting the Future of Big Data
  • 13. © Hortonworks Inc. 2011 PIG-4442: Eliminate redundant RPC call to get file information in HPath. • A similar story of redundant RPC within Pig code. • Before: long blockSize = fs.getHFS().getFileStatus(path).getBlockSize(); // RPC #1 short replication = fs.getHFS().getFileStatus(path).getReplication(); // RPC #2 • After: FileStatus fileStatus = fs.getHFS().getFileStatus(path); // Just 1 RPC long blockSize = fileStatus.getBlockSize(); // Local, no RPC short replication = fileStatus.getReplication(); // Local, no RPC • Revealed from inspection of HDFS audit log. – HDFS audit log shows a record of each file system operation executed against the NameNode. – This continues to be one of the most significant sources of HDFS troubleshooting information. – In this case, manual inspection revealed a suspicious pattern of multiple getfileinfo calls for the same path from a Pig job submission. Page 13 Architecting the Future of Big Data
  • 14. © Hortonworks Inc. 2011 Managing NameNode Load • NameNode no longer a single point of failure –However NameNode performance can still be a bottleneck • Assumption that applications will be well-behaved • A single inefficient job can easily overwhelm the NameNode with too much RPC load. Page 14 Architecting the Future of Big Data
  • 15. © Hortonworks Inc. 2011 Hadoop RPC Architecture • Hadoop RPC admits incoming calls into a shared queue. • Worker threads consume incoming calls from that shared queue and process them • In an overloaded situation, calls spend longer waiting in the queue for a worker thread to become available. • If the RPC queue overflows, requests are queued in the OS socket buffers. –More buffering leads to higher RPC latencies and potentially client side timeouts. –Timeouts often result in job failures and restarts –Restarted jobs cause more work - positive feedback loop. • Affects all callers, not just the caller that triggered the unusually high load. Page 15
  • 16. © Hortonworks Inc. 2011 HADOOP-10597: RPC Server signals backoff to clients when all request queues are full • If an RPC server’s queue is full, respond to new requests with a backoff signal. • Clients react by performing exponential backoff before retrying the call. –Reduce job failures by avoiding client timeouts • Improves QoS for clients when server is under heavy load. • RPC calls that would have timed out will instead succeed, but with longer latency. Page 16 Architecting the Future of Big Data
  • 17. © Hortonworks Inc. 2011 HADOOP-10282: FairCallQueue • Replace single RPC queue with multiple prioritized queues. • Server maintains sliding window of RPC request counts, by user. • New RPC calls placed into queues with priority based on the calling user’s history • Calls are de-queued and processed with higher probability from higher-priority queues • De-prioritizes heavy users under high load, prevents starvation of other jobs • Complements RPC Congestion Control. Page 17 Architecting the Future of Big Data
  • 18. © Hortonworks Inc. 2011 HADOOP-12916: Allow RPC scheduler/CallQueue backoff using response times • Flexible back-off policies. – Triggering backoff when the queue is full is often too late. – Clients may be already experiencing timeouts before the RPC queue overflows. • Instead, track call response time and trigger backoff when response time exceeds bounds. • Further reduces the probability of client timeouts and hence reduces job failures. Page 18 Architecting the Future of Big Data
  • 19. © Hortonworks Inc. 2011 HADOOP-13128: Manage Hadoop RPC resource usage via resource coupon (proposed feature) • Multi-tenancy is a key challenge in large enterprise deployments. • Allows HDFS and the YARN ResourceManager to coordinate allocation of RPC resources to multiple applications running concurrently in a multi-tenant deployment. • FairCallQueue can lead to priority inversion – NameNode is not aware of relative priorities of YARN jobs – Requests from a high priority application can be demoted to a lower-priority RPC call queue. – Resource coupon presented by incoming RPC requests. • Allow the Resource Manager to request a slice of NameNode capacity via a coupon. Page 19 Architecting the Future of Big Data
  • 20. © Hortonworks Inc. 2011 Logging • Logging requires a careful balance. • Too much logging causes – Information overload – Increased system load - Rendering strings is expensive, creates garbage • Too little logging hides valuable operational information. Page 20 Architecting the Future of Big Data
  • 21. © Hortonworks Inc. 2011 Too much logging • Benign errors can confuse administrators – INFO ipc.Server (Server.java:run(2165)) - IPC Server handler 32 on 8021, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing from 192.168.22.1:60216 Call#9371 Retry#0: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby – ERROR datanode.DataNode (DataXceiver.java:run(278)) – myhost.hortonworks.com:50010:DataXceiver error processing unknown operation src: /127.0.0.1:60681 dst: /127.0.0.1:50010 java.io.EOFException Page 21
  • 22. © Hortonworks Inc. 2011 Logging Pitfalls • Forgotten guard logic. – if (LOG.isDebugEnabled()) { LOG.debug(“Processing block: “ + block); // expensive toString() implementation! } • Switching the logging API to SLF4J can eliminate the need for log-level guards in most cases. – LOG.debug(“Processing block: {}”, block); // calls toString() only if debug enabled • Logging in a tight loop. • Logging while holding a shared resource, such as a mutually exclusive lock. Page 22 Architecting the Future of Big Data
  • 23. © Hortonworks Inc. 2011 HDFS-9434: Recommission a datanode with 500k blocks may pause NN for 30 seconds • Logging is too verbose – Summary of patch: don’t log too much! – Move detailed logging to debug or trace level. • Before: LOG.info("BLOCK* processOverReplicatedBlock: " + "Postponing processing of over-replicated " + block + " since storage + " + storage + "datanode " + cur + " does not yet have up-to-date " + "block information."); • After: LOG.trace("BLOCK* processOverReplicatedBlock: Postponing {}" + " since storage {} does not yet have up-to-date information.", block, storage); Page 23 Architecting the Future of Big Data
  • 24. © Hortonworks Inc. 2011 Troubleshooting • Metrics are vital for diagnosis of most operational problems. – Metrics must be capable of showing that there is a problem. (e.g. RPC call volume spike) – Metrics also must be capable of identifying the source of that problem. (e.g. user issuing RPC calls) Page 24 Architecting the Future of Big Data
  • 25. © Hortonworks Inc. 2011 HDFS-6982: nntop • Find activity trends of HDFS operations. – HDFS audit log contains a record of each file system operation to the NameNode. 2015-11-16 21:00:00,109 INFO FSNamesystem.audit: allowed=true ugi=bob (auth:SIMPLE) ip=/192.168.1.5 cmd=listStatus src=/app-logs/pcd_batch/application_1431545431771/ dst=null perm=null – However identifying sources of load from audit log requires ad-hoc scripting. • nntop: HDFS operation counts aggregated per operation and per user within time windows. – TopUserOpCounts - default time windows of 1 minute, 5 minutes, 25 minutes – curl 'http://127.0.0.1:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystemState’ Page 25 Architecting the Future of Big Data
  • 26. © Hortonworks Inc. 2011 nnTop sample Output "windowLenMs": 60000, "ops": [ { "opType": "create", "topUsers": [ { "user": "alice@EXAMPLE.COM", "count": 4632 }, { "user": "bob@EXAMPLE.COM", "count": 1387 } ], "totalCount": 6019 } ... Page 26
  • 27. © Hortonworks Inc. 2011 Troubleshooting Kerberos • Kerberos is hard. – Many moving parts: KDC, DNS, principals, keytabs and Hadoop configuration. – Management tools like Apache Ambari automate initial provisioning of principals, keytabs and configuration. – When it doesn’t work, finding root cause is challenging. Page 27
  • 28. © Hortonworks Inc. 2011 HADOOP-12426: kdiag • Kerberos misconfiguration diagnosis. – DNS – Hadoop configuration files – KDC configuration • kdiag: a command-line tool for diagnosis of Kerberos problems – Prints various environment variables, Java system properties and Hadoop configuration options related to security. – Attempt a login. – If keytab used, print principal information from keytab. – Print krb5.conf. – Validate kinit executable (used for ticket renewals). Page 28 Architecting the Future of Big Data
  • 29. © Hortonworks Inc. 2011 kdiag Sample Output - misconfigured DNS [hdfs@c6401 ~]$ hadoop org.apache.hadoop.security.KDiag == Kerberos Diagnostics scan at Mon Jun 27 23:13:40 UTC 2016 == 16/06/27 23:13:40 ERROR security.KDiag: java.net.UnknownHostException: java.net.UnknownHostException: c6401.ambari.apache.org: c6401.ambari.apache.org: unknown error at java.net.InetAddress.getLocalHost(InetAddress.java:1505) at org.apache.hadoop.security.KDiag.execute(KDiag.java:266) at org.apache.hadoop.security.KDiag.run(KDiag.java:221) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.security.KDiag.exec(KDiag.java:926) at org.apache.hadoop.security.KDiag.main(KDiag.java:936) ... Page 29
  • 30. © Hortonworks Inc. 2011 Summary • A variety of recent enhancements have improved the ability of HDFS to serve as the foundational storage layer of the Hadoop ecosystem. • Optimization – Performance – Optimizing Applications • Stabilization – Liveness – Managing Load • Supportability – Logging – Troubleshooting Page 30 Architecting the Future of Big Data
  • 31. © Hortonworks Inc. 2011 Thank you! Q&A • A few recommended best practices while we address questions… – Enable HDFS audit logs and periodically monitor audit logs/nnTop for unexpected patterns. – Configure service heap settings correctly. – https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref- 80953924-1cbf-4655-9953-1e744290a6c3.1.html – Use dedicated disks for NN metadata directories/journal node directories. – http://hortonworks.com/blog/hdfs-metadata-directories-explained/ – Run balancer (and soon disk-balancer) periodically. – http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer – Monitor for LDAP group lookup performance issues. – https://community.hortonworks.com/content/kbentry/38591/hadoop-and-ldap-usage-load-patterns-and-tuning.html – Use SmartSense for proactive analysis of potential issues and recommended fixes. – http://hortonworks.com/products/subscriptions/smartsense/ Page 31 Architecting the Future of Big Data