SlideShare ist ein Scribd-Unternehmen logo
1 von 34
NameNode Analytics
1
NameNode Analytics
2
Who Am I?
Bachelor of Science in Computer Science from UC San Diego (Eleanor Roosevelt College).
I have been fortunate to work alongside Konstantin Shvachko, one of the original architects of the HDFS NameNode from
Yahoo!, for several years.
I have spent 6 years working on HDFS internals and related projects at eBay, WANdisco, and now PayPal.
Hadoop open source contributor:
‱ HDFS-3107: Introduce truncate to HDFS.
‱ HDFS-4456: Add concat to HttpFS and WebHDFS.
‱ HADOOP-10641: Introduce coordination / consensus interface to HDFS.
‱ MAPREDUCE-2669: Add StandardDev, Mean, and Mode, examples to MapReduce.
‱ Various bug fixes.
Work on NameNode internals and distributed File System design.
 Giraffa File System: https://github.com/GiraffaFS/giraffa
 GeoDistributed File System (WANdisco Patent): https://patents.justia.com/patent/20150278244
©2015 PayPal Inc. Confidential and proprietary. 3
Plamen Jeliazkov.
Background
Created as a means of storing petabytes order of data securely (through replication).
By virtue of being a distributed file system, HDFS is seen as a safe haven for any type of data.
However, HDFS does have its own scaling limitations:
‱ “Limits are around 10,000 clients working on around 200 million files and directories, totaling around 500 million file
system objects (inodes and blocks). Typically capping out around 20 PBs, though larger clusters do exist.”
 https://www.usenix.org/publications/login/april-2010-volume-35-number-2/hdfs-scalability-limits-growth - Konstantin Shvachko,
Therefore, HDFS is best used as a system for storing large single files of data.
‱ Best case scenario is large files with large block sizes so that the NameNode has to store less metadata per raw
storage.
Because of the nature of having large sequential files it is also best used as a system for processing batch analytics or by
applications that benefit from sequential reads / writes.
©2015 PayPal Inc. Confidential and proprietary. 4
The Hadoop Distributed File System.
HDFS @ PayPal
Customers tend to see HDFS as a giant black box. Dump and forget.
Customers just want to store their data in the easiest manner. No storage optimization or security.
‱ Do not like to build any sort of “clean-up” or TTL mechanisms into their applications.
When space issues arise Hadoop Management lacks context:
‱ What took up that space? (RCA required)
‱ Who took up that space? (RCA required)
‱ What targets can we look at for deleting quickly? (Small files, old files, empty files, specific user, etc.)
Even in the event we catch wind of a data issue:
‱ Difficult to determine which team or person is responsible.
‱ Difficult to determine which datasets were affected.
‱ Damage is already done. (Cluster performance degraded; quota hit; application deployed; etc. It’s already too late
)
‱ Difficult to be pro-active, so we end up being re-active instead. Often times very late to react.
©2015 PayPal Inc. Confidential and proprietary. 5
My observations of HDFS data management pain points.
Previous Architecture(s)
©2015 PayPal Inc. Confidential and proprietary. 6
The Old World
Active
NN
Standby
NN
FsImage Processed Image
Offline
Image
Viewer
Kibana /
Elastic Search
3 mins 90 mins 30 mins
Legacy
FsImage
* This assumes a large enterprise Hadoop environment where the FsImage is larger than 20 GB. For smaller image sizes, this is trivial.
* This architecture usually leads to generation of daily reports. This diagram is presentative of the fastest possible report generation.
HDFS Usage Analytics Today
Standby NameNode is forced to create a legacy FSImage.
‱ This requires additional work by Standby NameNode to achieve.
‱ This legacy image is created in addition to the regular, Protobuf’d, FSImage created for the active NN.
‱ Storage redundancy solely for the purpose of performing analytics later.
(We end up creating 2 FSImages per checkpoint – double storage cost, double IO cost, no instant benefit).
‱ Legacy image retains less metadata than the Protobuf image. (No XAttrs, tokens, storage policies).
Legacy-format FSImage is parsed and uploaded to Kibana or ElasticSearch.
‱ This process typically happens once a day.
‱ It takes approximately 15 to 20 minutes to fully parse a 25GB FSImage, about current size of large cluster FSImage.
We have seen FSImages of over 30+ GB when things are bad.
‱ Requires pulling the FSImage off the Standby NameNode. Network cost is not very high however.
Making this process more frequent will increase network cost on the Standby. RPC issues seen if bandwidth saturated.
‱ Image dump -> Parsing -> Processing can take anywhere between 2-3 hours. Only about 4-6 reports per day at best.
Other third party solutions tend to follow the architecture described on this slide.
©2015 PayPal Inc. Confidential and proprietary. 7
My observations on the current “standard”.
Engineering A New Solution
In order to query near real time you require something like a constantly updating NameNode.
‱ Attempting to do so in any distributed manner involves solving the distributed atomic rename or coordination.
(Think HBase region transitions).
‱ We cannot rely on parsing the FSImage and EditLogs as that adds too much processing time.
 15-30 minutes to parse legacy FSImage and 1-2 minutes per large EditLog.
 Protobuf parsing means loading the entire INode set into memory.
To filter or query effectively requires parallel processing.
‱ Assuming we can’t utilize a distributed system effectively, can we work with a single node? Yes.
We can also utilize multiple CPU cores

‱ Java 8 Stream API allows simple filters, maps, reduces, collections on large parallelized in-memory data structures.
‱ A single NameNode stores the entire metadata set in-memory already in such a structure.
Do we need to build a whole new system? No.
‱ We need to write some custom query engine logic but can re-use most HDFS data structures and logic.
‱ We can keep our “NameNode” up to date using live cluster Journal Nodes.
‱ We can simplify further by removing the RPC Server. No need for DataNodes or clients to connect to our
“NameNode”.
©2015 PayPal Inc. Confidential and proprietary. 8
Combining old knowledge and new ideas.
Inspiration from Dr. Elephant
Dr. Elephant is a tool from LinkedIn for providing ”self-help” suggestions on how to tune various YARN applications in order
to free up more capacity queue space and perform better. NNA was also conceived as a “self-help” tool.
©2015 PayPal Inc. Confidential and proprietary. 9
Ideas inspiring other ideas.
Inspiration from Dr. Elephant
©2015 PayPal Inc. Confidential and proprietary. 10
Ideas inspiring other ideas.
NameNode Analytics
“A modified, isolated, read-only, Standby NameNode, with no RPC Server,
but with a Web Server and custom query engine embedded inside it.”
©2015 PayPal Inc. Confidential and proprietary. 11
It can best be described as:
Architecture
©2015 PayPal Inc. Confidential and proprietary. 12
Basic high-level view.
Client
NameNode Analytics
(Off the cluster;
isolated and read-only NN)
JournalNodes
(On the cluster)
NameNode
(On the cluster)
(1) Query
(0*) One-time Bootstrap Call
(Fetch remote FsImage)
(3) Response
(*) EditLog Tailing
(*) Writes editLog to JournalNodes
* = conditional or “in the background”
(2) Processing
Architecture
©2015 PayPal Inc. Confidential and proprietary. 13
Deep dive view into NNA.
NameNode Analytics
Rest API
(Spark Java Web Server)
Java 8 Stream API
(Query Processing)
NameNode FSNamesystem
(Image loading; editLog tailing / updating; and in-memory set)
NameNode
In-Memory
Metadata
Set
(INode Tree)
(GSet)
Query
EditLog-Tailer updates
Response
NNA @ PayPal
NNA provides the information and an internal TICK stack keeps the historical data, visualizes, and takes action.
(TICK stack is: telegraf, influxDB, chronograf, kapacitor)
©2015 PayPal Inc. Confidential and proprietary. 14
How do we utilize this?
NNA @ PayPal
©2015 PayPal Inc. Confidential and proprietary. 15
How do we utilize this?
NNA @ PayPal
©2015 PayPal Inc. Confidential and proprietary. 16
How do we utilize this?
NNA @ PayPal
©2015 PayPal Inc. Confidential and proprietary. 17
How do we utilize this?
NNA @ PayPal
Who is creating the most empty files?
Who is creating the most empty directories?
Who are the biggest users of the file system in terms of file count or space usage?
What are the largest directories by in terms of file count or space usage?
Who is creating small files? (Greater than 0 bytes but much less than 1 block size).
Who has the most “open permission” files? (chmod 777 abusers).
What is the average file size under a particular directory?
What files are open / being written to right now?
©2015 PayPal Inc. Confidential and proprietary. 18
How do we utilize this?
NNA @ PayPal
Tracking of quota usage.
Tracking of old files.
Tracking of small files / areas for archival or compression and compaction.
Tracking of user last delegation token issued date.
Tracking of File types (extensions).
Per user usage reports and suggestions.
Query against any dimension available in the HDFS INode(s).
(In progress) AUTOMATED HDFS DATA MANAGEMENT.
©2015 PayPal Inc. Confidential and proprietary. 19
How do we utilize this?
First detect, then fix.
NNA is your detection tool.
©2015 PayPal Inc. Confidential and proprietary. 20
Understanding NNA API
NNA first asks you to define a set to work with; either the set of all files, or the set of all directories.
Depending on which set you pick, different options are available to you.
From there you build a set of filters to apply to that set and then finally some result you want to reduce to, the sum.
‱ Take this example: /filter?set=files&filters=fileSize:eq:0&sum=count
‱ "Starting with the set of all files, get all those that have a file size equal to zero, and count how many there are."
‱ Or this example: /filter?set=files&filters=modTime:olderThanYears:1&sum=diskspaceConsumed
‱ "Starting with the set of all files, get all those with a modification time older than 1 year, and sum up their diskspace
usage."
From there we allow even more complex groupings via a /histogram endpoint:
‱ For example: /histogram?set=files&filters=fileSize:eq:0&type=user&sum=count
‱ "Starting with the set of all files, get all those that have a file size equal to zero, group them by user, and count how
many there are.
©2015 PayPal Inc. Confidential and proprietary. 21
What do queries look like?
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 22
For example

Graphing:
Users by # of empty files they own
/histogram?set=files&filters=fileSize:eq:0&type=user&sum=count
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 23
For example

Graphing:
Users by # of empty directories
they own
/histogram?set=dirs&filters=dirNumChildren:eq:0&type=user&sum=count
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 24
For example

Graphing:
Users by # of small files
/histogram?set=files&filters=fileSize:lte:1024&type=user&sum=count
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 25
For example

Dumping:
Files currently being
written to
/filter?set=files&filters=isUnderConstruction:eq:true&limit=1000
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 26
For example

Histogram Binning:
Size of Files vs
Disk space consumed by Files
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 27
For example

Histogram Binning:
Disk space consumed by
different replication factors
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 28
For example

Histogram Binning:
File Type Extensions
Story Time!
HDFS-11419
Slow addBlock operation on NameNode due to users writing into WARM StoragePolicy directories.
Difficult to find all the WARM directories; impossible from legacy FsImage alone; very simple on NNA.
Dump all WARM directory path from API: /filter?set=dirs&filters=storageType:eq:WARM
NameNode Pushing Scalability Limits
We were pushing the limits of the NameNode and close to going full GC. 400+ million files. 800+ million total file system objects.
Difficult to find datasets to delete and little time.
Find old datasets to delete: /histogram?set=files&filters=accessTime:olderThanYears:2&type=parentDir&sum=count
Small File Prevention
Midway through an imitative to find and clean-up small files from HDFS we found users were creating small files at the rate we were
compressing and cleaning them.
Difficult to find which users are creating small files.
Find users by small files: /histogram?set=files&filters=fileSize:lte:1048576,accessTime:hoursAgo:24&type=user&sum=count
©2015 PayPal Inc. Confidential and proprietary. 29
When has NNA saved us?
Successes
Near real-time analysis.
‘nough said.
For anyone wondering - the magic is in skipping the FSNamesystem lock and introducing multi-core processing.
Easy to install and maintain.
NNA’s Gradle build can construct RPM packages.
Difficulty is about equal to that of bringing up a new, additional, Standby NameNode.
Scalable?
While NNA is not a distributed system, it is a replicated read-only copy.
If you require more analytical throughput you could spin up multiple NNA instances.
The Journal Nodes can handle many readers.
©2015 PayPal Inc. Confidential and proprietary. 30
Where has NNA won?
Flaws
It is still a NameNode.
NNA is subject to all the faults and flaws of a regular HDFS NameNode.
If you have too many files and blocks, your NNA instance will operate slower as a result.
Interactive queries that don’t reduce the working set are not great for NNA.
It is not a distributed system.
While NNA can serve cached reports very frequently, it cannot handle many interactive queries at the same time.
Queries are best used by admins while reports are best used by end users.
It is “one of those” single-person projects.
While I had assistance in coding, NNA was mostly a one person show.
Fixing bugs and adding features over a period of nearly a year and a half now.
There is plenty of work still to do and things to improve.
©2015 PayPal Inc. Confidential and proprietary. 31
NNA is not Perfect.
Future Work
©2015 PayPal Inc. Confidential and proprietary. 32
Where can NNA go from here?
HDFS-6382 : TTL In HDFS
Discussion about TTL living outside the NameNode. Desire to not introduce TTL management due to additional thread resource requirements
on active NameNode. NNA could be extended to provide a routine TTL service on top of it.
HDFS-13150 : Faster Tailing of Edits from Journal Nodes
Part of the work to make Standby NameNode(s) service reads is to reduce the latency between when an EditLog transaction is applied on the
Active vs on the Standby. Reducing this latency means NNA queries become even closer to real time as well.
HDFS Cluster Management Integration
NNA is trivial enough to install that it should be able to easily create an Ambari package, Cloudera Parcel, or other integration package for
your flavor of management consoles.
Web & Security
NNA supports LDAP only at the moment. Uses JSON Web Tokens to maintain sessions. Would any Security experts like to lend a hand?
Support for Kerberos authentication would be great!
Demo
Example Local Cluster from Code
©2015 PayPal Inc. Confidential and proprietary. 33
END
(Q & A?)
©2015 PayPal Inc. Confidential and proprietary. 34

Weitere Àhnliche Inhalte

Was ist angesagt?

Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernelAdrian Huang
 
Not a Security Boundary: Bypassing User Account Control
Not a Security Boundary: Bypassing User Account ControlNot a Security Boundary: Bypassing User Account Control
Not a Security Boundary: Bypassing User Account Controlenigma0x3
 
/proc/irq/<irq>/smp_affinity
/proc/irq/<irq>/smp_affinity/proc/irq/<irq>/smp_affinity
/proc/irq/<irq>/smp_affinityTakuya ASADA
 
IntelliJ IDEA Default Keymap
IntelliJ IDEA Default KeymapIntelliJ IDEA Default Keymap
IntelliJ IDEA Default KeymapThanh Nguyen
 
[232] á„‰á…„á†Œá„‚á…łá†Œá„‹á…„á„ƒá…”á„á…Ąá„Œá…”á„Œá…±á„‹á…„á„á…Ąá„‡á…Șᆻᄂᅔ á„‰á…©á†Œá„á…ąá„‹á…źá†Œ
[232] á„‰á…„á†Œá„‚á…łá†Œá„‹á…„á„ƒá…”á„á…Ąá„Œá…”á„Œá…±á„‹á…„á„á…Ąá„‡á…Șᆻᄂᅔ á„‰á…©á†Œá„á…ąá„‹á…źá†Œ[232] á„‰á…„á†Œá„‚á…łá†Œá„‹á…„á„ƒá…”á„á…Ąá„Œá…”á„Œá…±á„‹á…„á„á…Ąá„‡á…Șᆻᄂᅔ á„‰á…©á†Œá„á…ąá„‹á…źá†Œ
[232] á„‰á…„á†Œá„‚á…łá†Œá„‹á…„á„ƒá…”á„á…Ąá„Œá…”á„Œá…±á„‹á…„á„á…Ąá„‡á…Șᆻᄂᅔ á„‰á…©á†Œá„á…ąá„‹á…źá†ŒNAVER D2
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntuSim Janghoon
 
Social Engineering the Windows Kernel by James Forshaw
Social Engineering the Windows Kernel by James ForshawSocial Engineering the Windows Kernel by James Forshaw
Social Engineering the Windows Kernel by James ForshawShakacon
 
syzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzersyzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzerDmitry Vyukov
 
Coredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverCoredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverYann Hamon
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterRed_Hat_Storage
 
Breaking Down the Entry Barriers on Linux Kernel Networking Stack
Breaking Down the Entry Barriers on Linux Kernel Networking StackBreaking Down the Entry Barriers on Linux Kernel Networking Stack
Breaking Down the Entry Barriers on Linux Kernel Networking StackJuhee Kang
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingBrendan Gregg
 
ADB(Android Debug Bridge): How it works?
ADB(Android Debug Bridge): How it works?ADB(Android Debug Bridge): How it works?
ADB(Android Debug Bridge): How it works?Tetsuyuki Kobayashi
 
50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi
50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi
50 Shades of Fuzzing by Peter Hlavaty & Marco GrassiShakacon
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFinside-BigData.com
 
The ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASThe ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASYasunori Goto
 
Linux seccomp(2) vs OpenBSD pledge(2)
Linux seccomp(2) vs OpenBSD pledge(2)Linux seccomp(2) vs OpenBSD pledge(2)
Linux seccomp(2) vs OpenBSD pledge(2)Giovanni Bechis
 

Was ist angesagt? (20)

Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
Not a Security Boundary: Bypassing User Account Control
Not a Security Boundary: Bypassing User Account ControlNot a Security Boundary: Bypassing User Account Control
Not a Security Boundary: Bypassing User Account Control
 
/proc/irq/<irq>/smp_affinity
/proc/irq/<irq>/smp_affinity/proc/irq/<irq>/smp_affinity
/proc/irq/<irq>/smp_affinity
 
IntelliJ IDEA Default Keymap
IntelliJ IDEA Default KeymapIntelliJ IDEA Default Keymap
IntelliJ IDEA Default Keymap
 
[232] á„‰á…„á†Œá„‚á…łá†Œá„‹á…„á„ƒá…”á„á…Ąá„Œá…”á„Œá…±á„‹á…„á„á…Ąá„‡á…Șᆻᄂᅔ á„‰á…©á†Œá„á…ąá„‹á…źá†Œ
[232] á„‰á…„á†Œá„‚á…łá†Œá„‹á…„á„ƒá…”á„á…Ąá„Œá…”á„Œá…±á„‹á…„á„á…Ąá„‡á…Șᆻᄂᅔ á„‰á…©á†Œá„á…ąá„‹á…źá†Œ[232] á„‰á…„á†Œá„‚á…łá†Œá„‹á…„á„ƒá…”á„á…Ąá„Œá…”á„Œá…±á„‹á…„á„á…Ąá„‡á…Șᆻᄂᅔ á„‰á…©á†Œá„á…ąá„‹á…źá†Œ
[232] á„‰á…„á†Œá„‚á…łá†Œá„‹á…„á„ƒá…”á„á…Ąá„Œá…”á„Œá…±á„‹á…„á„á…Ąá„‡á…Șᆻᄂᅔ á„‰á…©á†Œá„á…ąá„‹á…źá†Œ
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
Social Engineering the Windows Kernel by James Forshaw
Social Engineering the Windows Kernel by James ForshawSocial Engineering the Windows Kernel by James Forshaw
Social Engineering the Windows Kernel by James Forshaw
 
syzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzersyzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzer
 
Coredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverCoredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS server
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
Breaking Down the Entry Barriers on Linux Kernel Networking Stack
Breaking Down the Entry Barriers on Linux Kernel Networking StackBreaking Down the Entry Barriers on Linux Kernel Networking Stack
Breaking Down the Entry Barriers on Linux Kernel Networking Stack
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor Benchmarking
 
ADB(Android Debug Bridge): How it works?
ADB(Android Debug Bridge): How it works?ADB(Android Debug Bridge): How it works?
ADB(Android Debug Bridge): How it works?
 
50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi
50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi
50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi
 
Overview of github
Overview of githubOverview of github
Overview of github
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oF
 
The ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASThe ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RAS
 
Linux seccomp(2) vs OpenBSD pledge(2)
Linux seccomp(2) vs OpenBSD pledge(2)Linux seccomp(2) vs OpenBSD pledge(2)
Linux seccomp(2) vs OpenBSD pledge(2)
 

Ähnlich wie NameNode Analytics - Querying HDFS Namespace in Real Time

Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoopAdam Muise
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingSam Ng
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopStefano Paluello
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Chris Baglieri
 
44CON 2014: Using hadoop for malware, network, forensics and log analysis
44CON 2014: Using hadoop for malware, network, forensics and log analysis44CON 2014: Using hadoop for malware, network, forensics and log analysis
44CON 2014: Using hadoop for malware, network, forensics and log analysisMichael Boman
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 

Ähnlich wie NameNode Analytics - Querying HDFS Namespace in Real Time (20)

Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoop
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Hadoop
HadoopHadoop
Hadoop
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
44CON 2014: Using hadoop for malware, network, forensics and log analysis
44CON 2014: Using hadoop for malware, network, forensics and log analysis44CON 2014: Using hadoop for malware, network, forensics and log analysis
44CON 2014: Using hadoop for malware, network, forensics and log analysis
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 

KĂŒrzlich hochgeladen

"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.MdManikurRahman
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdfKamal Acharya
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectRased Khan
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...Roi Lipman
 
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineLow rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineAftabkhan575376
 
Teachers record management system project report..pdf
Teachers record management system project report..pdfTeachers record management system project report..pdf
Teachers record management system project report..pdfKamal Acharya
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationDr. Radhey Shyam
 
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...MohammadAliNayeem
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdfKamal Acharya
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdfKamal Acharya
 
E-Commerce Shopping using MERN Stack where different modules are present
E-Commerce Shopping using MERN Stack where different modules are presentE-Commerce Shopping using MERN Stack where different modules are present
E-Commerce Shopping using MERN Stack where different modules are presentjatinraor66
 
Paint shop management system project report.pdf
Paint shop management system project report.pdfPaint shop management system project report.pdf
Paint shop management system project report.pdfKamal Acharya
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringC Sai Kiran
 
Object Oriented Programming OOP Lab Manual.docx
Object Oriented Programming OOP Lab Manual.docxObject Oriented Programming OOP Lab Manual.docx
Object Oriented Programming OOP Lab Manual.docxRashidFaridChishti
 
How to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdfHow to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdftawat puangthong
 
ChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdfChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdfqasastareekh
 
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdfDR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdfDrGurudutt
 
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5T.D. Shashikala
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edgePaco Orozco
 
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfKamal Acharya
 

KĂŒrzlich hochgeladen (20)

"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdf
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
 
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineLow rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
 
Teachers record management system project report..pdf
Teachers record management system project report..pdfTeachers record management system project report..pdf
Teachers record management system project report..pdf
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
 
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdf
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
E-Commerce Shopping using MERN Stack where different modules are present
E-Commerce Shopping using MERN Stack where different modules are presentE-Commerce Shopping using MERN Stack where different modules are present
E-Commerce Shopping using MERN Stack where different modules are present
 
Paint shop management system project report.pdf
Paint shop management system project report.pdfPaint shop management system project report.pdf
Paint shop management system project report.pdf
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 
Object Oriented Programming OOP Lab Manual.docx
Object Oriented Programming OOP Lab Manual.docxObject Oriented Programming OOP Lab Manual.docx
Object Oriented Programming OOP Lab Manual.docx
 
How to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdfHow to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdf
 
ChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdfChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdf
 
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdfDR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
 
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
 

NameNode Analytics - Querying HDFS Namespace in Real Time

  • 3. Who Am I? Bachelor of Science in Computer Science from UC San Diego (Eleanor Roosevelt College). I have been fortunate to work alongside Konstantin Shvachko, one of the original architects of the HDFS NameNode from Yahoo!, for several years. I have spent 6 years working on HDFS internals and related projects at eBay, WANdisco, and now PayPal. Hadoop open source contributor: ‱ HDFS-3107: Introduce truncate to HDFS. ‱ HDFS-4456: Add concat to HttpFS and WebHDFS. ‱ HADOOP-10641: Introduce coordination / consensus interface to HDFS. ‱ MAPREDUCE-2669: Add StandardDev, Mean, and Mode, examples to MapReduce. ‱ Various bug fixes. Work on NameNode internals and distributed File System design.  Giraffa File System: https://github.com/GiraffaFS/giraffa  GeoDistributed File System (WANdisco Patent): https://patents.justia.com/patent/20150278244 ©2015 PayPal Inc. Confidential and proprietary. 3 Plamen Jeliazkov.
  • 4. Background Created as a means of storing petabytes order of data securely (through replication). By virtue of being a distributed file system, HDFS is seen as a safe haven for any type of data. However, HDFS does have its own scaling limitations: ‱ “Limits are around 10,000 clients working on around 200 million files and directories, totaling around 500 million file system objects (inodes and blocks). Typically capping out around 20 PBs, though larger clusters do exist.”  https://www.usenix.org/publications/login/april-2010-volume-35-number-2/hdfs-scalability-limits-growth - Konstantin Shvachko, Therefore, HDFS is best used as a system for storing large single files of data. ‱ Best case scenario is large files with large block sizes so that the NameNode has to store less metadata per raw storage. Because of the nature of having large sequential files it is also best used as a system for processing batch analytics or by applications that benefit from sequential reads / writes. ©2015 PayPal Inc. Confidential and proprietary. 4 The Hadoop Distributed File System.
  • 5. HDFS @ PayPal Customers tend to see HDFS as a giant black box. Dump and forget. Customers just want to store their data in the easiest manner. No storage optimization or security. ‱ Do not like to build any sort of “clean-up” or TTL mechanisms into their applications. When space issues arise Hadoop Management lacks context: ‱ What took up that space? (RCA required) ‱ Who took up that space? (RCA required) ‱ What targets can we look at for deleting quickly? (Small files, old files, empty files, specific user, etc.) Even in the event we catch wind of a data issue: ‱ Difficult to determine which team or person is responsible. ‱ Difficult to determine which datasets were affected. ‱ Damage is already done. (Cluster performance degraded; quota hit; application deployed; etc. It’s already too late
) ‱ Difficult to be pro-active, so we end up being re-active instead. Often times very late to react. ©2015 PayPal Inc. Confidential and proprietary. 5 My observations of HDFS data management pain points.
  • 6. Previous Architecture(s) ©2015 PayPal Inc. Confidential and proprietary. 6 The Old World Active NN Standby NN FsImage Processed Image Offline Image Viewer Kibana / Elastic Search 3 mins 90 mins 30 mins Legacy FsImage * This assumes a large enterprise Hadoop environment where the FsImage is larger than 20 GB. For smaller image sizes, this is trivial. * This architecture usually leads to generation of daily reports. This diagram is presentative of the fastest possible report generation.
  • 7. HDFS Usage Analytics Today Standby NameNode is forced to create a legacy FSImage. ‱ This requires additional work by Standby NameNode to achieve. ‱ This legacy image is created in addition to the regular, Protobuf’d, FSImage created for the active NN. ‱ Storage redundancy solely for the purpose of performing analytics later. (We end up creating 2 FSImages per checkpoint – double storage cost, double IO cost, no instant benefit). ‱ Legacy image retains less metadata than the Protobuf image. (No XAttrs, tokens, storage policies). Legacy-format FSImage is parsed and uploaded to Kibana or ElasticSearch. ‱ This process typically happens once a day. ‱ It takes approximately 15 to 20 minutes to fully parse a 25GB FSImage, about current size of large cluster FSImage. We have seen FSImages of over 30+ GB when things are bad. ‱ Requires pulling the FSImage off the Standby NameNode. Network cost is not very high however. Making this process more frequent will increase network cost on the Standby. RPC issues seen if bandwidth saturated. ‱ Image dump -> Parsing -> Processing can take anywhere between 2-3 hours. Only about 4-6 reports per day at best. Other third party solutions tend to follow the architecture described on this slide. ©2015 PayPal Inc. Confidential and proprietary. 7 My observations on the current “standard”.
  • 8. Engineering A New Solution In order to query near real time you require something like a constantly updating NameNode. ‱ Attempting to do so in any distributed manner involves solving the distributed atomic rename or coordination. (Think HBase region transitions). ‱ We cannot rely on parsing the FSImage and EditLogs as that adds too much processing time.  15-30 minutes to parse legacy FSImage and 1-2 minutes per large EditLog.  Protobuf parsing means loading the entire INode set into memory. To filter or query effectively requires parallel processing. ‱ Assuming we can’t utilize a distributed system effectively, can we work with a single node? Yes. We can also utilize multiple CPU cores
 ‱ Java 8 Stream API allows simple filters, maps, reduces, collections on large parallelized in-memory data structures. ‱ A single NameNode stores the entire metadata set in-memory already in such a structure. Do we need to build a whole new system? No. ‱ We need to write some custom query engine logic but can re-use most HDFS data structures and logic. ‱ We can keep our “NameNode” up to date using live cluster Journal Nodes. ‱ We can simplify further by removing the RPC Server. No need for DataNodes or clients to connect to our “NameNode”. ©2015 PayPal Inc. Confidential and proprietary. 8 Combining old knowledge and new ideas.
  • 9. Inspiration from Dr. Elephant Dr. Elephant is a tool from LinkedIn for providing ”self-help” suggestions on how to tune various YARN applications in order to free up more capacity queue space and perform better. NNA was also conceived as a “self-help” tool. ©2015 PayPal Inc. Confidential and proprietary. 9 Ideas inspiring other ideas.
  • 10. Inspiration from Dr. Elephant ©2015 PayPal Inc. Confidential and proprietary. 10 Ideas inspiring other ideas.
  • 11. NameNode Analytics “A modified, isolated, read-only, Standby NameNode, with no RPC Server, but with a Web Server and custom query engine embedded inside it.” ©2015 PayPal Inc. Confidential and proprietary. 11 It can best be described as:
  • 12. Architecture ©2015 PayPal Inc. Confidential and proprietary. 12 Basic high-level view. Client NameNode Analytics (Off the cluster; isolated and read-only NN) JournalNodes (On the cluster) NameNode (On the cluster) (1) Query (0*) One-time Bootstrap Call (Fetch remote FsImage) (3) Response (*) EditLog Tailing (*) Writes editLog to JournalNodes * = conditional or “in the background” (2) Processing
  • 13. Architecture ©2015 PayPal Inc. Confidential and proprietary. 13 Deep dive view into NNA. NameNode Analytics Rest API (Spark Java Web Server) Java 8 Stream API (Query Processing) NameNode FSNamesystem (Image loading; editLog tailing / updating; and in-memory set) NameNode In-Memory Metadata Set (INode Tree) (GSet) Query EditLog-Tailer updates Response
  • 14. NNA @ PayPal NNA provides the information and an internal TICK stack keeps the historical data, visualizes, and takes action. (TICK stack is: telegraf, influxDB, chronograf, kapacitor) ©2015 PayPal Inc. Confidential and proprietary. 14 How do we utilize this?
  • 15. NNA @ PayPal ©2015 PayPal Inc. Confidential and proprietary. 15 How do we utilize this?
  • 16. NNA @ PayPal ©2015 PayPal Inc. Confidential and proprietary. 16 How do we utilize this?
  • 17. NNA @ PayPal ©2015 PayPal Inc. Confidential and proprietary. 17 How do we utilize this?
  • 18. NNA @ PayPal Who is creating the most empty files? Who is creating the most empty directories? Who are the biggest users of the file system in terms of file count or space usage? What are the largest directories by in terms of file count or space usage? Who is creating small files? (Greater than 0 bytes but much less than 1 block size). Who has the most “open permission” files? (chmod 777 abusers). What is the average file size under a particular directory? What files are open / being written to right now? ©2015 PayPal Inc. Confidential and proprietary. 18 How do we utilize this?
  • 19. NNA @ PayPal Tracking of quota usage. Tracking of old files. Tracking of small files / areas for archival or compression and compaction. Tracking of user last delegation token issued date. Tracking of File types (extensions). Per user usage reports and suggestions. Query against any dimension available in the HDFS INode(s). (In progress) AUTOMATED HDFS DATA MANAGEMENT. ©2015 PayPal Inc. Confidential and proprietary. 19 How do we utilize this?
  • 20. First detect, then fix. NNA is your detection tool. ©2015 PayPal Inc. Confidential and proprietary. 20
  • 21. Understanding NNA API NNA first asks you to define a set to work with; either the set of all files, or the set of all directories. Depending on which set you pick, different options are available to you. From there you build a set of filters to apply to that set and then finally some result you want to reduce to, the sum. ‱ Take this example: /filter?set=files&filters=fileSize:eq:0&sum=count ‱ "Starting with the set of all files, get all those that have a file size equal to zero, and count how many there are." ‱ Or this example: /filter?set=files&filters=modTime:olderThanYears:1&sum=diskspaceConsumed ‱ "Starting with the set of all files, get all those with a modification time older than 1 year, and sum up their diskspace usage." From there we allow even more complex groupings via a /histogram endpoint: ‱ For example: /histogram?set=files&filters=fileSize:eq:0&type=user&sum=count ‱ "Starting with the set of all files, get all those that have a file size equal to zero, group them by user, and count how many there are. ©2015 PayPal Inc. Confidential and proprietary. 21 What do queries look like?
  • 22. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 22 For example
 Graphing: Users by # of empty files they own /histogram?set=files&filters=fileSize:eq:0&type=user&sum=count
  • 23. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 23 For example
 Graphing: Users by # of empty directories they own /histogram?set=dirs&filters=dirNumChildren:eq:0&type=user&sum=count
  • 24. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 24 For example
 Graphing: Users by # of small files /histogram?set=files&filters=fileSize:lte:1024&type=user&sum=count
  • 25. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 25 For example
 Dumping: Files currently being written to /filter?set=files&filters=isUnderConstruction:eq:true&limit=1000
  • 26. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 26 For example
 Histogram Binning: Size of Files vs Disk space consumed by Files
  • 27. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 27 For example
 Histogram Binning: Disk space consumed by different replication factors
  • 28. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 28 For example
 Histogram Binning: File Type Extensions
  • 29. Story Time! HDFS-11419 Slow addBlock operation on NameNode due to users writing into WARM StoragePolicy directories. Difficult to find all the WARM directories; impossible from legacy FsImage alone; very simple on NNA. Dump all WARM directory path from API: /filter?set=dirs&filters=storageType:eq:WARM NameNode Pushing Scalability Limits We were pushing the limits of the NameNode and close to going full GC. 400+ million files. 800+ million total file system objects. Difficult to find datasets to delete and little time. Find old datasets to delete: /histogram?set=files&filters=accessTime:olderThanYears:2&type=parentDir&sum=count Small File Prevention Midway through an imitative to find and clean-up small files from HDFS we found users were creating small files at the rate we were compressing and cleaning them. Difficult to find which users are creating small files. Find users by small files: /histogram?set=files&filters=fileSize:lte:1048576,accessTime:hoursAgo:24&type=user&sum=count ©2015 PayPal Inc. Confidential and proprietary. 29 When has NNA saved us?
  • 30. Successes Near real-time analysis. ‘nough said. For anyone wondering - the magic is in skipping the FSNamesystem lock and introducing multi-core processing. Easy to install and maintain. NNA’s Gradle build can construct RPM packages. Difficulty is about equal to that of bringing up a new, additional, Standby NameNode. Scalable? While NNA is not a distributed system, it is a replicated read-only copy. If you require more analytical throughput you could spin up multiple NNA instances. The Journal Nodes can handle many readers. ©2015 PayPal Inc. Confidential and proprietary. 30 Where has NNA won?
  • 31. Flaws It is still a NameNode. NNA is subject to all the faults and flaws of a regular HDFS NameNode. If you have too many files and blocks, your NNA instance will operate slower as a result. Interactive queries that don’t reduce the working set are not great for NNA. It is not a distributed system. While NNA can serve cached reports very frequently, it cannot handle many interactive queries at the same time. Queries are best used by admins while reports are best used by end users. It is “one of those” single-person projects. While I had assistance in coding, NNA was mostly a one person show. Fixing bugs and adding features over a period of nearly a year and a half now. There is plenty of work still to do and things to improve. ©2015 PayPal Inc. Confidential and proprietary. 31 NNA is not Perfect.
  • 32. Future Work ©2015 PayPal Inc. Confidential and proprietary. 32 Where can NNA go from here? HDFS-6382 : TTL In HDFS Discussion about TTL living outside the NameNode. Desire to not introduce TTL management due to additional thread resource requirements on active NameNode. NNA could be extended to provide a routine TTL service on top of it. HDFS-13150 : Faster Tailing of Edits from Journal Nodes Part of the work to make Standby NameNode(s) service reads is to reduce the latency between when an EditLog transaction is applied on the Active vs on the Standby. Reducing this latency means NNA queries become even closer to real time as well. HDFS Cluster Management Integration NNA is trivial enough to install that it should be able to easily create an Ambari package, Cloudera Parcel, or other integration package for your flavor of management consoles. Web & Security NNA supports LDAP only at the moment. Uses JSON Web Tokens to maintain sessions. Would any Security experts like to lend a hand? Support for Kerberos authentication would be great!
  • 33. Demo Example Local Cluster from Code ©2015 PayPal Inc. Confidential and proprietary. 33
  • 34. END (Q & A?) ©2015 PayPal Inc. Confidential and proprietary. 34