SlideShare ist ein Scribd-Unternehmen logo
1 von 21
REMOVING THE
NAMENODE'S MEMORY
LIMITATION
Lin Xiao
Intern@Hortonworks
PhD student @ Carnegie Mellon University
18/22/2013
About Me: Lin Xiao
• Phd Student at CMU
• Advisor: Garth Gibson
• Thesis area – scalable distributed file systems
• Intern at Hortonworks
• Intern project: removing the Namenode memory limitation
• Email: lxiao+@cs.cmu.edu
28/22/2013
Big Data
• We create 2.5x1018 bytes of data per day [IBM]
• Sloan Digital Sky Survey: 200GB/night
• Facebook: 240 billions of photos till Jan,2013
• 250 million photos uploaded daily
• Cloud storage
• Amazon: 2 trillion objects, peak1.1 million op/sec
• Need scalable storage systems
• Scalable metadata <- focus of this presentation
• Scalable storage
• Scalable IO
38/22/2013
Scalable Storage Systems
• Separate data and metadata servers
• More data nodes for higher throughput & capacity
• Bulk of work – the IO path - is done by data servers
• Not much work added to metadata servers?
48/22/2013
Federated HDFS
• Namenodes(MDS) see their own namespace (NS)
• Each datanode can serve all namenodes
5
!
!
"""! """! """!
!!!!!!!!!!#$%!
!
#$!&!
"""! """!
!!!!!!!!!!#$!' !
( )*+' !, **)-!
. /0/&*12!%! . /0/&*12!3! . /0/&*12!4 !
" ##$!%!" ##$!!&!" ##$!!' !
#/4 2&*12!%! #/4 2&*12!' ! #/4 2&*12!&!
8/22/2013
Single Namenode
• Stores all metadata in memory
• Design is simple
• Provide low latency and high throughput metadata operations
• Support up to 3K data servers
• Hadoop clusters make it affordable to store old data
• Cold data is stored in the cluster for a long time
• Take up memory space but rarely used
• Growth of data size can exceed throughput
• Goal: remove space limits while maintain similar
performance
68/22/2013
Metadata in Namenode
• Namespace
• Stored as a linked tree structure by inodes
• Always visit from the top for any operation
• Blocks Map: block_id to location mapping
• Handle separately for huge number of blocks
• Datanode status
• IPaddress, capacity, load, heartbeat status, Block report status
• Leases
• Namespace and Block map uses the majority of memory
• This talk will focus on the Namespace
78/22/2013
Problem and Proposed Solution
• Problem:
• Remove namespace limit while maintain similar performance when
the working set can fit in memory
• Solution
• Retain the same namespace tree structure
• Store the namespace in persistent store using LSM (LevelDB)
• No separate edit logs nor checkpoints
• All Inode and their updates are persistent via LevelDB
• Fast startup, with the cost of slow initial operations
• Could prefetch inodes in
• Do not expect customers to drastically reduce the actual heap size
• Larger heap benefits transition between different working sets as
applications and workload changes
• A customer may occasionally run queries against cold data
88/22/2013
New Namenode Architecture
• Namespace
• Same as before, but only part of the tree is in memory
• On cache miss, read from levelDB
• Edit logs and checkpoints are replaced by LevelDB
• Update to LevelDB for every inode change
• Key: <parent_inode_number + name>
9
Namenode
Inode
edit
logs
Namenode
Inode
Inode
levelDB
buffer
WAL
LevelDB
Inode
levelDB
buffer
WAL
LevelDB
8/22/2013
Comparison w/Traditional FileSystem
• Traditional File Systems
• VFS layer keeps inode and directory entry cache
• Goal is to support the work load of single machine
• Relatively large number of files
• Support the applications from a single machine or in case of NFS from a
larger number of client machines
• Much much smaller workload and size compared to Hadoop use cases
• LevelDB based Namenode
• Support very large traffic of Hadoop cluster
• Keep a much larger number of INodes in memory
• Cache replacement policies to suite the Hadoop work load
• Data is in Datanodes
108/22/2013
LevelDB
• A fast key-value storage library written at Google
• Basic operations: get, put, delete
• Concurrency: single process w/multiple threads
• By default, writes are asynchronous
• As long as the machine doesn’t crash, it’s safe.
• Support synchronous writes
• No separate sync() operation
• Can be implemented by sync write/delete
• Support batch updates
• Data is automatically compressed using the Snappy
118/22/2013
Cache Replacement Policy
• Only whole directories are replaced in or out
• Hot dirs are all in cache, others will require levelDB scan
• Future – don’t cache very large dirs?
• No need to read from disks to check file existence
• LRU replacement policy
• Use CLOCK to approximate to reduce cost
• Separate thread for cache replacement
• Start replacement when threshold is exceeded
• Remove eviction out of sessions with lock
128/22/2013
Benchmark description
• NNThroughputBenchmark
• No RPC cost, call FileSystem method directly
• All operations are generated based on BFS order
• Each thread gets one portion of the work
• NN Load generator using YCSB++ framework (in progress)
• Normal HDFS client calls
• Thread either works in their own namespace, or choose randomly
• Load generator based on real cluster traces (in progress)
• Can you help me get traces from your cluster?
• Traditional Hadoop benchmark(in progress)
• E.g. Gridmix Expect little degradation when most work is for data
transfer
138/22/2013
Categories of tests
• Everything fits in memory
• Goal: should be almost the same as the current NN
• Working set does not fit in memory or changes over time
• Study various cache replacement policies
• Need to get good traces from real cluster to see patterns of hot,
warm and cold data
148/22/2013
Experiment Setup
• Hardware description (Susitna)
• CPU: AMD Opteron 6272, 64 bit, 16 MB L2, 16-core 2.1 GHz
• SSD: Crucial M4-CT064M4SSD2 SSD, 64 GB, SATA 6.0Gb/s
• (In progress) Use disks in future experiments
• Heap size is set to 1GB
• NNThroughputBenchmark
• No RPC cost, call FileSystem method directly
• All operations are generated based on BFS order
• Multiple threads, but each thread gets one portion of the work
• Each directory contains 100 subdirs and 100 files
• Named sequentially: ThroughputBenchDir1, ThroughputBench1
• LevelDB NN
• Cache monitor thread starts replacement when 90% full
158/22/2013
Create & close 2.4M files – all fit in cache
0
1000
2000
3000
4000
5000
6000
7000
8000
2 4 8 16
Throughputops/sec
Number of Threads
original
w/LevelDB
16
• Note files are not accessed, but clearly parent dirs are
• Note: Old NN and LevelDB NN peak at different # threads
• Degradation for peak throughput is 13.5%
8/22/2013
Create 9.6M files: 1% fits in cache
• Old NN with 8 threads and LevelDB NN with 16 threads.
• Performance remains about the same using LevelDB
• Namenode’s throughput drops to zero when memory exhausted
17
0
1000
2000
3000
4000
5000
6000
7000
20
120
220
320
420
520
620
720
820
920
1020
1120
1220
1320
1420
1520
1620
ThroughputOps/sec
Time in seconds
Original
LevelDB NN
8/22/2013
GetFileInfo
18
• ListStatus of first 600K of 2.4M files
• Each thread working on different part of tree
• Original NN: all fit in memory (of course)
• LevelDB NN: 2 cases: (1) all fit, (2) half fit
• Half fit: 10%-20% degradation - cache is constantly replaced
0
20000
40000
60000
80000
100000
120000
140000
2 4 8 16 32
ThroughputOps/sec
Number of Threads
Original
FitCache
HalfInCache
8/22/2013
Benchmarks that remain
• NNThroughputBenchmark
• No RPC cost, call FileSystem method directly
• All operations are generated based on BFS order
• Each thread gets one portion of the work
• NN Load generator using YCSB++ framework (in progress)
• Normal HDFS client calls
• Thread either works in their own namespace, or choose randomly
• Load generator based on real cluster traces (in progress)
• Can you help me get traces from your cluster?
• Traditional Hadoop benchmark(in progress)
• E.g. Gridmix Expect little degradation when most work is for data
transfer
198/22/2013
Summary
• Now that NN is HA, removing the namespace memory
limitation is one of most important problems to solve
• LSM (LevelDB) has worked out quite well
• Initial experiments have shown good results
• Need further benchmarks especially on how effective caching is for
different workloads and patterns
• Other LSM implementations? (e.g.HBase’s Java LSM)
• Work is done on branch 0.23
• Graduate student quality prototype (very good graduate student )
• But worked closed with the HDFS experts at Hortonworks
• Goal of internship was to see how well the idea worked
• Hortonworks plans to take this to the next stage once more experiments
are completed.
208/22/2013
Q&A
• Contact: lxiao+@cs.cmu.edu
• We’d love to get trace stats from your cluster 
• Simple java program to run against your audit logs
• Can also run as Mapreduce jobs
• Extract metadata operation stats without exposing sensitive info
• Please contact me if you could help!
218/22/2013

Weitere ähnliche Inhalte

Was ist angesagt?

Gluster.community.day.2013
Gluster.community.day.2013Gluster.community.day.2013
Gluster.community.day.2013Udo Seidel
 
Accomplishing redundancy on Lustre based PFS with DRBD
Accomplishing redundancy on Lustre based PFS with DRBDAccomplishing redundancy on Lustre based PFS with DRBD
Accomplishing redundancy on Lustre based PFS with DRBDTyrone Systems
 
Ndb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_diskNdb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_diskmikaelronstrom
 
Gluster Storage
Gluster StorageGluster Storage
Gluster StorageRaz Tamir
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveGluster.org
 
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanCeph Community
 
Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012Gluster.org
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed_Hat_Storage
 
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...Tommy Lee
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesRun Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesBernd Ocklin
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstackopenstackindia
 
Gluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGlusterFS
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingSergey Bushik
 
Efficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using DatabasesEfficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using DatabasesJoseph Elwin Fernandes
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)Lars Marowsky-Brée
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 

Was ist angesagt? (20)

Gluster.community.day.2013
Gluster.community.day.2013Gluster.community.day.2013
Gluster.community.day.2013
 
Accomplishing redundancy on Lustre based PFS with DRBD
Accomplishing redundancy on Lustre based PFS with DRBDAccomplishing redundancy on Lustre based PFS with DRBD
Accomplishing redundancy on Lustre based PFS with DRBD
 
Ndb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_diskNdb cluster 80_ycsb_disk
Ndb cluster 80_ycsb_disk
 
YDAL Barcelona
YDAL BarcelonaYDAL Barcelona
YDAL Barcelona
 
Gluster Storage
Gluster StorageGluster Storage
Gluster Storage
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep Dive
 
GlusterFS And Big Data
GlusterFS And Big DataGlusterFS And Big Data
GlusterFS And Big Data
 
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
 
Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep Dive
 
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesRun Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in Kubernetes
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstack
 
Gluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & Tricks
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
 
Efficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using DatabasesEfficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using Databases
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)
 
Glusterfs and Hadoop
Glusterfs and HadoopGlusterfs and Hadoop
Glusterfs and Hadoop
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 

Andere mochten auch

Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopHortonworks
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFSDataWorks Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 

Andere mochten auch (6)

Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 

Ähnlich wie August 2013 HUG: Removing the NameNode's memory limitation

How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014marvin herrera
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Alluxio - Scalable Filesystem Metadata Services
Alluxio - Scalable Filesystem Metadata ServicesAlluxio - Scalable Filesystem Metadata Services
Alluxio - Scalable Filesystem Metadata ServicesAlluxio, Inc.
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectureshypertable
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Alluxio, Inc.
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache KuduAndriy Zabavskyy
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
Hardware Provisioning
Hardware Provisioning Hardware Provisioning
Hardware Provisioning MongoDB
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inRahulBhole12
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Alluxio, Inc.
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB InternalsSiraj Memon
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecturesaipriyacoool
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph Community
 
Project Presentation Final
Project Presentation FinalProject Presentation Final
Project Presentation FinalDhritiman Halder
 

Ähnlich wie August 2013 HUG: Removing the NameNode's memory limitation (20)

How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Alluxio - Scalable Filesystem Metadata Services
Alluxio - Scalable Filesystem Metadata ServicesAlluxio - Scalable Filesystem Metadata Services
Alluxio - Scalable Filesystem Metadata Services
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Hardware Provisioning
Hardware Provisioning Hardware Provisioning
Hardware Provisioning
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der Ster
 
Project Presentation Final
Project Presentation FinalProject Presentation Final
Project Presentation Final
 

Mehr von Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Mehr von Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Kürzlich hochgeladen

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

August 2013 HUG: Removing the NameNode's memory limitation

  • 1. REMOVING THE NAMENODE'S MEMORY LIMITATION Lin Xiao Intern@Hortonworks PhD student @ Carnegie Mellon University 18/22/2013
  • 2. About Me: Lin Xiao • Phd Student at CMU • Advisor: Garth Gibson • Thesis area – scalable distributed file systems • Intern at Hortonworks • Intern project: removing the Namenode memory limitation • Email: lxiao+@cs.cmu.edu 28/22/2013
  • 3. Big Data • We create 2.5x1018 bytes of data per day [IBM] • Sloan Digital Sky Survey: 200GB/night • Facebook: 240 billions of photos till Jan,2013 • 250 million photos uploaded daily • Cloud storage • Amazon: 2 trillion objects, peak1.1 million op/sec • Need scalable storage systems • Scalable metadata <- focus of this presentation • Scalable storage • Scalable IO 38/22/2013
  • 4. Scalable Storage Systems • Separate data and metadata servers • More data nodes for higher throughput & capacity • Bulk of work – the IO path - is done by data servers • Not much work added to metadata servers? 48/22/2013
  • 5. Federated HDFS • Namenodes(MDS) see their own namespace (NS) • Each datanode can serve all namenodes 5 ! ! """! """! """! !!!!!!!!!!#$%! ! #$!&! """! """! !!!!!!!!!!#$!' ! ( )*+' !, **)-! . /0/&*12!%! . /0/&*12!3! . /0/&*12!4 ! " ##$!%!" ##$!!&!" ##$!!' ! #/4 2&*12!%! #/4 2&*12!' ! #/4 2&*12!&! 8/22/2013
  • 6. Single Namenode • Stores all metadata in memory • Design is simple • Provide low latency and high throughput metadata operations • Support up to 3K data servers • Hadoop clusters make it affordable to store old data • Cold data is stored in the cluster for a long time • Take up memory space but rarely used • Growth of data size can exceed throughput • Goal: remove space limits while maintain similar performance 68/22/2013
  • 7. Metadata in Namenode • Namespace • Stored as a linked tree structure by inodes • Always visit from the top for any operation • Blocks Map: block_id to location mapping • Handle separately for huge number of blocks • Datanode status • IPaddress, capacity, load, heartbeat status, Block report status • Leases • Namespace and Block map uses the majority of memory • This talk will focus on the Namespace 78/22/2013
  • 8. Problem and Proposed Solution • Problem: • Remove namespace limit while maintain similar performance when the working set can fit in memory • Solution • Retain the same namespace tree structure • Store the namespace in persistent store using LSM (LevelDB) • No separate edit logs nor checkpoints • All Inode and their updates are persistent via LevelDB • Fast startup, with the cost of slow initial operations • Could prefetch inodes in • Do not expect customers to drastically reduce the actual heap size • Larger heap benefits transition between different working sets as applications and workload changes • A customer may occasionally run queries against cold data 88/22/2013
  • 9. New Namenode Architecture • Namespace • Same as before, but only part of the tree is in memory • On cache miss, read from levelDB • Edit logs and checkpoints are replaced by LevelDB • Update to LevelDB for every inode change • Key: <parent_inode_number + name> 9 Namenode Inode edit logs Namenode Inode Inode levelDB buffer WAL LevelDB Inode levelDB buffer WAL LevelDB 8/22/2013
  • 10. Comparison w/Traditional FileSystem • Traditional File Systems • VFS layer keeps inode and directory entry cache • Goal is to support the work load of single machine • Relatively large number of files • Support the applications from a single machine or in case of NFS from a larger number of client machines • Much much smaller workload and size compared to Hadoop use cases • LevelDB based Namenode • Support very large traffic of Hadoop cluster • Keep a much larger number of INodes in memory • Cache replacement policies to suite the Hadoop work load • Data is in Datanodes 108/22/2013
  • 11. LevelDB • A fast key-value storage library written at Google • Basic operations: get, put, delete • Concurrency: single process w/multiple threads • By default, writes are asynchronous • As long as the machine doesn’t crash, it’s safe. • Support synchronous writes • No separate sync() operation • Can be implemented by sync write/delete • Support batch updates • Data is automatically compressed using the Snappy 118/22/2013
  • 12. Cache Replacement Policy • Only whole directories are replaced in or out • Hot dirs are all in cache, others will require levelDB scan • Future – don’t cache very large dirs? • No need to read from disks to check file existence • LRU replacement policy • Use CLOCK to approximate to reduce cost • Separate thread for cache replacement • Start replacement when threshold is exceeded • Remove eviction out of sessions with lock 128/22/2013
  • 13. Benchmark description • NNThroughputBenchmark • No RPC cost, call FileSystem method directly • All operations are generated based on BFS order • Each thread gets one portion of the work • NN Load generator using YCSB++ framework (in progress) • Normal HDFS client calls • Thread either works in their own namespace, or choose randomly • Load generator based on real cluster traces (in progress) • Can you help me get traces from your cluster? • Traditional Hadoop benchmark(in progress) • E.g. Gridmix Expect little degradation when most work is for data transfer 138/22/2013
  • 14. Categories of tests • Everything fits in memory • Goal: should be almost the same as the current NN • Working set does not fit in memory or changes over time • Study various cache replacement policies • Need to get good traces from real cluster to see patterns of hot, warm and cold data 148/22/2013
  • 15. Experiment Setup • Hardware description (Susitna) • CPU: AMD Opteron 6272, 64 bit, 16 MB L2, 16-core 2.1 GHz • SSD: Crucial M4-CT064M4SSD2 SSD, 64 GB, SATA 6.0Gb/s • (In progress) Use disks in future experiments • Heap size is set to 1GB • NNThroughputBenchmark • No RPC cost, call FileSystem method directly • All operations are generated based on BFS order • Multiple threads, but each thread gets one portion of the work • Each directory contains 100 subdirs and 100 files • Named sequentially: ThroughputBenchDir1, ThroughputBench1 • LevelDB NN • Cache monitor thread starts replacement when 90% full 158/22/2013
  • 16. Create & close 2.4M files – all fit in cache 0 1000 2000 3000 4000 5000 6000 7000 8000 2 4 8 16 Throughputops/sec Number of Threads original w/LevelDB 16 • Note files are not accessed, but clearly parent dirs are • Note: Old NN and LevelDB NN peak at different # threads • Degradation for peak throughput is 13.5% 8/22/2013
  • 17. Create 9.6M files: 1% fits in cache • Old NN with 8 threads and LevelDB NN with 16 threads. • Performance remains about the same using LevelDB • Namenode’s throughput drops to zero when memory exhausted 17 0 1000 2000 3000 4000 5000 6000 7000 20 120 220 320 420 520 620 720 820 920 1020 1120 1220 1320 1420 1520 1620 ThroughputOps/sec Time in seconds Original LevelDB NN 8/22/2013
  • 18. GetFileInfo 18 • ListStatus of first 600K of 2.4M files • Each thread working on different part of tree • Original NN: all fit in memory (of course) • LevelDB NN: 2 cases: (1) all fit, (2) half fit • Half fit: 10%-20% degradation - cache is constantly replaced 0 20000 40000 60000 80000 100000 120000 140000 2 4 8 16 32 ThroughputOps/sec Number of Threads Original FitCache HalfInCache 8/22/2013
  • 19. Benchmarks that remain • NNThroughputBenchmark • No RPC cost, call FileSystem method directly • All operations are generated based on BFS order • Each thread gets one portion of the work • NN Load generator using YCSB++ framework (in progress) • Normal HDFS client calls • Thread either works in their own namespace, or choose randomly • Load generator based on real cluster traces (in progress) • Can you help me get traces from your cluster? • Traditional Hadoop benchmark(in progress) • E.g. Gridmix Expect little degradation when most work is for data transfer 198/22/2013
  • 20. Summary • Now that NN is HA, removing the namespace memory limitation is one of most important problems to solve • LSM (LevelDB) has worked out quite well • Initial experiments have shown good results • Need further benchmarks especially on how effective caching is for different workloads and patterns • Other LSM implementations? (e.g.HBase’s Java LSM) • Work is done on branch 0.23 • Graduate student quality prototype (very good graduate student ) • But worked closed with the HDFS experts at Hortonworks • Goal of internship was to see how well the idea worked • Hortonworks plans to take this to the next stage once more experiments are completed. 208/22/2013
  • 21. Q&A • Contact: lxiao+@cs.cmu.edu • We’d love to get trace stats from your cluster  • Simple java program to run against your audit logs • Can also run as Mapreduce jobs • Extract metadata operation stats without exposing sensitive info • Please contact me if you could help! 218/22/2013