SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
HopsFS – Breaking 1 million ops/sec barrier in Hadoop
Dr Jim Dowling
Associate Prof @ KTH
Senior Researcher @ SICS
CEO at Logical Clocks AB
www.hops.io
@hopshadoop
Evolution of Hadoop
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 2/51
2009 2017
Evolution of Hadoop
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 3/51
2009 2017
?
Tiny Brain
(NameNode, ResourceMgr)
Huge Body (DataNodes)
HDFS Scalability Bottleneck – the NameNode
•Limited namespace/metadata
- JVM Heap (~200 GB)
•Limited concurrency
- Single global namespace lock
(single-writer, multiple readers)
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 5/51
HFDS
CLIENT
HFDS
DATANODE
NAMENODE
HopsFS
1. Scale-out Metadata
- Metadata in an in-memory distributed database
- Multiple stateless NameNodes
2. Remove the Global Namespace Lock
- Supports multiple concurrent read and write operations
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 6/51
HopsFS Architecture
2017-04-05 7/51
MySQL Cluster: Network Database Engine (NDB)
•Open-Source, Distributed, In-Memory Database
- Scales to 48 database nodes
• 200 Million NoSQL Read Ops/Sec*
•NewSQL (Relational) DB
- Read Committed Transactions
- Row-level Locking
- User-defined partitioning
- Efficient cross-partition
transactions
2017-04-05 8/51*https://www.mysql.com/why-mysql/benchmarks/mysql-cluster/
NameNode
(Apache v2)
DAL API
(Apache v2)
NDB-DAL-Impl
(GPL v2)
Other DB
(Other License)
hops-2.7.3.jar ndb-2.7.3-7.5.6.jar
HopsFS Metadata and Metadata Operations
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 9/51
/
user
F1 F2 F3
HopsFS Metadata & Metadata Partitioning
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 10/51
INode Table Block Table Replica Table
Inode_ID Name Parent_ID ... Block_ID Inode_ID ... Inode_ID Block_ID DataNode_ID ...
/
user
F1 F2 F3
➢Inode ID
➢Parent INode ID
➢Name
➢Size
➢Access Attributes
➢...
HopsFS Metadata & Metadata Partitioning
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 11/51
INode Table Block Table Replica Table
Inode_ID Name Parent_ID ... Block_ID Inode_ID ... Inode_ID Block_ID DataNode_ID ...
/
user
F1 F2 F3
➢File INode to Blocks Mapping
➢Block Size
➢...
HopsFS Metadata & Metadata Partitioning
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 12/51
INode Table Block Table Replica Table
Inode_ID Name Parent_ID ... Block_ID Inode_ID ... Inode_ID Block_ID DataNode_ID ...
/
user
F1 F2 F3
➢Location of blocks on
Datanodes
➢...
HopsFS Metadata & Metadata Partitioning
13/51
INode Table Block Table Replica Table
Inode_ID Name Parent_ID ... Inode_ID Block_ID ... Inode_ID Block_ID DataNode_ID ...
1 / 0 3 1 3 1 1
2 user 1 3 2 3 1 2
3 F1 2 3 3 3 1 3
4 F2 2 3 2 4
5 F3 2 3 2 5
3 ... ...
$> ls /user/*
/
user
F1 F2 F3
MySQL Cluster
Partition 1 Partition 2 Partition 3 Partition 4
/ user F1 [{3,1},{3,2},{3,3}
F2 ],[{3,1,1},{3,1,2},
F3 {3,1,3},{3,2,4}
…{3,3,9}]
HopsFS Metadata & Metadata Partitioning
14/51
INode Table Block Table Replica Table
Inode_ID Name Parent_ID ... Inode_ID Block_ID ... Inode_ID Block_ID DataNode_ID ...
1 / 0 3 1 3 1 1
2 user 1 3 2 3 1 2
3 F1 2 3 3 3 1 3
4 F2 2 3 2 4
5 F3 2 3 2 5
3 ... ...
$> cat /user/F1
/
user
F1 F2 F3
MySQL Cluster
Partition 1 Partition 2 Partition 3 Partition 4
/ user F1 [{3,1},{3,2},{3,3}
F2 ],[{3,1,1},{3,1,2},
F3 {3,1,3},{3,2,4}
…{3,3,9}]
Leader Election using NDB*
•Leader NN coordinates replication/lease mgmt
- NDB as shared memory for Election of Leader NN.
• Zookeeper not needed!
15/51*Niazi, Berthou, Ismail, Dowling, ”Leader Election in a NewSQL Database”, DAIS 2015
Metadata Locking
16/51
Metadata Locking (contd.)
17/51
●Exclusive Lock
●Shared Lock
Metadata Locking (contd.)
18/51
●Exclusive Lock
●Shared Lock
Subtree Lock
Performance Evaluation for HopsFS
19/51
• On Premise
- Up to 72 servers
- Dual Intel® Xeon® E5-2620 v3
@2.40GHz
- 256 GB RAM, 4 TB Disks
• 10 GbE
- 0.1 ms ping latency
Evaluation: Spotify Workload
20/51
HopsFS Higher Throughput with Same Hardware
21/51
HopsFS outperforms with equivalent
hardware: HA-HDFS with Five Servers
● 1 Active NameNode
● 1 Standby NameNode
● 3 Servers
○ Journal Nodes
○ ZooKeeper Nodes
Evaluation: Spotify Workload (contd.)
22/51
Evaluation: Spotify Workload (contd.)
23/51
Evaluation: Spotify Workload (contd.)
24/51
Evaluation: Spotify Workload (contd.)
25/51
16X the performance of
HDFS.
Further scaling possible
with more hardware
Write Intensive workloads
26/51
Workloads
HopsFS
ops/sec HDFS ops/sec Scaling Factor
Synthetic Workload (5.0% File Writes) 1.19 M 53.6 K 22
Synthetic Workload (10% File Writes) 1.04 M 35.2 K 30
Synthetic Workload (20% File Writes) 0.748 M 19.9 K 37
Scalability of HopsFS and HDFS for write intensive workloads
Write Intensive workloads
27/51
Workloads
HopsFS
ops/sec HDFS ops/sec Scaling Factor
Synthetic Workload (5.0% File Writes) 1.19 M 53.6 K 22
Synthetic Workload (10% File Writes) 1.04 M 35.2 K 30
Synthetic Workload (20% File Writes) 0.748 M 19.9 K 37
Scalability of HopsFS and HDFS for write intensive workloads
Metadata Scalability
28/51
37 times more files than HDFS
Operational Latency
29/51
File System Clients
No of Clients HopsFS Latency HDFS Latency
50 3.0 3.1
Operational Latency
30/51
File System Clients
No of Clients HopsFS Latency HDFS Latency
50 3.0 3.1
1500 3.7 15.5
Operational Latency
31/51
File System Clients
No of Clients HopsFS Latency HDFS Latency
50 3.0 3.1
1500 3.7 15.5
6500 6.8 67.4
Operational Latency
32/51
File System Clients
No of Clients HopsFS Latency HDFS Latency
50 3.0 3.1
1500 3.7 15.5
6500 6.8 67.4
Erasure Coding with Data Locality
33/51
Reed-Solomon
(140%)
ZFS with HopsFS
HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 34/51
RAID-0
10 Gb/s
~350 MB/s
Reads
~250 MB/s
Writes
RAID-5 + HopsFS Erasure Coding
~500 MB/s
Reads
~350 MB/s
Writes
Archive filesTriple-replicated files
Elasticsearch
Strong Eventually Consistent Metadata
35/51
Database
Kafka
Epipe
Hive Metastore Changelog
for HDFS
Namespace
Free-Text Search for Files/Dirs in
the HopsFS Namespace
Extending Metadata in HopsFS
Metadata API (HopsFS->Elasticsearch)
public void attachMetadata(Json obj, String pathToFileorDir)
public void removeMetadata(String name, String pathToFileorDir)
•Design your own tables
- Use foreign keys for metadata integrity
- Transactions ensure metadata consistency
2017-04-05 36/51
HopsYARN
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 37/51
Hops scalability now limited by YARN
•YARN scheduler (triggered on node heartbeats)*
- Scheduling decisions cost O(N), where N is the number of active Applications
- We reduced the cost to O(M), where M is the number of applications currently
requesting resources. Typically M << N.
38/51
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1000 3000 5000 7000 9000 11000 13000 15000 17000 19000
ClusterUtilisation
Number of Node Managers
Hadoop(fix)
Hadoop(OFF)
Hadoop (INFO)
*Experiments based on workload from YARN paper at SOCC’13 using our own distributed benchmarking tool.
Hops Distribution (2.7.3)
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 39/51
HopsYARNResource
Manager
Storage HopsFS
On-Premise GCEAWSPlatform
Processing
Logstash
TensorflowSpark
Flink
Kafka
Hopsworks Elasticsearch
Kibana Zeppelin
Hadoop Distributions Simplify Things
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 40/51
Cloudera MgrKaramel/ChefAmbariInstall /
Upgrade
YARN
HDFS
On-Premise
MR TensorflowSpark FlinkKafka
Future of HopsFS
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 41/51
Hive Metastore is Moving in with HopsFS
HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 42/51
HopsFS
Hive
MetaStore
Hive Metastore is Moving in with HopsFS
HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 43/51
HopsFSHive
MetaStore
Hive
MetaStore
Result: Strongly Consistent Hive Metadata
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 44/51
1.
3.
2.
Removing the HDFS
backing directory
removes the Table
from Hive the
Metastore
Small Files in Hadoop
•In both Spotify and Yahoo 20% of the files are <= 4 KB
45/51
*Niazi et al, Size Matters: Improving the Performance of Small Files in HDFS, Poster at Eurosys 2017
Small Files in HopsFS*
inode_id varbinary (on-disk column)
32123432 [File contents go here]
46/51
•In HopsFS, we can store small files co-located with the
metadata in MySQL Cluster as on-disk data.
30 namenodes/datanodes and 6 NDB nodes were used. Small file size was 4 KB. HopsFs files were stored on Intel 750 Series SSDs
HopsFS Small Files Performance (Early Results)
47/51
Multi-Data-Center HopsFS
• Multi-Master Replication of Metadata with Conflict Detection/Resolution.
48/51
NDB NDB
DN DN DN DN
Client
Synchronous Replication of Blocks
Network Partition Identification Service
NNNN NNNN
Asynchronous Replication of Metadata (~2000 ms delay)
Hops-eu-west1 Hops-eu-west2
Summary
•Hops is the only European distribution of Hadoop
- More scalable, tinker-friendly, and open-source.
•HopsFS has made a quantum leap in the
performance for HDFS
•HopsFS opens up new possibilities for building data
processing frameworks with support for small files,
free-text search of the namespace, and extensible
strongly consistent metadata.
2017-04-05 49/51
The Hops Team
Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman
Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias
Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Roberto Bampi,
Fabio Buso, Fanti Machmount Al Samisti, Braulio Grana, Zahin Azher
Rashid, Robin Andersson, ArunaKumari Yedurupaka, Tobias
Johansson, August Bonds, Filotas Siskos.
Active:
Alumni:
Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram
Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto
Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro,
Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos
Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid
Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
Hops Heads
Resource manager
Lead simulator simulatorStart
start
Heartbeats
(nodes and apps)
Container allocations
stop
results
results
Scalable Benchmarker for YARN

Weitere ähnliche Inhalte

Was ist angesagt?

Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High PerformanceInderaj (Raj) Bains
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowDataWorks Summit/Hadoop Summit
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemDataWorks Summit/Hadoop Summit
 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyDataWorks Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitDataWorks Summit
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
 

Was ist angesagt? (20)

LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
HDFS Analysis for Small Files
HDFS Analysis for Small FilesHDFS Analysis for Small Files
HDFS Analysis for Small Files
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
 
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid AnalyticsTo The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
 
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 

Ähnlich wie Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop

Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaYahoo Developer Network
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopJim Dowling
 
Hadoop security
Hadoop securityHadoop security
Hadoop securityBiju Nair
 
Facebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage ChallengeFacebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage ChallengeDataWorks Summit
 
Architecture of the Upcoming OrangeFS v3 Distributed Parallel File System
Architecture of the Upcoming OrangeFS v3 Distributed Parallel File SystemArchitecture of the Upcoming OrangeFS v3 Distributed Parallel File System
Architecture of the Upcoming OrangeFS v3 Distributed Parallel File SystemAll Things Open
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
Introduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxIntroduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxSakthiVinoth78
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxMiraj Godha
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Lecture 4: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 4: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 4: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 4: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 

Ähnlich wie Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop (20)

Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
 
RuG Guest Lecture
RuG Guest LectureRuG Guest Lecture
RuG Guest Lecture
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Facebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage ChallengeFacebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage Challenge
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Architecture of the Upcoming OrangeFS v3 Distributed Parallel File System
Architecture of the Upcoming OrangeFS v3 Distributed Parallel File SystemArchitecture of the Upcoming OrangeFS v3 Distributed Parallel File System
Architecture of the Upcoming OrangeFS v3 Distributed Parallel File System
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Introduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxIntroduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptx
 
HDFS
HDFSHDFS
HDFS
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptx
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Lecture 4: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 4: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 4: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 4: Data-Intensive Computing for Text Analysis (Fall 2011)
 

Mehr von DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors DataWorks Summit/Hadoop Summit
 

Mehr von DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 

Kürzlich hochgeladen

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 

Kürzlich hochgeladen (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 

Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop

  • 1. HopsFS – Breaking 1 million ops/sec barrier in Hadoop Dr Jim Dowling Associate Prof @ KTH Senior Researcher @ SICS CEO at Logical Clocks AB www.hops.io @hopshadoop
  • 2. Evolution of Hadoop 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 2/51 2009 2017
  • 3. Evolution of Hadoop 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 3/51 2009 2017 ? Tiny Brain (NameNode, ResourceMgr) Huge Body (DataNodes)
  • 4.
  • 5. HDFS Scalability Bottleneck – the NameNode •Limited namespace/metadata - JVM Heap (~200 GB) •Limited concurrency - Single global namespace lock (single-writer, multiple readers) 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 5/51 HFDS CLIENT HFDS DATANODE NAMENODE
  • 6. HopsFS 1. Scale-out Metadata - Metadata in an in-memory distributed database - Multiple stateless NameNodes 2. Remove the Global Namespace Lock - Supports multiple concurrent read and write operations 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 6/51
  • 8. MySQL Cluster: Network Database Engine (NDB) •Open-Source, Distributed, In-Memory Database - Scales to 48 database nodes • 200 Million NoSQL Read Ops/Sec* •NewSQL (Relational) DB - Read Committed Transactions - Row-level Locking - User-defined partitioning - Efficient cross-partition transactions 2017-04-05 8/51*https://www.mysql.com/why-mysql/benchmarks/mysql-cluster/ NameNode (Apache v2) DAL API (Apache v2) NDB-DAL-Impl (GPL v2) Other DB (Other License) hops-2.7.3.jar ndb-2.7.3-7.5.6.jar
  • 9. HopsFS Metadata and Metadata Operations 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 9/51 / user F1 F2 F3
  • 10. HopsFS Metadata & Metadata Partitioning 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 10/51 INode Table Block Table Replica Table Inode_ID Name Parent_ID ... Block_ID Inode_ID ... Inode_ID Block_ID DataNode_ID ... / user F1 F2 F3 ➢Inode ID ➢Parent INode ID ➢Name ➢Size ➢Access Attributes ➢...
  • 11. HopsFS Metadata & Metadata Partitioning 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 11/51 INode Table Block Table Replica Table Inode_ID Name Parent_ID ... Block_ID Inode_ID ... Inode_ID Block_ID DataNode_ID ... / user F1 F2 F3 ➢File INode to Blocks Mapping ➢Block Size ➢...
  • 12. HopsFS Metadata & Metadata Partitioning 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 12/51 INode Table Block Table Replica Table Inode_ID Name Parent_ID ... Block_ID Inode_ID ... Inode_ID Block_ID DataNode_ID ... / user F1 F2 F3 ➢Location of blocks on Datanodes ➢...
  • 13. HopsFS Metadata & Metadata Partitioning 13/51 INode Table Block Table Replica Table Inode_ID Name Parent_ID ... Inode_ID Block_ID ... Inode_ID Block_ID DataNode_ID ... 1 / 0 3 1 3 1 1 2 user 1 3 2 3 1 2 3 F1 2 3 3 3 1 3 4 F2 2 3 2 4 5 F3 2 3 2 5 3 ... ... $> ls /user/* / user F1 F2 F3 MySQL Cluster Partition 1 Partition 2 Partition 3 Partition 4 / user F1 [{3,1},{3,2},{3,3} F2 ],[{3,1,1},{3,1,2}, F3 {3,1,3},{3,2,4} …{3,3,9}]
  • 14. HopsFS Metadata & Metadata Partitioning 14/51 INode Table Block Table Replica Table Inode_ID Name Parent_ID ... Inode_ID Block_ID ... Inode_ID Block_ID DataNode_ID ... 1 / 0 3 1 3 1 1 2 user 1 3 2 3 1 2 3 F1 2 3 3 3 1 3 4 F2 2 3 2 4 5 F3 2 3 2 5 3 ... ... $> cat /user/F1 / user F1 F2 F3 MySQL Cluster Partition 1 Partition 2 Partition 3 Partition 4 / user F1 [{3,1},{3,2},{3,3} F2 ],[{3,1,1},{3,1,2}, F3 {3,1,3},{3,2,4} …{3,3,9}]
  • 15. Leader Election using NDB* •Leader NN coordinates replication/lease mgmt - NDB as shared memory for Election of Leader NN. • Zookeeper not needed! 15/51*Niazi, Berthou, Ismail, Dowling, ”Leader Election in a NewSQL Database”, DAIS 2015
  • 18. Metadata Locking (contd.) 18/51 ●Exclusive Lock ●Shared Lock Subtree Lock
  • 19. Performance Evaluation for HopsFS 19/51 • On Premise - Up to 72 servers - Dual Intel® Xeon® E5-2620 v3 @2.40GHz - 256 GB RAM, 4 TB Disks • 10 GbE - 0.1 ms ping latency
  • 21. HopsFS Higher Throughput with Same Hardware 21/51 HopsFS outperforms with equivalent hardware: HA-HDFS with Five Servers ● 1 Active NameNode ● 1 Standby NameNode ● 3 Servers ○ Journal Nodes ○ ZooKeeper Nodes
  • 25. Evaluation: Spotify Workload (contd.) 25/51 16X the performance of HDFS. Further scaling possible with more hardware
  • 26. Write Intensive workloads 26/51 Workloads HopsFS ops/sec HDFS ops/sec Scaling Factor Synthetic Workload (5.0% File Writes) 1.19 M 53.6 K 22 Synthetic Workload (10% File Writes) 1.04 M 35.2 K 30 Synthetic Workload (20% File Writes) 0.748 M 19.9 K 37 Scalability of HopsFS and HDFS for write intensive workloads
  • 27. Write Intensive workloads 27/51 Workloads HopsFS ops/sec HDFS ops/sec Scaling Factor Synthetic Workload (5.0% File Writes) 1.19 M 53.6 K 22 Synthetic Workload (10% File Writes) 1.04 M 35.2 K 30 Synthetic Workload (20% File Writes) 0.748 M 19.9 K 37 Scalability of HopsFS and HDFS for write intensive workloads
  • 28. Metadata Scalability 28/51 37 times more files than HDFS
  • 29. Operational Latency 29/51 File System Clients No of Clients HopsFS Latency HDFS Latency 50 3.0 3.1
  • 30. Operational Latency 30/51 File System Clients No of Clients HopsFS Latency HDFS Latency 50 3.0 3.1 1500 3.7 15.5
  • 31. Operational Latency 31/51 File System Clients No of Clients HopsFS Latency HDFS Latency 50 3.0 3.1 1500 3.7 15.5 6500 6.8 67.4
  • 32. Operational Latency 32/51 File System Clients No of Clients HopsFS Latency HDFS Latency 50 3.0 3.1 1500 3.7 15.5 6500 6.8 67.4
  • 33. Erasure Coding with Data Locality 33/51 Reed-Solomon (140%)
  • 34. ZFS with HopsFS HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 34/51 RAID-0 10 Gb/s ~350 MB/s Reads ~250 MB/s Writes RAID-5 + HopsFS Erasure Coding ~500 MB/s Reads ~350 MB/s Writes Archive filesTriple-replicated files
  • 35. Elasticsearch Strong Eventually Consistent Metadata 35/51 Database Kafka Epipe Hive Metastore Changelog for HDFS Namespace Free-Text Search for Files/Dirs in the HopsFS Namespace
  • 36. Extending Metadata in HopsFS Metadata API (HopsFS->Elasticsearch) public void attachMetadata(Json obj, String pathToFileorDir) public void removeMetadata(String name, String pathToFileorDir) •Design your own tables - Use foreign keys for metadata integrity - Transactions ensure metadata consistency 2017-04-05 36/51
  • 37. HopsYARN 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 37/51
  • 38. Hops scalability now limited by YARN •YARN scheduler (triggered on node heartbeats)* - Scheduling decisions cost O(N), where N is the number of active Applications - We reduced the cost to O(M), where M is the number of applications currently requesting resources. Typically M << N. 38/51 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1000 3000 5000 7000 9000 11000 13000 15000 17000 19000 ClusterUtilisation Number of Node Managers Hadoop(fix) Hadoop(OFF) Hadoop (INFO) *Experiments based on workload from YARN paper at SOCC’13 using our own distributed benchmarking tool.
  • 39. Hops Distribution (2.7.3) 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 39/51 HopsYARNResource Manager Storage HopsFS On-Premise GCEAWSPlatform Processing Logstash TensorflowSpark Flink Kafka Hopsworks Elasticsearch Kibana Zeppelin
  • 40. Hadoop Distributions Simplify Things 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 40/51 Cloudera MgrKaramel/ChefAmbariInstall / Upgrade YARN HDFS On-Premise MR TensorflowSpark FlinkKafka
  • 41. Future of HopsFS 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 41/51
  • 42. Hive Metastore is Moving in with HopsFS HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 42/51 HopsFS Hive MetaStore
  • 43. Hive Metastore is Moving in with HopsFS HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 43/51 HopsFSHive MetaStore Hive MetaStore
  • 44. Result: Strongly Consistent Hive Metadata 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 44/51 1. 3. 2. Removing the HDFS backing directory removes the Table from Hive the Metastore
  • 45. Small Files in Hadoop •In both Spotify and Yahoo 20% of the files are <= 4 KB 45/51
  • 46. *Niazi et al, Size Matters: Improving the Performance of Small Files in HDFS, Poster at Eurosys 2017 Small Files in HopsFS* inode_id varbinary (on-disk column) 32123432 [File contents go here] 46/51 •In HopsFS, we can store small files co-located with the metadata in MySQL Cluster as on-disk data.
  • 47. 30 namenodes/datanodes and 6 NDB nodes were used. Small file size was 4 KB. HopsFs files were stored on Intel 750 Series SSDs HopsFS Small Files Performance (Early Results) 47/51
  • 48. Multi-Data-Center HopsFS • Multi-Master Replication of Metadata with Conflict Detection/Resolution. 48/51 NDB NDB DN DN DN DN Client Synchronous Replication of Blocks Network Partition Identification Service NNNN NNNN Asynchronous Replication of Metadata (~2000 ms delay) Hops-eu-west1 Hops-eu-west2
  • 49. Summary •Hops is the only European distribution of Hadoop - More scalable, tinker-friendly, and open-source. •HopsFS has made a quantum leap in the performance for HDFS •HopsFS opens up new possibilities for building data processing frameworks with support for small files, free-text search of the namespace, and extensible strongly consistent metadata. 2017-04-05 49/51
  • 50. The Hops Team Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Roberto Bampi, Fabio Buso, Fanti Machmount Al Samisti, Braulio Grana, Zahin Azher Rashid, Robin Andersson, ArunaKumari Yedurupaka, Tobias Johansson, August Bonds, Filotas Siskos. Active: Alumni: Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu. Hops Heads
  • 51. Resource manager Lead simulator simulatorStart start Heartbeats (nodes and apps) Container allocations stop results results Scalable Benchmarker for YARN