In this talk, we will present a new distribution of Hadoop, Hops, that can scale the Hadoop Filesystem (HDFS) by 16X, from 70K ops/s to 1.2 million ops/s on Spotiy's industrial Hadoop workload. Hops is an open-source distribution of Apache Hadoop that supports distributed metadata for HSFS (HopsFS) and the ResourceManager in Apache YARN. HopsFS is the first production-grade distributed hierarchical filesystem to store its metadata normalized in an in-memory, shared nothing database. For YARN, we will discuss optimizations that enable 2X throughput increases for the Capacity scheduler, enabling scalability to clusters with >20K nodes. We will discuss the journey of how we reached this milestone, discussing some of the challenges involved in efficiently and safely mapping hierarchical filesystem metadata state and operations onto a shared-nothing, in-memory database. We will also discuss the key database features needed for extreme scaling, such as multi-partition transactions, partition-pruned index scans, distribution-aware transactions, and the streaming changelog API. Hops (www.hops.io) is Apache-licensed open-source and supports a pluggable database backend for distributed metadata, although it currently only support MySQL Cluster as a backend. Hops opens up the potential for new directions for Hadoop when metadata is available for tinkering in a mature relational database.
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
1. HopsFS – Breaking 1 million ops/sec barrier in Hadoop
Dr Jim Dowling
Associate Prof @ KTH
Senior Researcher @ SICS
CEO at Logical Clocks AB
www.hops.io
@hopshadoop
15. Leader Election using NDB*
•Leader NN coordinates replication/lease mgmt
- NDB as shared memory for Election of Leader NN.
• Zookeeper not needed!
15/51*Niazi, Berthou, Ismail, Dowling, ”Leader Election in a NewSQL Database”, DAIS 2015
35. Elasticsearch
Strong Eventually Consistent Metadata
35/51
Database
Kafka
Epipe
Hive Metastore Changelog
for HDFS
Namespace
Free-Text Search for Files/Dirs in
the HopsFS Namespace
36. Extending Metadata in HopsFS
Metadata API (HopsFS->Elasticsearch)
public void attachMetadata(Json obj, String pathToFileorDir)
public void removeMetadata(String name, String pathToFileorDir)
•Design your own tables
- Use foreign keys for metadata integrity
- Transactions ensure metadata consistency
2017-04-05 36/51
38. Hops scalability now limited by YARN
•YARN scheduler (triggered on node heartbeats)*
- Scheduling decisions cost O(N), where N is the number of active Applications
- We reduced the cost to O(M), where M is the number of applications currently
requesting resources. Typically M << N.
38/51
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1000 3000 5000 7000 9000 11000 13000 15000 17000 19000
ClusterUtilisation
Number of Node Managers
Hadoop(fix)
Hadoop(OFF)
Hadoop (INFO)
*Experiments based on workload from YARN paper at SOCC’13 using our own distributed benchmarking tool.
42. Hive Metastore is Moving in with HopsFS
HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 42/51
HopsFS
Hive
MetaStore
43. Hive Metastore is Moving in with HopsFS
HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 43/51
HopsFSHive
MetaStore
Hive
MetaStore
44. Result: Strongly Consistent Hive Metadata
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 44/51
1.
3.
2.
Removing the HDFS
backing directory
removes the Table
from Hive the
Metastore
45. Small Files in Hadoop
•In both Spotify and Yahoo 20% of the files are <= 4 KB
45/51
46. *Niazi et al, Size Matters: Improving the Performance of Small Files in HDFS, Poster at Eurosys 2017
Small Files in HopsFS*
inode_id varbinary (on-disk column)
32123432 [File contents go here]
46/51
•In HopsFS, we can store small files co-located with the
metadata in MySQL Cluster as on-disk data.
47. 30 namenodes/datanodes and 6 NDB nodes were used. Small file size was 4 KB. HopsFs files were stored on Intel 750 Series SSDs
HopsFS Small Files Performance (Early Results)
47/51
48. Multi-Data-Center HopsFS
• Multi-Master Replication of Metadata with Conflict Detection/Resolution.
48/51
NDB NDB
DN DN DN DN
Client
Synchronous Replication of Blocks
Network Partition Identification Service
NNNN NNNN
Asynchronous Replication of Metadata (~2000 ms delay)
Hops-eu-west1 Hops-eu-west2
49. Summary
•Hops is the only European distribution of Hadoop
- More scalable, tinker-friendly, and open-source.
•HopsFS has made a quantum leap in the
performance for HDFS
•HopsFS opens up new possibilities for building data
processing frameworks with support for small files,
free-text search of the namespace, and extensible
strongly consistent metadata.
2017-04-05 49/51
50. The Hops Team
Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman
Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias
Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Roberto Bampi,
Fabio Buso, Fanti Machmount Al Samisti, Braulio Grana, Zahin Azher
Rashid, Robin Andersson, ArunaKumari Yedurupaka, Tobias
Johansson, August Bonds, Filotas Siskos.
Active:
Alumni:
Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram
Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto
Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro,
Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos
Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid
Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
Hops Heads
51. Resource manager
Lead simulator simulatorStart
start
Heartbeats
(nodes and apps)
Container allocations
stop
results
results
Scalable Benchmarker for YARN