SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Scaling HDFS at Xiaomi
Chen Zhang
Outline
• Introduction of Xiaomi
• Scenarios and challenges
• Improvements on HDFS federation
• Experience on scaling up single NameNode
• Efficient management of hundreds of clusters
About Xiaomi
World’s 4th largest
smartphone maker
Sold 118 Million
phones in 2018
About Xiaomi
World’s Largest
consumer IoT platform
Over 150 Million
smart devices connected
Software and Internet Services
MIUI MiPay/Finance
App Market Ads
MiCloud Game
MiPush Smart Home
News Feeds …
Scenarios
HDFS
HBase
EMQ
Yarn Talos
FDS(S3) Spark HiveImpala
Scenarios
Micloud
MiPush Feeds User
Profile
Talos
Ads
Online Services
• 100+ Independent Clusters
• Low Latency
• High availability
Offline
Services
Hadoop
• Several Huge Clusters
• High throughput
• High Scalability, High availability
Data Growth
2
23
41
71
3
30
60
150
0
20
40
60
80
100
120
140
160
2015 2016 2017 2018
Data Growth of The Largest Cluster
File counts (10 million) Data Size (PB)
Challenges
• Challenges at late 2016
data growth is too fast dependency is too complex code change is almost impossible
What We Need
We need A Huge Single HDFS
Cluster
Improvements on HDFS Federation
• Problem of HDFS Federation at late 2016
– NameNode are independent, metadata is not shared
– Client side MountTable config, hard to maintain
– MountTable don’t support nesting mount-point
– ViewFileSystem is not compatible with DistributedFileSystem
– RBF is not stable and not fully functioning at late 2016
Improvements on HDFS Federation
viewfs
Pool 1 Pool nPool k
Block Pools
Datanode 1
…
Datanode 2
…
Datanode m
…
NS 1 NS k
Foreign
NS n
Common Storage
NN-1 NN-k NN-n
… …
BlockStorageNamespace
Original HDFS Federation
user
/
yarn hive
service1 service2
small
dir1
small
dir2
small
service2
small
service1
…
…
Improvements on HDFS Federation
viewfs
Pool 1 Pool nPool k
Block Pools
Datanode 2
…
Datanode 3
…
Datanode m
…
NS 1 NS k
Foreign
NS n
Common Storage
NN-1 NN-k NN-n
… …
BlockStorageNamespace
Support Nested MountPoints
Pool 1
NS 0
NN-0
…
Datanode 1
…
user
/
yarn hive
service1 service2
hdfs:// -> FederatedDFSFileSystem
extends DistributedFileSystem
Add Default NameSpace
Support rename across NameSpaces
Compatible with hdfs://, don’t need
to change any code
Update MountTable Config from ZK
Nested Mount table and Default NameSpace
1. Xiaomi is not only a hardware company, also an Internet
company, which develops very fast
2. There are more than 100 internet services, the new business and
services emerges quickly, based on our smart devices and more
than 300 million users
3. It’s hard for us to use a fixed mount table which is pre-divided
NN-1 NN-k NN-nNN-0
user
/
yarn hive
service1 service2
Nested Mount table and Default NameSpace
/some_new_nosql_service
/user/live_show_services
/user/short_video_services
1. At First, we divide the initial mount
point by data amount and QPS. Only
need to config a dozen of mountpoints
for the largest services, others fall into
the default NameSpaces
2. When new infrastructure-services and
internet-services emerges, the whole
mount table don’t need any updates
3. HADOOP-13055 supports linkFallback,
but our solution is more flexible
NS 1 NS kNS 0 NS n
Client Transparency
ViewFileSystem
FederatedDFSFilesystem
/user/service1 /user/service2in
fs.hdfs.impl=FederatedDFSFileSyste
m
hdfs://clustername/user/service1
access config
ZooKeeper fetch mounttable
watch
Admin
Tool
update
Client Transparency
RPC integration
• listStatus
• getContentSummary
• setQuota/getQuota
Admin Tools
• refreshNodes
• setBalancerBandwidth
• DataNode decommission
NN-1 NN-k NN-nNN-0
user
/
yarn hive
service1 service2
NS 1 NS kNS 0 NS n
/user/service1/.Trash/
Trash optimization
• moveToTrash is an rename operation
• moveToTrash across namenode is very
expensive
Rename Across NameSpaces
Client
locked
hardlink
namenode1 namenode2
datanode1 datanode2 datanode3
blockpool1
blockpool2
Link block
Rename Across NameSpaces in Detail
Source Phase 1
1. Sanity Check.
• Existence
• Permission
• Can’t be reserved directory
• Can’t be symlink
• Not in encryption zones
2. Serialize the inode-tree and blocks
information with ProtoBuf
• Name
• Permissions
• mtime/atime
• Replication factor
• Block locations
• Acl / Xattr / Quota …
Rename Across NameSpaces in Detail
Source Phase 1
3. Lock the directory
• Add a FederationRenameFeature. Record the information about renameId, source
and destination path
• With FederationRenameFeature, all sub-directories and files in this directory, and all
inodes in the parent path, is not writable
4. Add a federation-rename record
5. Return the serialized data to client
Rename Across NameSpaces in Detail
Dest Phase 1
1. Sanity Check
• permission, quota, not in encryption zones
2. Deserialize the inode-tree, graft it to the destination path
• Allocate inode id for each inode
• Allocate block id and new GS for each block
• Update acl and other features
Rename Across NameSpaces in Detail
Dest Phase 1
3. Lock the directory
• Also use FederationRenameFeature
4. Update quota count
5. Add a federation-rename record
6. Return a list of block information, inclouding:
• srcBlockId, destBlockId, blockSize, srcGenStamp, destGenStamp for each block
Rename Across NameSpaces in Detail
Link Block
1. For each DN, send request in batch
• Create new block file by hardlink, one by one
• With a total operation timeout
2. Using a ThreadPoolExecutor
3. For each block, count as complete if at least 2/3 replicas succeed
• Slow DN will not affect the total progress
Rename Across NameSpaces in Detail
Source Phase2
1. Delete the source directory/file
2. Delete all the inodes and blocks asyncronizely
3. Remove federation-rename record
Dest Phase2
1. Remove FedeartionRenameFeature, make the target directory
visible
2. Remove federation-rename record
Error Handling
Failed at How to Handle Result
Source Phase 1 Fail Fail
Dest Phase 1 Cancel source-phase1 Fail
Link Block
Request Fail
NameNode Fixer will redo the remaining steps
Will succeed
finally
Source Phase 2
Request Fail
NameNode Fixer will redo the remaining steps
Will succeed
finally
Dest Phase 2
Request Fail
NameNode Fixer will redo the remaining steps
Will succeed
finally
Error Handling
NameNode Failover and Restart
1. All operation have editlog
2. FederationRenameFeature will serialized to FsImage
3. Federation-rename records won’t serialized to FsImage, rebuild
from log replay or FsImage loading ( if some inode have
FederationRenameFeature, then add a Federation-rename record)
Scaling up NameNodes
Our Largest NameNode
1. 150GB heap
2. Use CMS GC
3. More than 500 million objects (240 million files and 260 million
blocks)
4. More than 20000 QPS
Scaling up NameNodes
Experience
• Throttle
– BlockReport / Incremental-BlockReport throttle
– Concurrent GetContentSummary throttle
• Lock optimization
• Config optimization
• Add more tracing information
Block Report Throttle
• Problem:Full GC when NameNode Startup
NameNode
60%
DN
DN
DN
DN
DN
Thousands of DN Block Report
at almost same time
DN
DN
DN
DN
DN
NameNode could only
process one block report
one time
Throttle the max concurrent
block reports, extra reports
will be rejected, and DN will
retry later
Other optimization
• Lock Optimization on exhausting operations
– When processing block report, release and re-gain the lock for every storage
– When processing getContentSummary, release the lock every N files
• Config optimization
– More handlers
– Longer heart-beat interval
– Longer full block report interval
– disable retry-cache and access-time
More tracing information
• Record Operations that hold the FSNamesystem lock too long
• Record QPS monitor on both server-side and client-side, push these
data to our internal monitor system
• Record failure reason and statistics of block allocation failure
• Add log for slow block report processing
How We Efficiently Manage 100+ Clusters
• We use HBase heavily in Xiaomi
• 20~30 HBase clusters for sensitive services and businesses in each
datacenter
• With the rapid growth of the global business, now there are more than 5
datacenters distributed in the whole world
• The number of total clusters also grows very quickly, make it hard to
maintain
How We Efficiently Manage 100+ Clusters
• Initially…
cluster-1
Canary
cluster-2
Canary
cluster-3
Canary
cluster-n
Canary
Efficiently manage 100+ clusters
cluster-1
Canary Task
cluster-2 cluster-3 cluster-n
ClustrerOne Monitor System
Canary Task
Canary Task
Balancer Task
Balancer Task
Balancer Task
ZooKeeper
NameService
metrics
generated
configuration
Q&A

Weitere ähnliche Inhalte

Was ist angesagt?

Running Analytics at the Speed of Your Business
Running Analytics at the Speed of Your BusinessRunning Analytics at the Speed of Your Business
Running Analytics at the Speed of Your BusinessRedis Labs
 
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...Michael Stack
 
HBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanHBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanMichael Stack
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Cloudera, Inc.
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiMichael Stack
 
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraCisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraDataStax Academy
 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedis Labs
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...DataStax
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Redis Labs
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureDataWorks Summit
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...Michael Stack
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeMichael Stack
 
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...Aerospike, Inc.
 
Building Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStaxBuilding Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStaxDataStax
 
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...Michael Stack
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
 
Red Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and FutureRed Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and FutureRed_Hat_Storage
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 

Was ist angesagt? (20)

Running Analytics at the Speed of Your Business
Running Analytics at the Speed of Your BusinessRunning Analytics at the Speed of Your Business
Running Analytics at the Speed of Your Business
 
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
 
HBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanHBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at Meituan
 
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on HadoopEnterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at Xiaomi
 
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraCisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power Systems
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
 
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
 
Building Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStaxBuilding Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStax
 
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at Pinterest
 
Red Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and FutureRed Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and Future
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 

Ähnlich wie Scaling HDFS at Xiaomi

HUG_Ireland_BryanQuinnPresentation_20160111
HUG_Ireland_BryanQuinnPresentation_20160111HUG_Ireland_BryanQuinnPresentation_20160111
HUG_Ireland_BryanQuinnPresentation_20160111John Mulhall
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Cloud stack overview
Cloud stack overviewCloud stack overview
Cloud stack overviewhowie YU
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stackNitin Mehta
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE confluent
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackAdam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackShapeBlue
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt omalreda
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeErik Krogen
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkQAware GmbH
 
2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverviewDimas Prasetyo
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesAlexandra Sasha Blumenfeld
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Nati Shalom
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScyllaDB
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
 

Ähnlich wie Scaling HDFS at Xiaomi (20)

HUG_Ireland_BryanQuinnPresentation_20160111
HUG_Ireland_BryanQuinnPresentation_20160111HUG_Ireland_BryanQuinnPresentation_20160111
HUG_Ireland_BryanQuinnPresentation_20160111
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Cloud stack overview
Cloud stack overviewCloud stack overview
Cloud stack overview
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackAdam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
 
2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 Minutes
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDS
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 

Mehr von DataWorks Summit

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsDataWorks Summit
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart Cities
 

Kürzlich hochgeladen

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Scaling HDFS at Xiaomi

  • 1. Scaling HDFS at Xiaomi Chen Zhang
  • 2. Outline • Introduction of Xiaomi • Scenarios and challenges • Improvements on HDFS federation • Experience on scaling up single NameNode • Efficient management of hundreds of clusters
  • 3. About Xiaomi World’s 4th largest smartphone maker Sold 118 Million phones in 2018
  • 4. About Xiaomi World’s Largest consumer IoT platform Over 150 Million smart devices connected
  • 5. Software and Internet Services MIUI MiPay/Finance App Market Ads MiCloud Game MiPush Smart Home News Feeds …
  • 7. Scenarios Micloud MiPush Feeds User Profile Talos Ads Online Services • 100+ Independent Clusters • Low Latency • High availability Offline Services Hadoop • Several Huge Clusters • High throughput • High Scalability, High availability
  • 8. Data Growth 2 23 41 71 3 30 60 150 0 20 40 60 80 100 120 140 160 2015 2016 2017 2018 Data Growth of The Largest Cluster File counts (10 million) Data Size (PB)
  • 9. Challenges • Challenges at late 2016 data growth is too fast dependency is too complex code change is almost impossible
  • 10. What We Need We need A Huge Single HDFS Cluster
  • 11. Improvements on HDFS Federation • Problem of HDFS Federation at late 2016 – NameNode are independent, metadata is not shared – Client side MountTable config, hard to maintain – MountTable don’t support nesting mount-point – ViewFileSystem is not compatible with DistributedFileSystem – RBF is not stable and not fully functioning at late 2016
  • 12. Improvements on HDFS Federation viewfs Pool 1 Pool nPool k Block Pools Datanode 1 … Datanode 2 … Datanode m … NS 1 NS k Foreign NS n Common Storage NN-1 NN-k NN-n … … BlockStorageNamespace Original HDFS Federation user / yarn hive service1 service2 small dir1 small dir2 small service2 small service1 … …
  • 13. Improvements on HDFS Federation viewfs Pool 1 Pool nPool k Block Pools Datanode 2 … Datanode 3 … Datanode m … NS 1 NS k Foreign NS n Common Storage NN-1 NN-k NN-n … … BlockStorageNamespace Support Nested MountPoints Pool 1 NS 0 NN-0 … Datanode 1 … user / yarn hive service1 service2 hdfs:// -> FederatedDFSFileSystem extends DistributedFileSystem Add Default NameSpace Support rename across NameSpaces Compatible with hdfs://, don’t need to change any code Update MountTable Config from ZK
  • 14. Nested Mount table and Default NameSpace 1. Xiaomi is not only a hardware company, also an Internet company, which develops very fast 2. There are more than 100 internet services, the new business and services emerges quickly, based on our smart devices and more than 300 million users 3. It’s hard for us to use a fixed mount table which is pre-divided
  • 15. NN-1 NN-k NN-nNN-0 user / yarn hive service1 service2 Nested Mount table and Default NameSpace /some_new_nosql_service /user/live_show_services /user/short_video_services 1. At First, we divide the initial mount point by data amount and QPS. Only need to config a dozen of mountpoints for the largest services, others fall into the default NameSpaces 2. When new infrastructure-services and internet-services emerges, the whole mount table don’t need any updates 3. HADOOP-13055 supports linkFallback, but our solution is more flexible NS 1 NS kNS 0 NS n
  • 17. Client Transparency RPC integration • listStatus • getContentSummary • setQuota/getQuota Admin Tools • refreshNodes • setBalancerBandwidth • DataNode decommission NN-1 NN-k NN-nNN-0 user / yarn hive service1 service2 NS 1 NS kNS 0 NS n /user/service1/.Trash/ Trash optimization • moveToTrash is an rename operation • moveToTrash across namenode is very expensive
  • 18. Rename Across NameSpaces Client locked hardlink namenode1 namenode2 datanode1 datanode2 datanode3 blockpool1 blockpool2 Link block
  • 19. Rename Across NameSpaces in Detail Source Phase 1 1. Sanity Check. • Existence • Permission • Can’t be reserved directory • Can’t be symlink • Not in encryption zones 2. Serialize the inode-tree and blocks information with ProtoBuf • Name • Permissions • mtime/atime • Replication factor • Block locations • Acl / Xattr / Quota …
  • 20. Rename Across NameSpaces in Detail Source Phase 1 3. Lock the directory • Add a FederationRenameFeature. Record the information about renameId, source and destination path • With FederationRenameFeature, all sub-directories and files in this directory, and all inodes in the parent path, is not writable 4. Add a federation-rename record 5. Return the serialized data to client
  • 21. Rename Across NameSpaces in Detail Dest Phase 1 1. Sanity Check • permission, quota, not in encryption zones 2. Deserialize the inode-tree, graft it to the destination path • Allocate inode id for each inode • Allocate block id and new GS for each block • Update acl and other features
  • 22. Rename Across NameSpaces in Detail Dest Phase 1 3. Lock the directory • Also use FederationRenameFeature 4. Update quota count 5. Add a federation-rename record 6. Return a list of block information, inclouding: • srcBlockId, destBlockId, blockSize, srcGenStamp, destGenStamp for each block
  • 23. Rename Across NameSpaces in Detail Link Block 1. For each DN, send request in batch • Create new block file by hardlink, one by one • With a total operation timeout 2. Using a ThreadPoolExecutor 3. For each block, count as complete if at least 2/3 replicas succeed • Slow DN will not affect the total progress
  • 24. Rename Across NameSpaces in Detail Source Phase2 1. Delete the source directory/file 2. Delete all the inodes and blocks asyncronizely 3. Remove federation-rename record Dest Phase2 1. Remove FedeartionRenameFeature, make the target directory visible 2. Remove federation-rename record
  • 25. Error Handling Failed at How to Handle Result Source Phase 1 Fail Fail Dest Phase 1 Cancel source-phase1 Fail Link Block Request Fail NameNode Fixer will redo the remaining steps Will succeed finally Source Phase 2 Request Fail NameNode Fixer will redo the remaining steps Will succeed finally Dest Phase 2 Request Fail NameNode Fixer will redo the remaining steps Will succeed finally
  • 26. Error Handling NameNode Failover and Restart 1. All operation have editlog 2. FederationRenameFeature will serialized to FsImage 3. Federation-rename records won’t serialized to FsImage, rebuild from log replay or FsImage loading ( if some inode have FederationRenameFeature, then add a Federation-rename record)
  • 27. Scaling up NameNodes Our Largest NameNode 1. 150GB heap 2. Use CMS GC 3. More than 500 million objects (240 million files and 260 million blocks) 4. More than 20000 QPS
  • 28. Scaling up NameNodes Experience • Throttle – BlockReport / Incremental-BlockReport throttle – Concurrent GetContentSummary throttle • Lock optimization • Config optimization • Add more tracing information
  • 29. Block Report Throttle • Problem:Full GC when NameNode Startup NameNode 60% DN DN DN DN DN Thousands of DN Block Report at almost same time DN DN DN DN DN NameNode could only process one block report one time Throttle the max concurrent block reports, extra reports will be rejected, and DN will retry later
  • 30. Other optimization • Lock Optimization on exhausting operations – When processing block report, release and re-gain the lock for every storage – When processing getContentSummary, release the lock every N files • Config optimization – More handlers – Longer heart-beat interval – Longer full block report interval – disable retry-cache and access-time
  • 31. More tracing information • Record Operations that hold the FSNamesystem lock too long • Record QPS monitor on both server-side and client-side, push these data to our internal monitor system • Record failure reason and statistics of block allocation failure • Add log for slow block report processing
  • 32. How We Efficiently Manage 100+ Clusters • We use HBase heavily in Xiaomi • 20~30 HBase clusters for sensitive services and businesses in each datacenter • With the rapid growth of the global business, now there are more than 5 datacenters distributed in the whole world • The number of total clusters also grows very quickly, make it hard to maintain
  • 33. How We Efficiently Manage 100+ Clusters • Initially… cluster-1 Canary cluster-2 Canary cluster-3 Canary cluster-n Canary
  • 34. Efficiently manage 100+ clusters cluster-1 Canary Task cluster-2 cluster-3 cluster-n ClustrerOne Monitor System Canary Task Canary Task Balancer Task Balancer Task Balancer Task ZooKeeper NameService metrics generated configuration
  • 35. Q&A

Hinweis der Redaktion

  1. introduce my self Today I’ll share some works we did on scaling HDFS spoken English
  2. investigation of xiaomi phone sales main market is india and china, also have good market share at southeast aisa and euroupe not in America
  3. IoT sales a variety of smart-devices it sales very well in china
  4. based on these phones and devices, we build lots of internet services and business these are most import part of them
  5. for this page, most services are well-known, I would introduce some of services that developed by us Talos is a data integration and distribution system FDS is an object storage system, which is quite similar with AWS S3, EMQ is a cloud message queue, which is also similar with AWS EMQ
  6. our clusters could be divided into 2 part, online vs offline these 2 scenarios is quite different, which brings us different challenges for online services, most HDFS clusters is deployed for hbase, we use hbase heavily, there are more than 100 online hdfs clusters and more than 3000 nodes the biggest challenge for online cluster is latency, especially the impaction of slow nodes and slow disk this part is not belong to this session, I’ll not introduce them in detail on the other hand, for offline analysis, we build several huge clusters, for these clusters ,the biggest challenge is scalability, which is how to serve more data and files
  7. let take a look at the data growth this is the chart for our largest cluster 4 years ago by the end of last year everybody knows what this means to hdfs cluster single namenode is hard to serve so many data
  8. with the repaid growth, we meet the scalability in 2016 after a bounch of work, we successfully make namenode become stable, but it will not last for a long time, we have to enable federation but the dependency is too complex and it’s almost impossible to divide these data into different namespaces it’s also very hard for us to ask users change their code to use viewfs
  9. So the only way makes sense for us is to build a huge single cluster more accurately, we need to modify federation to make it works like a single hdfs cluster how we did that let’s first take a look at the defects of federation
  10. in this solution, for every directory you need to assign a namespace, you have to add a mountpoint
  11. if the path is not in mount-table, then it will be mapped to one of the default namespaces in addition, to make the federation works like a single cluster, we support rename across namenode to avoid the code change, we created a new filesystem that wrapped viewfs in it in the last, we move the mountable to zookeeper and can update it automatically, so user don’t need to worry about the mount table this is the whole solution of us to make Federation works as a single cluster, in the next, I’ll introduce each part in details
  12. first, we create a wrapper FileSystem, it’s extended from DistributedFileSystem our users don’t need to change any code, just update some configs when the client initialing, it will fetch mounttable from zk in addition, we add a watcher, so clients can get the latest config anytime when they update at last, we made a admin tool to operate the mounttable config on zookeeper
  13. to make the federation transparently to user, still a lot of works to do here is some of them another improvement that worth to mention is the trash optimization by default , every user have only one trash folder, and since movetotrash is a rename operation and we support rename across namenode. a user delete operation on other namespaces may cause a rename across namenode. this operation’s cost is high. we don’t want it be triggered too frequently by removing trash data, so we did some optimization
  14. I’ll first introduce the overview, and then introduce some details it’s very complex, I’ll try to explain it as clear as I can there are 5 steps to complete a federation-rename
  15. Ok, the next is some experience of tuning a single namenode
  16. let me show the reason first, let‘s assume in the normal case, heap usage is 60%. when NN restart, it start receiving a lot of blockreport other blockreports that waiting proceed is stored in memory, the report speed is much higher than the processing speed, so the reports in memory keep accumulating until the heap is full.