SlideShare ist ein Scribd-Unternehmen logo
1 von 27
HDFS Tiered
Storage
Thomas Demoor (Western Digital)
Virajith Jalaparti (Microsoft)
>id
Thomas Demoor
• PO/Architect @ Western Digital
• S3-compatible object storage
• Hadoop:
̶ S3a optimizations
• Fast uploader (stream from mem)
• Hadoop2/YARN support
• Coming up: object-store committer
̶ HDFS Tiered Storage
Virajith Jalaparti
• Scientist @ Microsoft CISL
• Hadoop
̶ HDFS Tiered Storage
2
Overview
• HDFS Tiered Storage
̶ Mount and manage remote stores through HDFS
• Earlier talks
̶ Hadoop Summit ‘16, San Jose
̶ Dataworks Summit ‘17, Munich
• This talk
̶ Introduce Tiered Storage in HDFS (design, read path,…)
̶ Focus on progress since earlier talks (mounting in HDFS, write path,…)
̶ Demo
3
REMOTE
STORE
APP
HADOOP CLUSTER
HDFS
Use Case I: Ephemeral Hadoop Clusters
• EMR on S3, HDInsight over WASB, …
• Several workarounds used today
̶ DistCp
̶ Use only remote storage
̶ Explicitly manage local and cloud storage
• Goal: Seamlessly use local and remote
(cloud) stores as one instance of HDFS
̶ Retrieve data to local cluster on-demand
̶ Use local storage to cache data
4
Data in cloud store (e.g., S3, WASB)
Hadoop
clusterHadoop
cluster read/write
read/write
Use Case II: Backup data to object stores
• Business value of Hadoop + Object Storage:
̶ Data retention: very high fault tolerance (erasure coding)
̶ Economics: cheap storage for cold data
̶ Business continuity planning: backup, migrate, …
• Public Clouds: Microsoft Azure, AWS S3, GCS, …
• Private Clouds: WD ActiveScale Object Storage
̶ S3-compatible object storage system
̶ Linear scalability in # racks, objects, throughput
̶ Entry level (100’s TB) – Scale out (5PB+/rack)
̶ http://www.hgst.com/products/systems
• Today: Hadoop Compatible FileSystems (s3a://, wasb://)
̶ Direct IO between Hadoop apps and object store
̶ Scalable & Resilient: outsourcing NameNode functions
• Compatible does not mean identical
̶ Most are not even FileSystems (notion of directories, append, …)
̶ No Data Locality: less performant for hot/real-time data
̶ Hadoop admin tools require HDFS: permissions/quota/security/…
̶ Workaround: explicitly manage local HDFS and remote cloud storage
• Goal: integrate better with HDFS
̶ Data-locality for hot data + object storage for cold data
̶ Offer familiar HDFS admin abstractions
Use Case II: Backup data to object stores
APP
HADOOP CLUSTER
READWRITE
Solution: “Mount” remote storage in HDFS
• Use HDFS to manage remote storage
̶ HDFS coordinates reads/writes to remote store
̶ Mount remote store as a PROVIDED tier in HDFS
• Details later in the talk
̶ Set StoragePolicy to move data between the tiers
7
… …
/
a b
HDFS
Namespace
…
… …
/
d e f
Remote
Namespace
Mount remote
namespace
c
d e f
Mount point
REMOTE
STORE
APP
HADOOP CLUSTER
WRITE
THROUGH
LOAD
ON-DEMAND
HDFS
READWRITE
READWRITE
Solution: “Mount” remote storage in HDFS
• Use HDFS to manage remote storage
̶ HDFS coordinates reads/writes to remote store
̶ Mount remote store as a PROVIDED tier in HDFS
• Details later in the talk
̶ Set StoragePolicy to move data between the tiers
• Benefits
̶ Transparent to users/applications
̶ Provides unified namespace
̶ Can extend HDFS support for quotas, security etc.
̶ Enables caching/prefetching
8
REMOTE
STORE
APP
HADOOP CLUSTER
HDFS
Challenges
• Synchronize metadata without copying data
̶ Dynamically page in “blocks” on demand
̶ Define policies to prefetch and evict local replicas
• Mirror changes in remote namespace
̶ Handle out-of-band churn in remote storage
̶ Avoid dropping valid, cached data (e.g., rename)
• Handle writes consistently
̶ Writes committed to the backing store must “make sense”
• Dynamic mounting
̶ Efficient/clean mount-unmount behavior
̶ One Object Store mapping to multiple Namenodes
9
Outline
• Use cases
• Mounting remote stores in HDFS
• Demo
1. Backup from on-prem HDFS cluster to Azure Blob Store
2. Spin up an ephemeral HDFS cluster on Azure
• Types of mounts
• Reads in Tiered HDFS
• Writes in Tiered HDFS
10
Demo summary
11
Azure blob storage
Hadoop cluster on Azure
On-prem HDFS
Backup to Azure Storage
(-setStoragePolicy PROVIDED)
Generate FSImage
FSImage
/user/hadoop/workloads/
wasb://container@storageAccount
/backup/user/hadoop/workloads/
/user/hadoop/workloads/
Outline
• Use cases
• Mounting remote stores in HDFS
• Demo
1. Backup from on-prem HDFS cluster to Azure Blob Store
2. Spin up an ephemeral HDFS cluster on Azure
• Types of mounts
• Reads in Tiered HDFS
• Writes in Tiered HDFS
12
Types of mounts
• Ephemeral mounts
̶ Access data in remote store using HDFS (Use Case I)
̶ <source>: remoteFS://remote/path
̶ <dest>: hdfs://local/path
̶ Changes are bi-directional
• Backup mounts
̶ Backup data from HDFS to remote store (Use Case II)
̶ <source>: hdfs://local/path
̶ <dest>: remoteFS://remote/path
̶ Changes are uni-directional
hdfs dfsadmin -mount <source> <dest> [-ephemeral|-backup]
13
APP
HDFS
APP
HDFS
Ephemeral
mount
Backup
mount
Reads in ephemeral mounts
Remote namespace remoteFS://
… …
… …
/
a b c
e f g
d
Remote store
mount
Client
read(/d/e)
read(/c/d/e)
(file data)
(file data)
DN1 DN2
HDFS cluster
NN
… …
d
e f g
14
Enabled using the PROVIDED Storage Type
• Peer to RAM, SSD, DISK in HDFS (HDFS-2832)
• Data in remote store mapped to HDFS blocks
on PROVIDED storage
̶ Each block associated with BlockAlias = (REF, nonce)
• Nonce used to detect changes on external store
• REF = (file URI, offset, length); nonce = GUID
• REF= (s3a://bucket/file, 0, 1024); nonce = <ETag>
̶ Mapping stored in a AliasMap
• Can use a KV store which is external to or in the NN
• PROVIDEDVolume on Datanodes reads/writes
data from/to remote store
DN1
Remote store
DN2
BlockManager
/𝑎/𝑓𝑜𝑜 → 𝑏𝑖, … , 𝑏𝑗
𝑏𝑖 → {𝑠1, 𝑠2, 𝑠3}
/𝑟𝑒𝑚𝑜𝑡𝑒/𝑏𝑎𝑟
→ 𝑏 𝑘, … , 𝑏𝑙
𝑏 𝑘 → {𝑠 𝑃𝑅𝑂𝑉𝐼𝐷𝐸𝐷}
FSNamesystem
NN
AliasMap
𝑏 𝑘→ 𝐴𝑙𝑖𝑎𝑠 𝑘
…
RAM_DISK SSD DISK PROVIDED
15
Example: Using an immutable cloud store
• Create FSImage and AliasMap
̶ Block StoragePolicy can be set as required
̶ E.g.: {rep=2, PROVIDED, DISK }
FSImage
AliasMap
/𝑑/𝑒 → {𝑏1, 𝑏2, … }
/d/f/z1 → {𝑏𝑖, 𝑏𝑖+1, … }
…
𝑏𝑖 → {rep = 1, PROVIDED}
…
𝑏𝑖 → { 𝑟𝑒𝑚𝑜𝑡𝑒://c/d/f/z1, 0, 𝐿 , inodeId1}
𝑏𝑖+1 → { 𝑟𝑒𝑚𝑜𝑡𝑒://c/d/f/z1, 𝐿, 2𝐿 , inodeId1}
…
Remote namespace remoteFS://
… …
… …
/
a b c
e f g
d
Remote store
16
Example: Using an immutable cloud store
• Start NN with the FSImage
• All blocks reachable when a DN with PROVIDED storage heartbeats in
… …
d
e f g
NN
BlockManager
DN1 DN2
… …
… …
/
a b c
e f g
d
FSImage
AliasMap
17
Remote namespace remoteFS://
Example: Using an immutable cloud store
• DN uses BlockAlias to read
from external store
̶ Data can be cached locally as it
is read (read-through cache)
… …
d
e f g
NN
BlockManager
DFSClient
getBlockLocation
(“/d/f/z1”, 0, L)
return LocatedBlocks
{{DN2, 𝑏𝑖, PROVIDED}}
Remote store
lookup(𝑏𝑖)
FSImage
AliasMap
18
open(“remote:///c/d/f/z1/”, GUID1)
… …
… …
/
a b c
e f g
d
Remote namespace remoteFS://
DN1 DN2
Writes in ephemeral mounts
• Metadata operations
̶ create(), mkdir(), chown etc.
̶ Synchronous on remote store
̶ For FileSystems: Namenode performs operation on remote store first
̶ For Blob Stores: metadata operations need not be propagated
• Example: Clients directly accessing S3 do not support notion of directories
• Data operations
̶ One of the Datanodes in the write pipeline writes to remote store
̶ BlockAlias passed in write pipeline
19
APP
HDFS
DN3DN1 DN2
DFSClient Remote store
Alias (Alias)
Writes in Backup mounts
• Daemon on Namenode backs up metadata/data in the mount
• Delegate work to Datanodes (similar to SPS [HDFS-10285])
• Backup of data based on remote store capabilities
̶ For FileSystems: Write block by block
̶ For blob stores: multi-part upload to upload blocks in parallel
20
APP
HDFS
DN2
Coordinator DN
Remote store
DN1
Writes in Backup mounts
• Daemon on Namenode backs up metadata/data in the mount
• Delegate work to Datanodes (similar to SPS [HDFS-10285])
• Backup of data based on remote store capabilities
̶ For FileSystems: Write block by block
̶ For blob stores: multi-part upload to upload blocks in parallel
• Use snapshots to maintain a consistent view
̶ Backup a particular snapshot
̶ Backup changes from previous snapshot
21
APP
HDFS
Assumptions
• Churn is rare and relatively predictable
̶ Analytic workloads, ETL into external/cloud storage, compute in cluster
• Clusters are either consumers/producers for a subtree/region
̶ FileSystem API has too little information to resolve conflicts
Ingest
ETL
Raw Data Bucket
Analytic Results
Bucket
Analytics
22
Conflict resolution
• Conflicts occur when remote store is directly modified
• Detected
̶ On read operations: e.g., using open-by-nonce operation
̶ On write operations: e.g., file to be created is already present
• Pluggable policy to resolve conflicts
̶ “HDFS wins”
̶ “Remote store wins”
̶ Rename files under conflict
23
Status
• Read-only ephemeral mounts
̶ HDFS-9806 branch on Apache Hadoop
• Backup mounts
̶ Prototype available (available on github)
• Next:
̶ Writes in ephemeral mounts
̶ Conflict resolution
̶ Create mounts in a running Namenode
24
Resources + Q&A
• HDFS Tiered Storage HDFS-9806
̶ Design documentation
̶ List of subtasks, lots of linked tickets – take one!
̶ Discussion of scope, implementation, and feedback
• Joint work Microsoft – Western Digital
̶ {thomas.demoor, ewan.higgs}@wdc.om
̶ {cdoug,vijala}@microsoft.com
25
Backup slides
26
Benefits of the PROVIDED design
• Use existing HDFS features to enforce quotas, limits on storage tiers
̶ Simpler implementation, no mismatch between HDFS invariants and framework
• Supports different types of back-end storages
̶ org.apache.hadoop.FileSystem, blob stores, etc.
• Credentials hidden from client
̶ Only NN and DNs require credentials of external store
̶ HDFS can be used to enforce access controls for remote store
• Enables several policies to improve performance
̶ Set replication in FSImage to pre-fetch
̶ Read-through cache
̶ Actively pre-fetch while cluster is running
27

Weitere ähnliche Inhalte

Was ist angesagt?

Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDataWorks Summit
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSDataWorks Summit
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemDataWorks Summit/Hadoop Summit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsDataWorks Summit/Hadoop Summit
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalDataWorks Summit
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseDataWorks Summit
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowDataWorks Summit/Hadoop Summit
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on DockerDataWorks Summit
 

Was ist angesagt? (20)

Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposal
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive Warehouse
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 

Ähnlich wie HDFS Tiered Storage: Mounting Object Stores in HDFS

Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
 
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFSHadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFSErik Krogen
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature DataWorks Summit
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaData Con LA
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSDataWorks Summit
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapakapa rohit
 
Red Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed_Hat_Storage
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentFarzad Nozarian
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeAdam Kawa
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости HadoopPositive Hack Days
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCeph Community
 

Ähnlich wie HDFS Tiered Storage: Mounting Object Stores in HDFS (20)

Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFSHadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Interacting with hdfs
Interacting with hdfsInteracting with hdfs
Interacting with hdfs
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jha
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Red Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open Stack
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости Hadoop
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at Last
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

HDFS Tiered Storage: Mounting Object Stores in HDFS

  • 1. HDFS Tiered Storage Thomas Demoor (Western Digital) Virajith Jalaparti (Microsoft)
  • 2. >id Thomas Demoor • PO/Architect @ Western Digital • S3-compatible object storage • Hadoop: ̶ S3a optimizations • Fast uploader (stream from mem) • Hadoop2/YARN support • Coming up: object-store committer ̶ HDFS Tiered Storage Virajith Jalaparti • Scientist @ Microsoft CISL • Hadoop ̶ HDFS Tiered Storage 2
  • 3. Overview • HDFS Tiered Storage ̶ Mount and manage remote stores through HDFS • Earlier talks ̶ Hadoop Summit ‘16, San Jose ̶ Dataworks Summit ‘17, Munich • This talk ̶ Introduce Tiered Storage in HDFS (design, read path,…) ̶ Focus on progress since earlier talks (mounting in HDFS, write path,…) ̶ Demo 3 REMOTE STORE APP HADOOP CLUSTER HDFS
  • 4. Use Case I: Ephemeral Hadoop Clusters • EMR on S3, HDInsight over WASB, … • Several workarounds used today ̶ DistCp ̶ Use only remote storage ̶ Explicitly manage local and cloud storage • Goal: Seamlessly use local and remote (cloud) stores as one instance of HDFS ̶ Retrieve data to local cluster on-demand ̶ Use local storage to cache data 4 Data in cloud store (e.g., S3, WASB) Hadoop clusterHadoop cluster read/write read/write
  • 5. Use Case II: Backup data to object stores • Business value of Hadoop + Object Storage: ̶ Data retention: very high fault tolerance (erasure coding) ̶ Economics: cheap storage for cold data ̶ Business continuity planning: backup, migrate, … • Public Clouds: Microsoft Azure, AWS S3, GCS, … • Private Clouds: WD ActiveScale Object Storage ̶ S3-compatible object storage system ̶ Linear scalability in # racks, objects, throughput ̶ Entry level (100’s TB) – Scale out (5PB+/rack) ̶ http://www.hgst.com/products/systems
  • 6. • Today: Hadoop Compatible FileSystems (s3a://, wasb://) ̶ Direct IO between Hadoop apps and object store ̶ Scalable & Resilient: outsourcing NameNode functions • Compatible does not mean identical ̶ Most are not even FileSystems (notion of directories, append, …) ̶ No Data Locality: less performant for hot/real-time data ̶ Hadoop admin tools require HDFS: permissions/quota/security/… ̶ Workaround: explicitly manage local HDFS and remote cloud storage • Goal: integrate better with HDFS ̶ Data-locality for hot data + object storage for cold data ̶ Offer familiar HDFS admin abstractions Use Case II: Backup data to object stores APP HADOOP CLUSTER READWRITE
  • 7. Solution: “Mount” remote storage in HDFS • Use HDFS to manage remote storage ̶ HDFS coordinates reads/writes to remote store ̶ Mount remote store as a PROVIDED tier in HDFS • Details later in the talk ̶ Set StoragePolicy to move data between the tiers 7 … … / a b HDFS Namespace … … … / d e f Remote Namespace Mount remote namespace c d e f Mount point REMOTE STORE APP HADOOP CLUSTER WRITE THROUGH LOAD ON-DEMAND HDFS READWRITE READWRITE
  • 8. Solution: “Mount” remote storage in HDFS • Use HDFS to manage remote storage ̶ HDFS coordinates reads/writes to remote store ̶ Mount remote store as a PROVIDED tier in HDFS • Details later in the talk ̶ Set StoragePolicy to move data between the tiers • Benefits ̶ Transparent to users/applications ̶ Provides unified namespace ̶ Can extend HDFS support for quotas, security etc. ̶ Enables caching/prefetching 8 REMOTE STORE APP HADOOP CLUSTER HDFS
  • 9. Challenges • Synchronize metadata without copying data ̶ Dynamically page in “blocks” on demand ̶ Define policies to prefetch and evict local replicas • Mirror changes in remote namespace ̶ Handle out-of-band churn in remote storage ̶ Avoid dropping valid, cached data (e.g., rename) • Handle writes consistently ̶ Writes committed to the backing store must “make sense” • Dynamic mounting ̶ Efficient/clean mount-unmount behavior ̶ One Object Store mapping to multiple Namenodes 9
  • 10. Outline • Use cases • Mounting remote stores in HDFS • Demo 1. Backup from on-prem HDFS cluster to Azure Blob Store 2. Spin up an ephemeral HDFS cluster on Azure • Types of mounts • Reads in Tiered HDFS • Writes in Tiered HDFS 10
  • 11. Demo summary 11 Azure blob storage Hadoop cluster on Azure On-prem HDFS Backup to Azure Storage (-setStoragePolicy PROVIDED) Generate FSImage FSImage /user/hadoop/workloads/ wasb://container@storageAccount /backup/user/hadoop/workloads/ /user/hadoop/workloads/
  • 12. Outline • Use cases • Mounting remote stores in HDFS • Demo 1. Backup from on-prem HDFS cluster to Azure Blob Store 2. Spin up an ephemeral HDFS cluster on Azure • Types of mounts • Reads in Tiered HDFS • Writes in Tiered HDFS 12
  • 13. Types of mounts • Ephemeral mounts ̶ Access data in remote store using HDFS (Use Case I) ̶ <source>: remoteFS://remote/path ̶ <dest>: hdfs://local/path ̶ Changes are bi-directional • Backup mounts ̶ Backup data from HDFS to remote store (Use Case II) ̶ <source>: hdfs://local/path ̶ <dest>: remoteFS://remote/path ̶ Changes are uni-directional hdfs dfsadmin -mount <source> <dest> [-ephemeral|-backup] 13 APP HDFS APP HDFS Ephemeral mount Backup mount
  • 14. Reads in ephemeral mounts Remote namespace remoteFS:// … … … … / a b c e f g d Remote store mount Client read(/d/e) read(/c/d/e) (file data) (file data) DN1 DN2 HDFS cluster NN … … d e f g 14
  • 15. Enabled using the PROVIDED Storage Type • Peer to RAM, SSD, DISK in HDFS (HDFS-2832) • Data in remote store mapped to HDFS blocks on PROVIDED storage ̶ Each block associated with BlockAlias = (REF, nonce) • Nonce used to detect changes on external store • REF = (file URI, offset, length); nonce = GUID • REF= (s3a://bucket/file, 0, 1024); nonce = <ETag> ̶ Mapping stored in a AliasMap • Can use a KV store which is external to or in the NN • PROVIDEDVolume on Datanodes reads/writes data from/to remote store DN1 Remote store DN2 BlockManager /𝑎/𝑓𝑜𝑜 → 𝑏𝑖, … , 𝑏𝑗 𝑏𝑖 → {𝑠1, 𝑠2, 𝑠3} /𝑟𝑒𝑚𝑜𝑡𝑒/𝑏𝑎𝑟 → 𝑏 𝑘, … , 𝑏𝑙 𝑏 𝑘 → {𝑠 𝑃𝑅𝑂𝑉𝐼𝐷𝐸𝐷} FSNamesystem NN AliasMap 𝑏 𝑘→ 𝐴𝑙𝑖𝑎𝑠 𝑘 … RAM_DISK SSD DISK PROVIDED 15
  • 16. Example: Using an immutable cloud store • Create FSImage and AliasMap ̶ Block StoragePolicy can be set as required ̶ E.g.: {rep=2, PROVIDED, DISK } FSImage AliasMap /𝑑/𝑒 → {𝑏1, 𝑏2, … } /d/f/z1 → {𝑏𝑖, 𝑏𝑖+1, … } … 𝑏𝑖 → {rep = 1, PROVIDED} … 𝑏𝑖 → { 𝑟𝑒𝑚𝑜𝑡𝑒://c/d/f/z1, 0, 𝐿 , inodeId1} 𝑏𝑖+1 → { 𝑟𝑒𝑚𝑜𝑡𝑒://c/d/f/z1, 𝐿, 2𝐿 , inodeId1} … Remote namespace remoteFS:// … … … … / a b c e f g d Remote store 16
  • 17. Example: Using an immutable cloud store • Start NN with the FSImage • All blocks reachable when a DN with PROVIDED storage heartbeats in … … d e f g NN BlockManager DN1 DN2 … … … … / a b c e f g d FSImage AliasMap 17 Remote namespace remoteFS://
  • 18. Example: Using an immutable cloud store • DN uses BlockAlias to read from external store ̶ Data can be cached locally as it is read (read-through cache) … … d e f g NN BlockManager DFSClient getBlockLocation (“/d/f/z1”, 0, L) return LocatedBlocks {{DN2, 𝑏𝑖, PROVIDED}} Remote store lookup(𝑏𝑖) FSImage AliasMap 18 open(“remote:///c/d/f/z1/”, GUID1) … … … … / a b c e f g d Remote namespace remoteFS:// DN1 DN2
  • 19. Writes in ephemeral mounts • Metadata operations ̶ create(), mkdir(), chown etc. ̶ Synchronous on remote store ̶ For FileSystems: Namenode performs operation on remote store first ̶ For Blob Stores: metadata operations need not be propagated • Example: Clients directly accessing S3 do not support notion of directories • Data operations ̶ One of the Datanodes in the write pipeline writes to remote store ̶ BlockAlias passed in write pipeline 19 APP HDFS DN3DN1 DN2 DFSClient Remote store Alias (Alias)
  • 20. Writes in Backup mounts • Daemon on Namenode backs up metadata/data in the mount • Delegate work to Datanodes (similar to SPS [HDFS-10285]) • Backup of data based on remote store capabilities ̶ For FileSystems: Write block by block ̶ For blob stores: multi-part upload to upload blocks in parallel 20 APP HDFS DN2 Coordinator DN Remote store DN1
  • 21. Writes in Backup mounts • Daemon on Namenode backs up metadata/data in the mount • Delegate work to Datanodes (similar to SPS [HDFS-10285]) • Backup of data based on remote store capabilities ̶ For FileSystems: Write block by block ̶ For blob stores: multi-part upload to upload blocks in parallel • Use snapshots to maintain a consistent view ̶ Backup a particular snapshot ̶ Backup changes from previous snapshot 21 APP HDFS
  • 22. Assumptions • Churn is rare and relatively predictable ̶ Analytic workloads, ETL into external/cloud storage, compute in cluster • Clusters are either consumers/producers for a subtree/region ̶ FileSystem API has too little information to resolve conflicts Ingest ETL Raw Data Bucket Analytic Results Bucket Analytics 22
  • 23. Conflict resolution • Conflicts occur when remote store is directly modified • Detected ̶ On read operations: e.g., using open-by-nonce operation ̶ On write operations: e.g., file to be created is already present • Pluggable policy to resolve conflicts ̶ “HDFS wins” ̶ “Remote store wins” ̶ Rename files under conflict 23
  • 24. Status • Read-only ephemeral mounts ̶ HDFS-9806 branch on Apache Hadoop • Backup mounts ̶ Prototype available (available on github) • Next: ̶ Writes in ephemeral mounts ̶ Conflict resolution ̶ Create mounts in a running Namenode 24
  • 25. Resources + Q&A • HDFS Tiered Storage HDFS-9806 ̶ Design documentation ̶ List of subtasks, lots of linked tickets – take one! ̶ Discussion of scope, implementation, and feedback • Joint work Microsoft – Western Digital ̶ {thomas.demoor, ewan.higgs}@wdc.om ̶ {cdoug,vijala}@microsoft.com 25
  • 27. Benefits of the PROVIDED design • Use existing HDFS features to enforce quotas, limits on storage tiers ̶ Simpler implementation, no mismatch between HDFS invariants and framework • Supports different types of back-end storages ̶ org.apache.hadoop.FileSystem, blob stores, etc. • Credentials hidden from client ̶ Only NN and DNs require credentials of external store ̶ HDFS can be used to enforce access controls for remote store • Enables several policies to improve performance ̶ Set replication in FSImage to pre-fetch ̶ Read-through cache ̶ Actively pre-fetch while cluster is running 27

Hinweis der Redaktion

  1. Welcome. Thanks for coming. We’re discussing a proposal for implementing tiering in HDFS, building on its support for heterogeneous storage.
  2. Thomas… Hi, I am Virajith, and I am currently working as a Scientist at the Microsoft Cloud Information and Services Lab, where I started the project on HDFS Tiered Storage about a year ago with Chris Douglas.
  3. • Started the effort almost a year ago now •Chris and virajith posted a design doc, ewan and I were trying to solve the same problem so we joined forces •In those talks, we dicussed the design. Today, we will of course reintroduce that •But we want to focus on the progress we’ve made on mounting and the write path. •And there is a demo. No tricks!
  4. •Ephemeral aka short-living Hadoop clusters •EMR, HDInsight, whatever custom env you have with k8s, … •Persistent data lives in  remote store outside of Hadoop cluster •Need to load in data at start-up and backup data before cluster is spun down •Several workarouds: distcp, sacrificing perf by using remote only or explicitly managing both remote and local in the app •To adress this use case the goal is for our proposed solution to present a single HDFS instance that abstracts away the underlying topology and retrieves/stores data on demand    •Local storagecan be seen as a temporary cache •
  5. •Every year at Hadoop Summit interacting with public cloud object stores gains more attention •Why would one want to use object stores with Hadoop •data is stored very efficiently at low cost •enables lots of data movement workflows •Some people have / need private cloud(scale, compliance, …) •Install an object store into your DC next to your on-prem Hadoop cluster and get high performance We happen to make one of these,there are others as well. 
  6. •The proposed solution allows mounting remote stores into HDFS •This can be another HDFS cluster or object stores or … •Mounting is a well-known abstraction for any (storage) admin  •We leverage HDFS Heterogeneous Storag by adding a new type PROVIDED •Data can be moved by setting the StoragePolicy: <PROVIDED> or <SSD, PROVIDED> 
  7. •Main benefits is that we transparently abstract away the underlying storage. •The user/app does not know whether data is local or not, HDFS handles this completely, offering a unified namespace and all the regular HDFS admin tools “just work” • Furthermore, there are nteresting caching / load on-demand opportunitites •
  8. There are a few challenges. These can be broadly be grouped into the read path and the write path. In the read path, we're mostly focused on caching and synchronizing changes to the object storage. In the write path, we're concerned with writing new blocks and dynamically mounting object stores. We consider this phase 2.
  9. Before we go into the technical details of how we make all this work, let’s look at a demo. In this demo, I will show you how we can backup data in an on-prem HDFS cluster to Azure blob store, and once the data is backed up, show that we can spin up HDFS clusters in Azure that can consume this data. So, we will illustrate the ability to both write and read data to remote stores. [Show local cluster HDFS page] Here we have an on-prem HDFS cluster, which [show directories in UI] contains two directories under /user/hadoop. [show hdfs-site.xml] For backup to work, we specify the backup path in the hdfs configuration file. In this case, it is this URL in azure blob store. [start running the setStoragePolicy command] Now suppose you want to back up the workloads directory. For this, in the current prototype, we just set the storage policy of the directory to be PROVIDED, and use the –scheduleBlockMoves flag to start the storage policy satisfier. We built this prototype on top the SPS work from Intel that is happening in HDFS-10285. [run the command] now once the command is run, lets go to the Azure portal and to verify that we see it. [show that we directory appears][switch between HDFS web page and azure webpage] Now we can see that all the files under the workloads directory is backed up to azure. Now suppose we want to backup another directory. Let’s do it for the bin directory under YCSB. [run back up command for YCSB]. [go back to azure and show that we have it backed up] See, we now have the YCSB/bin directory backed up. It is as simple as this. Just set the storage policy and data will be backed up to the configured location. Now let’s see how we can mount this data in azure blob store on a cluster in Azure. I have already started up a few VMs on Azure that will serve as the hosts for HDFS. [start creating FSImage] first for the mount to work, we have to create an FSImage to describe what files are stored on the blob store. [show blocks.csv] This creates a block map, which I will describe later – it essentially is a mapping from block ids to the paths on the remote store. For this demo, we just use a text file but this can be in a KV store. [run command] now let’s start HDFS on the cloud. [Show the UI of the hdfs cluster] Here is the webpage for HDFS running on azure. [show the URL] this machine is on azure. [now go to the file browser] Here we can see that the data all appears in the cluster on the cloud. Now, isn’t that cool 
  10. So, what have we seen this in this demo? We started off with an on-prem cluster, -> we were able to back up data to azure blob store by setting the storage policy on the data. -> Then we created an FSImage of this backup to describe what the blob store contains. -> And finally we were able to start up an HDFS cluster on Azure which reads this FSImage and mounts the data in the blob store. -> So, a particular location on the on-prem cluster is -> mapped to a corresponding blob on the blob store, -> and is eventually accessible on the cluster in the cloud.
  11. Now let’s go into the technical details on how all this works in HDFS.
  12. We define two kinds of mounts in our work for the 2 kinds of use cases we aim to address. The first are ephemeral mounts where we use HDFS to access data in remote stores. Here the source of the data is the remote store, and the mount destination is in HDFS. -> The change propagation is bi-directional. So, any changes in the remote store are propagated to HDFS, and any changes in HDFS are propagated to the remote store. -> The 2nd kind of mounts we define are backup mounts. -> These are used to backup data in HDFS to remote stores. So, the source here is HDFS and the destination is the remote store. -> The change propogation is one-directional here – only changes in HDFS are transferred to the remote store. We define two kinds of mounts to simplify how we reason about the semantics of the mounts. Another option is to define merge mounts where we merge the contents of the source and destination – however, the semantics of such mounts can get complicated
  13. Now let’s look at how these mounts work in practice. To start off, I will talk about how reads work in ephemeral mounts. -> Suppose, this is the part of the remote namespace we want to -> mount in HDFS. If the mount is successful, we should be able to access data in the cloud through HDFS. That is -> if a client comes and requests for a particular file, say /d/e, from HDFS, then HDFS should be -> able to read the file from the external store, -> get back the data from the external store and -> stream the data back to the client. In this work, we enable this----
  14. For this, we introduce a new storage type called Provided which will be a peer to existing storage types. The Provided storage type is used to refer to data in the remote store. -> So, Datanodes can now support 4 kinds of storage types. -> Data in the remote store is mapped to HDFS blocks on provided storage. So, in HDFS today, The NN is partitioned into a namespace (FSNamesystem) that maps files to a sequence of block IDs, and the BlockManager, which is responsible for the block lifecycle management and maintaining the locations of the blocks of any file. In this example, file /a/foo is mapped to blocks with ids bi to bj. Each block ID is mapped to a list of replicas resident on a storage attached to a datanode. For example, here we have block b_i mapped to storages s1, s2 and s3. -> As HDFS understands blocks, for files in the provided storage, we use a similar mapping. So, a file /remote/bar is mapped to blocks bk to b_l, and each of these blocks is mapped to a provided storage. -> However, this is not sufficient to locate the data in the remote store. We need to have some mapping between these blocks and how data is laid out in the remote store. For this, every block in “provided” storage are mapped to an alias. An alias is simply a tuple: a reference is something resolvable in the namespace of the remote store, and the nonce to verify that the reference still locates the data matching that block. For example, if the remote store is another FileSystem, then my reference may be a URI, offset, length. and the nonce can be a GUID like an inode or fileID. If the remote store is a blob store like s3, the REF can be a blob name, offset and length, and the nonce can be a ETAG id. -> We also maintain an AliasMap which contains the mapping between the block ids and their aliases. This can be in the NN or be an external KV store. -> Finally, we have provided volumes in Datanodes which are used to read and write data from the external store. The provided volume is essentially implements a client that is capable to talking to the external store. Summary: the AliasMap helps us map HDFS metadata to metadata on the remote store, and the PROVIDED storage type helps HDFS understand that the data is actually remote.
  15. Let’s drill down into an example and walk through how the ephemeral mounts would work. Assume we want to mount this remote subtree in HDFS. -> For this we generate two things– the FSImage, and the AliasMap. -> The FSImage is a mirror of the metadata. Every file in this image is partitioned it a sequence of blocks. The image contains only the block IDs and storage policy for each block. Along with the FSImage, we also generate the AliasMap -> which stores for each block id, the block Alias on the external store. Each Alias points to the file on the remote store the block references, the offset of the block, and the length of the block, and a nonce (inodeId, LMT) sufficient to detect inconsistency.
  16. The FSImage and AliasMap can now be used to start up a HDFS Namenode. If we set the replication to be > 1, we can load the data into the cluster before any clients read it. -> When a DN configured with provided storage volume reports in, the NN assumes that all blocks in the AliasMap can be reachable through this Datanode, and it marks all the blocks as available. There are no individual block reports for provided blocks.
  17. -> So, when a client calls getBlockLocations() for a provided file z1, -> the blockmanager resolves the composite DN to a -> physical DN that is configured with a provided volume. The DN can be chosen based on a pluggable policy For example, we can resolve the location to the closest DN to the client. -> Now, when the client goes to the DN to read the provided block, the DN know only that the block is provided but doesn’t have the block local to it. So -> it goes to the alias map to the resolve the block id to the Alias on the remote store, -> The DN uses this Alias to open the corresponding file on the remote store, reads the file, and passes the data along to the client, -> because the block is read through the DN, we can also cache the data as a local block.
  18. I will next briefly talk about how writes work with provided storage. First lets look at ephemeral mounts. When the remote store for an ephemeral mount is a FileSystem, the metadata operations are first performed in the remote store by the NN then performed locally. This ensures that if the remote operation fails, the NN can fail the client without having to revert back any local state. For remote stores that are blob stores or that do not support metadata information such as permissions or directories, metadata operations need not be propagated to the remote store. For operations that involve writing to files on the remote store, we plug into the existing write pipeline in HDFS. The BlockAlias is passed along to the DN that writes the Provided replica, The DN the uses the Information in the Alias to figure out where to write in the remote store. Any failures in writing to the remote store can be recovered similar to failures in the existing write pipeline.
  19. As opposed to ephemeral mounts, where write operations will be initiated by the client, for back up mounts, writes should happen without any continuous user/client interaction. So, for this, we have a daemon in the NN that backs up the data in the mount. Whenever any subtree is set up for backup, the backup daemon goes over all the files in this directory and backs them up. It delegates the work of backing up individual files to datanodes similar to how SPS works which is what we used for the prototype in our demo. The backup can happen based on the capabilities of the remote store. For FSes…., For blob stores, …. As we backup, the files in HDFS might change. To maintain a consistent view on the remote store, we use snapshots in HDFS. When a backup is initiated, we take a snapshot of the subtree that is being backed up, and copy back the metadata and data of that snapshot. During this time the subtree would have evolved. Once the first snapshot has been copied, we take a 2nd snapshot, figure out that deltas between the snapshots and copy these deltas over. We continue doing this, moving from snapshot to snapshot until the backup is unmounted.
  20. As opposed to ephemeral mounts, where write operations will be initiated by the client, for back up mounts, writes should happen without any continuous user/client interaction. So, for this, we have a daemon in the NN that backs up the data in the mount. Whenever any subtree is set up for backup, the backup daemon goes over all the files in this directory and backs them up. It delegates the work of backing up individual files to datanodes similar to how SPS works which is what we used for the prototype in our demo. The backup can happen based on the capabilities of the remote store. For FSes…., For blob stores, …. As we backup, the files in HDFS might change. To maintain a consistent view on the remote store, we use snapshots in HDFS. When a backup is initiated, we take a snapshot of the subtree that is being backed up, and copy back the metadata and data of that snapshot. During this time the subtree would have evolved. Once the first snapshot has been copied, we take a 2nd snapshot, figure out that deltas between the snapshots and copy these deltas over. We continue doing this, moving from snapshot to snapshot until the backup is unmounted.
  21. In this work, we try to mount stores without expecting any additional APIs than those supported by the FileSystem API. Without additional support from the remote stores, these APIs are generally not sufficient to have a HDFS and the remote store stay in tight synchronization. Even if we mount the remote store as read-only, we can only get eventual consistency without support from the remote store. However, in general we provide workable semantics for big data workloads. In most scenarios we target, churn is relatively rare, and is generally predictable. For example, most data ingest happens in year/month/day/hour layouts and is mostly additive. Because of this, we can have some simple heuristics that help resolve inconsistencies between the remote store and HDFS. We also assume that clusters are either producers or consumers of data. If clusters both produce and consume data, then we might run into conflicts. And in most cases, we do not enough information across multiple storage systems to resolve such conflicts Fundamentally: no magic, here but we try to provide a tractable solution that covers most common cases, and deployments.
  22. Please join us. We have a design document posted to JIRA, an active discussion of the implementation choices, and we’ll be starting a branch to host these changes. The existing work on READ_ONLY_SHARED replicas has a superlative design doc, if you want to contribute but need some orientation in the internal details. We have a few minutes for questions, but please find us after the talk. There are far more details than we can possibly cover in a single presentation and we’re still setting the design, so we’re very open to collaboration. Thanks, and... let’s take a couple questions.
  23. There are a few points worth calling out, here. * First, this is a relatively small change to HDFS. The only client-visible change adds a new storage type. As a user, this is simpler than coordinating with copying jobs. In our cloud example, all the cluster’s data is immediately available once it’s in the namespace, even if the replication policy hasn’t prefetched data into local media. * Second, particularly for read-only mounts, this is a narrow API to implement. For cloud backup scenarios- where the NN is the only writer to the namespace- then we only need the block to object ID map and NN metadata to mount a prefix/snapshot of the cluster. In our example the cloud credentials are hidden from the client. S3/WAS both authenticate clients to containers using a single key. Because HDFS owns and protects the external store’s credentials, the client only accesses data permitted by HDFS. Generally, we can use features of HDFS that aren’t directly supported by the backing store if we can define the mapping. Finally, because the client reads through the DN, it can cache a copy of the block on read. Pointedly, the NN can direct the client to any DN that should cache a copy on read, opening some interesting combinations of placement policies and read-through caching. The DN isn’t necessarily the closest to the client, but it may follow another objective function or replication policy.