SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
HDFS: What is New in Hadoop 2
Sze Tsz-Wo Nicholas
施子和
December 6, 2013

© Hortonworks Inc. 2013

Page 1
About Me
• 施子和 Sze Tsz-Wo Nicholas, Ph.D.
– Software Engineer at Hortonworks
– PMC Member at Apache Hadoop
– One of the most active contributors/committers of HDFS
• Started in 2007

– Used Hadoop to compute Pi at the two-quadrillionth (2x1015th) bit
• It is the current World Record.

= 3.141592654…

– Received Ph.D. from the University of Maryland, College Park
• Discovered a novel square root algorithm over finite field.

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 2
Agenda
• New HDFS features in Hadoop-2
– New appendable write-pipeline
– Multiple Namenode Federation
– Namenode HA
– File System Snapshots

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 3
We have been hard at work…
• Progress is being made in many areas
– Scalability
– Performance
– Enterprise features
– Ongoing operability improvements
– Enhancements for other projects in the ecosystem

– Expand Hadoop ecosystem to more platforms and use cases

• 2192 commits in Hadoop in the last year
– Almost a million lines of changes
– ~150 contributors
– Lot of new contributors - ~80 with < 3 patches

• 350K lines of changes in HDFS and common
Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 4
Building on Rock-solid Foundation
• Original design choices - simple and robust
– Single Namenode metadata server – all state in memory
– Fault Tolerance: multiple replicas, active monitoring
– Storage: Rely on OS’s file system not raw disk

• Reliability
– Over 7 9’s of data reliability, less than 0.38 failures across 25 clusters

• Operability
– Small teams can manage large clusters
• An operator per 3K node cluster

– Fast Time to repair on node or disk failure
• Minutes to an hour Vs. RAID array repairs taking many long hours

• Scalable - proven by large scale deployments not bits
– > 100 PB storage, > 400 million files, > 4500 nodes in a single cluster
– ~ 100 K nodes of HDFS in deployment and use
Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 5
New Appendable
Write-Pipeline

Architecting the Future of Big Data
© Hortonworks Inc. 2011

Page 6
HDFS Write Pipeline
• The write pipeline has been improved dramatically
–
–
–
–

Better durability
Better visibility
Consistency guarantees
Appendable
data

Writer

data

DN1
ack

Architecting the Future of Big Data
© Hortonworks Inc. 2013

data

DN2
ack

DN3
ack

Page 7
New Feature in Write Pipeline
• Earlier versions of HDFS
– Files were immutable
– Write-once-read-many model

• New features in Hadoop 2
–
–
–
–

Files can be reopened for append
New primitives: hflush and hsync
Read consistency
Replace datanode on failure

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 8
HDFS hflush and hsync
• Java flush (or C++ fflush)
– forces any buffered output bytes to be written out.

• HDFS hflush
– Flush data to all the datanodes in the write pipeline
– Guarantees the data is visible for reading
– The data may be in datanodes’ memory

• HDFS sync
– Hfush with local file system sync
– May also update the file length in Namenode

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 9
Read Consistency
• A reader may read data during write
– It can read from any datanode in the pipeline
– and then failover to any other datanode to read the same data
data

Writer

ack

data

DN1

ack

data

DN2

ack

DN3

read

read

Reader

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 10
In the past …
• When a datanode fails, the pipeline is reconstructed with
data
the remain datanodes
ack

data

Writer

DN1

DN2

DN3

ack

• When another datanode fails, only one datanode remains!
data

Writer

DN1

DN2

DN3

ack

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 11
Replace Datanode on Failure
• Add new datanodes to the pipeline
data

ack
data

Writer

data

DN1

DN2

DN3

ack

DN4
ack

• User clients may choose the replacement policy
– Performance vs data reliability

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 12
Multiple Namenode
Federation

Architecting the Future of Big Data
© Hortonworks Inc. 2011

Page 13
Namespace

HDFS Architecture
Persistent Namespace
Metadata & Journal

Hierarchal Namespace
File Name  BlockIDs

Namespace
State
Namenode

Block
Map

Block ID  Block Locations

Block Storage

Heartbeats & Block Reports

b2

b1

b3

b1

b3

b5

b3

Datanodes

b2

b5

b1

b2

b5

Block ID  Data
JBOD

JBOD

JBOD

JBOD

Horizontally Scale IO and Storage
Architecting the Future of Big Data
© Hortonworks Inc. 2011

14
Page 14
Single Namenode Limitations
• Namespace size is limited by the namenode memory size
– 64GB memory can support ~100m files and blocks
– Solution: Federation

• Single point of failure (SPOF)
– The service is down when the namenode is down
– Solution: HA

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 15
Federation Cluster
• Multiple namenodes and namespace volumes in a cluster
–
–
–
–

The namenodes/namespaces are independent
Scalability by adding more namenodes/namespaces
Isolation – separating applications to their own namespaces
Client side mount tables/ViewFS for integrated views

• Block Storage as generic storage service
– Datanodes store blocks in block pools for all namespaces

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 16
Namespace

Multiple Namenode Federation
Foreign
NS n

NS k

NS1

...

Pool 1

Block Storage

NN-n

NN-k

NN-1

...

Pool k

Pool n

Block Pools

DN 1

..

DN 2

..

DN m

..

Common Storage
Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 17
Namenode HA

Architecting the Future of Big Data
© Hortonworks Inc. 2011

Page 18
High Availability – No SPOF
• Support standby namenode and failover
– Planned downtime
– Unplanned downtime

• Release 1.1
– Cold standby
• Require reconstructing in-memory data structures during failure-over

– Uses NFS as shared storage
– Standard HA frameworks as failover controller
• Linux HA and VMWare VSphere

– Suitable for small clusters up to 500 nodes

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 19
Hadoop Full Stack HA

Slave Nodes of Hadoop Cluster

jo
b

jo
b

jo
b

jo
b

jo
b

Apps
Running
Outside

Failover
JT into Safemode

NN

JT

Server

Server

NN

Server

HA Cluster for Master Daemons
Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 20
High Availability – Release 2.0
• Support for Hot Standby
– The standby namenode maintains in-memory data structures

• Supports manual and automatic failover
• Automatic failover with Failover Controller
– Active NN election and failure detection using ZooKeeper
– Periodic NN health check
– Failover on NN failure

• Removed shared storage dependency
– Quorum Journal Manager
• 3 to 5 Journal Nodes for storing editlog
• Edit must be written to quorum number of Journal Nodes

• Replay cache for correctness & transparent failovers
Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 21
Namenode HA in Hadoop 2
ZK

Heartbeat

ZK

ZK

Heartbeat

FailoverController
Active

FailoverController
Standby

Cmds

Monitor Health
of NN. OS, HW

JN

NN
Active

JN

JN

Shared NN state
through Quorum
of JournalNodes

NN
Standby

Monitor Health
of NN. OS, HW

Block Reports to Active & Standby
DN fencing: only obey commands
from active

DN

DN

DN

DN

Namenode HA has no external dependency
Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 22
File System Snapshots

Architecting the Future of Big Data
© Hortonworks Inc. 2011

Page 23
Before Snapshots…
• Deleted files cannot be restored
– Trash is buggy and not well understood
– Trash works only for CLI based deletion

• No point-in-time recovery

• No periodic snapshots to restore from
– No admin/user managed snapshots

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 24
HDFS Snapshot

Point-in-time image of the file system
Read-only
Copy-on-write

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 25
Use Cases

Protection against user errors
Backup
Experimental/Test setups

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 26
Example: Periodic Snapshots for Backup
• A typical snapshot policy:
Take a snapshot in
– every 15 mins and
– every 1 hr,
– every 1 day,
– every 1 week,
– every 1 month,

Architecting the Future of Big Data
© Hortonworks Inc. 2013

keep it for 24 hrs
keep 2 days
keep 14 days
keep 3 months
keep 1 year

Page 27
Design Goal: Efficiency
• Storage efficiency
– No block data copying
– No metadata copying for unmodified files

• Processing efficiency
– No additional costs for processing current data

• Cheap snapshot creation
– Must be fast and lightweight
– Must support for a very large number of snapshots

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 28
Design Goal: Features
• Read-only
– Files and directories in a snapshot are immutable
– Nothing can be added to or removed from directories

• Hierarchical snapshots
– Snapshots of the entire namespace
– Snapshots of subtrees

• User operation
– Users can take snapshots for their data
– Admins manage where users can take snapshots

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 29
HDFS-2802: Snapshot Development
• Available in Hadoop 2 GA release (v2.2.0)
• Community-driven
– Special thanks to who have provided for the valuable discussion
and feedback on the feature requirements and the open questions

• 136 subtask JIRAs
– Mainly contributed by Hortonworks

• The merge patch has about 28k lines
• ~8 months of development

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 30
Namenode Only Operation
• No complicated distributed mechanism
• Snapshot metadata stored in Namenode
• Datanodes have no knowledge of snapshots

• Block management layer also don’t know about
snapshots

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 31
Fast Snapshot Creation
• Snapshot Creation: O(1)
– It just adds a record to an inode

/
d
1

f1

Architecting the Future of Big Data
© Hortonworks Inc. 2013

d
2

f2

S1

f3

Page 32
Low Memory Overhead
• NameNode memory usage: O(M)
– M is the number of modified files/directories
– Additional memory is used only when modifications are made
relative to a snapshot

/
d
1
f1

d
2
f4

Architecting the Future of Big Data
© Hortonworks Inc. 2013

f2

S1

Modifications:
1. rm f3
2. add f4

f3

Page 33
File Blocks Sharing
• Blocks in datanodes are not copied
– The snapshot files record the block list and the file size
– No data copying

/
d

blk0
Architecting the Future of Big Data
© Hortonworks Inc. 2013

S1

f'

f’’

S2

f

blk1

blk2

blk3
Page 34
Persistent Data Structures
• A well-known data structure for “time travel”
– Support querying previous version of the data

• Access slow down
– The additional time required for the data structure

• In traditional persistent data structures
– There is slow down on accessing current data and snapshot data

• In our implementation
– No slow down on accessing current data
– Slow down happens only on accessing snapshot data

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 35
No Slow Down on Accessing Current Data
• The current data can be accessed directly
– Modifications are recorded in reverse chronological order

Snapshot data = Current data – Modifications

/
~ modifications

d
1
f1

d
2
f4

f2

S1

Modifications:
1. rm f3
2. add f4

f3
f2

Architecting the Future of Big Data
© Hortonworks Inc. 2013

d
2
f3
Page 36
Easy Management
• Snapshots can be taken on any directory
– Set the directory to be snapshottable

• Support 65,536 simultaneous snapshots
• No limit on the number of snapshottable directories
– Nested snapshottable directories are currently NOT allowed

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 37
Admin Ops
• Allow snapshots on a directory
– hdfs dfsadmin –allowSnapshot <path>

• Reset a snapshottable directory
– hdfs dfsadmin –disallowSnapshot <path>

• Example

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 38
User Ops
• Create/delete/rename snapshots
– hdfs dfs -createSnapshot <path> [<snapshotName>]
– hdfs dfs –deleteSnapshot <path> <snapshotName>
– hdfs dfs –renameSnapshot <path> <oldName> <newName>

• Get snapshottable directory listing
– hdfs lsSnapshottableDir

• Get snapshots difference report
– hdfs snapshotDiff <path> <from> <to>

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 39
Use snapshot paths in CLI
• All regular commands and APIs can be used against
snapshot path
– /<snapshottableDir>/.snapshot/<snapshotName>/foo/bar

• List all the files in a snapshot
– ls /test/.snapshot/s4

• List all the snapshots under that path
– ls <path>/.snapshot

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 40
Test Snapshot Functionalities
• ~100 unit tests
• ~1.4 million generated system tests
– Covering most combination of (snapshot + rename) operations

• Automated long-running tests for months

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 41
NFS Support
and Other Features

Architecting the Future of Big Data
© Hortonworks Inc. 2011

Page 42
NFS Support
• NFS Gateway provides NFS access to HDFS
– File browsing, Data download/upload, Data streaming
– No client-side library
– Better alternative to Hadoop + Fuse based solution
• Better consistency guarantees

• Supports NFSv3
• Stateless Gateway
– Simpler design, easy to handle failures

• Future work
– High Availability for NFS Gateway
– NFSv4 support?

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 43
Other Features
• Protobuf, wire compatibility
– Post 2.0 GA stronger wire compatibility

• Rolling upgrades
– With relaxed version checks

• Improvements for other projects
– Stale node to improve HBase MTTR

• Block placement enhancements
– Better support for other topologies such as VMs and Cloud

• On the wire encryption
– Both data and RPC

• Expanding ecosystem, platforms and applicability
– Native support for Windows

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 44
Enterprise Readiness
• Storage fault-tolerance – built into HDFS 
– 100% data reliability

• High Availability 
• Standard Interfaces 
– WebHDFS(REST), Fuse, NFS, HttpFs, libwebhdfs and libhdfs

• Wire protocol compatibility 
– Protocol buffers

• Rolling upgrades 

• Snapshots 
• Disaster Recovery 
– Distcp for parallel and incremental copies across cluster
– Apache Ambari and HDP for automated management
Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 45
Work in Progress
• HDFS-2832: Heterogeneous storages
– Datanode abstraction from single storage to collection of storages
– Support different storage types: Disk and SSD

• HDFS-5535: Zero download rolling upgrade
– Namenodes and Datanodes can be upgraded independently
– No upgrade downtime

• HDFS-4685: ACLs
– More flexible than user-group-permission

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 46
Future Works
• HDFS-5477: Block manager as a service
– Move block management out from Namenode
– Support different name service, e.g. key-value store

• HDFS-3154: Immutable files
– Write-once and then read-only

• HDFS-4704: Transient files
– Tmp files will not be recorded in snapshots

Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 47
Q&A
• Myths and misinformation of HDFS
–
–
–
–
–

Not reliable (was never true)
Namenode dies, all state is lost (was never true)
Does not support disaster recovery (distcp in Hadoop0.15)
Hard to operate for new comers
Performance improvements (always ongoing)
• Major improvements in 1.2 and 2.x

– Namenode is a single point of failure
– Needs shared NFS storage for HA
– Does not have point in time recovery

Thank You!
Architecting the Future of Big Data
© Hortonworks Inc. 2013

Page 48

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldUwe Printz
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoBig Data Joe™ Rossi
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaDataWorks Summit
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance DataWorks Summit/Hadoop Summit
 
Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014Kevin Crocker
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmDataWorks Summit
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoopabord
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Manish Chopra
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recoverySandeep Singh
 
Disaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Disaster Recovery in the Hadoop Ecosystem: Preparing for the ImprobableDisaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Disaster Recovery in the Hadoop Ecosystem: Preparing for the ImprobableStefan Kupstaitis-Dunkler
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 

Was ist angesagt? (20)

Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in Alibaba
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
 
Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
 
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ TwitterCross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoop
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recovery
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Disaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Disaster Recovery in the Hadoop Ecosystem: Preparing for the ImprobableDisaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Disaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 

Andere mochten auch

袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战hdhappy001
 
薛伟:腾讯广点通——大数据之上的实时精准推荐
薛伟:腾讯广点通——大数据之上的实时精准推荐薛伟:腾讯广点通——大数据之上的实时精准推荐
薛伟:腾讯广点通——大数据之上的实时精准推荐hdhappy001
 
Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍
Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍
Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍Sijia Lyu
 
刘书良:基于大数据公共云平台的Dsp技术
刘书良:基于大数据公共云平台的Dsp技术刘书良:基于大数据公共云平台的Dsp技术
刘书良:基于大数据公共云平台的Dsp技术hdhappy001
 
徐萌:中国移动大数据应用实践
徐萌:中国移动大数据应用实践徐萌:中国移动大数据应用实践
徐萌:中国移动大数据应用实践hdhappy001
 
詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systemshdhappy001
 
翟艳堂:腾讯大规模Hadoop集群实践
翟艳堂:腾讯大规模Hadoop集群实践翟艳堂:腾讯大规模Hadoop集群实践
翟艳堂:腾讯大规模Hadoop集群实践hdhappy001
 
Capital onehadoopintro
Capital onehadoopintroCapital onehadoopintro
Capital onehadoopintroDoug Chang
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsTrendProgContest13
 

Andere mochten auch (11)

袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战
 
薛伟:腾讯广点通——大数据之上的实时精准推荐
薛伟:腾讯广点通——大数据之上的实时精准推荐薛伟:腾讯广点通——大数据之上的实时精准推荐
薛伟:腾讯广点通——大数据之上的实时精准推荐
 
Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍
Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍
Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍
 
刘书良:基于大数据公共云平台的Dsp技术
刘书良:基于大数据公共云平台的Dsp技术刘书良:基于大数据公共云平台的Dsp技术
刘书良:基于大数据公共云平台的Dsp技术
 
徐萌:中国移动大数据应用实践
徐萌:中国移动大数据应用实践徐萌:中国移动大数据应用实践
徐萌:中国移动大数据应用实践
 
詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems
 
翟艳堂:腾讯大规模Hadoop集群实践
翟艳堂:腾讯大规模Hadoop集群实践翟艳堂:腾讯大规模Hadoop集群实践
翟艳堂:腾讯大规模Hadoop集群实践
 
Zh tw cloud computing era
Zh tw cloud computing eraZh tw cloud computing era
Zh tw cloud computing era
 
Capital onehadoopintro
Capital onehadoopintroCapital onehadoopintro
Capital onehadoopintro
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 

Ähnlich wie Nicholas:hdfs what is new in hadoop 2

Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryChris Nauroth
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Red_Hat_Storage
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Etu Solution
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopMike Pittaro
 
Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory StorageDataWorks Summit
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
 

Ähnlich wie Nicholas:hdfs what is new in hadoop 2 (20)

Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and Memory
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
 
Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory Storage
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 

Mehr von hdhappy001

俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkabanhdhappy001
 
杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务hdhappy001
 
肖永红:科研数据应用和共享方面的实践
肖永红:科研数据应用和共享方面的实践肖永红:科研数据应用和共享方面的实践
肖永红:科研数据应用和共享方面的实践hdhappy001
 
肖康:Storm在实时网络攻击检测和分析的应用与改进
肖康:Storm在实时网络攻击检测和分析的应用与改进肖康:Storm在实时网络攻击检测和分析的应用与改进
肖康:Storm在实时网络攻击检测和分析的应用与改进hdhappy001
 
夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架hdhappy001
 
魏凯:大数据商业利用的政策管制问题
魏凯:大数据商业利用的政策管制问题魏凯:大数据商业利用的政策管制问题
魏凯:大数据商业利用的政策管制问题hdhappy001
 
王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎hdhappy001
 
王峰:阿里搜索实时流计算技术
王峰:阿里搜索实时流计算技术王峰:阿里搜索实时流计算技术
王峰:阿里搜索实时流计算技术hdhappy001
 
钱卫宁:在线社交媒体分析型查询基准评测初探
钱卫宁:在线社交媒体分析型查询基准评测初探钱卫宁:在线社交媒体分析型查询基准评测初探
钱卫宁:在线社交媒体分析型查询基准评测初探hdhappy001
 
穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scalehdhappy001
 
罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群hdhappy001
 
刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sql刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sqlhdhappy001
 
刘昌钰:阿里大数据应用平台
刘昌钰:阿里大数据应用平台刘昌钰:阿里大数据应用平台
刘昌钰:阿里大数据应用平台hdhappy001
 
李战怀:大数据背景下分布式系统的数据一致性策略
李战怀:大数据背景下分布式系统的数据一致性策略李战怀:大数据背景下分布式系统的数据一致性策略
李战怀:大数据背景下分布式系统的数据一致性策略hdhappy001
 
冯宏华:H base在小米的应用与扩展
冯宏华:H base在小米的应用与扩展冯宏华:H base在小米的应用与扩展
冯宏华:H base在小米的应用与扩展hdhappy001
 
堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensionshdhappy001
 
陈跃国:Sql on-hadoop结构化大数据分析系统性能评测
陈跃国:Sql on-hadoop结构化大数据分析系统性能评测陈跃国:Sql on-hadoop结构化大数据分析系统性能评测
陈跃国:Sql on-hadoop结构化大数据分析系统性能评测hdhappy001
 
查礼 -大数据技术如何用于传统信息系统
查礼 -大数据技术如何用于传统信息系统查礼 -大数据技术如何用于传统信息系统
查礼 -大数据技术如何用于传统信息系统hdhappy001
 
Ted yu:h base and hoya
Ted yu:h base and hoyaTed yu:h base and hoya
Ted yu:h base and hoyahdhappy001
 
Raghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarksRaghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarkshdhappy001
 

Mehr von hdhappy001 (20)

俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban
 
杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务
 
肖永红:科研数据应用和共享方面的实践
肖永红:科研数据应用和共享方面的实践肖永红:科研数据应用和共享方面的实践
肖永红:科研数据应用和共享方面的实践
 
肖康:Storm在实时网络攻击检测和分析的应用与改进
肖康:Storm在实时网络攻击检测和分析的应用与改进肖康:Storm在实时网络攻击检测和分析的应用与改进
肖康:Storm在实时网络攻击检测和分析的应用与改进
 
夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架
 
魏凯:大数据商业利用的政策管制问题
魏凯:大数据商业利用的政策管制问题魏凯:大数据商业利用的政策管制问题
魏凯:大数据商业利用的政策管制问题
 
王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎
 
王峰:阿里搜索实时流计算技术
王峰:阿里搜索实时流计算技术王峰:阿里搜索实时流计算技术
王峰:阿里搜索实时流计算技术
 
钱卫宁:在线社交媒体分析型查询基准评测初探
钱卫宁:在线社交媒体分析型查询基准评测初探钱卫宁:在线社交媒体分析型查询基准评测初探
钱卫宁:在线社交媒体分析型查询基准评测初探
 
穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale
 
罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群
 
刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sql刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sql
 
刘昌钰:阿里大数据应用平台
刘昌钰:阿里大数据应用平台刘昌钰:阿里大数据应用平台
刘昌钰:阿里大数据应用平台
 
李战怀:大数据背景下分布式系统的数据一致性策略
李战怀:大数据背景下分布式系统的数据一致性策略李战怀:大数据背景下分布式系统的数据一致性策略
李战怀:大数据背景下分布式系统的数据一致性策略
 
冯宏华:H base在小米的应用与扩展
冯宏华:H base在小米的应用与扩展冯宏华:H base在小米的应用与扩展
冯宏华:H base在小米的应用与扩展
 
堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions
 
陈跃国:Sql on-hadoop结构化大数据分析系统性能评测
陈跃国:Sql on-hadoop结构化大数据分析系统性能评测陈跃国:Sql on-hadoop结构化大数据分析系统性能评测
陈跃国:Sql on-hadoop结构化大数据分析系统性能评测
 
查礼 -大数据技术如何用于传统信息系统
查礼 -大数据技术如何用于传统信息系统查礼 -大数据技术如何用于传统信息系统
查礼 -大数据技术如何用于传统信息系统
 
Ted yu:h base and hoya
Ted yu:h base and hoyaTed yu:h base and hoya
Ted yu:h base and hoya
 
Raghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarksRaghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarks
 

Kürzlich hochgeladen

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Nicholas:hdfs what is new in hadoop 2

  • 1. HDFS: What is New in Hadoop 2 Sze Tsz-Wo Nicholas 施子和 December 6, 2013 © Hortonworks Inc. 2013 Page 1
  • 2. About Me • 施子和 Sze Tsz-Wo Nicholas, Ph.D. – Software Engineer at Hortonworks – PMC Member at Apache Hadoop – One of the most active contributors/committers of HDFS • Started in 2007 – Used Hadoop to compute Pi at the two-quadrillionth (2x1015th) bit • It is the current World Record. = 3.141592654… – Received Ph.D. from the University of Maryland, College Park • Discovered a novel square root algorithm over finite field. Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 2
  • 3. Agenda • New HDFS features in Hadoop-2 – New appendable write-pipeline – Multiple Namenode Federation – Namenode HA – File System Snapshots Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 3
  • 4. We have been hard at work… • Progress is being made in many areas – Scalability – Performance – Enterprise features – Ongoing operability improvements – Enhancements for other projects in the ecosystem – Expand Hadoop ecosystem to more platforms and use cases • 2192 commits in Hadoop in the last year – Almost a million lines of changes – ~150 contributors – Lot of new contributors - ~80 with < 3 patches • 350K lines of changes in HDFS and common Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 4
  • 5. Building on Rock-solid Foundation • Original design choices - simple and robust – Single Namenode metadata server – all state in memory – Fault Tolerance: multiple replicas, active monitoring – Storage: Rely on OS’s file system not raw disk • Reliability – Over 7 9’s of data reliability, less than 0.38 failures across 25 clusters • Operability – Small teams can manage large clusters • An operator per 3K node cluster – Fast Time to repair on node or disk failure • Minutes to an hour Vs. RAID array repairs taking many long hours • Scalable - proven by large scale deployments not bits – > 100 PB storage, > 400 million files, > 4500 nodes in a single cluster – ~ 100 K nodes of HDFS in deployment and use Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 5
  • 6. New Appendable Write-Pipeline Architecting the Future of Big Data © Hortonworks Inc. 2011 Page 6
  • 7. HDFS Write Pipeline • The write pipeline has been improved dramatically – – – – Better durability Better visibility Consistency guarantees Appendable data Writer data DN1 ack Architecting the Future of Big Data © Hortonworks Inc. 2013 data DN2 ack DN3 ack Page 7
  • 8. New Feature in Write Pipeline • Earlier versions of HDFS – Files were immutable – Write-once-read-many model • New features in Hadoop 2 – – – – Files can be reopened for append New primitives: hflush and hsync Read consistency Replace datanode on failure Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 8
  • 9. HDFS hflush and hsync • Java flush (or C++ fflush) – forces any buffered output bytes to be written out. • HDFS hflush – Flush data to all the datanodes in the write pipeline – Guarantees the data is visible for reading – The data may be in datanodes’ memory • HDFS sync – Hfush with local file system sync – May also update the file length in Namenode Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 9
  • 10. Read Consistency • A reader may read data during write – It can read from any datanode in the pipeline – and then failover to any other datanode to read the same data data Writer ack data DN1 ack data DN2 ack DN3 read read Reader Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 10
  • 11. In the past … • When a datanode fails, the pipeline is reconstructed with data the remain datanodes ack data Writer DN1 DN2 DN3 ack • When another datanode fails, only one datanode remains! data Writer DN1 DN2 DN3 ack Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 11
  • 12. Replace Datanode on Failure • Add new datanodes to the pipeline data ack data Writer data DN1 DN2 DN3 ack DN4 ack • User clients may choose the replacement policy – Performance vs data reliability Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 12
  • 13. Multiple Namenode Federation Architecting the Future of Big Data © Hortonworks Inc. 2011 Page 13
  • 14. Namespace HDFS Architecture Persistent Namespace Metadata & Journal Hierarchal Namespace File Name  BlockIDs Namespace State Namenode Block Map Block ID  Block Locations Block Storage Heartbeats & Block Reports b2 b1 b3 b1 b3 b5 b3 Datanodes b2 b5 b1 b2 b5 Block ID  Data JBOD JBOD JBOD JBOD Horizontally Scale IO and Storage Architecting the Future of Big Data © Hortonworks Inc. 2011 14 Page 14
  • 15. Single Namenode Limitations • Namespace size is limited by the namenode memory size – 64GB memory can support ~100m files and blocks – Solution: Federation • Single point of failure (SPOF) – The service is down when the namenode is down – Solution: HA Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 15
  • 16. Federation Cluster • Multiple namenodes and namespace volumes in a cluster – – – – The namenodes/namespaces are independent Scalability by adding more namenodes/namespaces Isolation – separating applications to their own namespaces Client side mount tables/ViewFS for integrated views • Block Storage as generic storage service – Datanodes store blocks in block pools for all namespaces Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 16
  • 17. Namespace Multiple Namenode Federation Foreign NS n NS k NS1 ... Pool 1 Block Storage NN-n NN-k NN-1 ... Pool k Pool n Block Pools DN 1 .. DN 2 .. DN m .. Common Storage Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 17
  • 18. Namenode HA Architecting the Future of Big Data © Hortonworks Inc. 2011 Page 18
  • 19. High Availability – No SPOF • Support standby namenode and failover – Planned downtime – Unplanned downtime • Release 1.1 – Cold standby • Require reconstructing in-memory data structures during failure-over – Uses NFS as shared storage – Standard HA frameworks as failover controller • Linux HA and VMWare VSphere – Suitable for small clusters up to 500 nodes Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 19
  • 20. Hadoop Full Stack HA Slave Nodes of Hadoop Cluster jo b jo b jo b jo b jo b Apps Running Outside Failover JT into Safemode NN JT Server Server NN Server HA Cluster for Master Daemons Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 20
  • 21. High Availability – Release 2.0 • Support for Hot Standby – The standby namenode maintains in-memory data structures • Supports manual and automatic failover • Automatic failover with Failover Controller – Active NN election and failure detection using ZooKeeper – Periodic NN health check – Failover on NN failure • Removed shared storage dependency – Quorum Journal Manager • 3 to 5 Journal Nodes for storing editlog • Edit must be written to quorum number of Journal Nodes • Replay cache for correctness & transparent failovers Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 21
  • 22. Namenode HA in Hadoop 2 ZK Heartbeat ZK ZK Heartbeat FailoverController Active FailoverController Standby Cmds Monitor Health of NN. OS, HW JN NN Active JN JN Shared NN state through Quorum of JournalNodes NN Standby Monitor Health of NN. OS, HW Block Reports to Active & Standby DN fencing: only obey commands from active DN DN DN DN Namenode HA has no external dependency Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 22
  • 23. File System Snapshots Architecting the Future of Big Data © Hortonworks Inc. 2011 Page 23
  • 24. Before Snapshots… • Deleted files cannot be restored – Trash is buggy and not well understood – Trash works only for CLI based deletion • No point-in-time recovery • No periodic snapshots to restore from – No admin/user managed snapshots Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 24
  • 25. HDFS Snapshot Point-in-time image of the file system Read-only Copy-on-write Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 25
  • 26. Use Cases Protection against user errors Backup Experimental/Test setups Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 26
  • 27. Example: Periodic Snapshots for Backup • A typical snapshot policy: Take a snapshot in – every 15 mins and – every 1 hr, – every 1 day, – every 1 week, – every 1 month, Architecting the Future of Big Data © Hortonworks Inc. 2013 keep it for 24 hrs keep 2 days keep 14 days keep 3 months keep 1 year Page 27
  • 28. Design Goal: Efficiency • Storage efficiency – No block data copying – No metadata copying for unmodified files • Processing efficiency – No additional costs for processing current data • Cheap snapshot creation – Must be fast and lightweight – Must support for a very large number of snapshots Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 28
  • 29. Design Goal: Features • Read-only – Files and directories in a snapshot are immutable – Nothing can be added to or removed from directories • Hierarchical snapshots – Snapshots of the entire namespace – Snapshots of subtrees • User operation – Users can take snapshots for their data – Admins manage where users can take snapshots Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 29
  • 30. HDFS-2802: Snapshot Development • Available in Hadoop 2 GA release (v2.2.0) • Community-driven – Special thanks to who have provided for the valuable discussion and feedback on the feature requirements and the open questions • 136 subtask JIRAs – Mainly contributed by Hortonworks • The merge patch has about 28k lines • ~8 months of development Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 30
  • 31. Namenode Only Operation • No complicated distributed mechanism • Snapshot metadata stored in Namenode • Datanodes have no knowledge of snapshots • Block management layer also don’t know about snapshots Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 31
  • 32. Fast Snapshot Creation • Snapshot Creation: O(1) – It just adds a record to an inode / d 1 f1 Architecting the Future of Big Data © Hortonworks Inc. 2013 d 2 f2 S1 f3 Page 32
  • 33. Low Memory Overhead • NameNode memory usage: O(M) – M is the number of modified files/directories – Additional memory is used only when modifications are made relative to a snapshot / d 1 f1 d 2 f4 Architecting the Future of Big Data © Hortonworks Inc. 2013 f2 S1 Modifications: 1. rm f3 2. add f4 f3 Page 33
  • 34. File Blocks Sharing • Blocks in datanodes are not copied – The snapshot files record the block list and the file size – No data copying / d blk0 Architecting the Future of Big Data © Hortonworks Inc. 2013 S1 f' f’’ S2 f blk1 blk2 blk3 Page 34
  • 35. Persistent Data Structures • A well-known data structure for “time travel” – Support querying previous version of the data • Access slow down – The additional time required for the data structure • In traditional persistent data structures – There is slow down on accessing current data and snapshot data • In our implementation – No slow down on accessing current data – Slow down happens only on accessing snapshot data Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 35
  • 36. No Slow Down on Accessing Current Data • The current data can be accessed directly – Modifications are recorded in reverse chronological order Snapshot data = Current data – Modifications / ~ modifications d 1 f1 d 2 f4 f2 S1 Modifications: 1. rm f3 2. add f4 f3 f2 Architecting the Future of Big Data © Hortonworks Inc. 2013 d 2 f3 Page 36
  • 37. Easy Management • Snapshots can be taken on any directory – Set the directory to be snapshottable • Support 65,536 simultaneous snapshots • No limit on the number of snapshottable directories – Nested snapshottable directories are currently NOT allowed Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 37
  • 38. Admin Ops • Allow snapshots on a directory – hdfs dfsadmin –allowSnapshot <path> • Reset a snapshottable directory – hdfs dfsadmin –disallowSnapshot <path> • Example Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 38
  • 39. User Ops • Create/delete/rename snapshots – hdfs dfs -createSnapshot <path> [<snapshotName>] – hdfs dfs –deleteSnapshot <path> <snapshotName> – hdfs dfs –renameSnapshot <path> <oldName> <newName> • Get snapshottable directory listing – hdfs lsSnapshottableDir • Get snapshots difference report – hdfs snapshotDiff <path> <from> <to> Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 39
  • 40. Use snapshot paths in CLI • All regular commands and APIs can be used against snapshot path – /<snapshottableDir>/.snapshot/<snapshotName>/foo/bar • List all the files in a snapshot – ls /test/.snapshot/s4 • List all the snapshots under that path – ls <path>/.snapshot Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 40
  • 41. Test Snapshot Functionalities • ~100 unit tests • ~1.4 million generated system tests – Covering most combination of (snapshot + rename) operations • Automated long-running tests for months Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 41
  • 42. NFS Support and Other Features Architecting the Future of Big Data © Hortonworks Inc. 2011 Page 42
  • 43. NFS Support • NFS Gateway provides NFS access to HDFS – File browsing, Data download/upload, Data streaming – No client-side library – Better alternative to Hadoop + Fuse based solution • Better consistency guarantees • Supports NFSv3 • Stateless Gateway – Simpler design, easy to handle failures • Future work – High Availability for NFS Gateway – NFSv4 support? Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 43
  • 44. Other Features • Protobuf, wire compatibility – Post 2.0 GA stronger wire compatibility • Rolling upgrades – With relaxed version checks • Improvements for other projects – Stale node to improve HBase MTTR • Block placement enhancements – Better support for other topologies such as VMs and Cloud • On the wire encryption – Both data and RPC • Expanding ecosystem, platforms and applicability – Native support for Windows Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 44
  • 45. Enterprise Readiness • Storage fault-tolerance – built into HDFS  – 100% data reliability • High Availability  • Standard Interfaces  – WebHDFS(REST), Fuse, NFS, HttpFs, libwebhdfs and libhdfs • Wire protocol compatibility  – Protocol buffers • Rolling upgrades  • Snapshots  • Disaster Recovery  – Distcp for parallel and incremental copies across cluster – Apache Ambari and HDP for automated management Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 45
  • 46. Work in Progress • HDFS-2832: Heterogeneous storages – Datanode abstraction from single storage to collection of storages – Support different storage types: Disk and SSD • HDFS-5535: Zero download rolling upgrade – Namenodes and Datanodes can be upgraded independently – No upgrade downtime • HDFS-4685: ACLs – More flexible than user-group-permission Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 46
  • 47. Future Works • HDFS-5477: Block manager as a service – Move block management out from Namenode – Support different name service, e.g. key-value store • HDFS-3154: Immutable files – Write-once and then read-only • HDFS-4704: Transient files – Tmp files will not be recorded in snapshots Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 47
  • 48. Q&A • Myths and misinformation of HDFS – – – – – Not reliable (was never true) Namenode dies, all state is lost (was never true) Does not support disaster recovery (distcp in Hadoop0.15) Hard to operate for new comers Performance improvements (always ongoing) • Major improvements in 1.2 and 2.x – Namenode is a single point of failure – Needs shared NFS storage for HA – Does not have point in time recovery Thank You! Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 48