SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Non-Stop Hadoop
Applying Paxos to make critical Hadoop
services Continuously Available
Jagane Sundar - CTO, WANdisco
Brett Rudenstein – Senior Product Manager, WANdisco
WANdisco Background
 WANdisco: Wide Area Network Distributed Computing
 Enterprise ready, high availability software solutions that enable globally distributed organizations to meet today’s
data challenges of secure storage, scalability and availability
 Leader in tools for software engineers – Subversion
 Apache Software Foundation sponsor
 Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND)
 US patented active-active replication technology granted, November 2012
 Global locations
- San Ramon (CA)
- Chengdu (China)
- Tokyo (Japan)
- Boston (MA)
- Sheffield (UK)
- Belfast (UK)
Customers
Recap of
Server Software Architecture
Elementary Server Software:
Single thread processing client requests in a loop
Server Process
make change to state (db)
OP OP OP OP
get client request e.g. hbase put send return value to client
Multi-threaded Server Software:
Multiple threads processing client requests in a loop
Server Process
make change to state (db)
get client request e.g. hbase put send return value to client
OP OP OP OP
OP
OP
OP OPOP OP
OP
OP
thread 1
thread 3
thread 2
thread 1
thread 2
thread 3
acquire lock release lock
Continuously Available Servers
Multiple Servers replicated and serving the same content
server1
Server Process
server2
Server Process
server3
Server Process
Problem
 How do we ensure that the three servers contain exactly the same data?
 In other words, how do you achieve strong consistency replication?
Two parts to the solution:
 (Multiple Replicas of) A Deterministic State Machine
 The exact same sequence of operations, to be applied to each replica of the
DSM
A Deterministic State Machine
 A state machine where a specific operation will always result in a deterministic
state
 Non deterministic factors cannot play a role in the end state of any operation
in a DSM. Examples of non-deterministic factors
- Time
- Random
Creating three replicated servers
Apply all modify operations in the same exact sequence in each replicated server =
Multiple Servers with exactly the same replicated data
server1
Server Process
(DSM)
server2
Server Process
(DSM)
server3
Server Process
(DSM)
O
P
O
P
O
P
O
P
O
P
O
P
O
P
O
P
O
P
O
P
O
P
O
P
Problem:
 How to achieve consensus between these servers as to the sequence of operations to
perform?
 Paxos is the answer
- Algorithm for reaching consensus in a network of unreliable processors
Three replicated servers
server3
Server Process
OP OP OP OP
Distributed
Coordination Engine
server2
Server Process
Distributed
Coordination Engine
OP OP OP OP
server
1
Server Process
OP OP OP OP
Distributed
Coordination Engine Paxos
Client Client
ClientClient
Client
Paxos
OP
OPOP
OP
Paxos Primer
Paxos
 Paxos is an Algorithm for building Replicated Servers with strong consistency
1. Synod algorithm for achieving consensus among a network of unreliable processes
2. The application of consensus to the task of replicating a Deterministic State Machine
 Paxos does not
- Specify a network protocol
- Invent a new language
- Restrict use in a specific language
Replicated State Machine
 Installed on each node that participates in the distributed system
 All nodes function as peers to deliver and assure the same transaction order occurs on
every system
- Achieve Consistent Replication
 Consensus
- Roles
• Proposers, Acceptors, Learners
- Phases
• Election of a node to be the proposer
• Broadcast of the proposal to peers
• Acceptance of the proposal for majority
Paxos Roles
 Proposer
- The client or a proxy for the client
- Proposes a change to the Deterministic State Machine
 Acceptor
- Acceptors are the ‘memory’ of paxos
- Quorum is established amongst acceptors
 Learner
- The DSM (Replicated, of course)
- Each Learner applies the exact same sequence of operations as proposed by the Proposers,
and accepted by a majority quorum of Acceptors
Paxos - Ordering
 Proposers issue a new sequence number of a higher value from the last sequence
known
 A majority agrees this number has not been seen
 Consensus must be reached on the current proposal
WANdisco DConE
Beyond Paxos
DConE Innovations
Beyond Paxos
 Quorum Configurations
- Majority, Singleton, Unanimous
- Distinguished Node – Tie Breaker
- Quorum Rotations – follow the sun
- Emergency Reconfigure
 Concurrent agreement handling
- Paxos only allows agreements on one proposal at a time
• Slow performance in a high transaction volume environment
- DConE allows simultaneous proposals from multiple proposers
DConE Innovations
Beyond Paxos
 Dynamic group evolution
- Add and remove nodes
- Add and remove sites
- No interruption of current operations
 Distributed garbage collection
- Safely discard state on disk and in memory when it is no longer required to assist in recovery
- Messages are sent to peers pre-defined intervals to determine the highest common agreement
- All agreements and agreed proposals are deleted
DConE Innovations
Beyond Paxos
 Backoff and collision avoidance
- Avoids repeated pre-emption of proposers by their peers
- Prevents thrashing which can severely degrade performance.
- When a round is pre-empted, a backoff delay is computed
Self Healing
Automatic Back up and Recovery
 All nodes are mirrors/replicas of each other
- Any node can be used as a helper to bring it back
 Read access without Quorum
- Cluster is still accessible for reads
- No writes prevent split brain
 Automatic catch up
- Servers that have been offline, learn of transactions that were agreed on while it was
unavailable
- The missing transactions are played back and one caught up become fully participating
members of the distributed system again
 Servers can be updated without down time
- Allows for rolling upgrades
‘Co-ordinate intent, not the outcome’
- Yeturu Aahlad
Active-Active, not Active-Standby
Co-ordinating intent
Proposal to
mkdir /a
P
a
x
o
s
server2
server
1
Proposal to
createFile /a
createFile /a
createFile /a
mkdir /a
mkdir /a
Op fails
Op fails
mkdir /a
Co-ordinate
outcome
(WAL, HDFS
Edits Log,
etc.)
server2
server
1
createFile /a
server1 state is wrong
mkdir /a operation needs to be undone
Co-ordinating outcome
HDFS
 Recap
26
HDFS Architecture
 HDFS metadata is decoupled from data
- Namespace is a hierarchy of files and directories represented by INodes
- INodes record attributes: permissions, quotas, timestamps, replication
 NameNode keeps its entire state in RAM
- Memory state: the namespace tree and the mapping of blocks to DataNodes
- Persistent state: recent checkpoint of the namespace and journal log
 File data is divided into blocks (default 128MB)
- Each block is independently replicated on multiple DataNodes (default 3)
- Block replicas stored on DataNodes as local files on local drives
Reliable distributed file system for storing very large data sets
27
HDFS Cluster
 Single active NameNode
 Thousands of DataNodes
 Tens of thousands of HDFS clients
Active-Standby Architecture
28
Standard HDFS operations
 Active NameNode workflow
1. Receive request from a client,
2. Apply the update to its memory state,
3. Record the update as a journal transaction in persistent storage,
4. Return result to the client
 HDFS Client (read or write to a file)
- Send request to the NameNode, receive replica locations
- Read or write data from or to DataNodes
 DataNode
- Data transfer to / from clients and between DataNodes
- Report replica state change to NameNode(s): new, deleted, corrupt
- Report its state to NameNode(s): heartbeats, block reports
29
Consensus Node
 Coordinated Replication of HDFS Namespace
30
Replicated Namespace
 Replicated NameNode is called a ConsensusNode or CNode
 ConsensusNodes play equal active role on the cluster
- Provide write and read access to the namespace
 The namespace replicas are consistent with each other
- Each CNode maintains a copy of the same namespace
- Namespace updates applied to one CNode propagated to the others
 Coordination Engine establishes the global order of namespace updates
- All CNodes apply the same deterministic updates in the same deterministic order
- Starting from the same initial state and applying the same updates = consistency
Coordination Engine provides consistency of multiple namespace replicas
31
Coordinated HDFS Cluster
 Independent CNodes – the same namespace
 Load balancing client requests
 Proposal, Agreement
 Coordinated updates
Multiple active Consensus Nodes share namespace via Coordination Engine
32
Coordinated HDFS operations
 ConsensusNode workflow
1. Receive request from a client
2. Submit proposal to update to the Coordination Engine
Wait for agreement
3. Apply the agreed update to its memory state,
4. Record the update as a journal transaction in persistent storage (optional)
5. Return result to the client
 HDFS Client and DataNode operations remain the same
Updates to the namespace when a file or a directory is created are coordinated
33
Strict Consistency Model
 Coordination Engine transforms namespace modification proposals into the global
sequence of agreements
- Applied to namespace replicas in the order of their Global Sequence Number
 ConsensusNodes may have different states at a given moment of “clock” time
- As the rate of consuming agreements may vary
 CNodes have the same namespace state when they reach the same GSN
 One-copy-equivalence
- each replica presented to the client as if it has only one copy
One-Copy-Equivalence as known in replicated databases
34
Consensus Node Proxy
 CNodeProxyProvider – a pluggable substitute of FailoverProxyProvider
- Defined via Configuration
 Main features
- Randomly chooses CNode when client is instantiated
- Sticky until a timeout occurs
- Fails over to another CNode
- Smart enough to avoid SafeMode
 Further improvements
- Take into account network proximity
Reads do not modify namespace can be directed to any ConsensusNode
35
Alternatives to a Paxos based Replicated State
Machine
Using a TCP Connection to send data to three
replicated servers (Load Balancer)
server3
Server Process
OP OP
server2
Server Process
OP OP OP OP
server
1
Server Process
OP OP OP OP
Client
OP OP OP OP
Load BalancerLoad Balancer
Problems with using a Load Balancer
 Load balancer becomes the single point of failure
- Need to make the LB highly available and distributed
 Since Paxos is not employed to reach consensus between the three replicas, strong
consistency cannot be guaranteed
- Replicas will quickly diverge
HBase WAL or HDFS Edits Log replication
 State Machine (HRegion contents, HDFS NameNode metadata, etc.) is modified first
 Modification Log (HBase WAL or HDFS Edits Log) is sent to a Highly Available shared
storage, QJM, etc.
 Standby Server(s) read edits log and serve as warm standby servers, ready to take
over should the active server fail
HBase WAL or HDFS Edits Log replication
server
1
Server Process
OP OP OP OP
server2
Server Process
Shared
Storage
Standby Server
WAL/Edits Log
Single Active Server
 Only one active server is possible
 Failover takes time
 Failover is error prone, with intricate fencing etc.
 Cost of reaching consensus needs to be paid for HDFS Edits log entry to be deemed
safely stored, so why not pay the cost before modifying the state and thereby have
multiple active servers?
HBase WAL or HDFS Edits Log tailing
HBase Continuous Availability
HBase Single Points of Failure
 HBase Region Server
 HBase Master
HBase Region Server
Replication
NonStopRegionServer:
Client Service
e.g. multi
Client Service
DConE
HRegionServer
NonStopRegionServer 1
Client Service
e.g. multi
Client Service
DConE
HRegionServer
NonStopRegionServer 2
Hbase Client 1. Client calls HRegionServer multi
2. NonStopRegionServer intercepts
3. NonStopRegionServer makes paxos
proposal using DConE library
4. Proposal comes back as agreement
on all NonStopRegionServers
5. NonStopRegionServer calls super.multi
on all nodes. State changes are recorded
6. NonStopRegionServer 1 alone sends
response back to client
Subclassing the HRegionServer
HBase RegionServer replication using
WANdisco DConE
 Shared nothing architecture
 HFiles, WALs etc. are not shared
 Replica count is tuned
 Snapshots of HFiles do not need to be created
 Messy details of WAL tailing are not necessary
HBase RegionServer replication using
WANdisco DConE
 Not an eventual consistency model
 Does not serve up stale data
/ page 48
DEMO
DEMO
Thank you
Jagane Sundar
jagane.sundar@wandisco.com
@jagane

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataWANdisco Plc
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Doug O'Flaherty
 
Geek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionGeek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionIDERA Software
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadooplarsgeorge
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File Systemtutchiio
 
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...xKinAnx
 
Application layer
Application layerApplication layer
Application layerNeha Kurale
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersRan Ziv
 
Geographically Distributed PostgreSQL
Geographically Distributed PostgreSQLGeographically Distributed PostgreSQL
Geographically Distributed PostgreSQLmason_s
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
HDFS Federation++
HDFS Federation++HDFS Federation++
HDFS Federation++Hortonworks
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapakapa rohit
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...xKinAnx
 

Was ist angesagt? (20)

Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
 
Geek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionGeek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An Introduction
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
 
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
 
Application layer
Application layerApplication layer
Application layer
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
 
Geographically Distributed PostgreSQL
Geographically Distributed PostgreSQLGeographically Distributed PostgreSQL
Geographically Distributed PostgreSQL
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
HDFS Federation++
HDFS Federation++HDFS Federation++
HDFS Federation++
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 

Andere mochten auch

Lock Service with Paxos in Erlang
Lock Service with Paxos in ErlangLock Service with Paxos in Erlang
Lock Service with Paxos in ErlangSave Manos
 
图解分布式一致性协议Paxos 20150311
图解分布式一致性协议Paxos 20150311图解分布式一致性协议Paxos 20150311
图解分布式一致性协议Paxos 20150311Cabin WJ
 
Paxos introduction
Paxos introductionPaxos introduction
Paxos introduction宗志 陈
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and SparkAudible, Inc.
 
Flexible Paxos: Reaching agreement without majorities
Flexible Paxos: Reaching agreement without majorities Flexible Paxos: Reaching agreement without majorities
Flexible Paxos: Reaching agreement without majorities Heidi Howard
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaAndy Petrella
 

Andere mochten auch (7)

Lock Service with Paxos in Erlang
Lock Service with Paxos in ErlangLock Service with Paxos in Erlang
Lock Service with Paxos in Erlang
 
图解分布式一致性协议Paxos 20150311
图解分布式一致性协议Paxos 20150311图解分布式一致性协议Paxos 20150311
图解分布式一致性协议Paxos 20150311
 
Paxos introduction
Paxos introductionPaxos introduction
Paxos introduction
 
Paxos
PaxosPaxos
Paxos
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
Flexible Paxos: Reaching agreement without majorities
Flexible Paxos: Reaching agreement without majorities Flexible Paxos: Reaching agreement without majorities
Flexible Paxos: Reaching agreement without majorities
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 

Ähnlich wie NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoop Services Continuously Available

Coordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsCoordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsKonstantin V. Shvachko
 
Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011sandeep_tata
 
HDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemHDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemKonstantin V. Shvachko
 
State transfer With Galera
State transfer With GaleraState transfer With Galera
State transfer With GaleraMydbops
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introducejhao niu
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategySaptarshi Chatterjee
 
Parallel Processing (Part 2)
Parallel Processing (Part 2)Parallel Processing (Part 2)
Parallel Processing (Part 2)Ajeng Savitri
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoBig Data Joe™ Rossi
 
Dsm (Distributed computing)
Dsm (Distributed computing)Dsm (Distributed computing)
Dsm (Distributed computing)Sri Prasanna
 
Pnuts yahoo!’s hosted data serving platform
Pnuts  yahoo!’s hosted data serving platformPnuts  yahoo!’s hosted data serving platform
Pnuts yahoo!’s hosted data serving platformlammya aa
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
File service architecture and network file system
File service architecture and network file systemFile service architecture and network file system
File service architecture and network file systemSukhman Kaur
 

Ähnlich wie NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoop Services Continuously Available (20)

Coordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsCoordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
 
Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011
 
HDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemHDFS for Geographically Distributed File System
HDFS for Geographically Distributed File System
 
State transfer With Galera
State transfer With GaleraState transfer With Galera
State transfer With Galera
 
Unit 1
Unit 1Unit 1
Unit 1
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Hadoop
HadoopHadoop
Hadoop
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategy
 
Parallel Processing (Part 2)
Parallel Processing (Part 2)Parallel Processing (Part 2)
Parallel Processing (Part 2)
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
 
Cloud storage
Cloud storageCloud storage
Cloud storage
 
No sql
No sqlNo sql
No sql
 
Dsm (Distributed computing)
Dsm (Distributed computing)Dsm (Distributed computing)
Dsm (Distributed computing)
 
Pnuts yahoo!’s hosted data serving platform
Pnuts  yahoo!’s hosted data serving platformPnuts  yahoo!’s hosted data serving platform
Pnuts yahoo!’s hosted data serving platform
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
File service architecture and network file system
File service architecture and network file systemFile service architecture and network file system
File service architecture and network file system
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 

Kürzlich hochgeladen (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 

NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoop Services Continuously Available

  • 1. Non-Stop Hadoop Applying Paxos to make critical Hadoop services Continuously Available Jagane Sundar - CTO, WANdisco Brett Rudenstein – Senior Product Manager, WANdisco
  • 2. WANdisco Background  WANdisco: Wide Area Network Distributed Computing  Enterprise ready, high availability software solutions that enable globally distributed organizations to meet today’s data challenges of secure storage, scalability and availability  Leader in tools for software engineers – Subversion  Apache Software Foundation sponsor  Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND)  US patented active-active replication technology granted, November 2012  Global locations - San Ramon (CA) - Chengdu (China) - Tokyo (Japan) - Boston (MA) - Sheffield (UK) - Belfast (UK)
  • 5. Elementary Server Software: Single thread processing client requests in a loop Server Process make change to state (db) OP OP OP OP get client request e.g. hbase put send return value to client
  • 6. Multi-threaded Server Software: Multiple threads processing client requests in a loop Server Process make change to state (db) get client request e.g. hbase put send return value to client OP OP OP OP OP OP OP OPOP OP OP OP thread 1 thread 3 thread 2 thread 1 thread 2 thread 3 acquire lock release lock
  • 7. Continuously Available Servers Multiple Servers replicated and serving the same content server1 Server Process server2 Server Process server3 Server Process
  • 8. Problem  How do we ensure that the three servers contain exactly the same data?  In other words, how do you achieve strong consistency replication?
  • 9. Two parts to the solution:  (Multiple Replicas of) A Deterministic State Machine  The exact same sequence of operations, to be applied to each replica of the DSM
  • 10. A Deterministic State Machine  A state machine where a specific operation will always result in a deterministic state  Non deterministic factors cannot play a role in the end state of any operation in a DSM. Examples of non-deterministic factors - Time - Random
  • 11. Creating three replicated servers Apply all modify operations in the same exact sequence in each replicated server = Multiple Servers with exactly the same replicated data server1 Server Process (DSM) server2 Server Process (DSM) server3 Server Process (DSM) O P O P O P O P O P O P O P O P O P O P O P O P
  • 12. Problem:  How to achieve consensus between these servers as to the sequence of operations to perform?  Paxos is the answer - Algorithm for reaching consensus in a network of unreliable processors
  • 13. Three replicated servers server3 Server Process OP OP OP OP Distributed Coordination Engine server2 Server Process Distributed Coordination Engine OP OP OP OP server 1 Server Process OP OP OP OP Distributed Coordination Engine Paxos Client Client ClientClient Client Paxos OP OPOP OP
  • 15. Paxos  Paxos is an Algorithm for building Replicated Servers with strong consistency 1. Synod algorithm for achieving consensus among a network of unreliable processes 2. The application of consensus to the task of replicating a Deterministic State Machine  Paxos does not - Specify a network protocol - Invent a new language - Restrict use in a specific language
  • 16. Replicated State Machine  Installed on each node that participates in the distributed system  All nodes function as peers to deliver and assure the same transaction order occurs on every system - Achieve Consistent Replication  Consensus - Roles • Proposers, Acceptors, Learners - Phases • Election of a node to be the proposer • Broadcast of the proposal to peers • Acceptance of the proposal for majority
  • 17. Paxos Roles  Proposer - The client or a proxy for the client - Proposes a change to the Deterministic State Machine  Acceptor - Acceptors are the ‘memory’ of paxos - Quorum is established amongst acceptors  Learner - The DSM (Replicated, of course) - Each Learner applies the exact same sequence of operations as proposed by the Proposers, and accepted by a majority quorum of Acceptors
  • 18. Paxos - Ordering  Proposers issue a new sequence number of a higher value from the last sequence known  A majority agrees this number has not been seen  Consensus must be reached on the current proposal
  • 20. DConE Innovations Beyond Paxos  Quorum Configurations - Majority, Singleton, Unanimous - Distinguished Node – Tie Breaker - Quorum Rotations – follow the sun - Emergency Reconfigure  Concurrent agreement handling - Paxos only allows agreements on one proposal at a time • Slow performance in a high transaction volume environment - DConE allows simultaneous proposals from multiple proposers
  • 21. DConE Innovations Beyond Paxos  Dynamic group evolution - Add and remove nodes - Add and remove sites - No interruption of current operations  Distributed garbage collection - Safely discard state on disk and in memory when it is no longer required to assist in recovery - Messages are sent to peers pre-defined intervals to determine the highest common agreement - All agreements and agreed proposals are deleted
  • 22. DConE Innovations Beyond Paxos  Backoff and collision avoidance - Avoids repeated pre-emption of proposers by their peers - Prevents thrashing which can severely degrade performance. - When a round is pre-empted, a backoff delay is computed
  • 23. Self Healing Automatic Back up and Recovery  All nodes are mirrors/replicas of each other - Any node can be used as a helper to bring it back  Read access without Quorum - Cluster is still accessible for reads - No writes prevent split brain  Automatic catch up - Servers that have been offline, learn of transactions that were agreed on while it was unavailable - The missing transactions are played back and one caught up become fully participating members of the distributed system again  Servers can be updated without down time - Allows for rolling upgrades
  • 24. ‘Co-ordinate intent, not the outcome’ - Yeturu Aahlad Active-Active, not Active-Standby
  • 25. Co-ordinating intent Proposal to mkdir /a P a x o s server2 server 1 Proposal to createFile /a createFile /a createFile /a mkdir /a mkdir /a Op fails Op fails mkdir /a Co-ordinate outcome (WAL, HDFS Edits Log, etc.) server2 server 1 createFile /a server1 state is wrong mkdir /a operation needs to be undone Co-ordinating outcome
  • 27. HDFS Architecture  HDFS metadata is decoupled from data - Namespace is a hierarchy of files and directories represented by INodes - INodes record attributes: permissions, quotas, timestamps, replication  NameNode keeps its entire state in RAM - Memory state: the namespace tree and the mapping of blocks to DataNodes - Persistent state: recent checkpoint of the namespace and journal log  File data is divided into blocks (default 128MB) - Each block is independently replicated on multiple DataNodes (default 3) - Block replicas stored on DataNodes as local files on local drives Reliable distributed file system for storing very large data sets 27
  • 28. HDFS Cluster  Single active NameNode  Thousands of DataNodes  Tens of thousands of HDFS clients Active-Standby Architecture 28
  • 29. Standard HDFS operations  Active NameNode workflow 1. Receive request from a client, 2. Apply the update to its memory state, 3. Record the update as a journal transaction in persistent storage, 4. Return result to the client  HDFS Client (read or write to a file) - Send request to the NameNode, receive replica locations - Read or write data from or to DataNodes  DataNode - Data transfer to / from clients and between DataNodes - Report replica state change to NameNode(s): new, deleted, corrupt - Report its state to NameNode(s): heartbeats, block reports 29
  • 30. Consensus Node  Coordinated Replication of HDFS Namespace 30
  • 31. Replicated Namespace  Replicated NameNode is called a ConsensusNode or CNode  ConsensusNodes play equal active role on the cluster - Provide write and read access to the namespace  The namespace replicas are consistent with each other - Each CNode maintains a copy of the same namespace - Namespace updates applied to one CNode propagated to the others  Coordination Engine establishes the global order of namespace updates - All CNodes apply the same deterministic updates in the same deterministic order - Starting from the same initial state and applying the same updates = consistency Coordination Engine provides consistency of multiple namespace replicas 31
  • 32. Coordinated HDFS Cluster  Independent CNodes – the same namespace  Load balancing client requests  Proposal, Agreement  Coordinated updates Multiple active Consensus Nodes share namespace via Coordination Engine 32
  • 33. Coordinated HDFS operations  ConsensusNode workflow 1. Receive request from a client 2. Submit proposal to update to the Coordination Engine Wait for agreement 3. Apply the agreed update to its memory state, 4. Record the update as a journal transaction in persistent storage (optional) 5. Return result to the client  HDFS Client and DataNode operations remain the same Updates to the namespace when a file or a directory is created are coordinated 33
  • 34. Strict Consistency Model  Coordination Engine transforms namespace modification proposals into the global sequence of agreements - Applied to namespace replicas in the order of their Global Sequence Number  ConsensusNodes may have different states at a given moment of “clock” time - As the rate of consuming agreements may vary  CNodes have the same namespace state when they reach the same GSN  One-copy-equivalence - each replica presented to the client as if it has only one copy One-Copy-Equivalence as known in replicated databases 34
  • 35. Consensus Node Proxy  CNodeProxyProvider – a pluggable substitute of FailoverProxyProvider - Defined via Configuration  Main features - Randomly chooses CNode when client is instantiated - Sticky until a timeout occurs - Fails over to another CNode - Smart enough to avoid SafeMode  Further improvements - Take into account network proximity Reads do not modify namespace can be directed to any ConsensusNode 35
  • 36. Alternatives to a Paxos based Replicated State Machine
  • 37. Using a TCP Connection to send data to three replicated servers (Load Balancer) server3 Server Process OP OP server2 Server Process OP OP OP OP server 1 Server Process OP OP OP OP Client OP OP OP OP Load BalancerLoad Balancer
  • 38. Problems with using a Load Balancer  Load balancer becomes the single point of failure - Need to make the LB highly available and distributed  Since Paxos is not employed to reach consensus between the three replicas, strong consistency cannot be guaranteed - Replicas will quickly diverge
  • 39. HBase WAL or HDFS Edits Log replication  State Machine (HRegion contents, HDFS NameNode metadata, etc.) is modified first  Modification Log (HBase WAL or HDFS Edits Log) is sent to a Highly Available shared storage, QJM, etc.  Standby Server(s) read edits log and serve as warm standby servers, ready to take over should the active server fail
  • 40. HBase WAL or HDFS Edits Log replication server 1 Server Process OP OP OP OP server2 Server Process Shared Storage Standby Server WAL/Edits Log Single Active Server
  • 41.  Only one active server is possible  Failover takes time  Failover is error prone, with intricate fencing etc.  Cost of reaching consensus needs to be paid for HDFS Edits log entry to be deemed safely stored, so why not pay the cost before modifying the state and thereby have multiple active servers? HBase WAL or HDFS Edits Log tailing
  • 43. HBase Single Points of Failure  HBase Region Server  HBase Master
  • 45. NonStopRegionServer: Client Service e.g. multi Client Service DConE HRegionServer NonStopRegionServer 1 Client Service e.g. multi Client Service DConE HRegionServer NonStopRegionServer 2 Hbase Client 1. Client calls HRegionServer multi 2. NonStopRegionServer intercepts 3. NonStopRegionServer makes paxos proposal using DConE library 4. Proposal comes back as agreement on all NonStopRegionServers 5. NonStopRegionServer calls super.multi on all nodes. State changes are recorded 6. NonStopRegionServer 1 alone sends response back to client Subclassing the HRegionServer
  • 46. HBase RegionServer replication using WANdisco DConE  Shared nothing architecture  HFiles, WALs etc. are not shared  Replica count is tuned  Snapshots of HFiles do not need to be created  Messy details of WAL tailing are not necessary
  • 47. HBase RegionServer replication using WANdisco DConE  Not an eventual consistency model  Does not serve up stale data

Hinweis der Redaktion

  1. Sequenced set of operations Proposers Nodes that propose issue a new number of a higher value based on last sequence it is aware of Majority agrees that a higher number has not been seen and if so allows transaction to complete Consensus must be reached on the current proposal
  2. Seven key innovations over paxos
  3. Distributed garbage collection Any system that deals with distributed state should be able to safely discard state information on disk and in memory for efficient resource utilization. The point at which it is safe to do so is the point at which the state information is no longer required to assist in the recovery of a node at any site. Each DConE instance sends messages to its peers at other nodes at pre-defined intervals to determine the highest contiguously populated agreement common to all of them. It then deletes all agreements from the agreement log, and all agreed proposals from the proposal log that are no longer needed for recovery. Distinguished and fair round numbers fopr proposals Weak reservations DConE’s use of distinguished and fair round numbers in the process of achieving consensus avoids the contention that would otherwise arise when multiple proposals are submitted simultaneously by different nodes using the same round number. If this option is used, the round number will consist of three components: (1) a monotonically increasing component which is simply the increment of the last monotonic component; (2) a distinguished component which is a component specific to each proposer and (3) a random component. If two proposers clash on the first component, then the random component is evaluated, and the proposer whose number has the larger random number component wins. If there is still no winner, then the distinguished component is compared, and the winner is the one with the largest distinguished component. Without this approach the competing nodes could end up simply incrementing the last attempted round number and resubmitting their proposals. This could lead to thrashing that would negatively impact the performance of the distributed system. This approach also ensures fairness in the sense that it prevents any node from always winning. Weak Reservations DConE provides an optional weak reservation mechanism to eliminate pre- emption of proposers under high transaction volume scenarios. For example, if there are three proposers - one, two and three - the proposer’s number determines which range of agreement numbers that proposer will drive. This avoids any possibility of collisions among the multiple proposals from each proposer that are proceeding in parallel across the distributed system.
  4. Dynamic group evolution DConE supports the concept of dynamic group evolution, allowing a distributed system to scale to support new sites and users. New nodes can be added to a distributed system, or existing nodes can be removed without interrupting the operation of the remaining nodes. Backoff and collison avoidance DConE provides a backoff mechanism for avoiding repeated pre-emption of proposers by their peers. Conventional replicated state machines allow the preempted proposer to immediately initiate a new round with an agreement number higher than that of the pre-emptor. This approach can lead an agreement protocol to thrash for an extended period of time and severely degrade performance. With DConE, when a round is pre-empted, the DConE instance which initiated the proposal computes the duration of backoff delay. The proposer then waits for this duration before initiating the next round. DConE uses an approach similar to Carrier Sense Multiple Access/Collision Detection (CSMA/CD) protocols for nonswitched ethernet. Multiple propsers Both say I want tx 179 so they are competing.. Collison avoidance Paxos round… sends out read messagner to acceptors
  5. Disadvantages: 1. Resources used to support Standby 2. Single NN is a bottleneck 3. Failover: complex, still outage Can do better than that with consistent replication
  6. Disadvantages: 1. Resources used to support Standby 2. Single NN is a bottleneck 3. Failover: complex, still outage Can do better than that with consistent replication
  7. Double determinism is important
  8. NameNodes start from the same state and apply the same deterministic updates in the same deterministic order, their states are consistent. Independent NameNodes don’t know about each other
  9. Sequenced set of operations Proposers Nodes that propose issue a new number of a higher value based on last sequence it is aware of Majority agrees that a higher number has not been seen and if so allows transaction to complete Consensus must be reached on the current proposal
  10. Sequenced set of operations Proposers Nodes that propose issue a new number of a higher value based on last sequence it is aware of Majority agrees that a higher number has not been seen and if so allows transaction to complete Consensus must be reached on the current proposal
  11. Sequenced set of operations Proposers Nodes that propose issue a new number of a higher value based on last sequence it is aware of Majority agrees that a higher number has not been seen and if so allows transaction to complete Consensus must be reached on the current proposal