SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
HBASE STATUS QUO
The State of Affairs in HBase Land
ApacheCon Europe, November 2012

Lars George
Director EMEA Services
About Me

•  Director EMEA Services @ Cloudera
    •  Consulting on Hadoop projects (everywhere)
•  Apache Committer
    •  HBase and Whirr
•  O’Reilly Author
    •  HBase – The Definitive Guide
      •  Now in Japanese!

•  Contact
    •  lars@cloudera.com                      日本語版も出ました!	
  
    •  @larsgeorge
Agenda

•  HDFS and HBase
•  HBase Project Status
HDFS AND HBASE
Past, Presence, Future
Framework for Discussion

•  Time Periods
    •  Past (Hadoop pre-1.0)
    •  Present (Hadoop 1.x, 2.0)
    •  Future (Hadoop 2.x and later)


•  Categories
    •  Reliability/Availability
    •  Performance
    •  Feature Set
HDFS and HBase History – 2006



Author: Douglass Cutting <cutting@apache.org> !
Date: Fri Jan 27 22:19:42 2006 +0000 !
!
   Create hadoop sub-project. !
!
HDFS and HBase History – 2007



Author: Douglass Cutting <cutting@apache.org>!
Date: Tue Apr 3 20:34:28 2007 +0000 !
  !!
   HADOOP-1045. Add contrib/hbase, a 

   BigTable-like online database. !
!
HDFS and HBase History – 2008



Author: Jim Kellerman <jimk@apache.org> !
Date: Tue Feb 5 02:36:26 2008 +0000 !
!
   2008/02/04 HBase is now a subproject of
   Hadoop. The first HBase release as a
   subproject will be release 0.1.0 which will
   be equivalent to the version of HBase
   included in Hadoop 0.16.0... !
HDFS and HBase History – Early 2010

HBase has been around for 3 years. But HDFS still
acts like MapReduce is the only important client!




        People	
  have	
  accused	
  HDFS	
  of	
  being	
  like	
  a	
  molasses	
  train:	
  
                         High	
  throughput	
  but	
  not	
  so	
  fast	
  
HDFS and HBase History – 2010

•  HBase becomes a top-level project
•  Facebook chooses HBase for Messages product
•  Jump from HBase 0.20 to HBase 0.89 and 0.90
•  First CDH3 betas include HBase
•  HDFS community starts to work on features for
 HBase.
  •  Infamous
    hadoop-0.20-append 

    branch
WHAT DID GET DONE?
And where is it going?
Reliability in the Past: Hadoop 1.0

•  Pre-1.0, if the DN crashed, HBase would lose its
 WALs (and your beloved data).
  •  1.0 integrated hadoop-0.20-append branch into a
     main-line release
  •  True durability support for HBase
  •  We have a fighting chance at metadata reliability!
•  Numerous bug fixes for write pipeline recovery
 and other error paths
  •  HBase is not nearly so forgiving as MapReduce!
  •  “Single-writer” fault tolerance vs. “job-level” fault
    tolerance
Reliability in the Past: Hadoop 1.0

•  Pre-1.0: if any disk failed, entire DN would go
 offline
   •  Problematic for HBase: local RS would lose all
      locality!
   •  1.0: per-disk failure detection in DN (HDFS-457)
   •  Allows HBase to lose a disk without losing all locality


•  Tip: Configure
   dfs.datanode.failed.volumes.tolerated = 1 !
Reliability Today: Hadoop 2.0

•  Integrates Highly Available HDFS
•  Active-standby hot failover removes SPOF
•  Transparent to clients: no HBase changes
   necessary
•  Tested extensively under HBase read/write
   workloads
•  Coupled with HBase master failover, no more
   HBase SPOF!
HDFS HA
Reliability in the Future: HA in 2.x

•  Remove dependency on NFS (HDFS-3077)
    •  Quorum-commit protocol for NameNode edit logs
    •  Similar to ZAB/Multi-Paxos


•  Automatic failover for HA NameNodes
 (HDFS-3042)
   •  ZooKeeper-based master election, just like HBase
   •  Merged to trunk
Other Reliability Work for HDFS 2.x

•  2.0: current hflush() API only guarantees data
   is replicated to three machines – not fully on disk.
•  A cluster-wide power outage can lose data.
   •  Upcoming in 2.x: Support for hsync()
      (HDFS-744, HBASE-5954)
   •  Calls fsync() for all replicas of the WAL
   •  Full durability of edits, even with full cluster power
      outages
hflush() and hsync()




hflush()/hsync()                       hflush(): flushes to OS buffer
- send all queued data, note seq num   hsync(): fsync() to disk
- block until corresponding ACK is
received
HDFS Wire Compatibility in Hadoop 2.0

•  In 1.0: HDFS client version must match server
   version closely.
•  How many of you have manually copied HDFS
   client jars?
•  Client-server compatibility in 2.0:
   •  Protobuf-based RPC
   •  Easier HBase installs: no more futzing with jars
   •  Separate HBase upgrades from HDFS upgrades
•  Intra-cluster server compatibility in the works
     •  Allow for rolling upgrade without downtime
Performance: Hadoop 1.0

•  Pre-1.0: even for reads from local machine, client
   connects to DN via TCP
•  1.0: Short-circuit local reads
   •  Obtains direct access to underlying local block file, then
      uses regular FileInputStream access.
   •  2x speedup for random reads
•  Configure
      dfs.client.read.shortcircuit = true!
      dfs.block.local-path-access.user = hbase!
      dfs.datanode.data.dir.perm = 755!
•  Note: Currently does not support security
Performance: Hadoop 2.0

•  Pre-2.0: Up to 50% CPU spent verifying CRC
•  2.0: Native checksums using SSE4.2 crc32 asm
 (HDFS-2080)
   •  2.5x speedup reading from buffer cache
   •  Now only 15% CPU overhead to checksumming
•  Pre-2.0: re-establishes TCP connection to DN for
   each seek
•  2.0: Rewritten BlockReader, keep-alive to DN
   (HDFS-941)
   •  40% improvement on random read for HBase
   •  2-2.5x in micro-benchmarks
•  Total improvement vs. 0.20.2: 3.4x!
Performance: Hadoop 2.x

•  Currently: lots of CPU spent copying data in
   memory
•  “Direct-read” API: read directly into user-provided
   DirectByteBuffers (HDFS-2834)
   •  Another ~2x improvement to sequential throughput
      reading from cache
   •  Opportunity to avoid two more buffer copies reading
      compressed data (HADOOP-8148)
   •  Codec APIs still in progress, needs integration into
      HBase
Performance: Hadoop 2.x

•  True “zero-copy read” support (HDFS-3051)
    •  New API would allow direct access to mmaped block
       files
    •  No syscall or JNI overhead for reads
    •  Initial benchmarks indicate at least ~30% gain.
    •  Some open questions around best safe
       implementation
Current Read Path
Proposed Read Path
Performance: Why Emphasize CPU?

•  Machines with lots of RAM now inexpensive
   (48-96GB common)
•  Want to use that to improve cache hit ratios.
•  Unfortunately, 50GB+ Java heaps still impractical
   (GC pauses too long)
•  Allocate the extra RAM to the buffer cache
  •  OS caches compressed data: another win!
•  CPU overhead reading from buffer cache
 becomes limiting factor for read workloads
What’s Up Next in 2.x?

•  HDFS Hard-links (HDFS-3370)
    •  Will allow for HBase to clone/snapshot tables
       efficiently!
    •  Improves HBase table-scoped backup story
•  HDFS Snapshots (HDFS-2802)
    •  HBase-wide snapshot support for point-in-time
       recovery
    •  Enables consistent backups copied off-site for DR
What’s Up Next in 2.x?

•  Improved block placement policies (HDFS-1094)
    •  Fundamental tradeoff between probability of data
       unavailability and the amount of data that becomes
       unavailable
    •  Current scheme: if any 3 nodes not on the same rack
       die, some very small amount of data is unavailable
    •  Proposed scheme: lessen chances of unavailability,
       but if a certain three nodes die, a larger amount is
       unavailable
    •  For many HBase applications: any single lost block
       halts whole operation. Prefer to minimize probability.
What’s Up Next in 2.x?

•  HBase-specific block placement hints
 (HBASE-4755)
  •  Assign each region a set of three RS (primary and
     two backups)
  •  Place underlying data blocks on these three DNs
  •  Could then fail-over and load-balance without losing
     any locality!
Summary

                          Hadoop	
  1.0	
               Hadoop	
  2.0	
                Hadoop	
  2.x	
  
  Availability	
   •  DN	
  volume	
  failure	
   •  NameNode	
  HA	
         •  HA	
  without	
  NAS	
  
                      isola>on	
                  •  Wire	
  Compa>bility	
   •  Rolling	
  upgrade	
  
Performance	
   •  Short-­‐circuit	
  reads	
   •  Na>ve	
  CRC	
              •  Direct-­‐read	
  API	
  
                                                •  DN	
  keep-­‐alive	
        •  Zero-­‐copy	
  API	
  
                                                                               •  Direct	
  codec	
  API	
  
     Features	
   •  Durable	
                                                 •    hsync()!
                     hflush()!                                                 •    Snapshots	
  
                                                                               •    Hard	
  links	
  
                                                                               •    HBase-­‐aware	
  block	
  
                                                                                    placement	
  
Summary

•  HBase is no longer a second-class citizen.
•  We’ve come a long way since Hadoop 0.20.2 in
   performance, reliability, and availability.
•  New features coming in the 2.x line specifically to
   benefit HBase use cases
•  Hadoop 2.0 features available today via CDH4.
   Many Cloudera customers already using CDH4
   with HBase with great success.
PROJECT STATUS
Current Project Status

•  HBase 0.90.x “Advanced Concepts”
    •  Master Rewrite – More Zookeeper
    •  Intra Row Scanning
    •  Further optimizations on algorithms and data
       structures

         CDH3
Current Project Status

•  HBase 0.92.x “Coprocessors”
    •  Multi-DC Replication
    •  Discretionary Access Control
    •  Coprocessors
      •  Endpoints and Observers
      •  Can hook into many explicit and implicit operation


        CDH4
Current Project Status (cont.)

•  HBase 0.94.x “Performance Release”
    •  Read CRC Improvements
    •  Seek Optimizations
     •  Lazy Seeks
  •  WAL Compression
  •  Prefix Compression (aka Block Encoding)


  •  Atomic Append
  •  Atomic put+delete
  •  Multi Increment and Multi Append
Current Project Status (cont.)

•  HBase 0.94.x “Performance Release”
    •  Per-region (i.e. local) Multi-Row Transactions


   RegionMutation rm = new RegionMutation();!
   Put p = new Put(ROW1);!
   p.add(FAMILY, QUALIFIER, VALUE);!
   rm.add(p);!
   p = new Put(ROW2);!
   p.add(FAMILY, QUALIFIER, VALUE);!
   rm.add(p);!
   t.mutateRegion(rm);!
Current Project Status (cont.)

•  HBase 0.94.x “Performance Release”
    •  Uber HBCK
    •  Embedded Thrift Server
    •  WALPlayer
    •  Enable/disable Replication Streams

        CDH4.x   (soon)
Current Project Status (cont.)

•  HBase 0.96.x “The Singularity”
    •  Protobuf RPC
      •  Rolling Upgrades
      •  Multiversion Access
  •  Metrics V2
  •  Preview Technologies
      •  Snapshots
      •  PrefixTrie Block Encoding



        CDH5 ?
Client/Server Compatibility Matrix


RegionServer,	
  
                         Client	
  0.96.0	
  	
           Client	
  0.96.1	
  	
                Client	
  0.98.0	
      Client	
  1.0.0	
  	
  
Master	
  
	
  	
  0.96.0	
  	
          Works	
  	
                       Works*	
                           Works*	
            No	
  guarantee	
  
	
  	
  0.96.1	
  	
          Works	
  	
                         Works	
  	
                      Works*	
            No	
  guarantee	
  
	
  	
  0.98.0	
              Works	
  	
                         Works	
  	
                       Works	
  	
            Works*	
  	
  
	
  	
  1.0.0	
          No	
  guarantee	
               No	
  guarantee	
                          Works	
                 Works	
  

                                 Notes:	
  *	
  If	
  new	
  features	
  are	
  not	
  used	
  
                                 	
  




             39                                     ©2012 Cloudera, Inc. All Rights Reserved.
QuesGons?	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeMapR Technologies
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesDataWorks Summit
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesYahoo Developer Network
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive TuningAdam Muise
 
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMaintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMapR Technologies
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsDataWorks Summit
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Uwe Printz
 
Architecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopArchitecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopDataWorks Summit
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Uwe Printz
 

Was ist angesagt? (20)

Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and Insides
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMaintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
 
Architecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopArchitecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with Hadoop
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Welcome to Hadoop2Land!
Welcome to Hadoop2Land!
 
Apache drill
Apache drillApache drill
Apache drill
 

Andere mochten auch

中德文化比較
中德文化比較中德文化比較
中德文化比較Chris Huang
 
Approaching real-time-hadoop
Approaching real-time-hadoopApproaching real-time-hadoop
Approaching real-time-hadoopChris Huang
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Chris Huang
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorialChris Huang
 
Real time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystemReal time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystemChris Huang
 
Scaling big-data-mining-infra2
Scaling big-data-mining-infra2Scaling big-data-mining-infra2
Scaling big-data-mining-infra2Chris Huang
 

Andere mochten auch (7)

中德文化比較
中德文化比較中德文化比較
中德文化比較
 
Approaching real-time-hadoop
Approaching real-time-hadoopApproaching real-time-hadoop
Approaching real-time-hadoop
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...
 
Wissbi osdc pdf
Wissbi osdc pdfWissbi osdc pdf
Wissbi osdc pdf
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Real time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystemReal time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystem
 
Scaling big-data-mining-infra2
Scaling big-data-mining-infra2Scaling big-data-mining-infra2
Scaling big-data-mining-infra2
 

Ähnlich wie Hbase status quo apache-con europe - nov 2012

HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Apache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's UpcomingApache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's Upcominghuguk
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
HBase User Group #9: HBase and HDFS
HBase User Group #9: HBase and HDFSHBase User Group #9: HBase and HDFS
HBase User Group #9: HBase and HDFSCloudera, Inc.
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesyarapavan
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)baggioss
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopSpagoWorld
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbaseRavi Veeramachaneni
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBaseMichael Stack
 
HBase state of the union
HBase   state of the unionHBase   state of the union
HBase state of the unionenissoz
 

Ähnlich wie Hbase status quo apache-con europe - nov 2012 (20)

HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Apache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other VersionsApache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other Versions
 
Apache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's UpcomingApache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's Upcoming
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
HBase User Group #9: HBase and HDFS
HBase User Group #9: HBase and HDFSHBase User Group #9: HBase and HDFS
HBase User Group #9: HBase and HDFS
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from Hadoop
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBase
 
HBase state of the union
HBase   state of the unionHBase   state of the union
HBase state of the union
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 

Mehr von Chris Huang

Data compression, data security, and machine learning
Data compression, data security, and machine learningData compression, data security, and machine learning
Data compression, data security, and machine learningChris Huang
 
Kks sre book_ch10
Kks sre book_ch10Kks sre book_ch10
Kks sre book_ch10Chris Huang
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2Chris Huang
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012Chris Huang
 
重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)Chris Huang
 
重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)Chris Huang
 
重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)Chris Huang
 
重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2Chris Huang
 
重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1Chris Huang
 
重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)Chris Huang
 
重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)Chris Huang
 
重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)Chris Huang
 
重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)Chris Huang
 
重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)Chris Huang
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsChris Huang
 
Hw5 my house in yong he
Hw5 my house in yong heHw5 my house in yong he
Hw5 my house in yong heChris Huang
 
Social English Class HW4
Social English Class HW4Social English Class HW4
Social English Class HW4Chris Huang
 
Social English Class HW3
Social English Class HW3Social English Class HW3
Social English Class HW3Chris Huang
 
火柴人的故事
火柴人的故事火柴人的故事
火柴人的故事Chris Huang
 

Mehr von Chris Huang (20)

Data compression, data security, and machine learning
Data compression, data security, and machine learningData compression, data security, and machine learning
Data compression, data security, and machine learning
 
Kks sre book_ch10
Kks sre book_ch10Kks sre book_ch10
Kks sre book_ch10
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)
 
重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)
 
重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)
 
重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2
 
重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1
 
重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)
 
重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)
 
重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)
 
重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)
 
重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
 
Hw5 my house in yong he
Hw5 my house in yong heHw5 my house in yong he
Hw5 my house in yong he
 
Social English Class HW4
Social English Class HW4Social English Class HW4
Social English Class HW4
 
Social English Class HW3
Social English Class HW3Social English Class HW3
Social English Class HW3
 
Sm Case1 Ikea
Sm Case1 IkeaSm Case1 Ikea
Sm Case1 Ikea
 
火柴人的故事
火柴人的故事火柴人的故事
火柴人的故事
 

Kürzlich hochgeladen

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 

Kürzlich hochgeladen (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Hbase status quo apache-con europe - nov 2012

  • 1. HBASE STATUS QUO The State of Affairs in HBase Land ApacheCon Europe, November 2012 Lars George Director EMEA Services
  • 2. About Me •  Director EMEA Services @ Cloudera •  Consulting on Hadoop projects (everywhere) •  Apache Committer •  HBase and Whirr •  O’Reilly Author •  HBase – The Definitive Guide •  Now in Japanese! •  Contact •  lars@cloudera.com 日本語版も出ました!   •  @larsgeorge
  • 3. Agenda •  HDFS and HBase •  HBase Project Status
  • 4. HDFS AND HBASE Past, Presence, Future
  • 5. Framework for Discussion •  Time Periods •  Past (Hadoop pre-1.0) •  Present (Hadoop 1.x, 2.0) •  Future (Hadoop 2.x and later) •  Categories •  Reliability/Availability •  Performance •  Feature Set
  • 6. HDFS and HBase History – 2006 Author: Douglass Cutting <cutting@apache.org> ! Date: Fri Jan 27 22:19:42 2006 +0000 ! ! Create hadoop sub-project. ! !
  • 7. HDFS and HBase History – 2007 Author: Douglass Cutting <cutting@apache.org>! Date: Tue Apr 3 20:34:28 2007 +0000 ! !! HADOOP-1045. Add contrib/hbase, a 
 BigTable-like online database. ! !
  • 8. HDFS and HBase History – 2008 Author: Jim Kellerman <jimk@apache.org> ! Date: Tue Feb 5 02:36:26 2008 +0000 ! ! 2008/02/04 HBase is now a subproject of Hadoop. The first HBase release as a subproject will be release 0.1.0 which will be equivalent to the version of HBase included in Hadoop 0.16.0... !
  • 9. HDFS and HBase History – Early 2010 HBase has been around for 3 years. But HDFS still acts like MapReduce is the only important client! People  have  accused  HDFS  of  being  like  a  molasses  train:   High  throughput  but  not  so  fast  
  • 10. HDFS and HBase History – 2010 •  HBase becomes a top-level project •  Facebook chooses HBase for Messages product •  Jump from HBase 0.20 to HBase 0.89 and 0.90 •  First CDH3 betas include HBase •  HDFS community starts to work on features for HBase. •  Infamous hadoop-0.20-append 
 branch
  • 11. WHAT DID GET DONE? And where is it going?
  • 12. Reliability in the Past: Hadoop 1.0 •  Pre-1.0, if the DN crashed, HBase would lose its WALs (and your beloved data). •  1.0 integrated hadoop-0.20-append branch into a main-line release •  True durability support for HBase •  We have a fighting chance at metadata reliability! •  Numerous bug fixes for write pipeline recovery and other error paths •  HBase is not nearly so forgiving as MapReduce! •  “Single-writer” fault tolerance vs. “job-level” fault tolerance
  • 13. Reliability in the Past: Hadoop 1.0 •  Pre-1.0: if any disk failed, entire DN would go offline •  Problematic for HBase: local RS would lose all locality! •  1.0: per-disk failure detection in DN (HDFS-457) •  Allows HBase to lose a disk without losing all locality •  Tip: Configure dfs.datanode.failed.volumes.tolerated = 1 !
  • 14. Reliability Today: Hadoop 2.0 •  Integrates Highly Available HDFS •  Active-standby hot failover removes SPOF •  Transparent to clients: no HBase changes necessary •  Tested extensively under HBase read/write workloads •  Coupled with HBase master failover, no more HBase SPOF!
  • 16. Reliability in the Future: HA in 2.x •  Remove dependency on NFS (HDFS-3077) •  Quorum-commit protocol for NameNode edit logs •  Similar to ZAB/Multi-Paxos •  Automatic failover for HA NameNodes (HDFS-3042) •  ZooKeeper-based master election, just like HBase •  Merged to trunk
  • 17. Other Reliability Work for HDFS 2.x •  2.0: current hflush() API only guarantees data is replicated to three machines – not fully on disk. •  A cluster-wide power outage can lose data. •  Upcoming in 2.x: Support for hsync() (HDFS-744, HBASE-5954) •  Calls fsync() for all replicas of the WAL •  Full durability of edits, even with full cluster power outages
  • 18. hflush() and hsync() hflush()/hsync() hflush(): flushes to OS buffer - send all queued data, note seq num hsync(): fsync() to disk - block until corresponding ACK is received
  • 19. HDFS Wire Compatibility in Hadoop 2.0 •  In 1.0: HDFS client version must match server version closely. •  How many of you have manually copied HDFS client jars? •  Client-server compatibility in 2.0: •  Protobuf-based RPC •  Easier HBase installs: no more futzing with jars •  Separate HBase upgrades from HDFS upgrades •  Intra-cluster server compatibility in the works •  Allow for rolling upgrade without downtime
  • 20. Performance: Hadoop 1.0 •  Pre-1.0: even for reads from local machine, client connects to DN via TCP •  1.0: Short-circuit local reads •  Obtains direct access to underlying local block file, then uses regular FileInputStream access. •  2x speedup for random reads •  Configure dfs.client.read.shortcircuit = true! dfs.block.local-path-access.user = hbase! dfs.datanode.data.dir.perm = 755! •  Note: Currently does not support security
  • 21. Performance: Hadoop 2.0 •  Pre-2.0: Up to 50% CPU spent verifying CRC •  2.0: Native checksums using SSE4.2 crc32 asm (HDFS-2080) •  2.5x speedup reading from buffer cache •  Now only 15% CPU overhead to checksumming •  Pre-2.0: re-establishes TCP connection to DN for each seek •  2.0: Rewritten BlockReader, keep-alive to DN (HDFS-941) •  40% improvement on random read for HBase •  2-2.5x in micro-benchmarks •  Total improvement vs. 0.20.2: 3.4x!
  • 22. Performance: Hadoop 2.x •  Currently: lots of CPU spent copying data in memory •  “Direct-read” API: read directly into user-provided DirectByteBuffers (HDFS-2834) •  Another ~2x improvement to sequential throughput reading from cache •  Opportunity to avoid two more buffer copies reading compressed data (HADOOP-8148) •  Codec APIs still in progress, needs integration into HBase
  • 23. Performance: Hadoop 2.x •  True “zero-copy read” support (HDFS-3051) •  New API would allow direct access to mmaped block files •  No syscall or JNI overhead for reads •  Initial benchmarks indicate at least ~30% gain. •  Some open questions around best safe implementation
  • 26. Performance: Why Emphasize CPU? •  Machines with lots of RAM now inexpensive (48-96GB common) •  Want to use that to improve cache hit ratios. •  Unfortunately, 50GB+ Java heaps still impractical (GC pauses too long) •  Allocate the extra RAM to the buffer cache •  OS caches compressed data: another win! •  CPU overhead reading from buffer cache becomes limiting factor for read workloads
  • 27. What’s Up Next in 2.x? •  HDFS Hard-links (HDFS-3370) •  Will allow for HBase to clone/snapshot tables efficiently! •  Improves HBase table-scoped backup story •  HDFS Snapshots (HDFS-2802) •  HBase-wide snapshot support for point-in-time recovery •  Enables consistent backups copied off-site for DR
  • 28. What’s Up Next in 2.x? •  Improved block placement policies (HDFS-1094) •  Fundamental tradeoff between probability of data unavailability and the amount of data that becomes unavailable •  Current scheme: if any 3 nodes not on the same rack die, some very small amount of data is unavailable •  Proposed scheme: lessen chances of unavailability, but if a certain three nodes die, a larger amount is unavailable •  For many HBase applications: any single lost block halts whole operation. Prefer to minimize probability.
  • 29. What’s Up Next in 2.x? •  HBase-specific block placement hints (HBASE-4755) •  Assign each region a set of three RS (primary and two backups) •  Place underlying data blocks on these three DNs •  Could then fail-over and load-balance without losing any locality!
  • 30. Summary Hadoop  1.0   Hadoop  2.0   Hadoop  2.x   Availability   •  DN  volume  failure   •  NameNode  HA   •  HA  without  NAS   isola>on   •  Wire  Compa>bility   •  Rolling  upgrade   Performance   •  Short-­‐circuit  reads   •  Na>ve  CRC   •  Direct-­‐read  API   •  DN  keep-­‐alive   •  Zero-­‐copy  API   •  Direct  codec  API   Features   •  Durable   •  hsync()! hflush()! •  Snapshots   •  Hard  links   •  HBase-­‐aware  block   placement  
  • 31. Summary •  HBase is no longer a second-class citizen. •  We’ve come a long way since Hadoop 0.20.2 in performance, reliability, and availability. •  New features coming in the 2.x line specifically to benefit HBase use cases •  Hadoop 2.0 features available today via CDH4. Many Cloudera customers already using CDH4 with HBase with great success.
  • 33. Current Project Status •  HBase 0.90.x “Advanced Concepts” •  Master Rewrite – More Zookeeper •  Intra Row Scanning •  Further optimizations on algorithms and data structures CDH3
  • 34. Current Project Status •  HBase 0.92.x “Coprocessors” •  Multi-DC Replication •  Discretionary Access Control •  Coprocessors •  Endpoints and Observers •  Can hook into many explicit and implicit operation CDH4
  • 35. Current Project Status (cont.) •  HBase 0.94.x “Performance Release” •  Read CRC Improvements •  Seek Optimizations •  Lazy Seeks •  WAL Compression •  Prefix Compression (aka Block Encoding) •  Atomic Append •  Atomic put+delete •  Multi Increment and Multi Append
  • 36. Current Project Status (cont.) •  HBase 0.94.x “Performance Release” •  Per-region (i.e. local) Multi-Row Transactions RegionMutation rm = new RegionMutation();! Put p = new Put(ROW1);! p.add(FAMILY, QUALIFIER, VALUE);! rm.add(p);! p = new Put(ROW2);! p.add(FAMILY, QUALIFIER, VALUE);! rm.add(p);! t.mutateRegion(rm);!
  • 37. Current Project Status (cont.) •  HBase 0.94.x “Performance Release” •  Uber HBCK •  Embedded Thrift Server •  WALPlayer •  Enable/disable Replication Streams CDH4.x (soon)
  • 38. Current Project Status (cont.) •  HBase 0.96.x “The Singularity” •  Protobuf RPC •  Rolling Upgrades •  Multiversion Access •  Metrics V2 •  Preview Technologies •  Snapshots •  PrefixTrie Block Encoding CDH5 ?
  • 39. Client/Server Compatibility Matrix RegionServer,   Client  0.96.0     Client  0.96.1     Client  0.98.0   Client  1.0.0     Master      0.96.0     Works     Works*   Works*   No  guarantee      0.96.1     Works     Works     Works*   No  guarantee      0.98.0   Works     Works     Works     Works*        1.0.0   No  guarantee   No  guarantee   Works   Works   Notes:  *  If  new  features  are  not  used     39 ©2012 Cloudera, Inc. All Rights Reserved.