SlideShare a Scribd company logo
1 of 43
June	
  13,	
  2012	
  

HBase Consistency and
Performance Improvements
Esteban	
  Gu+errez,	
  Gregory	
  Chanan	
  
{esteban,	
  gchanan}@cloudera.com	
  
HBase Consistency

    •  ACID guarantees within a single row
    •  “Any row returned by the scan will be a
       consistent view (i.e. that version of the
       complete row existed at some point in
       time)”[1]

    [1] http://hbase.apache.org/acid-semantics.html



2
                       ©2012 Cloudera, Inc. All Rights Reserved.
HBase Consistency Issues

    •  Write Consistency Issues
    •  Read Consistency Issues




3
                    ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552

    •  Importing Multiple CFs HFiles
       is not an atomic operation




4
                     ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552

•  Importing Multiple CFs HFiles
was not an atomic operation
   is




5
                 ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552
                                       HRegion.bulkLoadHFile()


                       HFile1:         HFile2:                    HFile3:     HFile4:
             Row 1


                     fam1:col1       fam2:col2                  fam3:col3   fam4:col4

                      val1
     T1   Scan



     T2   Scan        val1               val2

     T3   Scan
                      val1               val2                       val3

     T4   Scan
                      val1               val2                       val3      val4

                                                                                        < HBase 0.90.5


6
                             ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552
                                        HRegion.bulkLoadHFiles()


                       HFile1:           HFile2:                    HFile3:     HFile4:
             Row 1


                     fam1:col1         fam2:col2                  fam3:col3   fam4:col4


     T1   Scan       public void bulkLoadHFiles(List<Pair<byte[], String>>
                     familyPaths) {!
                     ...!
                        startRegionOperation(); ç lock.writeLock().lock()!
     T2   Scan       } finally {!
                        closeBulkRegionOperation(); !
                     }!
     T3   Scan
                     ...!
                     !


     T4   Scan

                                                                                          ≥ HBase 0.90.5


7
                               ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552
                                        HRegion.bulkLoadHFiles()


                       HFile1:           HFile2:                    HFile3:     HFile4:
             Row 1


                     fam1:col1         fam2:col2                  fam3:col3   fam4:col4


     T1   Scan       public void bulkLoadHFiles(List<Pair<byte[], String>>
                     familyPaths) {!
                     ...!
                        startRegionOperation(); !
     T2   Scan       } finally {!
                        closeBulkRegionOperation(); ç lock.writeLock().unlock()!
                     }!
     T3   Scan
                     ...!
                     !


     T4   Scan

                                                                                          ≥ HBase 0.90.5


8
                               ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552
                                        HRegion.bulkLoadHFiles()


                       HFile1:           HFile2:                    HFile3:         HFile4:
             Row 1


                     fam1:col1         fam2:col2                  fam3:col3       fam4:col4


     T1   Scan       public void bulkLoadHFiles(List<Pair<byte[], String>>
                     familyPaths) {!
                     ...!
                        startRegionOperation(); !
     T2   Scan       } finally {!
                        closeBulkRegionOperation(); !
                     }!
     T3   Scan
                     ...!
                     !


     T4   Scan         val1                  val2                          val3      val4
                                                                                              ≥ HBase 0.90.5


9
                               ©2012 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856

 •  Seen only twice in the
    wilderness
 •  Hard to detect if application
    monitoring is not
    implemented


10
                   ©2012 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 •  Table size ≈ 50 M records
 •  Large number of CFs
 •  New records are continuously added to
    the table
 •  Concurrent MR Jobs on the same table
 •  Cluster has to meet strict SLAs


11
                 ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1
              …             …                                                …
                            SPLIT_RAW_FILES                                  …
     Map-Reduce Framework
                            Map output records                               500,000




12
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1   Run 2
              …             …                                                …           …
                            SPLIT_RAW_FILES                                  …           …
     Map-Reduce Framework
                            Map output records                               500,000     499,997




13
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1   Run 2     Run 3
              …             …                                                …           …         …
                            SPLIT_RAW_FILES                                  …           …         …
     Map-Reduce Framework
                            Map output records                               500,000     499,997   500,001




14
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1   Run 2     Run 3
                …           …                                                …           …         …
                            SPLIT_RAW_FILES                                  …           …         …
     Map-Reduce Framework
                            Map output records                               500,000     499,997   500,001


     cf1:col1        cf2:col2             cf3:col3
     cf1:col1
                     cf2:col2             cf3:col3
     cf1:col1



15
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1   Run 2     Run 3
                …           …                                                …           …         …
                            SPLIT_RAW_FILES                                  …           …         …
     Map-Reduce Framework
                            Map output records                               500,000     499,997   500,001


     cf1:col1        cf2:col2             cf3:col3
     cf1:col1
                     cf2:col2             cf3:col3
     cf1:col1
      Scale testing shows between 0.5% to 2% of inconsistent results between runs


16
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Impact
 •  Result is used to update user facing
    records
 •  Customer is not happy




17
                  ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Impact
 •  Result is used to update user facing
    records
 •  Customer is not happy
     — “Where is my data?”




18
                    ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Workarounds
 •  Re-try scan if not all CFs are present
 •  Re-submit job if any inconsistency is found




19
                  ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Workarounds
 •  Re-try scan if not all CFs are present
 •  Re-submit job if any inconsistency is found
 •  Sometimes that is not possible




20
                  ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Workarounds
 •  Re-try scan if not all CFs are present
 •  Re-submit job if any inconsistency is found
 •  Sometimes that is not possible SLAs!




21
                  ©2011 Cloudera, Inc. All Rights Reserved.
MVCC

 •  HBase maintains ACID semantics using
    Multiversion Concurrency Control
 •  Instead of overwriting state, create a new
    version of object with timestamp
     Timestamp   Row             fam1:col1                          fam2:col2
     t1          row1            val1                               val1




22
                        ©2012 Cloudera, Inc. All Rights Reserved.
MVCC

 •  HBase maintains ACID semantics using
    Multiversion Concurrency Control
 •  Instead of overwriting state, create a new
    version of object with timestamp
     Timestamp   Row             fam1:col1                          fam2:col2
     t2          row1            val2                               val2
     t1          row1            val1                               val1
 •  Reads never have to block
 •  Note this timestamp is not externally visible!
    Internally called “memStoreTs”


23
                        ©2012 Cloudera, Inc. All Rights Reserved.
HBase Write Path

 1.  Write to WAL (per RegionServer)
 2.  Write to In-Memory Sorted Map (MemStore)
     (per Region+ColumnFamily)
 3.  Flush MemStore to disk as HFile when
     MemStore hits configurable
     hbase.hregion.memstore.flush.size




24
                   ©2012 Cloudera, Inc. All Rights Reserved.
Internals / Bug




     Now that we know the internals – what
               could go wrong?




25
                  ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Let’s go back to the beginning…

                        MemStore
     Timestamp   Row             fam1:col1                        fam2:col2
     t1          row1            val1                             val1




26
                          ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Let’s go back to the beginning…

                        MemStore
     Timestamp   Row             fam1:col1                        fam2:col2
     t1          row1            val1                             val1



 And start a scan.




27
                          ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Let’s go back to the beginning…

                        MemStore
     Timestamp   Row             fam1:col1                        fam2:col2
     t2          row1            val2                             val2
     t1          row1            val1                             val1

 And start a scan.
 And concurrently put.




28
                          ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Let’s go back to the beginning…

                        MemStore
     Timestamp   Row             fam1:col1                        fam2:col2
     t2          row1            val2                             val2
     t1          row1            val1                             val1

 And start a scan.                                                       HFile
 And concurrently put.                                        Row           fam2:col2:

 Which causes a flush.                                        row1          val2
                                                              row1          val1




29
                          ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Now, scan needs to make sense of this…

                 MemStore
     Ts          Row           fam1:col1
     t2          row1          val2
     t1          row1          val1

                  HFile
          Row           fam2:col2:
          row1          val2
          row1          val1




30
                                ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Now, scan needs to make sense of this…

                    MemStore
     Ts             Row           fam1:col1
     t2             row1          val2
     t1             row1          val1

                     HFile
             Row           fam2:col2:
             row1          val2
             row1          val1
          But HFile has no timestamp!




31
                                   ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Now, scan needs to make sense of this…

                    MemStore
     Ts             Row           fam1:col1
     t2             row1          val2
     t1             row1          val1

                     HFile                                                     Inconsistent Result
             Row           fam2:col2:                      Row                    fam1:col1     fam2:col2
             row1          val2                            row1                   val1          val2
             row1          val1
          But HFile has no timestamp!




32
                                   ©2012 Cloudera, Inc. All Rights Reserved.
Solution
 Store the timestamp in the Hfile
          MemStore                                                      HFile
Ts        Row       fam1:col1                       Ts                 Row      fam2:col2:
t2        row1      val2                            t2                 row1     val2
t1        row1      val1                            t1                 row1     val1


                           Correct Result
             Row             fam1:col1                          fam2:col2
             row1            val1                               val2


 Now we have all the information we need


33
                           ©2012 Cloudera, Inc. All Rights Reserved.
Consistency
 •  Only some of the consistency issues in 0.90
    –  e.g. HBASE-5121: MajorCompaction may
       affect scan's correctness
 •  Solution: Upgrade to 0.92 or 0.94




34
                    ©2012 Cloudera, Inc. All Rights Reserved.
HBase 0.94




        “Performance Release”




35
              ©2012 Cloudera, Inc. All Rights Reserved.
Performance Improvements in 0.94
 •  HBASE-5047 Support checksums in HBase block cache
 •  HBASE-5199 Delete out of TTL store files before
    compaction selection
 •  HBASE-4608 HLog Compression
 •  HBASE-4465 Lazy-seek optimization for StoreFile
    scanners




36
                     ©2012 Cloudera, Inc. All Rights Reserved.
Performance Improvements in 0.94
 •  HBASE-5047 Support checksums in HBase block cache
 •  HBASE-5199 Delete out of TTL store files before
    compaction selection
 •  HBASE-4608 HLog Compression
 •  HBASE-4465 Lazy-seek optimization for StoreFile
    scanners




37
                     ©2012 Cloudera, Inc. All Rights Reserved.
HBASE-5047
 •  HDFS stores checksum is separate file
            HFile              Checksum




 •  So each file read actually requires two disk iops
 •  HBase often bottlenecked by random disk ipos




38
                        ©2012 Cloudera, Inc. All Rights Reserved.
HBASE-5047 Solution
 •  Solution: Store checksum in HFile block
              HFile                                   HFile Block
                                                            Chksum

                                                               Data




 •  On by default (“hbase.regionserver.checksum.verify”)
 •  Bytes per checksum (“hbase.hstore.bytes.per.checksum”) –
    default is 16K




39
                         ©2012 Cloudera, Inc. All Rights Reserved.
Performance Improvements in 0.94
 •  HBASE-5047 Support checksums in HBase block cache
 •  HBASE-5199 Delete out of TTL store files before
    compaction selection
 •  HBASE-4608 HLog Compression
 •  HBASE-4465 Lazy-seek optimization for StoreFile
    scanners




40
                     ©2012 Cloudera, Inc. All Rights Reserved.
HBASE-5199
 •  User can specify TTL per column family
 •  If all values in the HFile are expired, delete HFile rather
    than compact




 •  Off by default, turn on via
    ("hbase.store.delete.expired.storefile“)


41
                             ©2012 Cloudera, Inc. All Rights Reserved.
Conclusion
 •  Most consistency issues fixed in 0.92/
    CDH4
 •  Performance improvements in 0.94
 •  0.94 is wire compatible with 0.92, so will
    be in a CDH4 update




42
                   ©2012 Cloudera, Inc. All Rights Reserved.
References
 •  HBase Acid Semantics,
    http://hbase.apache.org/acid-semantics.html
 •  Apache HBase Meetup @ SU, Michael Stack.
    http://files.meetup.com/
    1350427/20120327hbase_meetup.pdf
 •  HBase Internals, Lars Hofhansl.
    http://www.cloudera.com/resource/hbasecon-2012-
    learning-hbase-internals/




43
                      ©2012 Cloudera, Inc. All Rights Reserved.

More Related Content

What's hot

From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersDatabricks
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkDuyhai Doan
 
GoldenGateテクニカルセミナー3「Oracle GoldenGate Technical Deep Dive」(2016/5/11)
GoldenGateテクニカルセミナー3「Oracle GoldenGate Technical Deep Dive」(2016/5/11)GoldenGateテクニカルセミナー3「Oracle GoldenGate Technical Deep Dive」(2016/5/11)
GoldenGateテクニカルセミナー3「Oracle GoldenGate Technical Deep Dive」(2016/5/11)オラクルエンジニア通信
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemCloudera, Inc.
 
Oracle GoldenGateでの資料採取(トラブル時に採取すべき資料)
Oracle GoldenGateでの資料採取(トラブル時に採取すべき資料)Oracle GoldenGateでの資料採取(トラブル時に採取すべき資料)
Oracle GoldenGateでの資料採取(トラブル時に採取すべき資料)オラクルエンジニア通信
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Oracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion EditionOracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion EditionMarkus Michalewicz
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadeaviadea
 
ビッグデータ処理データベースの全体像と使い分け - 2017年 Version -
ビッグデータ処理データベースの全体像と使い分け - 2017年 Version - ビッグデータ処理データベースの全体像と使い分け - 2017年 Version -
ビッグデータ処理データベースの全体像と使い分け - 2017年 Version - Tetsutaro Watanabe
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
GoldenGateテクニカルセミナー4「テクニカルコンサルタントが語るOracle GoldenGate現場で使える極意」(2016/5/11)
GoldenGateテクニカルセミナー4「テクニカルコンサルタントが語るOracle GoldenGate現場で使える極意」(2016/5/11)GoldenGateテクニカルセミナー4「テクニカルコンサルタントが語るOracle GoldenGate現場で使える極意」(2016/5/11)
GoldenGateテクニカルセミナー4「テクニカルコンサルタントが語るOracle GoldenGate現場で使える極意」(2016/5/11)オラクルエンジニア通信
 
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0Databricks
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 

What's hot (20)

Spark architecture
Spark architectureSpark architecture
Spark architecture
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
GoldenGateテクニカルセミナー3「Oracle GoldenGate Technical Deep Dive」(2016/5/11)
GoldenGateテクニカルセミナー3「Oracle GoldenGate Technical Deep Dive」(2016/5/11)GoldenGateテクニカルセミナー3「Oracle GoldenGate Technical Deep Dive」(2016/5/11)
GoldenGateテクニカルセミナー3「Oracle GoldenGate Technical Deep Dive」(2016/5/11)
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File System
 
Oracle GoldenGateでの資料採取(トラブル時に採取すべき資料)
Oracle GoldenGateでの資料採取(トラブル時に採取すべき資料)Oracle GoldenGateでの資料採取(トラブル時に採取すべき資料)
Oracle GoldenGateでの資料採取(トラブル時に採取すべき資料)
 
Oracle GoldenGate導入Tips
Oracle GoldenGate導入TipsOracle GoldenGate導入Tips
Oracle GoldenGate導入Tips
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Oracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion EditionOracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion Edition
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
 
ビッグデータ処理データベースの全体像と使い分け - 2017年 Version -
ビッグデータ処理データベースの全体像と使い分け - 2017年 Version - ビッグデータ処理データベースの全体像と使い分け - 2017年 Version -
ビッグデータ処理データベースの全体像と使い分け - 2017年 Version -
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
GoldenGateテクニカルセミナー4「テクニカルコンサルタントが語るOracle GoldenGate現場で使える極意」(2016/5/11)
GoldenGateテクニカルセミナー4「テクニカルコンサルタントが語るOracle GoldenGate現場で使える極意」(2016/5/11)GoldenGateテクニカルセミナー4「テクニカルコンサルタントが語るOracle GoldenGate現場で使える極意」(2016/5/11)
GoldenGateテクニカルセミナー4「テクニカルコンサルタントが語るOracle GoldenGate現場で使える極意」(2016/5/11)
 
Oracle GoldenGate入門
Oracle GoldenGate入門Oracle GoldenGate入門
Oracle GoldenGate入門
 
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0
 
Hadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返りHadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返り
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 

Viewers also liked

Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsCloudera, Inc.
 
Streaming map reduce
Streaming map reduceStreaming map reduce
Streaming map reducedanirayan
 
Chrome extensions
Chrome extensions Chrome extensions
Chrome extensions Ahmad Tahhan
 
阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践wuqiuping
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pubChao Zhu
 
Content Identification using HBase
Content Identification using HBaseContent Identification using HBase
Content Identification using HBaseHBaseCon
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideHBaseCon
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiHBaseCon
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)Amazon Web Services
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"Inhacking
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
 

Viewers also liked (20)

Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
 
Apache HBase 0.98
Apache HBase 0.98Apache HBase 0.98
Apache HBase 0.98
 
Streaming map reduce
Streaming map reduceStreaming map reduce
Streaming map reduce
 
Chrome extensions
Chrome extensions Chrome extensions
Chrome extensions
 
阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical Applications
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub
 
Content Identification using HBase
Content Identification using HBaseContent Identification using HBase
Content Identification using HBase
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's Guide
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and Kiji
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 

Similar to HBase Consistency and Performance Improvements

"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at ClouderaDataconomy Media
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - SlidesSeveralnines
 
"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TERyosuke IWANAGA
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HiveYukinori Suda
 
READPAST & Furious: Locking
READPAST & Furious: Locking READPAST & Furious: Locking
READPAST & Furious: Locking Mark Broadbent
 
The Practice of Alluxio in Ctrip Bigdata Platform
The Practice of Alluxio in Ctrip Bigdata PlatformThe Practice of Alluxio in Ctrip Bigdata Platform
The Practice of Alluxio in Ctrip Bigdata PlatformAlluxio, Inc.
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simonlucenerevolution
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Lucidworks (Archived)
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Alex Levenson
 
Steps to identify ONTAP latency related issues
Steps to identify ONTAP latency related issuesSteps to identify ONTAP latency related issues
Steps to identify ONTAP latency related issuesAshwin Pawar
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingAbdelhamide EL ARIB
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera, Inc.
 
Oracle 12.2 sharding learning more
Oracle 12.2 sharding learning moreOracle 12.2 sharding learning more
Oracle 12.2 sharding learning moreLeyi (Kamus) Zhang
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...buildacloud
 

Similar to HBase Consistency and Performance Improvements (20)

"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
 
"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
MySQL Replication
MySQL ReplicationMySQL Replication
MySQL Replication
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
 
READPAST & Furious: Locking
READPAST & Furious: Locking READPAST & Furious: Locking
READPAST & Furious: Locking
 
The Practice of Alluxio in Ctrip Bigdata Platform
The Practice of Alluxio in Ctrip Bigdata PlatformThe Practice of Alluxio in Ctrip Bigdata Platform
The Practice of Alluxio in Ctrip Bigdata Platform
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValuesColumn Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
Steps to identify ONTAP latency related issues
Steps to identify ONTAP latency related issuesSteps to identify ONTAP latency related issues
Steps to identify ONTAP latency related issues
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
 
14 lab-planing
14 lab-planing14 lab-planing
14 lab-planing
 
14 lab-planing
14 lab-planing14 lab-planing
14 lab-planing
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
 
Oracle 12.2 sharding learning more
Oracle 12.2 sharding learning moreOracle 12.2 sharding learning more
Oracle 12.2 sharding learning more
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...Miss joya
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Modelssonalikaur4
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000aliya bhat
 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girlsnehamumbai
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Serviceparulsinha
 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Miss joya
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceNehru place Escorts
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAAjennyeacort
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...rajnisinghkjn
 
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbaisonalikaur4
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptxDr.Nusrat Tariq
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaCall Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaPooja Gupta
 

Recently uploaded (20)

Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
 
97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
 
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptx
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaCall Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
 

HBase Consistency and Performance Improvements

  • 1. June  13,  2012   HBase Consistency and Performance Improvements Esteban  Gu+errez,  Gregory  Chanan   {esteban,  gchanan}@cloudera.com  
  • 2. HBase Consistency •  ACID guarantees within a single row •  “Any row returned by the scan will be a consistent view (i.e. that version of the complete row existed at some point in time)”[1] [1] http://hbase.apache.org/acid-semantics.html 2 ©2012 Cloudera, Inc. All Rights Reserved.
  • 3. HBase Consistency Issues •  Write Consistency Issues •  Read Consistency Issues 3 ©2012 Cloudera, Inc. All Rights Reserved.
  • 4. Write Consistency HBASE-4552 •  Importing Multiple CFs HFiles is not an atomic operation 4 ©2012 Cloudera, Inc. All Rights Reserved.
  • 5. Write Consistency HBASE-4552 •  Importing Multiple CFs HFiles was not an atomic operation is 5 ©2012 Cloudera, Inc. All Rights Reserved.
  • 6. Write Consistency HBASE-4552 HRegion.bulkLoadHFile() HFile1: HFile2: HFile3: HFile4: Row 1 fam1:col1 fam2:col2 fam3:col3 fam4:col4 val1 T1 Scan T2 Scan val1 val2 T3 Scan val1 val2 val3 T4 Scan val1 val2 val3 val4 < HBase 0.90.5 6 ©2012 Cloudera, Inc. All Rights Reserved.
  • 7. Write Consistency HBASE-4552 HRegion.bulkLoadHFiles() HFile1: HFile2: HFile3: HFile4: Row 1 fam1:col1 fam2:col2 fam3:col3 fam4:col4 T1 Scan public void bulkLoadHFiles(List<Pair<byte[], String>> familyPaths) {! ...! startRegionOperation(); ç lock.writeLock().lock()! T2 Scan } finally {! closeBulkRegionOperation(); ! }! T3 Scan ...! ! T4 Scan ≥ HBase 0.90.5 7 ©2012 Cloudera, Inc. All Rights Reserved.
  • 8. Write Consistency HBASE-4552 HRegion.bulkLoadHFiles() HFile1: HFile2: HFile3: HFile4: Row 1 fam1:col1 fam2:col2 fam3:col3 fam4:col4 T1 Scan public void bulkLoadHFiles(List<Pair<byte[], String>> familyPaths) {! ...! startRegionOperation(); ! T2 Scan } finally {! closeBulkRegionOperation(); ç lock.writeLock().unlock()! }! T3 Scan ...! ! T4 Scan ≥ HBase 0.90.5 8 ©2012 Cloudera, Inc. All Rights Reserved.
  • 9. Write Consistency HBASE-4552 HRegion.bulkLoadHFiles() HFile1: HFile2: HFile3: HFile4: Row 1 fam1:col1 fam2:col2 fam3:col3 fam4:col4 T1 Scan public void bulkLoadHFiles(List<Pair<byte[], String>> familyPaths) {! ...! startRegionOperation(); ! T2 Scan } finally {! closeBulkRegionOperation(); ! }! T3 Scan ...! ! T4 Scan val1 val2 val3 val4 ≥ HBase 0.90.5 9 ©2012 Cloudera, Inc. All Rights Reserved.
  • 10. Read Consistency HBASE-2856 •  Seen only twice in the wilderness •  Hard to detect if application monitoring is not implemented 10 ©2012 Cloudera, Inc. All Rights Reserved.
  • 11. Read Consistency HBASE-2856 •  Table size ≈ 50 M records •  Large number of CFs •  New records are continuously added to the table •  Concurrent MR Jobs on the same table •  Cluster has to meet strict SLAs 11 ©2011 Cloudera, Inc. All Rights Reserved.
  • 12. Read Consistency HBASE-2856 Symptoms Run 1 … … … SPLIT_RAW_FILES … Map-Reduce Framework Map output records 500,000 12 ©2011 Cloudera, Inc. All Rights Reserved.
  • 13. Read Consistency HBASE-2856 Symptoms Run 1 Run 2 … … … … SPLIT_RAW_FILES … … Map-Reduce Framework Map output records 500,000 499,997 13 ©2011 Cloudera, Inc. All Rights Reserved.
  • 14. Read Consistency HBASE-2856 Symptoms Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 14 ©2011 Cloudera, Inc. All Rights Reserved.
  • 15. Read Consistency HBASE-2856 Symptoms Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 cf1:col1 cf2:col2 cf3:col3 cf1:col1 cf2:col2 cf3:col3 cf1:col1 15 ©2011 Cloudera, Inc. All Rights Reserved.
  • 16. Read Consistency HBASE-2856 Symptoms Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 cf1:col1 cf2:col2 cf3:col3 cf1:col1 cf2:col2 cf3:col3 cf1:col1 Scale testing shows between 0.5% to 2% of inconsistent results between runs 16 ©2011 Cloudera, Inc. All Rights Reserved.
  • 17. Read Consistency HBASE-2856 Impact •  Result is used to update user facing records •  Customer is not happy 17 ©2011 Cloudera, Inc. All Rights Reserved.
  • 18. Read Consistency HBASE-2856 Impact •  Result is used to update user facing records •  Customer is not happy — “Where is my data?” 18 ©2011 Cloudera, Inc. All Rights Reserved.
  • 19. Read Consistency HBASE-2856 Workarounds •  Re-try scan if not all CFs are present •  Re-submit job if any inconsistency is found 19 ©2011 Cloudera, Inc. All Rights Reserved.
  • 20. Read Consistency HBASE-2856 Workarounds •  Re-try scan if not all CFs are present •  Re-submit job if any inconsistency is found •  Sometimes that is not possible 20 ©2011 Cloudera, Inc. All Rights Reserved.
  • 21. Read Consistency HBASE-2856 Workarounds •  Re-try scan if not all CFs are present •  Re-submit job if any inconsistency is found •  Sometimes that is not possible SLAs! 21 ©2011 Cloudera, Inc. All Rights Reserved.
  • 22. MVCC •  HBase maintains ACID semantics using Multiversion Concurrency Control •  Instead of overwriting state, create a new version of object with timestamp Timestamp Row fam1:col1 fam2:col2 t1 row1 val1 val1 22 ©2012 Cloudera, Inc. All Rights Reserved.
  • 23. MVCC •  HBase maintains ACID semantics using Multiversion Concurrency Control •  Instead of overwriting state, create a new version of object with timestamp Timestamp Row fam1:col1 fam2:col2 t2 row1 val2 val2 t1 row1 val1 val1 •  Reads never have to block •  Note this timestamp is not externally visible! Internally called “memStoreTs” 23 ©2012 Cloudera, Inc. All Rights Reserved.
  • 24. HBase Write Path 1.  Write to WAL (per RegionServer) 2.  Write to In-Memory Sorted Map (MemStore) (per Region+ColumnFamily) 3.  Flush MemStore to disk as HFile when MemStore hits configurable hbase.hregion.memstore.flush.size 24 ©2012 Cloudera, Inc. All Rights Reserved.
  • 25. Internals / Bug Now that we know the internals – what could go wrong? 25 ©2012 Cloudera, Inc. All Rights Reserved.
  • 26. Putting it together Let’s go back to the beginning… MemStore Timestamp Row fam1:col1 fam2:col2 t1 row1 val1 val1 26 ©2012 Cloudera, Inc. All Rights Reserved.
  • 27. Putting it together Let’s go back to the beginning… MemStore Timestamp Row fam1:col1 fam2:col2 t1 row1 val1 val1 And start a scan. 27 ©2012 Cloudera, Inc. All Rights Reserved.
  • 28. Putting it together Let’s go back to the beginning… MemStore Timestamp Row fam1:col1 fam2:col2 t2 row1 val2 val2 t1 row1 val1 val1 And start a scan. And concurrently put. 28 ©2012 Cloudera, Inc. All Rights Reserved.
  • 29. Putting it together Let’s go back to the beginning… MemStore Timestamp Row fam1:col1 fam2:col2 t2 row1 val2 val2 t1 row1 val1 val1 And start a scan. HFile And concurrently put. Row fam2:col2: Which causes a flush. row1 val2 row1 val1 29 ©2012 Cloudera, Inc. All Rights Reserved.
  • 30. Putting it together Now, scan needs to make sense of this… MemStore Ts Row fam1:col1 t2 row1 val2 t1 row1 val1 HFile Row fam2:col2: row1 val2 row1 val1 30 ©2012 Cloudera, Inc. All Rights Reserved.
  • 31. Putting it together Now, scan needs to make sense of this… MemStore Ts Row fam1:col1 t2 row1 val2 t1 row1 val1 HFile Row fam2:col2: row1 val2 row1 val1 But HFile has no timestamp! 31 ©2012 Cloudera, Inc. All Rights Reserved.
  • 32. Putting it together Now, scan needs to make sense of this… MemStore Ts Row fam1:col1 t2 row1 val2 t1 row1 val1 HFile Inconsistent Result Row fam2:col2: Row fam1:col1 fam2:col2 row1 val2 row1 val1 val2 row1 val1 But HFile has no timestamp! 32 ©2012 Cloudera, Inc. All Rights Reserved.
  • 33. Solution Store the timestamp in the Hfile MemStore HFile Ts Row fam1:col1 Ts Row fam2:col2: t2 row1 val2 t2 row1 val2 t1 row1 val1 t1 row1 val1 Correct Result Row fam1:col1 fam2:col2 row1 val1 val2 Now we have all the information we need 33 ©2012 Cloudera, Inc. All Rights Reserved.
  • 34. Consistency •  Only some of the consistency issues in 0.90 –  e.g. HBASE-5121: MajorCompaction may affect scan's correctness •  Solution: Upgrade to 0.92 or 0.94 34 ©2012 Cloudera, Inc. All Rights Reserved.
  • 35. HBase 0.94 “Performance Release” 35 ©2012 Cloudera, Inc. All Rights Reserved.
  • 36. Performance Improvements in 0.94 •  HBASE-5047 Support checksums in HBase block cache •  HBASE-5199 Delete out of TTL store files before compaction selection •  HBASE-4608 HLog Compression •  HBASE-4465 Lazy-seek optimization for StoreFile scanners 36 ©2012 Cloudera, Inc. All Rights Reserved.
  • 37. Performance Improvements in 0.94 •  HBASE-5047 Support checksums in HBase block cache •  HBASE-5199 Delete out of TTL store files before compaction selection •  HBASE-4608 HLog Compression •  HBASE-4465 Lazy-seek optimization for StoreFile scanners 37 ©2012 Cloudera, Inc. All Rights Reserved.
  • 38. HBASE-5047 •  HDFS stores checksum is separate file HFile Checksum •  So each file read actually requires two disk iops •  HBase often bottlenecked by random disk ipos 38 ©2012 Cloudera, Inc. All Rights Reserved.
  • 39. HBASE-5047 Solution •  Solution: Store checksum in HFile block HFile HFile Block Chksum Data •  On by default (“hbase.regionserver.checksum.verify”) •  Bytes per checksum (“hbase.hstore.bytes.per.checksum”) – default is 16K 39 ©2012 Cloudera, Inc. All Rights Reserved.
  • 40. Performance Improvements in 0.94 •  HBASE-5047 Support checksums in HBase block cache •  HBASE-5199 Delete out of TTL store files before compaction selection •  HBASE-4608 HLog Compression •  HBASE-4465 Lazy-seek optimization for StoreFile scanners 40 ©2012 Cloudera, Inc. All Rights Reserved.
  • 41. HBASE-5199 •  User can specify TTL per column family •  If all values in the HFile are expired, delete HFile rather than compact •  Off by default, turn on via ("hbase.store.delete.expired.storefile“) 41 ©2012 Cloudera, Inc. All Rights Reserved.
  • 42. Conclusion •  Most consistency issues fixed in 0.92/ CDH4 •  Performance improvements in 0.94 •  0.94 is wire compatible with 0.92, so will be in a CDH4 update 42 ©2012 Cloudera, Inc. All Rights Reserved.
  • 43. References •  HBase Acid Semantics, http://hbase.apache.org/acid-semantics.html •  Apache HBase Meetup @ SU, Michael Stack. http://files.meetup.com/ 1350427/20120327hbase_meetup.pdf •  HBase Internals, Lars Hofhansl. http://www.cloudera.com/resource/hbasecon-2012- learning-hbase-internals/ 43 ©2012 Cloudera, Inc. All Rights Reserved.