SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Real-time Analytics with
                               HBase
                        Alex Baranau, Sematext International
                                   (short version)




Tuesday, June 5, 12
About me


                      Software Engineer at Sematext International

                      http://blog.sematext.com/author/abaranau

                      @abaranau

                      http://github.com/sematext (abaranau)




                                                          Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Plan


                      Problem background: what? why?

                      Going real-time with append-only updates
                      approach: how?

                      Open-source implementation: how exactly?

                      Q&A




                                                         Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Background: our services
                      Systems Monitoring Service (Solr, HBase, ...)

                      Search Analytics Service



          data collector                                       Reports
                                     Data
                                                         100

                                                          75

          data collector          Analytics &             50


                                   Storage                25


          data collector                                   0
                                                               2007   2008   2009   2010




                                                                  Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Background: Report Example
                      Search engine (Solr) request latency




                                                         Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Background: requirements

                       High volume of input data

                       Multiple filters/dimensions

                       Interactive (fast) reports

                       Show wide range of data intervals

                       Real-time data changes visibility

                       No sampling, accurate data needed



                                                           Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Background: serve raw data?
                         simply storing all data points doesn’t work
                            to show 1-year worth of data points collected every second
                            31,536,000 points have to be fetched


                         pre-aggregation (at least partial) needed

                        Data Analytics & Storage                           Reports
                                               aggregated data
                        input data



                       data processing
                      (pre-aggregating)

                                                                              Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Background: pre-aggregation
                  OLAP-like Solution

                             aggregation rules
                            * filters/dimensions
                            * time range granularities   aggregated
                            * ...                           value


                                                         aggregated
           input data            processing                 value
              item
                                    logic
                                                         aggregated
                                                            value


                                                         Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Background: RMW updates are slow
                        more dimensions/filters -> greater output data vs input data
                        ratio

                        individual ready-modify-write (Get+Put) operations are slow
                        and not efficient (10-20+ times slower than only Puts)



                                   sensor1                          sensor2
                      ...            <...>
                                               sensor2
                                              value:15.0    ...    value:41.0
                                                                                  input
                                  Get   Put   Get   Put      Get        Put

                            ...   sensor1       sensor2
                                                                  ...           reports
                                    <...>      avg : 28.7          Get/Scan      sensor2
                                               min: 15.0
                      storage                  max: 41.0
                       (HBase)
                                                                                   Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Background: batch updates
                       More efficient data processing: multiple updates
                       processed at once, not individually

                       Decreases aggregation output (per input record)

                       Reliable, no data loss in case of failures



                       Not real-time

                       If done frequently (closer to real-time), still a lot of
                       costly Get+Put update operations

                       Handling of failures of tasks which partially wrote
                       data to HBase is complex

                                                                    Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Going Real-time
                              with
                      Append-based Updates




Tuesday, June 5, 12
Append-only: main goals

                      Increase record update throughput

                      Process updates more efficiently: reduce
                      operations number and resources usage

                      Ideally, apply high volume of incoming data
                      changes in real-time

                      Add ability to roll back changes

                      Handle well high update peaks


                                                           Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Append-only: how?

             1. Replace read-modify-write (Get+Put) operations
                      at write time with simple append-only writes (Put)

             2. Defer processing of updates to periodic jobs
             3. Perform processing of updates on the fly only
                      if user asks for data earlier than updates are
                      processed.




                                                              Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Append-only: writing updates

            1         Replace update (Get+Put) operations at write time
                      with simple append-only writes (Put)

                                sensor1          sensor2              sensor2
                      ...          ...          value:15.0   ...     value:41.0     input
                              Put          Put                                Put
                              ...     sensor1           sensor2
                                                                        ...
                                        <...>          avg : 22.7
                                                       max: 31.0
                                      sensor1
                                        <...>
                                                     ...
                                    ...                 sensor2
                                                       value: 15.0
                            storage                     sensor2
                                                       value: 41.0
                            (HBase)                  ...                            Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Append-only: writing updates

                      2        Defer processing of updates to periodic jobs


                      processing updates with MR job
         ...           sensor1           sensor2
                                                       ...
                         <...>          avg : 22.7           ...   sensor1       sensor2
                                                                                                          ...
                                        max: 31.0                    <...>      avg : 23.4
                       sensor1
                         <...>
                                           ...                                  max: 41.0
                         ...             sensor2
                                        value: 15.0
                                         sensor2
                                        value: 41.0
                                           ...

                                                                             Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Append-only: writing updates
                 3          Perform aggregations on the fly if user asks
                            for data earlier than updates are processed

                      ...     sensor1    sensor2
                                                         ...        reports
                                <...>   avg : 22.7                 sensor1         ...
                                        max: 31.0
                              sensor1
                                <...>
                                           ...
                                ...      sensor2
                                        value: 15.0
                                         sensor2
                      storage           value: 41.0
                                           ...
                                                       sensor2
                                                      avg : 23.4
                                                      max: 41.0

                                                                         Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Append-only: benefits
                      High update throughput

                      Real-time updates visibility

                      Efficient updates processing

                      Handling high peaks of update operations

                      Ability to roll back any range of changes

                      Automatically handling failures of tasks which only
                      partially updated data (e.g. in MR jobs)

                      Update operation becomes idempotent & atomic, easy
                      to scale writers horizontally

                                                                  Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
3/7
                  Append-only: efficient updates
                      To apply N changes:
                        N Get+Put operations replaced with

                        N Puts and 1 Scan (shared) + 1 Put operation

                      Applying N changes at once is much more
                      efficient than performing N individual changes
                        Especially when updated value is complex (like bitmaps),
                        takes time to load in memory

                        Skip compacting if too few records to process

                      Avoid a lot of redundant Get operations when
                      large portion of operations - inserting new data
                                                                   Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
5/7
                           Append-only: rollback
                         Rollbacks are easy when updates were not
                         processed yet (not merged)

                         To preserve rollback ability after they are
                         processed (and result is written back), updates
                         can be compacted into groups
                      written at:     processing updates
                         9:00       ...    sensor2
                                              ...
                                                       ...   ...   sensor2          ...
                                                                      ...
                                           sensor2
                                              ...
                                              ...
                         10:00             sensor2                 sensor2
                                              ...                     ...
                                           sensor2
                                              ...
                                              ...
                         11:00             sensor2                 sensor2
                                              ...                     ...
                                              ...                    ...
                                                                      Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
6/7
                      Append-only: idempotency
                       Using append-only approach helps recover from
                       failed tasks which write data to HBase
                          without rolling back partial updates

                          avoids applying duplicate updates

                          fixes task failure with simple restart of task

                       Note: new task should write records with same row
                       keys as failed one
                          easy, esp. given that input data is likely to be same

                       Very convenient when writing from MapReduce

                       Updates processing periodic jobs are also idempotent

                                                                                  Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Append-only: cons


                      Processing on the fly makes reading slower

                      Looking for data to compact (during periodic
                      compactions) may be inefficient

                      Increased amount of stored data depending
                      on use-case (in 0.92+)




                                                          Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Append-only updates implementation
                           HBaseHUT



Tuesday, June 5, 12
HBaseHUT: Overview

                      Simple

                      Easy to integrate into existing projects
                        Packed as a singe jar to be added to HBase client
                        classpath (also add it to RegionServer classpath to
                        benefit from server-side optimizations)

                        Supports native HBase API: HBaseHUT classes
                        implement native HBase interfaces

                      Apache License, v2.0


                                                                 Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
HBaseHUT: Overview
                      Processing of updates on-the-fly (behind
                      ResultScanner interface)

                        Allows storing back processed Result

                        Can use CPs to process updates on server-side

                      Periodic processing of updates with Scan or
                      MapReduce job
                        Including processing updates in groups based on write ts

                      Rolling back changes with MapReduce job


                                                                   Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
HBaseHUT: API overview
            Writing data:
            Put put = new Put(HutPut.adjustRow(rowKey));
            // ...
            hTable.put(put);


            Reading data:
            Scan scan = new Scan(startKey, stopKey);
            ResultScanner resultScanner =
                new HutResultScanner(hTable.getScanner(scan),
                                     updateProcessor);

            for (Result current : resultScanner) {...}




                                                            Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
HBaseHUT: API overview
                  Example UpdateProcessor:
                  public class MaxFunction extends UpdateProcessor {
                    // ... constructor & utility methods

                      @Override
                      public void process(Iterable<Result> records,
                                          UpdateProcessingResult result) {
                        Double maxVal = null;

                          for (Result record : records) {
                            double val = getValue(record);
                            if (maxVal == null || maxVal < val) {
                              maxVal = val;
                            }
                          }

                          result.add(colfam, qual, Bytes.toBytes(maxVal));
                      }
                  }
                                                                    Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
HBaseHUT: Next Steps
                      Wider CPs (HBase 0.92+) utilization
                         Process updates during memstore flush

                      Make use of Append operation (HBase 0.94+)

                      Integrate with asynchbase lib

                      Reduce storage overhead from adjusting row keys

                      etc.



                      Contributors are welcome!


                                                                Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12
Qs?
                      http://github.com/sematext/HBaseHUT

                      http://blog.sematext.com

                      @abaranau

                      http://github.com/sematext (abaranau)

                      http://sematext.com, we are hiring! ;)
                      there will be a longer version of the presentation on the web



                                                                            Alex Baranau, Sematext International, 2012
Tuesday, June 5, 12

Weitere ähnliche Inhalte

Andere mochten auch

Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time applicationEdward Yoon
 
HBase: Extreme makeover
HBase: Extreme makeoverHBase: Extreme makeover
HBase: Extreme makeoverbigbase
 
Transactions Over Apache HBase
Transactions Over Apache HBaseTransactions Over Apache HBase
Transactions Over Apache HBasealexbaranau
 
HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014larsgeorge
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduceHadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Hadoop User Group
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Hadoop User Group
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesData Con LA
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...Cloudera, Inc.
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini Cloudera, Inc.
 
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksLucidworks
 

Andere mochten auch (20)

Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
geniSIGHTS Offerings on Travel
geniSIGHTS Offerings on TravelgeniSIGHTS Offerings on Travel
geniSIGHTS Offerings on Travel
 
geniSIGHTS Offerings on eTail
geniSIGHTS Offerings on eTailgeniSIGHTS Offerings on eTail
geniSIGHTS Offerings on eTail
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time application
 
HBase: Extreme makeover
HBase: Extreme makeoverHBase: Extreme makeover
HBase: Extreme makeover
 
Transactions Over Apache HBase
Transactions Over Apache HBaseTransactions Over Apache HBase
Transactions Over Apache HBase
 
HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
 
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl PresentationJanuary 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
 
January 2011 HUG: Pig Presentation
January 2011 HUG: Pig PresentationJanuary 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use Cases
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
 
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
 

Ähnlich wie Real-time Analytics with HBase (short version)

Klout changing landscape of social media
Klout changing landscape of social mediaKlout changing landscape of social media
Klout changing landscape of social mediaDataWorks Summit
 
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...SL Corporation
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Chris Baglieri
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream AnalyticsMarco Parenzan
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasWes McKinney
 
Slides 111017220255-phpapp01
Slides 111017220255-phpapp01Slides 111017220255-phpapp01
Slides 111017220255-phpapp01Ken Mwai
 
From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012Anand Deshpande
 
What is going on? Application Diagnostics on Azure - Copenhagen .NET User Group
What is going on? Application Diagnostics on Azure - Copenhagen .NET User GroupWhat is going on? Application Diagnostics on Azure - Copenhagen .NET User Group
What is going on? Application Diagnostics on Azure - Copenhagen .NET User GroupMaarten Balliauw
 
Big Data launch keynote Singapore Patrick Buddenbaum
Big Data launch keynote Singapore Patrick BuddenbaumBig Data launch keynote Singapore Patrick Buddenbaum
Big Data launch keynote Singapore Patrick BuddenbaumIntelAPAC
 
Data pipelines from zero
Data pipelines from zero Data pipelines from zero
Data pipelines from zero Lars Albertsson
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Analyzer Uk 1211
Analyzer Uk 1211Analyzer Uk 1211
Analyzer Uk 1211liblib
 
Big Data launch Singapore Patrick Buddenbaum
Big Data launch Singapore Patrick BuddenbaumBig Data launch Singapore Patrick Buddenbaum
Big Data launch Singapore Patrick BuddenbaumIntelAPAC
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBasedarach
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...Yann Cluchey
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbenchEuropean Data Forum
 

Ähnlich wie Real-time Analytics with HBase (short version) (20)

Klout changing landscape of social media
Klout changing landscape of social mediaKlout changing landscape of social media
Klout changing landscape of social media
 
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
 
Slides 111017220255-phpapp01
Slides 111017220255-phpapp01Slides 111017220255-phpapp01
Slides 111017220255-phpapp01
 
From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012
 
What is going on? Application Diagnostics on Azure - Copenhagen .NET User Group
What is going on? Application Diagnostics on Azure - Copenhagen .NET User GroupWhat is going on? Application Diagnostics on Azure - Copenhagen .NET User Group
What is going on? Application Diagnostics on Azure - Copenhagen .NET User Group
 
Big Data launch keynote Singapore Patrick Buddenbaum
Big Data launch keynote Singapore Patrick BuddenbaumBig Data launch keynote Singapore Patrick Buddenbaum
Big Data launch keynote Singapore Patrick Buddenbaum
 
Data pipelines from zero
Data pipelines from zero Data pipelines from zero
Data pipelines from zero
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Analyzer Uk 1211
Analyzer Uk 1211Analyzer Uk 1211
Analyzer Uk 1211
 
Big Data launch Singapore Patrick Buddenbaum
Big Data launch Singapore Patrick BuddenbaumBig Data launch Singapore Patrick Buddenbaum
Big Data launch Singapore Patrick Buddenbaum
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbench
 

Kürzlich hochgeladen

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Kürzlich hochgeladen (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Real-time Analytics with HBase (short version)

  • 1. Real-time Analytics with HBase Alex Baranau, Sematext International (short version) Tuesday, June 5, 12
  • 2. About me Software Engineer at Sematext International http://blog.sematext.com/author/abaranau @abaranau http://github.com/sematext (abaranau) Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 3. Plan Problem background: what? why? Going real-time with append-only updates approach: how? Open-source implementation: how exactly? Q&A Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 4. Background: our services Systems Monitoring Service (Solr, HBase, ...) Search Analytics Service data collector Reports Data 100 75 data collector Analytics & 50 Storage 25 data collector 0 2007 2008 2009 2010 Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 5. Background: Report Example Search engine (Solr) request latency Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 6. Background: requirements High volume of input data Multiple filters/dimensions Interactive (fast) reports Show wide range of data intervals Real-time data changes visibility No sampling, accurate data needed Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 7. Background: serve raw data? simply storing all data points doesn’t work to show 1-year worth of data points collected every second 31,536,000 points have to be fetched pre-aggregation (at least partial) needed Data Analytics & Storage Reports aggregated data input data data processing (pre-aggregating) Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 8. Background: pre-aggregation OLAP-like Solution aggregation rules * filters/dimensions * time range granularities aggregated * ... value aggregated input data processing value item logic aggregated value Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 9. Background: RMW updates are slow more dimensions/filters -> greater output data vs input data ratio individual ready-modify-write (Get+Put) operations are slow and not efficient (10-20+ times slower than only Puts) sensor1 sensor2 ... <...> sensor2 value:15.0 ... value:41.0 input Get Put Get Put Get Put ... sensor1 sensor2 ... reports <...> avg : 28.7 Get/Scan sensor2 min: 15.0 storage max: 41.0 (HBase) Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 10. Background: batch updates More efficient data processing: multiple updates processed at once, not individually Decreases aggregation output (per input record) Reliable, no data loss in case of failures Not real-time If done frequently (closer to real-time), still a lot of costly Get+Put update operations Handling of failures of tasks which partially wrote data to HBase is complex Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 11. Going Real-time with Append-based Updates Tuesday, June 5, 12
  • 12. Append-only: main goals Increase record update throughput Process updates more efficiently: reduce operations number and resources usage Ideally, apply high volume of incoming data changes in real-time Add ability to roll back changes Handle well high update peaks Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 13. Append-only: how? 1. Replace read-modify-write (Get+Put) operations at write time with simple append-only writes (Put) 2. Defer processing of updates to periodic jobs 3. Perform processing of updates on the fly only if user asks for data earlier than updates are processed. Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 14. Append-only: writing updates 1 Replace update (Get+Put) operations at write time with simple append-only writes (Put) sensor1 sensor2 sensor2 ... ... value:15.0 ... value:41.0 input Put Put Put ... sensor1 sensor2 ... <...> avg : 22.7 max: 31.0 sensor1 <...> ... ... sensor2 value: 15.0 storage sensor2 value: 41.0 (HBase) ... Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 15. Append-only: writing updates 2 Defer processing of updates to periodic jobs processing updates with MR job ... sensor1 sensor2 ... <...> avg : 22.7 ... sensor1 sensor2 ... max: 31.0 <...> avg : 23.4 sensor1 <...> ... max: 41.0 ... sensor2 value: 15.0 sensor2 value: 41.0 ... Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 16. Append-only: writing updates 3 Perform aggregations on the fly if user asks for data earlier than updates are processed ... sensor1 sensor2 ... reports <...> avg : 22.7 sensor1 ... max: 31.0 sensor1 <...> ... ... sensor2 value: 15.0 sensor2 storage value: 41.0 ... sensor2 avg : 23.4 max: 41.0 Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 17. Append-only: benefits High update throughput Real-time updates visibility Efficient updates processing Handling high peaks of update operations Ability to roll back any range of changes Automatically handling failures of tasks which only partially updated data (e.g. in MR jobs) Update operation becomes idempotent & atomic, easy to scale writers horizontally Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 18. 3/7 Append-only: efficient updates To apply N changes: N Get+Put operations replaced with N Puts and 1 Scan (shared) + 1 Put operation Applying N changes at once is much more efficient than performing N individual changes Especially when updated value is complex (like bitmaps), takes time to load in memory Skip compacting if too few records to process Avoid a lot of redundant Get operations when large portion of operations - inserting new data Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 19. 5/7 Append-only: rollback Rollbacks are easy when updates were not processed yet (not merged) To preserve rollback ability after they are processed (and result is written back), updates can be compacted into groups written at: processing updates 9:00 ... sensor2 ... ... ... sensor2 ... ... sensor2 ... ... 10:00 sensor2 sensor2 ... ... sensor2 ... ... 11:00 sensor2 sensor2 ... ... ... ... Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 20. 6/7 Append-only: idempotency Using append-only approach helps recover from failed tasks which write data to HBase without rolling back partial updates avoids applying duplicate updates fixes task failure with simple restart of task Note: new task should write records with same row keys as failed one easy, esp. given that input data is likely to be same Very convenient when writing from MapReduce Updates processing periodic jobs are also idempotent Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 21. Append-only: cons Processing on the fly makes reading slower Looking for data to compact (during periodic compactions) may be inefficient Increased amount of stored data depending on use-case (in 0.92+) Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 22. Append-only updates implementation HBaseHUT Tuesday, June 5, 12
  • 23. HBaseHUT: Overview Simple Easy to integrate into existing projects Packed as a singe jar to be added to HBase client classpath (also add it to RegionServer classpath to benefit from server-side optimizations) Supports native HBase API: HBaseHUT classes implement native HBase interfaces Apache License, v2.0 Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 24. HBaseHUT: Overview Processing of updates on-the-fly (behind ResultScanner interface) Allows storing back processed Result Can use CPs to process updates on server-side Periodic processing of updates with Scan or MapReduce job Including processing updates in groups based on write ts Rolling back changes with MapReduce job Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 25. HBaseHUT: API overview Writing data: Put put = new Put(HutPut.adjustRow(rowKey)); // ... hTable.put(put); Reading data: Scan scan = new Scan(startKey, stopKey); ResultScanner resultScanner = new HutResultScanner(hTable.getScanner(scan), updateProcessor); for (Result current : resultScanner) {...} Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 26. HBaseHUT: API overview Example UpdateProcessor: public class MaxFunction extends UpdateProcessor { // ... constructor & utility methods @Override public void process(Iterable<Result> records, UpdateProcessingResult result) { Double maxVal = null; for (Result record : records) { double val = getValue(record); if (maxVal == null || maxVal < val) { maxVal = val; } } result.add(colfam, qual, Bytes.toBytes(maxVal)); } } Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 27. HBaseHUT: Next Steps Wider CPs (HBase 0.92+) utilization Process updates during memstore flush Make use of Append operation (HBase 0.94+) Integrate with asynchbase lib Reduce storage overhead from adjusting row keys etc. Contributors are welcome! Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12
  • 28. Qs? http://github.com/sematext/HBaseHUT http://blog.sematext.com @abaranau http://github.com/sematext (abaranau) http://sematext.com, we are hiring! ;) there will be a longer version of the presentation on the web Alex Baranau, Sematext International, 2012 Tuesday, June 5, 12