SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Apache HBase Table Snapshots
Matteo Bertozzi | Cloudera | Software Engineer / HBase Committer
Jonathan Hsieh | Cloudera | Software Engineer / HBase Committer
Jesse Yates | Salesforce.com | Software Engineer / HBase Committer
June 13, 2013
HBaseCon 2013
Outline
• Intro and Use Cases
• Usage Instructions
• Internals
• Snapshot Layout
• Snapshot Restoration
• Online Snapshots
• Conclusion
HBase Table Snapshots
Snapshot is a collection of
metadata required to
reconstitute the data near
a particular point in time
HBaseCon 2013 6/13/20133
HBase Table Snapshots
• An inexpensive way to freeze state of a
table
• A mechanism that helps backup data to
in the cluster or to a remote cluster
• Recover from user error
• Bootstrap Replication
HBaseCon 2013 6/13/20134
Old: HBase-Supported Batch Backups
• Export / Dist CP / Import
• 3 batch MR jobs
• Several extra copies of data
• High latency (hours)
• Impacts existing low-latency
workloads
• Copy Table
• 1 MR Job
• Single copy of data
• Incremental table copies
• High Latency (hours)
• Impacts existing workloads
Export
MR Job
Import
MR Job
Dist CP
MR Job
Copy Table
MR Job
HBaseCon 2013 6/13/20135
Upcoming: HDFS Snapshots (or DistCP backup)
• Take an hdfs snapshot of all the
files in the underlying HBase’s
data directory.
• Hfiles, hlogs, and other
metadata.
• Snapshots all tables in Hbase
• Cannot Clone tables
• “Restore As”
• Targeted for Hadoop 2.1 /
Hadoop 3.0 DistCP
HLog Append
Flush
Compact
Restart
Recover
HBaseCon 2013 6/13/20136
New: HBase Snapshot-based Backups
• Snapshot, then Export
• 1 MR Job
• Single copy of data
• Little impact on low-latency
workloads
• Export is like distcp directly
from hfds
• No incremental snapshot
copy
HBaseCon 2013
Export
Snapshot
6/13/20137
Export
• Like distcp for a snapshot manifest
• Copy data files without going through HBase’s “front door”
Export
HBaseCon 2013 6/13/20138
Recover from User Error
• How do we recover from user error?
Recovery Time
time
User Error:
drop ‘table’
Service is
restored, major data
loss
Service is down!
Panic! Black magic!
HBaseCon 2013 6/13/20139
Recovering from User Mistakes: Table Snapshots
• Snapshot the state of a table at a certain moment in time
• Restore it or Clone it later, creating a new read write table
• Export it to another cluster with minimal impact on HBase
time
User Error:
drop ‘table’
Service restored, Minor
data loss. Carry on.
Periodic
snapshot
Service is down!
Keep calm!
restore
Periodic
snapshot
HBaseCon 2013 6/13/201310
Usage
What an Admin needs to know
Configuration
• Simple hbase-site.xml configuration
<property>
<name>hbase.snapshot.enabled</name>
<value>true</value>
</property>
• Enabled by default in 0.95+
• Requires user to enable in 0.94.6.1+.
HBaseCon 2013 6/13/201312
Usage: Shell Commands
• snapshot ‘table’, ‘snapshot’
• Table can be offline or online
• list_snapshot [<regex>]
• clone_snapshot ‘snapshot’, ‘dsttable’
• restore_snapshot ‘snapshot’
• delete_snapshot ‘snapshot’
HBaseCon 2013 6/13/201313
Usage: Web UI
HBaseCon 2013 6/13/201314
Usage: Web UI
HBaseCon 2013 6/13/201315
Export: Usage
• Copy “MySnapshot” to a remote HDFS
• $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
MySnapshot -copy-to hdfs:///srv2:8082/hbase -mappers 16
• With permission change on the copy
• $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
MySnapshot -copy-to hdfs:///srv2:8082/hbase -chuser MyUser -chgroup MyGroup
-chmod 700 -mappers 16
HBaseCon 2013 6/13/201316
Debugging and Info
• Dump a snapshot manifest
• Writes to standard out
• Usage
• $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot
test-snapshot
• $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot
test-snapshot -files
HBaseCon 2013 6/13/201317
Metrics
• Histograms of operation completion
• snapshotTime
• cloneTime
• restoreTime
• Includes ‘extended’ metrics
• Std deviation
• Min/max
HBaseCon 2013 6/13/201318
Table Snapshot Internals
Internals
• HBase Table HDFS Layout
• Snapshot HDFS layout
• Offline Snapshots
• Restore and Clone Snapshot
• Online Snapshots
HBaseCon 2013 6/13/201320
Primer: HBase Table Layout in HDFS
• HRegions map directly to a directory structure
with table name, encoded region
name, column family and hfiles.
• In HDFS:
/hbase/Table/<enc R1>/cf/<hfile f11>
/hbase/Table/<enc R1>/cf/<hfile f12>
/hbase/Table/<enc R2>/cf/<hfile f21>
/hbase/Table/<enc R2>/cf/<hfile f22>
/hbase/Table/<enc R3>/cf/<hfile f31>
/hbase/Table/<enc R3>/cf/<hfile f32>
Table
F11 F21 F31
R1 R2 R3
6/13/2013HBaseCon 201321
Table Snapshots in the File System
• A Snapshot manifest contains references to files in the original
table.
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201322
Table Snapshots in the File System
• A Snapshot manifest contains references to files in the original
table.
• Each snapshot is stored in the hbase/.hbase-snapshots dir.
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201323
Offline Snapshots
• Disable table, then create Snapshot Manifest
• Created in temporary dir to guarantee snapshot creation
atomicity
• Includes
• Snapshot Metadata
• Table Metadata/Schema (.tableinfo)
• References to original HFiles
• Master-only file system operation
HBaseCon 2013 6/13/201324
HFile Life Cycle
• Splits and Compactions remove hfiles
• What happens to references to these files?
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201325
HFile Life Cycle
• Splits and Compactions remove hfiles
• What happens to references to these files?
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201326
HFile Life Cycle
• Splits and Compactions remove hfiles
• What happens to references to these files?
./.hbase-snapshots
Table
F11 F21
F31
+32
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
No more
Hfile??
HBaseCon 2013 6/13/201327
HFile Archiver
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
F31
HBaseCon 2013 6/13/201328
• We archive old HFiles from compactions (HBASE-5547)
HFile Archiver
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
F31
+32
Table files
F31
HBaseCon 2013 6/13/201329
• We archive old HFiles from compactions (HBASE-5547)
• Files stored in hbase/.archive
HFile Archiver
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
F31
+32
Table files
F31
• We archive old HFiles from compactions (HBASE-5547)
• Files stored in hbase/.archive
• HFileCleaner ensures HFiles’ data remains available
HBaseCon 2013 6/13/201330
Restore Snapshot Internals
Restore Operations
• Restore table
• Rollback table to specific state
• Clone from snapshot (Restore As)
• Create new read-write table from snapshot
• There can be multiple replicas of a snapshot
• Export snapshot
• Send snapshot and all its data to another cluster
HBaseCon 2013 6/13/201332
Clone: Creating table from a Snapshot
• Convert snapshot manifest info into a Table.
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
Clone
R1 R2 R3
F31
HBaseCon 2013 6/13/201333
Clone: Creating table from a Snapshot
• Convert snapshot manifest info into a Table.
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
Clone
R1 R2 R3
F31
HBaseCon 2013 6/13/201334
Clone: Creating table from a Snapshot
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
F31
+32
Table files
F31
Clone
R1 R2 R3
• Convert snapshot manifest info into a Table.
• HFileLinks (HBASE-6610) to mimic unix open file descriptor semantics
HBaseCon 2013 6/13/201335
Restore: Rollback to an old state
• Rollback the existing table to snapshot state
• Restores original schema if altered
• Snapshots current table, just in case
• Minimal overhead
• Smarter delete table & clone snapshot
• Handles creating/deleting regions
• Restore META
HBaseCon 2013 6/13/201336
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F31
+32 F41
R4
• Rollback “Table” to the “TableSnapshot” state
HBaseCon 2013 6/13/201337
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F31
+32 F41
R4
• Region “R4” is not present in the snapshot
• “R4” will be removed from “Table”, files moved to “.archive”
HBaseCon 2013 6/13/201338
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F31
+32
F41
• New files not present in the snapshots are moved to the archive
HBaseCon 2013 6/13/201339
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F41F3+
32
• New files not present in the snapshots are moved to the archive
• HFileLinks are created to point to old files.
HBaseCon 2013 6/13/201340
Restore failures
• The table to restore is disabled
• META and HDFS operations may fail (network issue, server down, …)
• hbck can’t repair an incomplete restore...
• Restore again!
HBaseCon 2013 6/13/201341
Export Snapshot
• Copy a full snapshot to another cluster
• All required HFiles, and Metadata
• Lots of options
• Fancy dist-cp
• Must resolve HFileLinks
• Faster than CopyTable or table export+import!
• Minimal impact on running cluster
HBaseCon 2013 6/13/201342
Online Snapshots
Online snapshots
• Take a snapshot without making the table
unavailable
• No need to disable the table
• Continue accepting reads and writes from
clients
• Challenges
• Coordinating Region Servers
• Data is in memory
• Consistency
HBaseCon 2013 6/13/201344
Offline vs Online Snapshots
Offline Online
mastermaster
RS1 RS2 RS3 RS4
verify
Snapshot
region
subprocedure
Write
manifest
per region
verify
Write
manifest
per region
6/13/2013HBaseCon 201345
Online Snapshots
• Each Region can have data in memstore and hlog, not yet Hfile
• Snapshot is missing in memory data!
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
F31
mem mem mem
HBaseCon 2013 6/13/201346
Online Snapshots
• Flush so that all in memory data written in an Hfile
• Then add to snapshot manifest
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
Table files
F31
F13 F23 F33
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201347
Online Snapshots
• Flush so that all in memory data in an Hfile
• Then add to snapshot manifest
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
Table files
F31
F13 F23 F33
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201348
Consistency
• Offline Snapshots
• Fully consistent snapshot
• Online Flush Snapshot
• “CopyTable” level consistency with a much smaller window.
• Time bounded by slowest region server and region flush
HBaseCon 2013 6/13/201349
Online Snapshots and Causal consistency
• Causal consistency would only allow A, AB, or neither A nor B.
• B and Not A is currently possible
Table
F11 F21
R1 R2 R3
F31
TableSnapshot manifest
R1 R2 R3
Master RS1 RS2 Client
mem mem
Flush SS
F13
HBaseCon 2013 6/13/201350
Online Snapshots and Causal consistency
• Causal consistency would only allow A, AB, or neither A nor B.
• B and Not A is currently possible
Table
F11 F21
R1 R2 R3
F31
TableSnapshot manifest
R1 R2 R3
Master RS1 RS2 Client
mem mem
Put A …
… then
Put B
F13
mem
HBaseCon 2013 6/13/201351
Online Snapshots and Causal consistency
• Causal consistency would allow A, AB, or neither A nor B.
• B and Not A is possible with Flush Snapshots
Table
F11 F21
R1 R2 R3
F31
TableSnapshot manifest
R1 R2 R3
Master RS1 RS2 Client
mem
F23
F13
Flush SS
Put B is
in but
Put A is
not!
F33
HBaseCon 2013 6/13/201352
Online snapshot attempts can fail
• If involved RS’s fail, the snapshots attempt will fail.
• Needs a way to prevent other table metadata operations
• Table Metadata Locks (0.95+)
• Avoid many snapshot failures conflicts(Ex: Online schema, splits)
• Failed attempt will report errors -- user must retry.
• o.a.h.hbase.snapshot.HBaseSnapshotException
• o.a.h.hbase.snapshot.CorruptedSnapshotException
HBaseCon 2013 6/13/201353
Development Notes
How we collaborated, built, and tested
Table Snapshots Development
• Developed in a Branch off of trunk
merged and in 0.95 and trunk.
• Feature is too big to include as a single
patch
• Does not destabilize trunk
• Does not slow time-based release
trains
• Later Backported to 0.94.6.1
src
branch
Reintegrate
into trunk
sync
HBaseCon 2013 6/13/201355
System testing with Jenkins
• Concurrently load data while taking snapshots
• Inject compactions, Kill RS’s, Meta RS, Master
• Create snapshot clones of the snapshots
• Inject Compactions, Kill RS’s, META Rs, Master
HBaseCon 2013 6/13/201356
Future Work:
• Alternative semantics and implementations
• Log Roll Snapshot (HBASE-7291)
• Store logs and replay on restore
• Faster for snapshot, slower and more complicated for restore.
• Timestamp Snapshot (HBASE-6866)
• All updates before ts in snapshot, all after not in snapshot
• Longer pause before snapshot taken
• Globally-Consistent Snapshot (HBASE-6867)
• global write lock for all regions nodes until snapshot complete.
• Expensive
• Repair tools
• Manual repairs necessary (hbck does not support yet)
HBaseCon 2013 6/13/201357
Conclusions
Feature Summary by Version
Apache 0.92.x
Apache <0.94.6.1
Apache 0.94.6.1+ Apache 0.95.0
Apache 0.96.0
Copy Table Copy Table Copy Table
Import / Export Import / Export Import / Export
Offline snapshots Offline snapshots
Flush Online
Snapshot
Flush Online
Snapshot
Table Locks
HBaseCon 2013 6/13/201359
Key Contributors
• Jesse Yates (Salesforce.com)
• HFileArchiver, Offline Snapshot, first draft online
• Matteo Bertozzi (Cloudera)
• HFileLink, Restore, clone, Testing, 0.94 backport
• Jonathan Hsieh (Cloudera)
• Online Snapshots revamp, Testing, Branch Sheppard
• Ted Yu (HortonWorks)
• Reviews
• Enis Soztutar (HortonWorks)
• Table Locks on Snapshots
HBaseCon 2013 6/13/201360
Thanks! Questions?
Matteo Bertozzi
@th30z
matteo.bertozzi@cloudera.com
Jonathan Hsieh
@jmhsieh
jon@cloudera.com
Jesse Yates
@jesse_yates
jesse.k.yates@gmail.com
HBaseCon 2013 6/13/201361

Weitere ähnliche Inhalte

Was ist angesagt?

Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiLev Brailovskiy
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in ImpalaCloudera, Inc.
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoAlluxio, Inc.
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudMichael Stack
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...StampedeCon
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNDataWorks Summit
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 

Was ist angesagt? (20)

Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in Impala
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
Multi-Tenancy
Multi-TenancyMulti-Tenancy
Multi-Tenancy
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 

Andere mochten auch

HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseCloudera, Inc.
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseCloudera, Inc.
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache HiveHBaseCon
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsMichael Stack
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storagehive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata StorageDataWorks Summit/Hadoop Summit
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Sematext Group, Inc.
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics Cloudera, Inc.
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterCloudera, Inc.
 

Andere mochten auch (20)

HBase Snapshots
HBase SnapshotsHBase Snapshots
HBase Snapshots
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at Pinterest
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storagehive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 

Ähnlich wie HBaseCon 2013: Apache HBase Table Snapshots

Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0enissoz
 
[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery[Altibase] 13 backup and recovery
[Altibase] 13 backup and recoveryaltistory
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
 
HBase Backups
HBase BackupsHBase Backups
HBase BackupsHBaseCon
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
 
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...Michael Stack
 
Hbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the EnterpriseHbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the EnterpriseSalesforce Engineering
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかToshihiro Suzuki
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxMarco Gralike
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...Michael Stack
 
Less17 flashback tb3
Less17 flashback tb3Less17 flashback tb3
Less17 flashback tb3Imran Ali
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
Oracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slidesOracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slidesSaiful
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under controlMarcin Przepiórowski
 
RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)Gustavo Rene Antunez
 

Ähnlich wie HBaseCon 2013: Apache HBase Table Snapshots (20)

Meet Apache HBase - 2.0
Meet Apache HBase - 2.0Meet Apache HBase - 2.0
Meet Apache HBase - 2.0
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0
 
[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
 
Les 12 fl_db
Les 12 fl_dbLes 12 fl_db
Les 12 fl_db
 
Hbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the EnterpriseHbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the Enterprise
 
Les 11 fl2
Les 11 fl2Les 11 fl2
Les 11 fl2
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
 
Less17 flashback tb3
Less17 flashback tb3Less17 flashback tb3
Less17 flashback tb3
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Oracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slidesOracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slides
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under control
 
RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

HBaseCon 2013: Apache HBase Table Snapshots

  • 1. Apache HBase Table Snapshots Matteo Bertozzi | Cloudera | Software Engineer / HBase Committer Jonathan Hsieh | Cloudera | Software Engineer / HBase Committer Jesse Yates | Salesforce.com | Software Engineer / HBase Committer June 13, 2013 HBaseCon 2013
  • 2. Outline • Intro and Use Cases • Usage Instructions • Internals • Snapshot Layout • Snapshot Restoration • Online Snapshots • Conclusion
  • 3. HBase Table Snapshots Snapshot is a collection of metadata required to reconstitute the data near a particular point in time HBaseCon 2013 6/13/20133
  • 4. HBase Table Snapshots • An inexpensive way to freeze state of a table • A mechanism that helps backup data to in the cluster or to a remote cluster • Recover from user error • Bootstrap Replication HBaseCon 2013 6/13/20134
  • 5. Old: HBase-Supported Batch Backups • Export / Dist CP / Import • 3 batch MR jobs • Several extra copies of data • High latency (hours) • Impacts existing low-latency workloads • Copy Table • 1 MR Job • Single copy of data • Incremental table copies • High Latency (hours) • Impacts existing workloads Export MR Job Import MR Job Dist CP MR Job Copy Table MR Job HBaseCon 2013 6/13/20135
  • 6. Upcoming: HDFS Snapshots (or DistCP backup) • Take an hdfs snapshot of all the files in the underlying HBase’s data directory. • Hfiles, hlogs, and other metadata. • Snapshots all tables in Hbase • Cannot Clone tables • “Restore As” • Targeted for Hadoop 2.1 / Hadoop 3.0 DistCP HLog Append Flush Compact Restart Recover HBaseCon 2013 6/13/20136
  • 7. New: HBase Snapshot-based Backups • Snapshot, then Export • 1 MR Job • Single copy of data • Little impact on low-latency workloads • Export is like distcp directly from hfds • No incremental snapshot copy HBaseCon 2013 Export Snapshot 6/13/20137
  • 8. Export • Like distcp for a snapshot manifest • Copy data files without going through HBase’s “front door” Export HBaseCon 2013 6/13/20138
  • 9. Recover from User Error • How do we recover from user error? Recovery Time time User Error: drop ‘table’ Service is restored, major data loss Service is down! Panic! Black magic! HBaseCon 2013 6/13/20139
  • 10. Recovering from User Mistakes: Table Snapshots • Snapshot the state of a table at a certain moment in time • Restore it or Clone it later, creating a new read write table • Export it to another cluster with minimal impact on HBase time User Error: drop ‘table’ Service restored, Minor data loss. Carry on. Periodic snapshot Service is down! Keep calm! restore Periodic snapshot HBaseCon 2013 6/13/201310
  • 11. Usage What an Admin needs to know
  • 12. Configuration • Simple hbase-site.xml configuration <property> <name>hbase.snapshot.enabled</name> <value>true</value> </property> • Enabled by default in 0.95+ • Requires user to enable in 0.94.6.1+. HBaseCon 2013 6/13/201312
  • 13. Usage: Shell Commands • snapshot ‘table’, ‘snapshot’ • Table can be offline or online • list_snapshot [<regex>] • clone_snapshot ‘snapshot’, ‘dsttable’ • restore_snapshot ‘snapshot’ • delete_snapshot ‘snapshot’ HBaseCon 2013 6/13/201313
  • 14. Usage: Web UI HBaseCon 2013 6/13/201314
  • 15. Usage: Web UI HBaseCon 2013 6/13/201315
  • 16. Export: Usage • Copy “MySnapshot” to a remote HDFS • $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/hbase -mappers 16 • With permission change on the copy • $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/hbase -chuser MyUser -chgroup MyGroup -chmod 700 -mappers 16 HBaseCon 2013 6/13/201316
  • 17. Debugging and Info • Dump a snapshot manifest • Writes to standard out • Usage • $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot test-snapshot • $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot test-snapshot -files HBaseCon 2013 6/13/201317
  • 18. Metrics • Histograms of operation completion • snapshotTime • cloneTime • restoreTime • Includes ‘extended’ metrics • Std deviation • Min/max HBaseCon 2013 6/13/201318
  • 20. Internals • HBase Table HDFS Layout • Snapshot HDFS layout • Offline Snapshots • Restore and Clone Snapshot • Online Snapshots HBaseCon 2013 6/13/201320
  • 21. Primer: HBase Table Layout in HDFS • HRegions map directly to a directory structure with table name, encoded region name, column family and hfiles. • In HDFS: /hbase/Table/<enc R1>/cf/<hfile f11> /hbase/Table/<enc R1>/cf/<hfile f12> /hbase/Table/<enc R2>/cf/<hfile f21> /hbase/Table/<enc R2>/cf/<hfile f22> /hbase/Table/<enc R3>/cf/<hfile f31> /hbase/Table/<enc R3>/cf/<hfile f32> Table F11 F21 F31 R1 R2 R3 6/13/2013HBaseCon 201321
  • 22. Table Snapshots in the File System • A Snapshot manifest contains references to files in the original table. ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201322
  • 23. Table Snapshots in the File System • A Snapshot manifest contains references to files in the original table. • Each snapshot is stored in the hbase/.hbase-snapshots dir. ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201323
  • 24. Offline Snapshots • Disable table, then create Snapshot Manifest • Created in temporary dir to guarantee snapshot creation atomicity • Includes • Snapshot Metadata • Table Metadata/Schema (.tableinfo) • References to original HFiles • Master-only file system operation HBaseCon 2013 6/13/201324
  • 25. HFile Life Cycle • Splits and Compactions remove hfiles • What happens to references to these files? ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201325
  • 26. HFile Life Cycle • Splits and Compactions remove hfiles • What happens to references to these files? ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201326
  • 27. HFile Life Cycle • Splits and Compactions remove hfiles • What happens to references to these files? ./.hbase-snapshots Table F11 F21 F31 +32 R1 R2 R3 TableSnapshot manifest R1 R2 R3 No more Hfile?? HBaseCon 2013 6/13/201327
  • 28. HFile Archiver ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files F31 HBaseCon 2013 6/13/201328 • We archive old HFiles from compactions (HBASE-5547)
  • 29. HFile Archiver ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 F31 +32 Table files F31 HBaseCon 2013 6/13/201329 • We archive old HFiles from compactions (HBASE-5547) • Files stored in hbase/.archive
  • 30. HFile Archiver ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 F31 +32 Table files F31 • We archive old HFiles from compactions (HBASE-5547) • Files stored in hbase/.archive • HFileCleaner ensures HFiles’ data remains available HBaseCon 2013 6/13/201330
  • 32. Restore Operations • Restore table • Rollback table to specific state • Clone from snapshot (Restore As) • Create new read-write table from snapshot • There can be multiple replicas of a snapshot • Export snapshot • Send snapshot and all its data to another cluster HBaseCon 2013 6/13/201332
  • 33. Clone: Creating table from a Snapshot • Convert snapshot manifest info into a Table. ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files Clone R1 R2 R3 F31 HBaseCon 2013 6/13/201333
  • 34. Clone: Creating table from a Snapshot • Convert snapshot manifest info into a Table. ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files Clone R1 R2 R3 F31 HBaseCon 2013 6/13/201334
  • 35. Clone: Creating table from a Snapshot ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 F31 +32 Table files F31 Clone R1 R2 R3 • Convert snapshot manifest info into a Table. • HFileLinks (HBASE-6610) to mimic unix open file descriptor semantics HBaseCon 2013 6/13/201335
  • 36. Restore: Rollback to an old state • Rollback the existing table to snapshot state • Restores original schema if altered • Snapshots current table, just in case • Minimal overhead • Smarter delete table & clone snapshot • Handles creating/deleting regions • Restore META HBaseCon 2013 6/13/201336
  • 37. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F31 +32 F41 R4 • Rollback “Table” to the “TableSnapshot” state HBaseCon 2013 6/13/201337
  • 38. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F31 +32 F41 R4 • Region “R4” is not present in the snapshot • “R4” will be removed from “Table”, files moved to “.archive” HBaseCon 2013 6/13/201338
  • 39. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F31 +32 F41 • New files not present in the snapshots are moved to the archive HBaseCon 2013 6/13/201339
  • 40. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F41F3+ 32 • New files not present in the snapshots are moved to the archive • HFileLinks are created to point to old files. HBaseCon 2013 6/13/201340
  • 41. Restore failures • The table to restore is disabled • META and HDFS operations may fail (network issue, server down, …) • hbck can’t repair an incomplete restore... • Restore again! HBaseCon 2013 6/13/201341
  • 42. Export Snapshot • Copy a full snapshot to another cluster • All required HFiles, and Metadata • Lots of options • Fancy dist-cp • Must resolve HFileLinks • Faster than CopyTable or table export+import! • Minimal impact on running cluster HBaseCon 2013 6/13/201342
  • 44. Online snapshots • Take a snapshot without making the table unavailable • No need to disable the table • Continue accepting reads and writes from clients • Challenges • Coordinating Region Servers • Data is in memory • Consistency HBaseCon 2013 6/13/201344
  • 45. Offline vs Online Snapshots Offline Online mastermaster RS1 RS2 RS3 RS4 verify Snapshot region subprocedure Write manifest per region verify Write manifest per region 6/13/2013HBaseCon 201345
  • 46. Online Snapshots • Each Region can have data in memstore and hlog, not yet Hfile • Snapshot is missing in memory data! ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files F31 mem mem mem HBaseCon 2013 6/13/201346
  • 47. Online Snapshots • Flush so that all in memory data written in an Hfile • Then add to snapshot manifest ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 Table files F31 F13 F23 F33 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201347
  • 48. Online Snapshots • Flush so that all in memory data in an Hfile • Then add to snapshot manifest ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 Table files F31 F13 F23 F33 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201348
  • 49. Consistency • Offline Snapshots • Fully consistent snapshot • Online Flush Snapshot • “CopyTable” level consistency with a much smaller window. • Time bounded by slowest region server and region flush HBaseCon 2013 6/13/201349
  • 50. Online Snapshots and Causal consistency • Causal consistency would only allow A, AB, or neither A nor B. • B and Not A is currently possible Table F11 F21 R1 R2 R3 F31 TableSnapshot manifest R1 R2 R3 Master RS1 RS2 Client mem mem Flush SS F13 HBaseCon 2013 6/13/201350
  • 51. Online Snapshots and Causal consistency • Causal consistency would only allow A, AB, or neither A nor B. • B and Not A is currently possible Table F11 F21 R1 R2 R3 F31 TableSnapshot manifest R1 R2 R3 Master RS1 RS2 Client mem mem Put A … … then Put B F13 mem HBaseCon 2013 6/13/201351
  • 52. Online Snapshots and Causal consistency • Causal consistency would allow A, AB, or neither A nor B. • B and Not A is possible with Flush Snapshots Table F11 F21 R1 R2 R3 F31 TableSnapshot manifest R1 R2 R3 Master RS1 RS2 Client mem F23 F13 Flush SS Put B is in but Put A is not! F33 HBaseCon 2013 6/13/201352
  • 53. Online snapshot attempts can fail • If involved RS’s fail, the snapshots attempt will fail. • Needs a way to prevent other table metadata operations • Table Metadata Locks (0.95+) • Avoid many snapshot failures conflicts(Ex: Online schema, splits) • Failed attempt will report errors -- user must retry. • o.a.h.hbase.snapshot.HBaseSnapshotException • o.a.h.hbase.snapshot.CorruptedSnapshotException HBaseCon 2013 6/13/201353
  • 54. Development Notes How we collaborated, built, and tested
  • 55. Table Snapshots Development • Developed in a Branch off of trunk merged and in 0.95 and trunk. • Feature is too big to include as a single patch • Does not destabilize trunk • Does not slow time-based release trains • Later Backported to 0.94.6.1 src branch Reintegrate into trunk sync HBaseCon 2013 6/13/201355
  • 56. System testing with Jenkins • Concurrently load data while taking snapshots • Inject compactions, Kill RS’s, Meta RS, Master • Create snapshot clones of the snapshots • Inject Compactions, Kill RS’s, META Rs, Master HBaseCon 2013 6/13/201356
  • 57. Future Work: • Alternative semantics and implementations • Log Roll Snapshot (HBASE-7291) • Store logs and replay on restore • Faster for snapshot, slower and more complicated for restore. • Timestamp Snapshot (HBASE-6866) • All updates before ts in snapshot, all after not in snapshot • Longer pause before snapshot taken • Globally-Consistent Snapshot (HBASE-6867) • global write lock for all regions nodes until snapshot complete. • Expensive • Repair tools • Manual repairs necessary (hbck does not support yet) HBaseCon 2013 6/13/201357
  • 59. Feature Summary by Version Apache 0.92.x Apache <0.94.6.1 Apache 0.94.6.1+ Apache 0.95.0 Apache 0.96.0 Copy Table Copy Table Copy Table Import / Export Import / Export Import / Export Offline snapshots Offline snapshots Flush Online Snapshot Flush Online Snapshot Table Locks HBaseCon 2013 6/13/201359
  • 60. Key Contributors • Jesse Yates (Salesforce.com) • HFileArchiver, Offline Snapshot, first draft online • Matteo Bertozzi (Cloudera) • HFileLink, Restore, clone, Testing, 0.94 backport • Jonathan Hsieh (Cloudera) • Online Snapshots revamp, Testing, Branch Sheppard • Ted Yu (HortonWorks) • Reviews • Enis Soztutar (HortonWorks) • Table Locks on Snapshots HBaseCon 2013 6/13/201360
  • 61. Thanks! Questions? Matteo Bertozzi @th30z matteo.bertozzi@cloudera.com Jonathan Hsieh @jmhsieh jon@cloudera.com Jesse Yates @jesse_yates jesse.k.yates@gmail.com HBaseCon 2013 6/13/201361

Hinweis der Redaktion

  1. Talk about everything! Don’t glaze over internals
  2. Why does it cause extra latency? What “crushes” the cluster?
  3. Causes move expensive backups. Have a bunch of ‘write optimized files’ – HLogs and have to convert them to ‘read optimized files’ – HFIles. This isn’t a cheap process.
  4. Don’t over sell! Just say what it is.
  5. Add a quick summary of what just talked about BEFORE handoff!!!