SlideShare ist ein Scribd-Unternehmen logo
1 von 112
1500 JIRAs in 20
Minutes
The Evolution of HBase, 2012-2013
Ian Varley, Salesforce.com
@thefutureian
It's been a year since the
first HBaseCon.
What's changed?
It's been a year since the
first HBaseCon.
What's changed?
(besides my beard length)
One lens on the evolution of
HBase is through JIRA
(issue tracking system).
HBase has a lot of activity.
HBase has a lot of activity.
Total JIRAs, all time: ~8700
HBase has a lot of activity.
Opened in last year: ~2500
Total JIRAs, all time: ~8700
HBase has a lot of activity.
Opened in last year: ~2500
Fixed in last year: 1638
Total JIRAs, all time: ~8700
HBase has a lot of activity.
Opened in last year: ~2500
Fixed in last year: 1638
Total JIRAs, all time: ~8700
resolved >= 2012-05-23
AND resolved <= 2013-05-24
AND resolution in (Fixed, Implemented)
So we're going to talk about
them all. One by one.
We need to narrow it down.
First, let's get rid of the non-
functional changes:
First, let's get rid of the non-
functional changes:
Test: 307
First, let's get rid of the non-
functional changes:
Test:
Build:
307
55
First, let's get rid of the non-
functional changes:
Test:
Build:
Doc:
307
55
107
First, let's get rid of the non-
functional changes:
Test:
Build:
Doc:
Ports:
307
55
107
62
First, let's get rid of the non-
functional changes:
Test:
Build:
Doc:
Ports:
307
55
107
62
503(some overlap)
Total:
First, let's get rid of the non-
functional changes:
Test:
Build:
Doc:
Ports:
307
55
107
62
503(some overlap)
"test", "junit", etc.
"pom", "classpath", "mvn", "build", etc.
"book", "[site]", "[refGuide]", "javadoc", etc.
"backport", "forward port", etc.
Total:
That leaves 1135 functional
changes to go over.
(In 18 minutes.)
Break what's left into 2 parts:
• Big Topics (20+ JIRAs on same issue)
• Indie Hits (Cool for some other reason)
Top 10 "big topics":
Top 10 "big topics":
Snapshots:
Top 10 "big topics":
82
Snapshots:
Replication:
Top 10 "big topics":
82
58
Snapshots:
Replication:
Compaction:
Top 10 "big topics":
82
58
54
Snapshots:
Replication:
Compaction:
Metrics:
Top 10 "big topics":
82
58
54
53
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Top 10 "big topics":
82
58
54
53
44
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
82
58
54
53
44
37
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
82
58
54
53
44
37
34
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
82
58
54
53
44
37
34
28
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:
82
58
54
53
44
37
34
28
23
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:
Modularization:
82
58
54
53
44
37
34
28
23
21
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:
Modularization:
82
58
54
53
44
37
34
28
23
21
416(some overlap)
(305 functional, 111 non-functional)
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:
Modularization:
82
58
54
53
44
37
34
28
23
21
416(some overlap)
(305 functional, 111 non-functional)
Let's dive in to the top 3.
Snapshots
The gist: Take advantage of the fact that files in HDFS are already immutable
to get fast "snapshots" of tables that you can roll back to. This is pretty tricky
when you consider HBase is a distributed system and you want a point in time.
Main JIRAs:
• HBASE-6055 - Offline Snapshots: Take a snapshot after first disabling
the table
• HBASE-7290 - Online Snapshots: Take a snapshot of a live, running
table by splitting the memstore.
• HBASE-7360 - Backport Snapshots to 0.94
Top contributors: Matteo B, Jonathan H, Ted Y, Jesse Y, Enis S
Replication
The gist: use asynchronous WAL shipping to replay all edits on a different
(possibly remote) cluster, for Disaster Recovery or other operational purposes.
Main JIRAs:
• HBASE-1295 - Multi-data-center replication: Top level issue. Real meat
was actually implemented in 0.90 (Jan 2010), so not a new feature.
• HBASE-8207- Data loss when machine name contains "-". Doh.
• HBASE-2611 - Handle RS failure while processing failure of another:
This was an ugly issue that took a while to fix. Corner cases matter!
Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H
Replication
The gist: use asynchronous WAL shipping to replay all edits on a different
(possibly remote) cluster, for Disaster Recovery or other operational purposes.
Main JIRAs:
• HBASE-1295 - Multi-data-center replication: Top level issue. Real meat
was actually implemented in 0.90 (Jan 2010), so not a new feature.
• HBASE-8207- Data loss when machine name contains "-". Doh.
• HBASE-2611 - Handle RS failure while processing failure of another:
This was an ugly issue that took a while to fix. Corner cases matter!
Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H
Theme: corner cases!
Replication
The gist: use asynchronous WAL shipping to replay all edits on a different
(possibly remote) cluster, for Disaster Recovery or other operational purposes.
Main JIRAs:
• HBASE-1295 - Multi-data-center replication: Top level issue. Real meat
was actually implemented in 0.90 (Jan 2010), so not a new feature.
• HBASE-8207- Data loss when machine name contains "-". Doh.
• HBASE-2611 - Handle RS failure while processing failure of another:
This was an ugly issue that took a while to fix. Corner cases matter!
Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H
Plug: stick around next while Chris Trezzo tweets about Replication!!
Theme: corner cases! Corner Case!
Compaction
The gist: In an LSM store, if you don't compact the store files, you end up with
lots of 'em, which makes reads slower. Not a new feature, just improvements.
Main JIRAs:
• HBASE-7516 - Make compaction policy pluggable: allow users to
customize which files are included for compaction.
• HBASE-2231 - Compaction events should be written to HLog: deal with
the case when regions have been reassigned since compaction started.
Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y
Compaction
The gist: In an LSM store, if you don't compact the store files, you end up with
lots of 'em, which makes reads slower. Not a new feature, just improvements.
Main JIRAs:
• HBASE-7516 - Make compaction policy pluggable: allow users to
customize which files are included for compaction.
• HBASE-2231 - Compaction events should be written to HLog: deal with
the case when regions have been reassigned since compaction started.
Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y
Corner Case!
Compaction
The gist: In an LSM store, if you don't compact the store files, you end up with
lots of 'em, which makes reads slower. Not a new feature, just improvements.
Main JIRAs:
• HBASE-7516 - Make compaction policy pluggable: allow users to
customize which files are included for compaction.
• HBASE-2231 - Compaction events should be written to HLog: deal with
the case when regions have been reassigned since compaction started.
Look for cool stuff to come in the next year with tiered (aka "leveled")
compaction policies, so you could do stuff like (e.g.) put "recent" data into
smaller files that'll be hit frequently, and the older "long tail" data into bigger
files that'll be hit less frequently.
Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y
Corner Case!
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics
Assignment
Hadoop 2
Protobufs
Security
Bulk Loading
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment
Hadoop 2
Protobufs
Security
Bulk Loading
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2
Protobufs
Security
Bulk Loading
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs
Security
Bulk Loading
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs: wire compatibility!
Security
Bulk Loading
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs: wire compatibility!
Security: kerberos, in the core.
Bulk Loading
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs: wire compatibility!
Security: kerberos, in the core.
Bulk Loading: pop in an HFile.
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs: wire compatibility!
Security: kerberos, in the core.
Bulk Loading: pop in an HFile.
Modularization: break up the code.
Now on to the
"Indie Hits JIRAs".
What's left? About half.
Blocker:
Critical:
Major:
Minor:
31
88
455
206
830
Trivial: 52
1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining
Blocker:
Critical:
Major:
Minor:
31
88
455
206
573
Trivial: 52
Let's cut out these:
830
What's left? About half.1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining
We can't cover 573 issues.
Let's just hit a few cool ones.
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-5416
Interesting because:most commented JIRA (200+ human comments!)
Improve perf of scans with some kinds of filters
What? Avoid loading non-essential CFs until after filters run, big perf gain.
How?
+++ Filter.java:
+ abstract public boolean isFamilyEssential(byte[] name);
+++ HRegion.java:
KeyValueScanner scanner = store.getScanner(scan, entry.getValue());
- scanners.add(scanner);
+ if (this.filter == null || !scan.doLoadColumnFamiliesOnDemand()
+ || this.filter.isFamilyEssential(entry.getKey())) {
+ scanners.add(scanner);
+ } else {
+ joinedScanners.add(scanner);
+ }
By: Max Lapan for original idea & patch, Sergey Shelukhin for final impl
200 comments? Srsly?
From whom?
To save you some time, allow
me to summarize.
Reenactment ...
Feb 2012:
• Max Lapan: Hey guys, here's a cool patch!
Reenactment ...
Feb 2012:
• Max Lapan: Hey guys, here's a cool patch!
• Nicolas S: This should be an app detail, not in core.
Reenactment ...
Feb 2012:
• Max Lapan: Hey guys, here's a cool patch!
• Nicolas S: This should be an app detail, not in core.
• Ted Yu: I fixed your typos while you were asleep!
Reenactment ...
Feb 2012:
• Max Lapan: Hey guys, here's a cool patch!
• Nicolas S: This should be an app detail, not in core.
• Ted Yu: I fixed your typos while you were asleep!
• Nick: Not enough utest coverage to put this in core.
• Max: Agree, but I can't find any other way to do this.
Reenactment ...
Feb 2012:
• Max Lapan: Hey guys, here's a cool patch!
• Nicolas S: This should be an app detail, not in core.
• Ted Yu: I fixed your typos while you were asleep!
• Nick: Not enough utest coverage to put this in core.
• Max: Agree, but I can't find any other way to do this.
• Kannan: Why don't you try 2-phase w/ multiget?
• Max: OK, ok, I'll try it.
Reenactment ...
May 2012:
• Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
Reenactment ...
May 2012:
• Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
Reenactment ...
May 2012:
• Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
• Ted: Holy guacamole that's a big patch.
Reenactment ...
May 2012:
• Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
• Ted: Holy guacamole that's a big patch.
July 2012:
• Max: Anybody there? Here's a perf test.
• Ted: Cool!
Reenactment ...
May 2012:
• Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
• Ted: Holy guacamole that's a big patch.
July 2012:
• Max: Anybody there? Here's a perf test.
• Ted: Cool!
Oct 2012:
• Anoop: A coprocessor would make faster.
• Max: We're on 0.90 and can't use CP.
• Stack: -1, FB guys are right about needing more tests.
Reenactment ...
Dec 2012:
• Sergey: I'm on it guys. Rebased on trunk, added the
ability to configure, and integration tests.
Reenactment ...
Dec 2012:
• Sergey: I'm on it guys. Rebased on trunk, added the
ability to configure, and integration tests.
• Stack: Still not enough tests. Some new code even
when disabled? Who's reviewing? Go easy lads.
Reenactment ...
Dec 2012:
• Sergey: I'm on it guys. Rebased on trunk, added the
ability to configure, and integration tests.
• Stack: Still not enough tests. Some new code even
when disabled? Who's reviewing? Go easy lads.
• Ram: I'm on it. Couple improvements, but looks good.
Reenactment ...
Dec 31st, 2012 (while everyone else is partying):
• Lars: Ooh, let's pull this into 0.94! I made a patch.
Reenactment ...
Dec 31st, 2012 (while everyone else is partying):
• Lars: Ooh, let's pull this into 0.94! I made a patch.
• Lars: ... hold the phone! This slows down a tight loop
case (even when disabled) by 10-20%.
Reenactment ...
Dec 31st, 2012 (while everyone else is partying):
• Lars: Ooh, let's pull this into 0.94! I made a patch.
• Lars: ... hold the phone! This slows down a tight loop
case (even when disabled) by 10-20%.
• Ted: I optimized the disabled path.
• Lars: Sweet.
Reenactment ...
Dec 31st, 2012 (while everyone else is partying):
• Lars: Ooh, let's pull this into 0.94! I made a patch.
• Lars: ... hold the phone! This slows down a tight loop
case (even when disabled) by 10-20%.
• Ted: I optimized the disabled path.
• Lars: Sweet.
Reenactment ...
Jan, 2013:
• Ram: +1, let's commit.
• Ted: Committed to trunk
• Lars: Committed to 0.94.
Reenactment ...
Jan, 2013:
• Ram: +1, let's commit.
• Ted: Committed to trunk
• Lars: Committed to 0.94.
And there was much rejoi....
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
• All: Crapface.
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
• All: Crapface.
• Stack: We should back this out. SOMA pride!! Also,
Dave is running world's biggest HBase cluster, FYI.
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
• All: Crapface.
• Stack: We should back this out. SOMA pride!!
Also, Dave is running world's biggest HBase
cluster, FYI.
• Lars: Filter is internal. Extend FilterBase maybe?
• Ted: If we take it OUT now, it's also a regression.
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
• All: Crapface.
• Stack: We should back this out. SOMA pride!! Also,
Dave is running world's biggest HBase cluster, FYI.
• Lars: Filter is internal. Extend FilterBase maybe?
• Ted: If we take it OUT now, it's also a regression.
• Dave: Chill dudes, we can fix by changing our client.
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
• All: Crapface.
• Stack: We should back this out. SOMA pride!!
Also, Dave is running world's biggest HBase
cluster, FYI.
• Lars: Filter is internal. Extend FilterBase maybe?
• Ted: If we take it OUT now, it's also a regression.
• Dave: Chill dudes, we can fix by changing our client.
• All: Uhh ... change it? Keep it? Change it?
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
• All: Crapface.
• Stack: We should back this out. SOMA pride!!
Also, Dave is running world's biggest HBase
cluster, FYI.
• Lars: Filter is internal. Extend FilterBase maybe?
• Ted: If we take it OUT now, it's also a regression.
• Dave: Chill dudes, we can fix by changing our client.
• All: Uhh ... change it? Keep it? Change it?
Resolution: Change it (HBASE-7920)
Moral of the story?
• JIRA comments are a great way to learn.
• Do the work to keep new features from
destabilizing core code paths.
• Careful with changing interfaces.
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-4676
Interesting because:most watched (42 watchers), and biggest patch.
Prefix Compression - Trie data block encoding
What? An optimization to compress what we store for key/value prefixes.
How? ~8000 new lines added! (Originally written in git repo, here)
At SFDC, James Taylor reported seeing 5-15x improvement in
Phoenix, with no degradation in scan performance. Woot!
By: Matt Corgan
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-7403
Interesting because: It's a cool feature. And went through 33 revisions!
Online Merge
What? The ability to merge regions online and transactionally, just like we
do with splitting regions.
How? The master moves the regions together (on the same regionserver)
and send MERGE RPC to regionserver. Merge happens in a transaction.
Example:
RegionMergeTransaction mt = new
RegionMergeTransaction(conf, parent, midKey)
if (!mt.prepare(services)) return;
try {
mt.execute(server, services);
} catch (IOException ioe) {
try {
mt.rollback(server, services);
return;
} catch (RuntimeException e) {
myAbortable.abort("Failed merge, abort");
}
}
By: Chunhui Shen
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-1212
Interesting because:Oldest issue (Feb, 2009) resolved w/ patch this year.
Merge tool expects regions to have diff seq ids
What? With aggregated hfile format, sequence id is written into file, not
along side. In rare case where two store files have same sequence id and
we want to merge the regions, it wouldn't work.
How? In conjucntion with HBASE-7287, removes the code that did this:
--- HRegion.java
List<StoreFile> srcFiles = es.getValue();
- if (srcFiles.size() == 2) {
- long seqA = srcFiles.get(0).getMaxSequenceId();
- long seqB = srcFiles.get(1).getMaxSequenceId();
- if (seqA == seqB) {
- // Can't have same sequenceid since on open store, this is what
- // distingushes the files (see the map of stores how its keyed
by
- // sequenceid).
- throw new IOException("Files have same sequenceid: " + seqA);
- }
- }
By: Jean-Marc Spaggiari
HBASE-1212
Interesting because:Oldest issue (Feb, 2009) resolved w/ patch this year.
Merge tool expects regions to have diff seq ids
What? With aggregated hfile format, sequence id is written into file, not
along side. In rare case where two store files have same sequence id and
we want to merge the regions, it wouldn't work.
How? In conjucntion with HBASE-7287, removes the code that did this:
--- HRegion.java
List<StoreFile> srcFiles = es.getValue();
- if (srcFiles.size() == 2) {
- long seqA = srcFiles.get(0).getMaxSequenceId();
- long seqB = srcFiles.get(1).getMaxSequenceId();
- if (seqA == seqB) {
- // Can't have same sequenceid since on open store, this is what
- // distingushes the files (see the map of stores how its keyed
by
- // sequenceid).
- throw new IOException("Files have same sequenceid: " + seqA);
- }
- }
By: Jean-Marc Spaggiari
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-7801
Interesting because: has durability implications worth blogging about.
Allow a deferred sync option per Mutation
What? Previously, you could only turn WAL writing off completely, per table
or edit. Now you can choose "none", "async", "sync" or "fsync".
How?
+++ Mutation.java
+ public void setDurability(Durability d) {
+ setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal()));
+ this.writeToWAL = d != Durability.SKIP_WAL;
+ }
+++ HRegion.java
+ private void syncOrDefer(long txid, Durability durability) {
+ switch(durability) { ...
+ case SKIP_WAL: // nothing to do
+ break;
+ case ASYNC_WAL: // defer the sync, unless we globally can't
+ if (this.deferredLogSyncDisabled) { this.log.sync(txid); }
+ break;
+ case SYNC_WAL:
+ case FSYNC_WAL:
+ // sync the WAL edit (SYNC and FSYNC treated the same for now)
+ this.log.sync(txid);
+ break;
+ }
By: Lars Hofhansl
HBASE-7801
Interesting because: has durability implications worth blogging about.
Allow a deferred sync option per Mutation
What? Previously, you could only turn WAL writing off completely, per table
or edit. Now you can choose "none", "async", "sync" or "fsync".
How?
+++ Mutation.java
+ public void setDurability(Durability d) {
+ setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal()));
+ this.writeToWAL = d != Durability.SKIP_WAL;
+ }
+++ HRegion.java
+ private void syncOrDefer(long txid, Durability durability) {
+ switch(durability) { ...
+ case SKIP_WAL: // nothing to do
+ break;
+ case ASYNC_WAL: // defer the sync, unless we globally can't
+ if (this.deferredLogSyncDisabled) { this.log.sync(txid); }
+ break;
+ case SYNC_WAL:
+ case FSYNC_WAL:
+ // sync the WAL edit (SYNC and FSYNC treated the same for now)
+ this.log.sync(txid);
+ break;
+ }
By: Lars Hofhansl
Wha ... ?
Oh. See HADOOP-6313
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-4072
Interesting because: Biggest facepalm.
Disable reading zoo.cfg files
What? Used to be, if two system both use ZK and one needed to override
values, the zoo.cfg values would always win. Caused a lot of goofy bugs in
hbase utils like import/export, integration with other systems like flume.
How? Put reading it behind a config that defaults to false.
+ if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) {
+ LOG.warn(
+ "Parsing zoo.cfg is deprecated. Place all ZK related HBase " +
+ "configuration under the hbase-site.xml");
By: Harsh J
HBASE-4072
Interesting because: Biggest facepalm.
Disable reading zoo.cfg files
What? Used to be, if two system both use ZK and one needed to override
values, the zoo.cfg values would always win. Caused a lot of goofy bugs in
hbase utils like import/export, integration with other systems like flume.
How? Put reading it behind a config that defaults to false.
+ if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) {
+ LOG.warn(
+ "Parsing zoo.cfg is deprecated. Place all ZK related HBase " +
+ "configuration under the hbase-site.xml");
By: Harsh J
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-3171
Interesting because: Only HBase JIRA with a downfall parody.
Drop ROOT, store META location in ZooKeeper
What? The ROOT just tells you where the META table is. That's silly.
How? Pretty big patch (59 files changed, 580 insertions(+), 1749 deletions(-))
By: J-D Cryans
http://www.youtube.com/watch?v=tuM9MYDssvg
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-6868
Interesting because: tiny fix, but marked as a blocker, and sunk 0.94.2 RC1.
Avoid double checksumming blocks
What? since HBASE-5074 (checksums), sometimes we double checksum.
How? 3 line patch to default to skip checksum if not local fs.
+++ HFileSystem.java // Incorrect data is read and HFileBlocks won't be
able to read
// their header magic numbers. See HBASE-5885
if (useHBaseChecksum && !(fs instanceof LocalFileSystem)) {
+ conf = new Configuration(conf);
+ conf.setBoolean("dfs.client.read.shortcircuit.skip.checksum", true);
this.noChecksumFs = newInstanceFileSystem(conf);...
+++ HRegionServer.java // If hbase checksum verification enabled,
automatically
//switch off hdfs checksum verification.
this.useHBaseChecksum = conf.getBoolean(
- HConstants.HBASE_CHECKSUM_VERIFICATION, true);
+ HConstants.HBASE_CHECKSUM_VERIFICATION, false);
By: Lars Hofhansl
What's it all mean?
Active codebase. Good!
Complexity increasing. Bad!
credit: https://www.ohloh.net/p/hbase
One more interesting stat:
"Good on you"s
One more interesting stat:
stack
"Good on you"s
everyone
else
Takeaways?
Busy community.
New features!
Fixing corner cases.
BTW: How did I do this?
JIRA API +
Phoenix on HBase +
http://github.com/ivarley/jirachi
Thanks!
@thefutureian

Weitere ähnliche Inhalte

Was ist angesagt?

Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataCloudera, Inc.
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureDataWorks Summit
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
HBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBaseHBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBaseHBaseCon
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...Cloudera, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 

Was ist angesagt? (20)

Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ TwitterCross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 
HBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBaseHBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBase
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 

Andere mochten auch

HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARNHBaseCon
 
HBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsCloudera, Inc.
 
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!Cloudera, Inc.
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCCloudera, Inc.
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterCloudera, Inc.
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashCloudera, Inc.
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...Cloudera, Inc.
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...Cloudera, Inc.
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBaseHBaseCon
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.Cloudera, Inc.
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponCloudera, Inc.
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics Cloudera, Inc.
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseCloudera, Inc.
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseCloudera, Inc.
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseCloudera, Inc.
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme MakeoverHBaseCon
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...Cloudera, Inc.
 

Andere mochten auch (20)

HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
 
HBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three Acts
 
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBase
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 

Ähnlich wie HBaseCon 2013: 1500 JIRAs in 20 Minutes

1500 JIRAs in 20 minutes - HBaseCon 2013
1500 JIRAs in 20 minutes - HBaseCon 20131500 JIRAs in 20 minutes - HBaseCon 2013
1500 JIRAs in 20 minutes - HBaseCon 2013Ian Varley
 
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)lakeFS
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Data Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloningData Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloning Kyle Hailey
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimePlamen Jeliazkov
 
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...Philip Stehlik
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Yahoo Developer Network
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideDanairat Thanabodithammachari
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseHBaseCon
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDataWorks Summit
 
BGOUG "Agile Data: revolutionizing database cloning'
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'Kyle Hailey
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldDataWorks Summit
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldLester Martin
 

Ähnlich wie HBaseCon 2013: 1500 JIRAs in 20 Minutes (20)

1500 JIRAs in 20 minutes - HBaseCon 2013
1500 JIRAs in 20 minutes - HBaseCon 20131500 JIRAs in 20 minutes - HBaseCon 2013
1500 JIRAs in 20 minutes - HBaseCon 2013
 
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Data Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloningData Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloning
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real Time
 
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010
 
HDInsight for Architects
HDInsight for ArchitectsHDInsight for Architects
HDInsight for Architects
 
The Future of Hbase
The Future of HbaseThe Future of Hbase
The Future of Hbase
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guide
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
BGOUG "Agile Data: revolutionizing database cloning'
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'
 
Data Science
Data ScienceData Science
Data Science
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Kürzlich hochgeladen (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

HBaseCon 2013: 1500 JIRAs in 20 Minutes

  • 1. 1500 JIRAs in 20 Minutes The Evolution of HBase, 2012-2013 Ian Varley, Salesforce.com @thefutureian
  • 2. It's been a year since the first HBaseCon. What's changed?
  • 3. It's been a year since the first HBaseCon. What's changed? (besides my beard length)
  • 4. One lens on the evolution of HBase is through JIRA (issue tracking system).
  • 5. HBase has a lot of activity.
  • 6. HBase has a lot of activity. Total JIRAs, all time: ~8700
  • 7. HBase has a lot of activity. Opened in last year: ~2500 Total JIRAs, all time: ~8700
  • 8.
  • 9. HBase has a lot of activity. Opened in last year: ~2500 Fixed in last year: 1638 Total JIRAs, all time: ~8700
  • 10. HBase has a lot of activity. Opened in last year: ~2500 Fixed in last year: 1638 Total JIRAs, all time: ~8700 resolved >= 2012-05-23 AND resolved <= 2013-05-24 AND resolution in (Fixed, Implemented)
  • 11. So we're going to talk about them all. One by one.
  • 12.
  • 13. We need to narrow it down.
  • 14. First, let's get rid of the non- functional changes:
  • 15. First, let's get rid of the non- functional changes: Test: 307
  • 16. First, let's get rid of the non- functional changes: Test: Build: 307 55
  • 17. First, let's get rid of the non- functional changes: Test: Build: Doc: 307 55 107
  • 18. First, let's get rid of the non- functional changes: Test: Build: Doc: Ports: 307 55 107 62
  • 19. First, let's get rid of the non- functional changes: Test: Build: Doc: Ports: 307 55 107 62 503(some overlap) Total:
  • 20. First, let's get rid of the non- functional changes: Test: Build: Doc: Ports: 307 55 107 62 503(some overlap) "test", "junit", etc. "pom", "classpath", "mvn", "build", etc. "book", "[site]", "[refGuide]", "javadoc", etc. "backport", "forward port", etc. Total:
  • 21. That leaves 1135 functional changes to go over. (In 18 minutes.)
  • 22. Break what's left into 2 parts: • Big Topics (20+ JIRAs on same issue) • Indie Hits (Cool for some other reason)
  • 23. Top 10 "big topics":
  • 24. Top 10 "big topics":
  • 25. Snapshots: Top 10 "big topics": 82
  • 30. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: 82 58 54 53 44 37
  • 31. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: 82 58 54 53 44 37 34
  • 32. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: 82 58 54 53 44 37 34 28
  • 33. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: Bulk Loading: 82 58 54 53 44 37 34 28 23
  • 34. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: Bulk Loading: Modularization: 82 58 54 53 44 37 34 28 23 21
  • 35. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: Bulk Loading: Modularization: 82 58 54 53 44 37 34 28 23 21 416(some overlap) (305 functional, 111 non-functional)
  • 36. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: Bulk Loading: Modularization: 82 58 54 53 44 37 34 28 23 21 416(some overlap) (305 functional, 111 non-functional) Let's dive in to the top 3.
  • 37. Snapshots The gist: Take advantage of the fact that files in HDFS are already immutable to get fast "snapshots" of tables that you can roll back to. This is pretty tricky when you consider HBase is a distributed system and you want a point in time. Main JIRAs: • HBASE-6055 - Offline Snapshots: Take a snapshot after first disabling the table • HBASE-7290 - Online Snapshots: Take a snapshot of a live, running table by splitting the memstore. • HBASE-7360 - Backport Snapshots to 0.94 Top contributors: Matteo B, Jonathan H, Ted Y, Jesse Y, Enis S
  • 38. Replication The gist: use asynchronous WAL shipping to replay all edits on a different (possibly remote) cluster, for Disaster Recovery or other operational purposes. Main JIRAs: • HBASE-1295 - Multi-data-center replication: Top level issue. Real meat was actually implemented in 0.90 (Jan 2010), so not a new feature. • HBASE-8207- Data loss when machine name contains "-". Doh. • HBASE-2611 - Handle RS failure while processing failure of another: This was an ugly issue that took a while to fix. Corner cases matter! Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H
  • 39. Replication The gist: use asynchronous WAL shipping to replay all edits on a different (possibly remote) cluster, for Disaster Recovery or other operational purposes. Main JIRAs: • HBASE-1295 - Multi-data-center replication: Top level issue. Real meat was actually implemented in 0.90 (Jan 2010), so not a new feature. • HBASE-8207- Data loss when machine name contains "-". Doh. • HBASE-2611 - Handle RS failure while processing failure of another: This was an ugly issue that took a while to fix. Corner cases matter! Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H Theme: corner cases!
  • 40. Replication The gist: use asynchronous WAL shipping to replay all edits on a different (possibly remote) cluster, for Disaster Recovery or other operational purposes. Main JIRAs: • HBASE-1295 - Multi-data-center replication: Top level issue. Real meat was actually implemented in 0.90 (Jan 2010), so not a new feature. • HBASE-8207- Data loss when machine name contains "-". Doh. • HBASE-2611 - Handle RS failure while processing failure of another: This was an ugly issue that took a while to fix. Corner cases matter! Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H Plug: stick around next while Chris Trezzo tweets about Replication!! Theme: corner cases! Corner Case!
  • 41. Compaction The gist: In an LSM store, if you don't compact the store files, you end up with lots of 'em, which makes reads slower. Not a new feature, just improvements. Main JIRAs: • HBASE-7516 - Make compaction policy pluggable: allow users to customize which files are included for compaction. • HBASE-2231 - Compaction events should be written to HLog: deal with the case when regions have been reassigned since compaction started. Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y
  • 42. Compaction The gist: In an LSM store, if you don't compact the store files, you end up with lots of 'em, which makes reads slower. Not a new feature, just improvements. Main JIRAs: • HBASE-7516 - Make compaction policy pluggable: allow users to customize which files are included for compaction. • HBASE-2231 - Compaction events should be written to HLog: deal with the case when regions have been reassigned since compaction started. Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y Corner Case!
  • 43. Compaction The gist: In an LSM store, if you don't compact the store files, you end up with lots of 'em, which makes reads slower. Not a new feature, just improvements. Main JIRAs: • HBASE-7516 - Make compaction policy pluggable: allow users to customize which files are included for compaction. • HBASE-2231 - Compaction events should be written to HLog: deal with the case when regions have been reassigned since compaction started. Look for cool stuff to come in the next year with tiered (aka "leveled") compaction policies, so you could do stuff like (e.g.) put "recent" data into smaller files that'll be hit frequently, and the older "long tail" data into bigger files that'll be hit less frequently. Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y Corner Case!
  • 44. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics Assignment Hadoop 2 Protobufs Security Bulk Loading Modularization
  • 45. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment Hadoop 2 Protobufs Security Bulk Loading Modularization
  • 46. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2 Protobufs Security Bulk Loading Modularization
  • 47. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs Security Bulk Loading Modularization
  • 48. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs: wire compatibility! Security Bulk Loading Modularization
  • 49. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs: wire compatibility! Security: kerberos, in the core. Bulk Loading Modularization
  • 50. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs: wire compatibility! Security: kerberos, in the core. Bulk Loading: pop in an HFile. Modularization
  • 51. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs: wire compatibility! Security: kerberos, in the core. Bulk Loading: pop in an HFile. Modularization: break up the code.
  • 52. Now on to the "Indie Hits JIRAs".
  • 53. What's left? About half. Blocker: Critical: Major: Minor: 31 88 455 206 830 Trivial: 52 1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining
  • 54. Blocker: Critical: Major: Minor: 31 88 455 206 573 Trivial: 52 Let's cut out these: 830 What's left? About half.1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining
  • 55. We can't cover 573 issues. Let's just hit a few cool ones.
  • 58. HBASE-5416 Interesting because:most commented JIRA (200+ human comments!) Improve perf of scans with some kinds of filters What? Avoid loading non-essential CFs until after filters run, big perf gain. How? +++ Filter.java: + abstract public boolean isFamilyEssential(byte[] name); +++ HRegion.java: KeyValueScanner scanner = store.getScanner(scan, entry.getValue()); - scanners.add(scanner); + if (this.filter == null || !scan.doLoadColumnFamiliesOnDemand() + || this.filter.isFamilyEssential(entry.getKey())) { + scanners.add(scanner); + } else { + joinedScanners.add(scanner); + } By: Max Lapan for original idea & patch, Sergey Shelukhin for final impl
  • 61. To save you some time, allow me to summarize.
  • 62. Reenactment ... Feb 2012: • Max Lapan: Hey guys, here's a cool patch!
  • 63. Reenactment ... Feb 2012: • Max Lapan: Hey guys, here's a cool patch! • Nicolas S: This should be an app detail, not in core.
  • 64. Reenactment ... Feb 2012: • Max Lapan: Hey guys, here's a cool patch! • Nicolas S: This should be an app detail, not in core. • Ted Yu: I fixed your typos while you were asleep!
  • 65. Reenactment ... Feb 2012: • Max Lapan: Hey guys, here's a cool patch! • Nicolas S: This should be an app detail, not in core. • Ted Yu: I fixed your typos while you were asleep! • Nick: Not enough utest coverage to put this in core. • Max: Agree, but I can't find any other way to do this.
  • 66. Reenactment ... Feb 2012: • Max Lapan: Hey guys, here's a cool patch! • Nicolas S: This should be an app detail, not in core. • Ted Yu: I fixed your typos while you were asleep! • Nick: Not enough utest coverage to put this in core. • Max: Agree, but I can't find any other way to do this. • Kannan: Why don't you try 2-phase w/ multiget? • Max: OK, ok, I'll try it.
  • 67. Reenactment ... May 2012: • Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom.
  • 68. Reenactment ... May 2012: • Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom.
  • 69. Reenactment ... May 2012: • Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom. • Ted: Holy guacamole that's a big patch.
  • 70. Reenactment ... May 2012: • Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom. • Ted: Holy guacamole that's a big patch. July 2012: • Max: Anybody there? Here's a perf test. • Ted: Cool!
  • 71. Reenactment ... May 2012: • Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom. • Ted: Holy guacamole that's a big patch. July 2012: • Max: Anybody there? Here's a perf test. • Ted: Cool! Oct 2012: • Anoop: A coprocessor would make faster. • Max: We're on 0.90 and can't use CP. • Stack: -1, FB guys are right about needing more tests.
  • 72. Reenactment ... Dec 2012: • Sergey: I'm on it guys. Rebased on trunk, added the ability to configure, and integration tests.
  • 73. Reenactment ... Dec 2012: • Sergey: I'm on it guys. Rebased on trunk, added the ability to configure, and integration tests. • Stack: Still not enough tests. Some new code even when disabled? Who's reviewing? Go easy lads.
  • 74. Reenactment ... Dec 2012: • Sergey: I'm on it guys. Rebased on trunk, added the ability to configure, and integration tests. • Stack: Still not enough tests. Some new code even when disabled? Who's reviewing? Go easy lads. • Ram: I'm on it. Couple improvements, but looks good.
  • 75. Reenactment ... Dec 31st, 2012 (while everyone else is partying): • Lars: Ooh, let's pull this into 0.94! I made a patch.
  • 76. Reenactment ... Dec 31st, 2012 (while everyone else is partying): • Lars: Ooh, let's pull this into 0.94! I made a patch. • Lars: ... hold the phone! This slows down a tight loop case (even when disabled) by 10-20%.
  • 77. Reenactment ... Dec 31st, 2012 (while everyone else is partying): • Lars: Ooh, let's pull this into 0.94! I made a patch. • Lars: ... hold the phone! This slows down a tight loop case (even when disabled) by 10-20%. • Ted: I optimized the disabled path. • Lars: Sweet.
  • 78. Reenactment ... Dec 31st, 2012 (while everyone else is partying): • Lars: Ooh, let's pull this into 0.94! I made a patch. • Lars: ... hold the phone! This slows down a tight loop case (even when disabled) by 10-20%. • Ted: I optimized the disabled path. • Lars: Sweet.
  • 79. Reenactment ... Jan, 2013: • Ram: +1, let's commit. • Ted: Committed to trunk • Lars: Committed to 0.94.
  • 80. Reenactment ... Jan, 2013: • Ram: +1, let's commit. • Ted: Committed to trunk • Lars: Committed to 0.94. And there was much rejoi....
  • 81. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter.
  • 82. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. • All: Crapface.
  • 83. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. • All: Crapface. • Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI.
  • 84. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. • All: Crapface. • Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI. • Lars: Filter is internal. Extend FilterBase maybe? • Ted: If we take it OUT now, it's also a regression.
  • 85. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. • All: Crapface. • Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI. • Lars: Filter is internal. Extend FilterBase maybe? • Ted: If we take it OUT now, it's also a regression. • Dave: Chill dudes, we can fix by changing our client.
  • 86. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. • All: Crapface. • Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI. • Lars: Filter is internal. Extend FilterBase maybe? • Ted: If we take it OUT now, it's also a regression. • Dave: Chill dudes, we can fix by changing our client. • All: Uhh ... change it? Keep it? Change it?
  • 87. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. • All: Crapface. • Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI. • Lars: Filter is internal. Extend FilterBase maybe? • Ted: If we take it OUT now, it's also a regression. • Dave: Chill dudes, we can fix by changing our client. • All: Uhh ... change it? Keep it? Change it? Resolution: Change it (HBASE-7920)
  • 88. Moral of the story? • JIRA comments are a great way to learn. • Do the work to keep new features from destabilizing core code paths. • Careful with changing interfaces.
  • 90. HBASE-4676 Interesting because:most watched (42 watchers), and biggest patch. Prefix Compression - Trie data block encoding What? An optimization to compress what we store for key/value prefixes. How? ~8000 new lines added! (Originally written in git repo, here) At SFDC, James Taylor reported seeing 5-15x improvement in Phoenix, with no degradation in scan performance. Woot! By: Matt Corgan
  • 92. HBASE-7403 Interesting because: It's a cool feature. And went through 33 revisions! Online Merge What? The ability to merge regions online and transactionally, just like we do with splitting regions. How? The master moves the regions together (on the same regionserver) and send MERGE RPC to regionserver. Merge happens in a transaction. Example: RegionMergeTransaction mt = new RegionMergeTransaction(conf, parent, midKey) if (!mt.prepare(services)) return; try { mt.execute(server, services); } catch (IOException ioe) { try { mt.rollback(server, services); return; } catch (RuntimeException e) { myAbortable.abort("Failed merge, abort"); } } By: Chunhui Shen
  • 94. HBASE-1212 Interesting because:Oldest issue (Feb, 2009) resolved w/ patch this year. Merge tool expects regions to have diff seq ids What? With aggregated hfile format, sequence id is written into file, not along side. In rare case where two store files have same sequence id and we want to merge the regions, it wouldn't work. How? In conjucntion with HBASE-7287, removes the code that did this: --- HRegion.java List<StoreFile> srcFiles = es.getValue(); - if (srcFiles.size() == 2) { - long seqA = srcFiles.get(0).getMaxSequenceId(); - long seqB = srcFiles.get(1).getMaxSequenceId(); - if (seqA == seqB) { - // Can't have same sequenceid since on open store, this is what - // distingushes the files (see the map of stores how its keyed by - // sequenceid). - throw new IOException("Files have same sequenceid: " + seqA); - } - } By: Jean-Marc Spaggiari
  • 95. HBASE-1212 Interesting because:Oldest issue (Feb, 2009) resolved w/ patch this year. Merge tool expects regions to have diff seq ids What? With aggregated hfile format, sequence id is written into file, not along side. In rare case where two store files have same sequence id and we want to merge the regions, it wouldn't work. How? In conjucntion with HBASE-7287, removes the code that did this: --- HRegion.java List<StoreFile> srcFiles = es.getValue(); - if (srcFiles.size() == 2) { - long seqA = srcFiles.get(0).getMaxSequenceId(); - long seqB = srcFiles.get(1).getMaxSequenceId(); - if (seqA == seqB) { - // Can't have same sequenceid since on open store, this is what - // distingushes the files (see the map of stores how its keyed by - // sequenceid). - throw new IOException("Files have same sequenceid: " + seqA); - } - } By: Jean-Marc Spaggiari
  • 97. HBASE-7801 Interesting because: has durability implications worth blogging about. Allow a deferred sync option per Mutation What? Previously, you could only turn WAL writing off completely, per table or edit. Now you can choose "none", "async", "sync" or "fsync". How? +++ Mutation.java + public void setDurability(Durability d) { + setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal())); + this.writeToWAL = d != Durability.SKIP_WAL; + } +++ HRegion.java + private void syncOrDefer(long txid, Durability durability) { + switch(durability) { ... + case SKIP_WAL: // nothing to do + break; + case ASYNC_WAL: // defer the sync, unless we globally can't + if (this.deferredLogSyncDisabled) { this.log.sync(txid); } + break; + case SYNC_WAL: + case FSYNC_WAL: + // sync the WAL edit (SYNC and FSYNC treated the same for now) + this.log.sync(txid); + break; + } By: Lars Hofhansl
  • 98. HBASE-7801 Interesting because: has durability implications worth blogging about. Allow a deferred sync option per Mutation What? Previously, you could only turn WAL writing off completely, per table or edit. Now you can choose "none", "async", "sync" or "fsync". How? +++ Mutation.java + public void setDurability(Durability d) { + setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal())); + this.writeToWAL = d != Durability.SKIP_WAL; + } +++ HRegion.java + private void syncOrDefer(long txid, Durability durability) { + switch(durability) { ... + case SKIP_WAL: // nothing to do + break; + case ASYNC_WAL: // defer the sync, unless we globally can't + if (this.deferredLogSyncDisabled) { this.log.sync(txid); } + break; + case SYNC_WAL: + case FSYNC_WAL: + // sync the WAL edit (SYNC and FSYNC treated the same for now) + this.log.sync(txid); + break; + } By: Lars Hofhansl Wha ... ? Oh. See HADOOP-6313
  • 100. HBASE-4072 Interesting because: Biggest facepalm. Disable reading zoo.cfg files What? Used to be, if two system both use ZK and one needed to override values, the zoo.cfg values would always win. Caused a lot of goofy bugs in hbase utils like import/export, integration with other systems like flume. How? Put reading it behind a config that defaults to false. + if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) { + LOG.warn( + "Parsing zoo.cfg is deprecated. Place all ZK related HBase " + + "configuration under the hbase-site.xml"); By: Harsh J
  • 101. HBASE-4072 Interesting because: Biggest facepalm. Disable reading zoo.cfg files What? Used to be, if two system both use ZK and one needed to override values, the zoo.cfg values would always win. Caused a lot of goofy bugs in hbase utils like import/export, integration with other systems like flume. How? Put reading it behind a config that defaults to false. + if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) { + LOG.warn( + "Parsing zoo.cfg is deprecated. Place all ZK related HBase " + + "configuration under the hbase-site.xml"); By: Harsh J
  • 103. HBASE-3171 Interesting because: Only HBase JIRA with a downfall parody. Drop ROOT, store META location in ZooKeeper What? The ROOT just tells you where the META table is. That's silly. How? Pretty big patch (59 files changed, 580 insertions(+), 1749 deletions(-)) By: J-D Cryans http://www.youtube.com/watch?v=tuM9MYDssvg
  • 105. HBASE-6868 Interesting because: tiny fix, but marked as a blocker, and sunk 0.94.2 RC1. Avoid double checksumming blocks What? since HBASE-5074 (checksums), sometimes we double checksum. How? 3 line patch to default to skip checksum if not local fs. +++ HFileSystem.java // Incorrect data is read and HFileBlocks won't be able to read // their header magic numbers. See HBASE-5885 if (useHBaseChecksum && !(fs instanceof LocalFileSystem)) { + conf = new Configuration(conf); + conf.setBoolean("dfs.client.read.shortcircuit.skip.checksum", true); this.noChecksumFs = newInstanceFileSystem(conf);... +++ HRegionServer.java // If hbase checksum verification enabled, automatically //switch off hdfs checksum verification. this.useHBaseChecksum = conf.getBoolean( - HConstants.HBASE_CHECKSUM_VERIFICATION, true); + HConstants.HBASE_CHECKSUM_VERIFICATION, false); By: Lars Hofhansl
  • 106. What's it all mean? Active codebase. Good! Complexity increasing. Bad! credit: https://www.ohloh.net/p/hbase
  • 107.
  • 108. One more interesting stat: "Good on you"s
  • 109. One more interesting stat: stack "Good on you"s everyone else
  • 111. BTW: How did I do this? JIRA API + Phoenix on HBase + http://github.com/ivarley/jirachi