Google’s original use case for BigTable was the storage and processing of web graph information, represented as sparse matrices. However, many organizations tend to treat HBase as merely a “web scale” RDBMS. This session will cover several use cases for storing graph data in HBase, including social networks and web link graphs, MapReduce processes like cached traversal, PageRank, and clustering and lastly will look at some lower-level modeling details like row key and column qualifier design, using FullContact’s graph processing systems as a real-world use case.
37. Adjacency List Design in HBase
row key “edges” column family
e:dan@fullcontact.com p:+13039316251= ...
t:danklynn= ...
p:+13039316251 e:dan@fullcontact.com= ...
t:danklynn= ...
t:danklynn e:dan@fullcontact.com= ...
p:+13039316251= ...
38. Adjacency List Design in HBase
row key “edges” column family
e:dan@fullcontact.com p:+13039316251= ...
t:danklynn= ...
at to
W e?h
p:+13039316251 e:dan@fullcontact.com= ...
st or
t:danklynn= ...
t:danklynn e:dan@fullcontact.com= ...
p:+13039316251= ...
41. Don’t get fancy with byte[]
class EdgeValueWritable implements Writable {
EdgeValue edgeValue
byte[] toBytes() {
// use strings if you can help it
}
static EdgeValueWritable fromBytes(byte[] bytes) {
// use strings if you can help it
}
}
groovy
42. Querying by vertex
def get = new Get(vertexKeyBytes)
get.addFamily(edgesFamilyBytes)
Result result = table.get(get);
result.noVersionMap.each {family, data ->
// construct edge objects as needed
// data is a Map<byte[],byte[]>
}
43. Adding edges to a vertex
def put = new Put(vertexKeyBytes)
put.add(
edgesFamilyBytes,
destinationVertexBytes,
edgeValue.toBytes() // your own implementation here
)
// if writing directly
table.put(put)
// if using TableReducer
context.write(NullWritable.get(), put)
59. Do implement your own comparator
public static class Comparator
extends WritableComparator {
public int compare(
byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
// .....
}
}
java
60. Do implement your own comparator
static {
WritableComparator.define(VertexKeyWritable,
new VertexKeyWritable.Comparator())
}
java
68. Elastic MapReduce
HFi les
Copy to S3
Elastic MapReduce
Seq uen ceFiles Seq uen ceFiles
HFileOutputFormat.configureIncrementalLoad(job, outputTable)
HFi les
69. Elastic MapReduce
HFi les
Copy to S3
Elastic MapReduce
Seq uen ceFiles Seq uen ceFiles
HFileOutputFormat.configureIncrementalLoad(job, outputTable)
HFi les HBase
$ hadoop jar hbase-VERSION.jar completebulkload
70. Additional Resources
Google Pregel: BSP-based graph processing system
Apache Giraph: Implementation of Pregel for Hadoop
MultiScanTableInputFormat: (code to appear on GitHub)
Apache Mahout - Distributed machine learning on Hadoop