Weitere ähnliche Inhalte Ähnlich wie Apache Accumulo Overview (20) Kürzlich hochgeladen (20) Apache Accumulo Overview2. 2 ©2014 Cloudera, Inc. All rights reserved.
2
•Quick History
•Storage Model
•Loading and Querying
•Daemons
•Getting Started, a.k.a., the Pitch
Agenda
4. 4 ©2014 Cloudera, Inc. All rights reserved.
Google BigTable
Compressed, high-performance, scalable,
distributed sorted map
4
5. 5 ©2014 Cloudera, Inc. All rights reserved.
Google BigTable
• Began development in 2004
• Built on Google File System
• Non-relational
• Byte-oriented and schemaless
• Stores data in the petabyte range
• Research paper published in 2006
5
6. 6 ©2014 Cloudera, Inc. All rights reserved.
Child(ren) of BigTable
• Apache HBase (begun 2006, top-level 2010)
• Apache Cassandra (begun 2008-ish, top-level 2010)
• Apache Accumulo ...
6
7. 7 ©2014 Cloudera, Inc. All rights reserved.
From Cloudbase to Accumulo
• Started in 2008 as National Security Agency project
• Submitted to Apache Incubator in 2011 (and renamed)
• Top-level project in 2012
7
9. 9 ©2014 Cloudera, Inc. All rights reserved.
Key / Value Store
Accumulo stores tables of key / value pairs
9
10. 10 ©2014 Cloudera, Inc. All rights reserved.
Key / Value Store
A row is a sorted sequence of key / value pairs
Each pair is a cell
10
11. 11 ©2014 Cloudera, Inc. All rights reserved.
The Key
11
row
column
timestamp
family qualifier visibility
12. 12 ©2014 Cloudera, Inc. All rights reserved.
An example key
12
bhavanki
column
1401041295
personal middle PII
13. 13 ©2014 Cloudera, Inc. All rights reserved.
Another example key
13
brees
column
1401041296
employment salary FIN
14. 14 ©2014 Cloudera, Inc. All rights reserved.
It’s all bytes
All key and value data are stored as bytes
except timestamp is a long
There are no built-in data types
but lexicoders help with common types
Key components are usually UTF-8 strings
14
15. 15 ©2014 Cloudera, Inc. All rights reserved.
Some rows for you
15
row cf cq cv ts value
bhavanki job employer 2013-09-01 Cloudera
bhavanki personal beer 2013-09-15 Omission
bhavanki personal house NOMUGGL 2014-01-25 Ravenclaw
brees job employer 2013-10-01 White Cliffs
brees personal house NOMUGGL 2014-01-01 Hufflepuff
16. 16 ©2014 Cloudera, Inc. All rights reserved.
Visibility Labels
Boolean expression
Specialist | (Management & SpecTraining)
Authorizations are provided in each scan
16
17. 17 ©2014 Cloudera, Inc. All rights reserved.
Locality Groups
You can identify sets of one or more column families as
locality groups
Data in a locality group is stored together for improved
read performance
17
18. 18 ©2014 Cloudera, Inc. All rights reserved.
Tablets
A table is comprised of one or more tablets
18
employeesemployees
employees;Semployees;Semployees;Hemployees;H employees;~employees;~
19. 19 ©2014 Cloudera, Inc. All rights reserved.
Tablets
Tablets maps to data files in HDFS
19
employees;Semployees;Semployees;Hemployees;H employees;~employees;~
rfile 2rfile 2rfile 1rfile 1 rfile 3rfile 3
20. 20 ©2014 Cloudera, Inc. All rights reserved.
Tablets
Data also kept in write-ahead logs and memtable
20
employees;Hemployees;H
rfile 1rfile 1
walogswalogs
memtablememtable
23. 23 ©2014 Cloudera, Inc. All rights reserved.
Java Client API
Read using scanners
Scanner s = conn.createScanner(“employees”, new
Authorizations());
s.setRange(“alice”, “eve”);
s.setColumnFamily(“personal”);
for (Entry<Key, Value> e : s)
employeeIds.add(e.getKey().getRow());
23
24. 24 ©2014 Cloudera, Inc. All rights reserved.
Java Client API
Read access via iterator pattern
• server-side system iterators handle timestamps,
authorization checks, and lots more
• iterators almost always wrap other iterators, forming a
chain
• you can define your own, client-side or server-side
24
25. 25 ©2014 Cloudera, Inc. All rights reserved.
Java Client API
Scanners fetch sorted rows from one range
Batch scanners fetch unsorted rows from multiple
ranges in parallel
Isolated scanners ensure that you do not see a row
mid-change
25
26. 26 ©2014 Cloudera, Inc. All rights reserved.
MapReduce
AccumuloInputFormat
AccumuloOutputFormat
26
27. 27 ©2014 Cloudera, Inc. All rights reserved.
MapReduce
AccumuloRowInputFormat
AccumuloRowOutputFormat
27
28. 28 ©2014 Cloudera, Inc. All rights reserved.
Shell
Command-line / manual access to Accumulo data
• scan, insert, delete
• iterator management
• table management (creation, deletion, cloning)
• user and authorization management
• table splitting and merging
• ... more
28
29. 29 ©2014 Cloudera, Inc. All rights reserved.
Bulk Import
Got lots of data to import quickly?
• Use MR job to format data using
AccumuloFileOutputFormat
• Import files using shell
Trade off latency / availablity for throughput
29
31. 31 ©2014 Cloudera, Inc. All rights reserved.
Tablet Server
Serves tablets (table data)
• writes data to walog, memtable; deals with compaction
• serves data for reads from files, memtable
• handles recovery from walogs in case of server failure
Most client calls go to tablet servers
31
32. 32 ©2014 Cloudera, Inc. All rights reserved.
Master
• assigns tablets to tablet servers
• detects tablet server failures and reassigns tablets
• balances tablet assignments over time
• coordinates table operations
Multiple supported for failover, only one active
32
33. 33 ©2014 Cloudera, Inc. All rights reserved.
Everybody Else in Accumulo
Garbage Collector (GC) - identifies and deletes files in
HDFS that are no longer needed
Tracer - listens for and stores distributed trace messages
using a special table
33
34. 34 ©2014 Cloudera, Inc. All rights reserved.
Everybody Else in Accumulo
• Monitor - collects and serves status information
• server status
• log inspection
• performance data
• table inspection
34
35. 35 ©2014 Cloudera, Inc. All rights reserved.
Everybody Else outside Accumulo
• HDFS (as part of Apache Hadoop)
• stores tablet files
• stores write-ahead logs (1.5+)
• MapReduce (Hadoop)
• bulk import
• batch processing
• Apache ZooKeeper
35
37. 37 ©2014 Cloudera, Inc. All rights reserved.
Easy as 1-2-3?
1.Install Hadoop (HDFS and MapReduce)
2.Install ZooKeeper
3.Install Accumulo!
37
38. 38 ©2014 Cloudera, Inc. All rights reserved.
Making Steps 1 and 2 Easier
Use a complete, pre-packaged Hadoop distribution
... like CDH!
a leading commercial distribution centered on Apache
Hadoop
•many ecosystem components
•configured / updated to work together
38
39. 39 ©2014 Cloudera, Inc. All rights reserved.
Making Steps 1 and 2 Easier
Cloudera Manager
•deployment
•configuration
•operation
•security
39
40. 40 ©2014 Cloudera, Inc. All rights reserved.
Making Step 3 Easier
Standard Apache Accumulo installation is via tarball
• no longer shipping RPM / DEB / ...
Using CDH/CM you can use:
• a tarball, RPM or DEB with Accumulo packaged for CDH
• a parcel (like RPM / ZIP) for easier upgrades
• 1.4.4 and 1.4.5 available now
• 1.6.0 soon
40
41. 41 ©2014 Cloudera, Inc. All rights reserved.
Where to Go for More
• http://accumulo.apache.org/
• http://www.cloudera.com/content/cloudera/en/products-and-service
• http://www.cloudera.com/content/cloudera/en/products-and-service
• http://www.cloudera.com/content/cloudera/en/products-and-
services/cdh/accumulo.html
41
43. 43 ©2014 Cloudera, Inc. All rights reserved.
Quick Thanks
• My slide reviewers
• Sean Busbey
• Mike Drob
• Accumulo community
• You all for listening
43
44. 44 ©2014 Cloudera, Inc. All rights reserved.
Thank you!
Bill Havanki
bhavanki@clouderagovt.com
44