End-to-end Analytics with Apache Cassandra

End-to-end Analytics
with Apache Cassandra

C* #cassandra12

Basics
• CFIF/CFOF/CFRR/CFRW
• BulkOutputFormat
• Input locality (identical to HDFS)
• Wide row, 2I, composite support
• Pig, Hive, Mahout, Sqoop, Oozie, DSE
Analytics

C* #cassandra12

Why Cassandra?

• Excellent Hadoop capabilities built-in
• Multi-datacenter support, load isolation
• Operationally, order of magnitude simpler
• DSE Analytics is all-in-one, simpler still
• Review your requirements, do homework
C* #cassandra12

Use Cases

• Trends, recommendations, reporting, etc.
• Detect and ﬁx problems in data
• New realtime (or analytic) query pattern?
• Backpopulate new CF with historical data

C* #cassandra12

Data Model
• Consider growth patterns and analytic
query patterns for your data
• One-off inquiry or regular processing?
• Fast growing, want only small slices?
• Consider active/archive CFs
• Secondary indexes for small inputs
C* #cassandra12

Miscellaneous Tips

• Don’t forget about tombstones
• BOP to enable range slices (Rows with
keys ‘A*’ to ‘F*’)

C* #cassandra12

Cassandra + Oozie
• Workflows: cohesive, nestable, scheduled
• Web UI, CLI, web service
• Cassandra properties in oozie job
properties, and workflow.xml
• Writing out to Cassandra:
mapreduce.fileoutputcommitter.marksuccessfuljobs to false

• DSE Analytics works with Oozie 3.2.1+
C* #cassandra12

Cassandra + Pig
• Data with validators
• Pig tuples (address.name, address.value)
• Data with default validator
• Bag of key, value pairs (tuples)
• Unmarshal with Pygmalion
• Select by regex (eg ‘1369*’, ‘link*’)
C* #cassandra12

Cassandra + Pig
• Output to Cassandra
• Can output directly with (key, (name,
value), (name, value)...) format
• For tabular data, format output with
Pygmalion’s ToCassandraBag
• Use BulkOutputFormat (C* 1.1)
C* #cassandra12

Cassandra + Pig
• Composite column support (C* 1.0.9+)
• Counter support (C* 1.0.9+)
• Secondary Index support for relatively
small slices (C* 1.1+)
• Wide row support (C* 1.1+)
• Composite key support (C* 1.1.3+)
C* #cassandra12

Cassandra + Hadoop X
• Example: Cassandra + CDH3
• Start with Cassandra ring
• Add NN, JT, Oozie server
• TaskTracker, DataNode on each node
• Jobs/launching point have Cassandra info
• Segregate out analytics with virtual DC
C* #cassandra12

Current/Future Work

• Cassandra core Hive support (outside of
Brisk) (CASSANDRA-4131)

C* #cassandra12

For More Information

• Follow @CassandraHadoop on Twitter
• http://wiki.apache.org/cassandra/
HadoopSupport

C* #cassandra12

Questions?

C* #cassandra12

End-to-end Analytics with Apache Cassandra

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (9)

Ähnlich wie End-to-end Analytics with Apache Cassandra

Ähnlich wie End-to-end Analytics with Apache Cassandra (20)

Mehr von Jeremy Hanna

Mehr von Jeremy Hanna (9)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

End-to-end Analytics with Apache Cassandra

Hinweis der Redaktion