Active mq: t if the queue backed up beyond what could be kept inmemory, performance would severely degrade due to heavy amounts of random I/O.Flume: Flume is a distributed, reliable, and available service for moving large amounts of log data. It accepts on streaming data flows; Ability to store the data temporarily.Very fast.
Assumes everythings (all the layers) are distributed, and can be started at any given time, no master node. (scalability). It’s all synced and coordinated by Zookepper.Kafka acts as a buffer; between live activity and asynchronous processing.Was built for high thruput by Linkedin.Provides a single pipeline of data for both online and offline consumers. is well suited for situations where you need to both process data in realtime while still having the possibility to analyse them in bulk via MapReduce later on.concept
Camus is LinkedIn's Kafka->HDFS pipeline. It is a mapreduce job that does distributed data loads out of Kafka. Setup stage fetches available topics and partitions from Zookeeper and the latest offsets from the Kafka Nodes.Atstartup time the job reads its current offset for each partition from a file in HDFS and queries Kafka to discoverany new topics and read the current log offset for each partition. It then loads all data from the last load offset tothe current Kafka offset and writes it out to Hadoop,