SMACK is a combination of Spark, Mesos, Akka, Cassandra and Kafka. It is used for pipelined data architecture which is required for the real time data analysis and to integrate all the technology at the right place to efficient data pipeline.
2. Agenda:
● What is SMACK?
● Why SMACK?
● Brief introduction of technologies
● How to Integrate all the technologies to create the data pipeline
● Demo
3. What is SMACK?
● Spark :Apache Spark is a fast and general-purpose cluster
computing system.
● Mesos :Cluster resource management system that provide
efficient resource allocation.
● Akka :Akka is a toolkit and runtime for building highly
concurrent, distributed, and resilient message-driven
applications on the JVM.
● Cassandra :The Apache Cassandra database is the right
choice when you need scalability and high availability.
● Kafka :distributed messaging system for handling real
time data.
4. Why SMACK?
● Smack is used for pipelined data architecture which is
required for the real time data analysis.
● Smack is use to integrate all the technology at the right
place to efficient data pipeline.
● Smack is use to linearly scale your whole cluster without
any hassle
6. Why Spark?
● Its general purpose big data processing engine which have
4 main components spark core, spark streaming, spark
ml, spark graphx
● So we can process our data which any of the component
at real time.
● Its provide fault tolerant for real time application.
7. Why Cassandra?
● Cassandra implements “no single points of failure
● Cassandra Write-path is so fast so it can handle real-time data easily
● It will support Datacenter architecture so we can easily use different
DC for different things.
Ingestion DC Analysis DC
Cassandra Cluster
9. Models in SMACK
● In SMACK models are Scala and AKKA.
● We can use models to write highly concurrent and parallel
applications.
● Example: We can use akka modules according to our use
case like akka-http, akka-scheduler, akka priority
mailboxes etc.
11. Why Kafka
● streams of data efficiently and in real time
● Use Kafka for fault tolerance.
● To create bridge between two applications.
Streaming
Source
Kafka
Broker
Spark Receiver
12. Architecture of Spark and cassandra
Cassandra Cluster
Spark Worker
Spark Worker
Spark Worker
Spark Worker
Spark worker nodes
will get the data on
local node so it will
avoid latency
13. Spark, Mesos, Cassandra
Mesos Slaves and cassandra nodes are collocated to enforce the better data
locality for spark.
Driver
Program
Mesos
Master
Mesos slave
Cassandra node
Mesos slave
Cassandra node
Mesos slave
Cassandra node