SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
SMACK Stack 1.1
Elodina is a big data as a service platform built on top
of open source software.
The Elodina platform solves today’s data
analytics needs by providing the tools and
support necessary to utilize open source
technologies.
http://www.elodina.net/
Whats SMACK Stack?
SMACK stack 1.0 has been traditionally Spark, Mesos, Akka, Cassandra and
Kafka lots https://dzone.com/articles/smack-stack-guide and lots lots more https:
//www.google.com/webhp?q=smack%20stack
Now we are going to introduce SMACK Stack 1.1 and talk more about dynamic
compute, micro services, orchestration, micro segmentation all part of what you
can do now with Streaming, Mesos, Analytics, Cassandra and Kafka
The free lunch is over!
http://www.gotw.ca/publications/concurrency-ddj.htm
Many industries still don’t get it
XML is everywhere but we have alternatives!
We can support XML interface but don’t have to take on the burden of the extra
data. You can save A LOT of overheard just by having a pre-processing step
taking the XML, turning it into Avro and processing and storing that.
It works https://github.com/elodina/xml-avro
You can even process the response in Avro but return the result in XML, more on
that later though!
You need to be running Mesos. Lots of options here!
What is most important is that you abstract your “Provider” from your “Grid”.
What is “The Grid”?
It is your PaaS layer you deploy too that runs your software. (aka your new
awesome super computer)
The grid is your mesos cluster. You are likely going to have more than one so plan
accordingly. Think of it as immutable infrastructure, the computer does.
Step 1
“Provider” of compute resources
The Grid … 2.0 ...
https://github.com/elodina/sawfly/blob/master/cloud-deploy-grid.md
Program against your datacenter like it’s a single pool of resources Apache Mesos abstracts CPU,
memory, storage, and other compute resources away from machines (physical or virtual), enabling
fault-tolerant and elastic distributed systems to easily be built and run effectively. Mesosphere’s Data
Center Operating System (DCOS) is an operating system that spans all of the machines in a datacenter
or cloud and treats them as a single computer, providing a highly elastic and highly scalable way of
deploying applications, services, and big data infrastructure on shared resources. DCOS is based on
Apache Mesos and includes a distributed systems kernel with enterprise-grade security.
Data Center Optimization!
But there is more!
● Provisioning
● Micro Segmentation
● Orchestration
● Configuration Management
● Service Discovery
● Deployment Isolation and Identification
● Telemetry, Tracing, Ops Stuff, Etc
● Oh My!
It boils back down into stacks! https://github.com/elodina/stack-deploy and how
you are working with your schedulers in your cluster ultimatlly.
Stack Deploy to the rescue!
In the Grid you need Schedulers!
● Kafka – Producer/Consumer-based message queue management
● Exhibitor – Supervisor for distributed persistence (like ZooKeeper)
● Cassandra/DSE – HA, scalable, distributed NoSQL data storage
● Storm – Topology-based Real-time distributed data streaming
● Monarch – Distributed Remote Procedure Calls, Kafka REST interface and schema repository
● Zipkin – Configure, launch and manage Zipkin distributed trace on Mesos
● HDFS – Configure, launch and manage HDFS on Mesos (coming soon)
● Stockpile – Consumer to “stock pile” data into persistent storage (mesos scheduler only for c* now)
● MirrorMaker – Consumer to make a mirror copy of data to destination
● StatsD – Producer to pump StatsD on Mesos into Kafka for consumption, preserves layers
● SysLog – Producer to pump Syslog on Mesos into Kafka for consumption, preserves layers
https://github.com/elodina/
Virtual Telemetry “Data Center” In the Grid
ZipkinQATeamBuild92
● 1x Exhibitor-Mesos
● 1x Exhibitor
● 1x DSE-Mesos
● 1x Cassandra node
● 1x Kafka-Mesos
● 1x Kafka 0.8 broker
● 1x Zipkin-Mesos
● 1x Zipkin Collector
● 1x Zipkin Query
● 1x Zipkin Web
“cluster”
“zone”
“Stack” - defaultSimpleZipkinFull
“data center”
Stack Deploy In Action
./stack-deploy addlayer --file stacks/cassandra_dc.stack --level datacenter
./stack-deploy addlayer --file stacks/cassandra_cluster.stack --level cluster --parent cassandra_dc
./stack-deploy addlayer --file stacks/cassandra_zone1.stack --level zone --parent cassandra_cluster
./stack-deploy addlayer --file stacks/cassandra_zone2.stack --level zone --parent cassandra_cluster
./stack-deploy add --file stacks/cassandra.stack
./stack-deploy run cassandra --zone cassandra_zone1
Full Stack Deployments
Cassandra
Cassandra Multi DC
Casandra https://github.com/elodina/datastax-enterprise-mesos
Start your nodes!
Apache Kafka
• Apache Kafka
o http://kafka.apache.org
• Apache Kafka Source Code
o https://github.com/apache/kafka
• Documentation
o http://kafka.apache.org/documentation.html
• Wiki
o https://cwiki.apache.org/confluence/display/KAFKA/Index
It often starts with just one data pipeline
Reuse of data pipelines for new producers
Reuse of existing providers for new consumers
Eventually the solution becomes the problem
Kafka decouples data-pipelines
Topics & Partitions
A high-throughput distributed messaging system
rethought as a distributed commit log.
Intra Cluster Replication
Mesos Kafka http://github.com/mesos/kafka
Streaming & Analytics
● The landscape of streaming is about to get more fragmented and harder to
navigate. This is not all bad news and it is not much different than where we
were with NoSQL 6 years ago or so.
● Different systems are getting really (really (really)) good at different things.
○ Dag based systems
○ Event based systems
○ Query & Execution Engines
○ Streaming Engines
○ Etc!
GearPump
Airflow
Spring Cloud Data Flow
Storm (and Storm Topology based systems)
Storm Nimbus
{
"id": "storm-nimbus",
"cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm-mesos nimbus -c mesos.master.url=zk:
//zookeeper.service:2181/mesos -c storm.zookeeper.servers="["zookeeper.service"]" -c nimbus.thrift.port=$PORT0 -c topology.
mesos.worker.cpu=0.5 -c topology.mesos.worker.mem.mb=615 -c worker.childopts=-Xmx512m -c topology.mesos.executor.cpu=0.1 -c
topology.mesos.executor.mem.mb=160 -c supervisor.childopts=-Xmx128m -c mesos.executor.uri=http://repo.elodina.s3.amazonaws.
com/storm-mesos-0.9.6.tgz -c storm.log.dir=$(pwd)/logs",
"cpus": 1.0,
"mem": 1024,
"ports": [31056],
"requirePorts": true,
"instances": 1,
"uris": [
"http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz",
"http://repo.elodina.s3.amazonaws.com/storm.yaml"
]
}
Storm UI
{
"id": "storm-ui",
"cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm ui -c ui.port=$PORT0 -c nimbus.thrift.port=31056 -c nimbus.
host=storm-nimbus.service -c storm.log.dir=$(pwd)/logs",
"cpus": 0.2,
"mem": 512,
"ports": [31067],
"requirePorts": true,
"instances": 1,
"uris": [
"http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz",
"http://repo.elodina.s3.amazonaws.com/storm.yaml"
],
"healthChecks": [
{
"protocol": "HTTP",
"portIndex": 0,
"path": "/",
"gracePeriodSeconds": 120,
"intervalSeconds": 20,
"maxConsecutiveFailures": 3
}
]
}
Storm Kafka - new spouts & bolts for Kafka 8, 9, ...
Apache Kafka Streams
Go Kafka Client - Fan Out Processing
https://github.com/elodina/go-kafka-client-mesos
● Dynamic Kafka Log workers
● Blue/Green Deploy Support
● Fan Out Processing
● Auditable
● Batches
● Scalable/Auto-Scalable
Questions?
http://www.elodina.net

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache Spark
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
 
Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internals
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
 

Ähnlich wie SMACK Stack 1.1

OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
NETWAYS
 

Ähnlich wie SMACK Stack 1.1 (20)

Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
MANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData ServicesMANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData Services
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Deploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerDeploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and docker
 
Azure fb-google Web Services
Azure fb-google Web ServicesAzure fb-google Web Services
Azure fb-google Web Services
 
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
 
Modern Elastic Datacenter Architecture
Modern Elastic Datacenter ArchitectureModern Elastic Datacenter Architecture
Modern Elastic Datacenter Architecture
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Mesos by zigi
Mesos by zigiMesos by zigi
Mesos by zigi
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSmack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
 
Enabling Microservices Frameworks to Solve Business Problems
Enabling Microservices Frameworks to Solve  Business ProblemsEnabling Microservices Frameworks to Solve  Business Problems
Enabling Microservices Frameworks to Solve Business Problems
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStack
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Scabiv0.2
Scabiv0.2Scabiv0.2
Scabiv0.2
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-h
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Ceph-Mesos framework
Ceph-Mesos frameworkCeph-Mesos framework
Ceph-Mesos framework
 

Mehr von Joe Stein

Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 

Mehr von Joe Stein (20)

Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache Mesos
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Storing Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsStoring Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite Columns
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

SMACK Stack 1.1

  • 2. Elodina is a big data as a service platform built on top of open source software. The Elodina platform solves today’s data analytics needs by providing the tools and support necessary to utilize open source technologies. http://www.elodina.net/
  • 3. Whats SMACK Stack? SMACK stack 1.0 has been traditionally Spark, Mesos, Akka, Cassandra and Kafka lots https://dzone.com/articles/smack-stack-guide and lots lots more https: //www.google.com/webhp?q=smack%20stack Now we are going to introduce SMACK Stack 1.1 and talk more about dynamic compute, micro services, orchestration, micro segmentation all part of what you can do now with Streaming, Mesos, Analytics, Cassandra and Kafka
  • 4. The free lunch is over! http://www.gotw.ca/publications/concurrency-ddj.htm
  • 5. Many industries still don’t get it XML is everywhere but we have alternatives! We can support XML interface but don’t have to take on the burden of the extra data. You can save A LOT of overheard just by having a pre-processing step taking the XML, turning it into Avro and processing and storing that. It works https://github.com/elodina/xml-avro You can even process the response in Avro but return the result in XML, more on that later though!
  • 6. You need to be running Mesos. Lots of options here! What is most important is that you abstract your “Provider” from your “Grid”. What is “The Grid”? It is your PaaS layer you deploy too that runs your software. (aka your new awesome super computer) The grid is your mesos cluster. You are likely going to have more than one so plan accordingly. Think of it as immutable infrastructure, the computer does. Step 1
  • 8. The Grid … 2.0 ... https://github.com/elodina/sawfly/blob/master/cloud-deploy-grid.md Program against your datacenter like it’s a single pool of resources Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively. Mesosphere’s Data Center Operating System (DCOS) is an operating system that spans all of the machines in a datacenter or cloud and treats them as a single computer, providing a highly elastic and highly scalable way of deploying applications, services, and big data infrastructure on shared resources. DCOS is based on Apache Mesos and includes a distributed systems kernel with enterprise-grade security.
  • 9.
  • 10.
  • 12.
  • 13. But there is more! ● Provisioning ● Micro Segmentation ● Orchestration ● Configuration Management ● Service Discovery ● Deployment Isolation and Identification ● Telemetry, Tracing, Ops Stuff, Etc ● Oh My! It boils back down into stacks! https://github.com/elodina/stack-deploy and how you are working with your schedulers in your cluster ultimatlly.
  • 14. Stack Deploy to the rescue!
  • 15.
  • 16. In the Grid you need Schedulers! ● Kafka – Producer/Consumer-based message queue management ● Exhibitor – Supervisor for distributed persistence (like ZooKeeper) ● Cassandra/DSE – HA, scalable, distributed NoSQL data storage ● Storm – Topology-based Real-time distributed data streaming ● Monarch – Distributed Remote Procedure Calls, Kafka REST interface and schema repository ● Zipkin – Configure, launch and manage Zipkin distributed trace on Mesos ● HDFS – Configure, launch and manage HDFS on Mesos (coming soon) ● Stockpile – Consumer to “stock pile” data into persistent storage (mesos scheduler only for c* now) ● MirrorMaker – Consumer to make a mirror copy of data to destination ● StatsD – Producer to pump StatsD on Mesos into Kafka for consumption, preserves layers ● SysLog – Producer to pump Syslog on Mesos into Kafka for consumption, preserves layers https://github.com/elodina/
  • 17.
  • 18. Virtual Telemetry “Data Center” In the Grid ZipkinQATeamBuild92 ● 1x Exhibitor-Mesos ● 1x Exhibitor ● 1x DSE-Mesos ● 1x Cassandra node ● 1x Kafka-Mesos ● 1x Kafka 0.8 broker ● 1x Zipkin-Mesos ● 1x Zipkin Collector ● 1x Zipkin Query ● 1x Zipkin Web “cluster” “zone” “Stack” - defaultSimpleZipkinFull “data center”
  • 19. Stack Deploy In Action ./stack-deploy addlayer --file stacks/cassandra_dc.stack --level datacenter ./stack-deploy addlayer --file stacks/cassandra_cluster.stack --level cluster --parent cassandra_dc ./stack-deploy addlayer --file stacks/cassandra_zone1.stack --level zone --parent cassandra_cluster ./stack-deploy addlayer --file stacks/cassandra_zone2.stack --level zone --parent cassandra_cluster ./stack-deploy add --file stacks/cassandra.stack ./stack-deploy run cassandra --zone cassandra_zone1
  • 20.
  • 21.
  • 22.
  • 23.
  • 25.
  • 28.
  • 29.
  • 31.
  • 33.
  • 34. Apache Kafka • Apache Kafka o http://kafka.apache.org • Apache Kafka Source Code o https://github.com/apache/kafka • Documentation o http://kafka.apache.org/documentation.html • Wiki o https://cwiki.apache.org/confluence/display/KAFKA/Index
  • 35. It often starts with just one data pipeline
  • 36. Reuse of data pipelines for new producers
  • 37. Reuse of existing providers for new consumers
  • 38. Eventually the solution becomes the problem
  • 40.
  • 42. A high-throughput distributed messaging system rethought as a distributed commit log.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. Streaming & Analytics ● The landscape of streaming is about to get more fragmented and harder to navigate. This is not all bad news and it is not much different than where we were with NoSQL 6 years ago or so. ● Different systems are getting really (really (really)) good at different things. ○ Dag based systems ○ Event based systems ○ Query & Execution Engines ○ Streaming Engines ○ Etc!
  • 51.
  • 54. Storm (and Storm Topology based systems)
  • 55. Storm Nimbus { "id": "storm-nimbus", "cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm-mesos nimbus -c mesos.master.url=zk: //zookeeper.service:2181/mesos -c storm.zookeeper.servers="["zookeeper.service"]" -c nimbus.thrift.port=$PORT0 -c topology. mesos.worker.cpu=0.5 -c topology.mesos.worker.mem.mb=615 -c worker.childopts=-Xmx512m -c topology.mesos.executor.cpu=0.1 -c topology.mesos.executor.mem.mb=160 -c supervisor.childopts=-Xmx128m -c mesos.executor.uri=http://repo.elodina.s3.amazonaws. com/storm-mesos-0.9.6.tgz -c storm.log.dir=$(pwd)/logs", "cpus": 1.0, "mem": 1024, "ports": [31056], "requirePorts": true, "instances": 1, "uris": [ "http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz", "http://repo.elodina.s3.amazonaws.com/storm.yaml" ] }
  • 56. Storm UI { "id": "storm-ui", "cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm ui -c ui.port=$PORT0 -c nimbus.thrift.port=31056 -c nimbus. host=storm-nimbus.service -c storm.log.dir=$(pwd)/logs", "cpus": 0.2, "mem": 512, "ports": [31067], "requirePorts": true, "instances": 1, "uris": [ "http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz", "http://repo.elodina.s3.amazonaws.com/storm.yaml" ], "healthChecks": [ { "protocol": "HTTP", "portIndex": 0, "path": "/", "gracePeriodSeconds": 120, "intervalSeconds": 20, "maxConsecutiveFailures": 3 } ] }
  • 57. Storm Kafka - new spouts & bolts for Kafka 8, 9, ...
  • 59.
  • 60. Go Kafka Client - Fan Out Processing https://github.com/elodina/go-kafka-client-mesos ● Dynamic Kafka Log workers ● Blue/Green Deploy Support ● Fan Out Processing ● Auditable ● Batches ● Scalable/Auto-Scalable