This presentation was part of the Cloudify and XLAB Research Webinar about DevOps for Data Intensive Applications.
In this webinar we discussed how to leverage automation for your big data applications, using DICE tools based on the Cloudify Open Source Orchestration.
We want to make sure that developers use the time to develop their big data applications and not have to worry about deployment and operations, and have the shortest time to delivery possible.
We also cover using the DICE deployment tools for automated deployment of Spark, Storm, Cassandra or Hadoop.
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
DICE & Cloudify – Quality Big Data Made Easy
1. DICE
Horizon 2020 Research & Innovation Action
Grant Agreement no. 644869
http://www.dice-h2020.eu
Funded by the Horizon 2020
Framework Programme of the European Union
DICE:
quality Big Data made easy
Matej Artač @matej_artac
XLAB Research @xlab_research
Michele Guerriero, Damian A. Tamburri
Politecnico di Milano
2. Agenda
o Introduction
o About DICE
Developing Data-Intensive Cloud Applications with
Iterative Quality Enhancements
o DICER
o TOSCA
o Cloudify
o DICE delivery tools
2
4. Building blocks for DIAs today
4
Coordinator (Kafka)
Orchestrator (Hadoop Cluster)
Data
Store
Batch
Layer
Speed
Layer
Serving
Layer
Serving
Layer
Serving
Layer
Data
Source
Data
Source
Distributed
computation
Data streaming
HDFS
Distributed storage
Lambda architecture Cloud infrastructure
5. What problems EU SMEs face?
5
Traditional market:
Legacy software systems
Customers with legacy
data now ask for Big
Data technologies
Growth in sight, but …
Learning curves
Initial prototyping
Risk of failure
(+ others…)
Fast-paced market
6. Traditional approach to deployment
o Spend time studying
documentation
o Use trial and error to set up
a working cluster
o Use incompatible public
cookbooks
o Repeat for each change
o Keep the Big Data cluster
fixed for fear of breaking it
days
7. What is in the box?
o DICER
model your application
create a TOSCA blueprint
o Cloudify
Pure-play orchestration & automation
o DICE deployment tool
alternative front-end for Cloudify
o TOSCA library
worries about deploying Big Data
services so that you don’t have to
7
9. The Rapid Growth of Big Data
9
o Software market rapidly shifting to Big data
27% compound annual growth rate through 2017 (IDC)
Popular technologies such as Spark, Hadoop, and NoSQL
boost Big Data adoption and revenues from new services
Business issue: 65% of Big data projects still fail (CapGemini’15)
Source: IDC Source: Wikibon
11. DICE Mission and Partners
ICT 9 Call/2014 – Software engineering
9 partners (Academia & SMEs), 7 EU countries
11
Mission: support SMEs in developing high-quality
cloud-based data-intensive applications (DIAs)
(IEAT)
(IMP)
(PMI)
(ZAR) (NETF)
(XLAB)
(ATC)
(FLEXI)(PRO)
12. Ingredients of the DICE approach
o DevOps
o Model-Driven Engineering
12
Dev Ops Dev Ops
Analysis
Deployment blueprint
13. DICE incremental modeling and analysis
13
DICE Platform Independent Model (DPIM)
DICE Technology Specific Model (DTSM)
DICE Deployment Specific Model (DDSM)
is implemented by
is deployed onto
TOSCA
blueprint
Analysis
Analysis
Analysis &
Optimization
M2M transformation
M2M transformation
M2T transformation
DICEMethodology
14. DICE deployment, monitoring and testing
14
Deployment
Testbed
Monitoring
Fault Injection
Quality
Testing
Trace
Checking
Enhancement
Anomaly
Detection
Running
DIA
Comp
MW
VM
Running
DIA
Comp
MW
VM
Configuration
optimization
TOSCA
blueprint
DICEMethodology
22. What is TOSCA?
o Open standard
o Enabling a unique Cloud
eco-system
o Supported by a large
and growing number of
international industry
leaders
22
Associated Companies
23. TOSCA is an Intent Model which is declarative
(integration points for imperative)
TOSCA Domain-Specific Language
Information Models
Typically, used to model a constrained
domain that can be described by a
closed set of entity types, properties,
relationships and operations.
Data Models
Typically, describe the structure
(format), enabling manipulation (via
interfaces) of the data stored in data
management systems assuring
integrity.
• Topology
• Composition
• Requirements - Capabilities
• State (Nodes, Relationships)
• Lifecycle (Management)
• Policy
Intent Model Adds:
• Structure
• Format
• interfaces
• Types, Relationships
• Properties
• Operations
TOSCA can work with
imperative scripts
(e.g., Ansible, Chef,
Bash, Ant, etc.)
TOSCA can include
other data models
(e.g., JSON, YANG)
24. Tier (Group Type)
TOSCA is used first and foremost to describe the topology of the deployment view for
cloud applications and services
Topology – Nodes and Relationships
24
source_resource
Node_Type_A
target_resource
Node_Type_B
Requirement
connect_relationship
ConnectsTo
Capability
Nodes - are the resources
or components that will be
materialized or consumed
in the deployment
topology
Relationships
express the dependencies
between the nodes (not the
traffic flow)
Requirement - Capability
Relationships can be
customized to match specific
source requirements to target
capabilities
Groups
Create Logical,
Management or Policy
groups (1 or more nodes)
Node templates to describe components in the topology
structure
Relationship templates to describe connections,
dependencies, deployment ordering
25. Application Tier
(container)
Application
Tier
(container)
Composition – different service templates can be
“wired” together
25
Logging/Monitoring Tier (ELK)
nodejs
WebServer
app_server
Compute
paypal_pizza
store
WebApplication
collectd
logstash
SoftwareComponent
Requirements
Container
Capabilities
log_endpoint
logstash_server
Compute
Capabilities
Container
elasticsearch
SoftwareComponent
Requirements
Container
Capabilities
search_endpoint
elasticsearch
_server
Compute
Capabilities
kibana
SoftwareComponent
Requirements
Container
kibana_server
Compute
Capabilities
search_endpoint
ConnectsTo
HostedOn HostedOn HostedOn
ConnectsTo
mongo_dbms
DBMS
mongo_server
Compute
mongo_db
Database
rsyslog
search_endpoint
ContainerContainer
ConnectsTo
Enabling the description of complex, multi-tier (hybrid) Cloud applications
30. 30
Cloudify Key Aspects
Open Source
Open source is key
to drive innovation
and create superb
quality software.
Open Standard
Open standard-based
TOSCA Spec for
application blueprints
allows vendor
neutrality, and
enables collaboration.
Future Proof
Try new emerging
technologies while
using stable in place
existing ones.
36. Deploy your own Big Data services
DevOps approach:
o Describe your Big Data
cluster and application in a
blueprint
o Store and maintain the
blueprint in your VCS with
the application’s code: IasC
o Rely on orchestrators and
configuration managers for
executing deployments
36
hours
39. Conclusion
o DICE tools remove barriers to Big Data
o DICE technology library simplifies blueprints
o TOSCA blueprints describe infrastructure as code
o Enabled Continuous Integration and Continuous
Delivery
44
40. Links
o Cloudify: http://getcloudify.org
o DICE H2020: http://www.dice-h2020.eu/
o DICE deployment service:
https://github.com/dice-project/DICE-Deployment-Service
o Big Data blueprint examples:
https://github.com/dice-project/DICE-Deployment-Examples
o DICER:
https://github.com/dice-project/DICER
45
41. Follow us
o DICE project: @diceh2020
o Cloudify:
@CloudifySource – ilanadl@getcloudify.org
o User groups:
https://groups.google.com/forum/#!forum/cloudify-users
o Webinars:
http://getcloudify.org/webinars.html
o Matej Artač: @matej_artac – matej.artac@xlab.si
o XLAB Research: @xlab_research
46
As models are the basis for analyses
As incremental reasoning on models can lead to the automated creation of deployment blueprints
Node templates to describe components in the topology structure
Relationship templates to describe connections, dependencies, deployment ordering
Example: Connect a Logging / Monitoring Service composed of ElasticSearch, LogStash and Kibana (ELK)
Container is a logical unit of deployment to enable focused continuous delivery.
Delivery Service handles platform parameters (e.g., credentials for the IaaS, OS image and flavour IDs, location of a monitoring service, etc.)
Orchestration engine and blueprint visualization provided by Cloudify. Blueprint generated by DICER.
Orchestration engine and blueprint visualization provided by Cloudify. Blueprint generated by DICER.