SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Monitoring CloudStack and Components
August, 22nd 2017
Alexander Stock
Cloud Infrastructure Architect
©2017itelligenceclassification:public|version:1.105/17/2017
About Me
2
Sysadmin @BIT.Group GmbH – member of itelligence group
Experience in Vmware, KVM, Nagios and Ansible
Working with CloudStack since 2015
GitHub:
https://github.com/AlexanderStock
Mail:
alexander.stock@bitgroup.de
©2017itelligenceclassification:public5/17/2017
CloudStack Berlin & Dresden, Germany
https://www.meetup.com/german-CloudStack-user-group
Ansible Dresden, Germany
https://www.meetup.com/Ansible-Dresden
Overview BIT.Group GmbH – member of itelligence group
3
350+ employees in Dresden, Bautzen, Hanover and Shanghai
SAP Consulting, Development and Support
SAP partner and service provider for SAP SE
©2017itelligenceclassification:external
IT Consulting
Development
Cloud IT Infrastructure Management
SAP BASIS
SAP Solution Manager
Application Lifecycle Management
International
BIT Service Desk
SAP Service & Support
ITIL SAP HANA
Workshops
IT Service Management
SAP partner
5/17/2017
Since June 2016 BIT.Group GmbH
officially part of itelligence and
NTT DATA Group
Know-how, flexibility and
internationality as part of NTT DATA network
BIT.Group GmbH as part of itelligence / NTT DATA Group
4
Together internationally leading full
IT service provider with:
©2017itelligenceclassification:external
3.500+ active
SAP customers
Locations in
40+ countries
$1,5 billion
in SAP revenue worldwide
Over 9.000
SAP experts worldwide
5/17/2017
Agenda
1. What do we use for monitoring?
2. MySQL
3. Tomcat
4. CloudStack API
5. Distributed Monitoring
5
5. Distributed Monitoring
©2017itelligenceclassification:public5/17/2017
What do we use for Monitoring?
6
Why do we monitor CloudStack?
Detecting performance issues
Detecting misconfigurations
Detecting resource bottlenecks
Get a long-term overview of our installationsGet a long-term overview of our installations
©2017itelligenceclassification:public5/17/2017
What do we use for Monitoring?
We use Nagios with frontend called Check_MK
Check_MK :
Combines passive and active checks
Auto inventory of Client hosts
Manage host/services/reports
7
Manage host/services/reports
Live status: Module to access to the core
data of Nagios
Can monitor
Linux/Unix/Windows/Switches/Storage…
Out of the Box
S: https://en.wikipedia.org/wiki/File:Cmk-dashboard.png
©2017itelligenceclassification:public5/17/2017
Event-
Konsole
Status
GUI
BI WATO Mobile
Custom
Applications
Multisite Web Platform
NagVis
Event-
Daemon
PNP-
4Nagios
RRDTool
Monitoring Core (Nagios / Icinga)
Live status
What do we use for Monitoring?
Syslog
SNMP
Traps
Linux
Solaris VMS
Windows
HP-
UX
AIX
Switch
Sensor
Appliance Router PING
DNS-
Server
HTTP-
Server
TCP-
Port
Daemon
CMK
Notify
Monitoring Core (Nagios / Icinga)
Check_MK
Live check
Nagios-
Plugin
Nagios-
Plugin
TCP or SSH
TCP/IP
SNMP
InlineICMP
What do we use for Monitoring?
9
Check_MK
Host
1
2
34
AgentTCP
Active check
Passive checks
Retrieve data
22.08.2017©2016itelligenceKlassifizierung:intern
Nagios core triggers active check (Check_MK script)
Check_MK script polls data from client over TCP
Check_MK script writes long-term data to RRD files
Check_MK script distributes check results to passive checks
RRD
34
current state
MySQL
10
Check_MK Plugin for MySQL
Installation
Configuration Monitoring-Client
wget https://<mycheckmkserver>/check_mk/agents/mk_mysql
mv mk_mysql /usr/lib/check_mk_agent/plugin/
Configuration Monitoring-Client
Configuration Monitoring-Server
©2017itelligenceclassification:public5/17/2017
vi /etc/check_mk/mysql.cfg
[client]
user=monitor
password=MyPassWord
cmk -I <mydbhost>
cmk -r
MySQL
11
Checks:
MySQL DB Size <database>
MySQL Connections mysql
MySQL DB Slave mysql
MySQL InnoDB IO mysql
MySQL Version mysql
Alternatives for pure Nagios:
Check mysql health
Active Check for MySQL
Advanced features like “cache hit rates“
or “slow queries“
©2017itelligenceclassification:public5/17/2017
Tomcat
12
Check_MK_Plugin for Tomcat using Jolokia (JMK Bridge):
Installation
wget http://search.maven.org/remotecontent?filepath=org/jolokia/jolokia-
war/1.3.5/jolokia-war-1.3.5.war
mv jolokia-war-1.3.5.war /usr/share/cloudstack-management/webapps/jolokia.war
service cloudstack-management restart
Configuration Monitoring-Client
Configuration Monitoring-Server
©2017itelligenceclassification:public5/17/2017
cd /etc/check_mk/
Wget https://<mycheckmkserver>/itlinfra/check_mk/agents/cfg_examples/jolokia.cfg
cmk -I <mytomcathost>
cmk -r
service cloudstack-management restart
wget https://<mycheckmkserver>/check_mk/agents/mk_jolokia
mv mk_jolokia /usr/lib/check_mk_agent/plugin/
Tomcat
13
Metrics:
JVM <PORT> <url> Requests
JVM <PORT> <url> Sessions
JVM <PORT> GC PS_MarkSweep
JVM <PORT> GC PS_Scavenge
JVM <PORT> Memory
JVM <PORT> ThreadPool http-8080
JVM <PORT> ThreadPool jk-20400JVM <PORT> ThreadPool jk-20400
JVM <PORT> Threads
JVM <PORT> Uptime
©2017itelligenceclassification:public5/17/2017
CloudStack API
14
Check Cloudstack.py:
Developed by BIT.Group to see what's going on inside CloudStack
Python script which can monitor different parts of CloudStack
Build as an active check which can also be used with plain Nagios
Thresholds can be defined in a JSON file (Global thresholds and instance thresholds)
Performance Data (long-term usage) will be produced by the ScriptsPerformance Data (long-term usage) will be produced by the Scripts
Two categories:
Availability checks
Resource checks
©2017itelligenceclassification:public5/17/2017
CloudStack API
15
Availabilty checks:
Hoststatus:
Status of Hosts per cluster
Detects if Hosts are reachable and enabled
Writes performance data
System VM:
Status for Cluster: kvm01
Host Result Status Enabled
hv05 OK running yes
hv03 OK running yes
hv02 OK running yes
hv04 OK running yes
hv01 OK running yes
System VM:
Global status of all virtual routers
Writes performance data
Virtual router:
Global status of all virtual routers
Detects if VR is up or needs an update
Checks Redundant Routers
Writes performance data
Name Status Running
v-1405-VM OK yes
s-1406-VM OK yes
Name Status Running Upgrade
r-1289-VM OK yes no
r-1385-VM OK yes no
r-1272-VM Critical yes yes
r-1173-VM OK yes no
r-1381-VM OK yes no
Status of redundant VPC Routers
Name Status Status
©2017itelligenceclassification:public5/17/2017
CloudStack API
16
Resource checks:
Capacity:
• Status of all global capacity metrics
• Thresholds can be set in JSON file
• Writes performance data for each metric
Domains/Projects:
OK: CAPACITY_TYPE_CPU is in status ok. Value:37.2%
OK: CAPACITY_TYPE_MEMORY is in status ok. Value:71.11%
OK: CAPACITY_TYPE_STORAGE_ALLOCATED No Thresholds given.Value:26.99%
OK: CAPACITY_TYPE_VIRTUAL_NETWORK_PUBLIC_IP No Thresholds given. Value:63.03%
OK: CAPACITY_TYPE_PRIVATE_IP No Thresholds given. Value:3.92%
OK: CAPACITY_TYPE_VLAN No Thresholds given. Value:92.96%
OK: CAPACITY_TYPE_DIRECT_ATTACHED_PUBLIC_IP No Thresholds given. Value:2.01%
OK: CAPACITY_TYPE_SECONDARY_STORAGE No Thresholds given. Value:45.01%
OK: CAPACITY_TYPE_STORAGE No Thresholds given. Value:19.38%
OK: CAPACITY_TYPE_LOCAL_STORAGE No Thresholds given. Value:0%
Domains/Projects:
• Monitors usage metrics for all domains/projects
• Checks if domains/projects have
• reached their resource thresholds
• Thresholds can be set in JSON file
• Writes performance data for all metrics
Offerings:
• Monitors if offerings can be deployed on clusters
• Thresholds can be defined in JSON file
• Writes performance data for each offering
Results for Domain ROOT:
Results for Domain DOM1:
Warning: Domain DOM1 has reached threshold for cpu: 80
Results for Domain DOM2:
Results for Domain DOM3:
Results for Domain DOM4:
Warning: Domain DOM4 has reached threshold for memory: 80
Results for Domain DOM5:
Statistics for Cluster: kvm01
! Offering ! Count!
!XL ! 21!
!XXL ! 12!
!XXXL ! 5!
!XXXXL ! 0!
!XXXXXL ! 0!
--> Critical: Offering: XXXXL can not be deployed anymore
--> Critical: Offering: XXXXXL can not be deployed anymore
©2017itelligenceclassification:public5/17/2017
CloudStack API
17
Execution:
Configfiles:
For domain and project checks: For offering and capacity checks:
{
"thresholds": {
{
"thresholds": {
./cloudstack-resources.py -m <MODE> -f <configfile> -d <optional DomainID> -p <optional ProjectID>
"thresholds": {
„DOM1": {
"cpu": {
"warn": "50",
"critical": "90"
}
}
},
"global":{
"cpu": {
"warn": „60",
"critical": "95"
}
}
}
"thresholds": {
"CAPACITY_TYPE_MEMORY": {
"warn": "50",
"critical": "80"
},
"CAPACITY_TYPE_CPU": {
"warn": "30",
"critical": „70"
}
}
}
©2017itelligenceclassification:public5/17/2017
CloudStack API
18
Outlook:
Checks to come:
Monitoring of usage of networks
Monitoring optimal VM placement
Resource forecasting
Monitoring old snapshots
Download:
https://exchange.nagios.org/directory/Plugins/Cloud/Check_Cloudstack/details
©2017itelligenceclassification:public5/17/2017
Distributed Monitoring
19
One Master Server which holds all
configurations of the slaves
Status of objects will be queried on
demand via Live status
All data is stored on the slaves
Core
State
System System System
RRDs
Livestatus
Master Site
All data is stored on the slaves
Configurations of the slaves will be done
via API and HTTPS
Slaves provide UI functionality for the
customers
Setup can be done over UI
©2017itelligenceclassification:public5/17/2017
Core
State
System System System
RRDs
Core
State
System System System
RRDs
Slave Site 2Slave Site 1
Livestatus
Livestatus
Distributed Monitoring
20
Configuration of hosts and setting
over UI or API.
Automation with Chef, Ansible…
Central overview of all systems
Rules can maintained centraly
Monitoring Network (isolated)
©2017itelligenceclassification:public5/17/2017
NetworkCustomerA(isolated)
NetworkCustomerB(isolated)
UI Access User
Replication of setting
and Query of Livestatus
Check of Servers
Summary
21
Detecting performance issues
Solved through MySQL and Tomcat checks
Detecting misconfigurations:
Solved through availability checks through the API
Detecting resource bottlenecks:
Solved through resource checks through the API
Get a long-term overview of our installations:
All checks producing RRD Files which can be used for analysis over a long period
©2017itelligenceclassification:public5/17/2017
Other Platforms
22
Zabbix
Zenoss
https://github.com/ke4qqq/zabbix-cloudstack
https://www.zenoss.com/product/zenpacks/cloudstackhttps://www.zenoss.com/product/zenpacks/cloudstack
©2017itelligenceclassification:public5/17/2017
classification:public|author:AlexanderStock|version:1.1
Questions?
Alexander Stock
Cloud Infrastructure Architect
alexander.stock@bitgroup.de
BIT.Group GmbH – member of itelligence group
We make the most of SAP® solutions!
5/17/2017©2017itelligenceclassification:public|author:AlexanderStock|version:1.1
Contact
Questions?`
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of itelligence AG. The information contained herein may be changed without prior notice.
Some software products marketed by itelligence AG and its distributors contain proprietary software components of other software vendors. All product and service names mentioned and associated logos displayed are the
trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary.
Copyright itelligence AG - All rights reserved
8/22/2017©2017itelligence
trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary.
The information in this document is proprietary to itelligence. This document is a preliminary version and not subject to your license agreement or any other agreement with itelligence. This document contains only
intended strategies, developments and product functionalities and is not intended to be binding upon itelligence to any particular course of business, product strategy, and/or development. itelligence assumes no
responsibility for errors or omissions in this document. itelligence does not warrant the accuracy or completeness of the information, text, graphics, links, or other items contained within this material. This document is
provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose, or non-infringement.
itelligence shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials. This limitation shall not apply in
cases of intent or gross negligence.
The statutory liability for personal injury and defective products is not affected. itelligence has no control over the information that you may access through the use of hot links contained in these materials and does not
endorse your use of third-party Web pages nor provide any warranty whatsoever relating to third-party Web pages.

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Introduction and news
Introduction and newsIntroduction and news
Introduction and news
 
CloudStack Container Service
CloudStack Container ServiceCloudStack Container Service
CloudStack Container Service
 
Introductions & CloudStack news - Giles Sirett
Introductions & CloudStack news - Giles SirettIntroductions & CloudStack news - Giles Sirett
Introductions & CloudStack news - Giles Sirett
 
CloudStack news
CloudStack newsCloudStack news
CloudStack news
 
Using the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStackUsing the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStack
 
CloudStack - Apache's best kept secret
CloudStack - Apache's best kept secretCloudStack - Apache's best kept secret
CloudStack - Apache's best kept secret
 
CloudStack Container Service
CloudStack Container ServiceCloudStack Container Service
CloudStack Container Service
 
Improving CloudStack for operators
Improving CloudStack for operatorsImproving CloudStack for operators
Improving CloudStack for operators
 
Policy driven SDN in CloudStack
Policy driven SDN in CloudStack Policy driven SDN in CloudStack
Policy driven SDN in CloudStack
 
Securing your Cloud Environment v2
Securing your Cloud Environment v2Securing your Cloud Environment v2
Securing your Cloud Environment v2
 
[OpenStack Days Korea 2016] Track1 - Red Hat enterprise Linux OpenStack Platform
[OpenStack Days Korea 2016] Track1 - Red Hat enterprise Linux OpenStack Platform[OpenStack Days Korea 2016] Track1 - Red Hat enterprise Linux OpenStack Platform
[OpenStack Days Korea 2016] Track1 - Red Hat enterprise Linux OpenStack Platform
 
CloudStack EU user group - Trillian
CloudStack EU user group - TrillianCloudStack EU user group - Trillian
CloudStack EU user group - Trillian
 
Giles Sirett - welcome and CloudStack news
Giles Sirett - welcome and CloudStack news Giles Sirett - welcome and CloudStack news
Giles Sirett - welcome and CloudStack news
 
CCNA17 KVM and CloudStack
CCNA17 KVM and CloudStackCCNA17 KVM and CloudStack
CCNA17 KVM and CloudStack
 
CloudStack UI
CloudStack UICloudStack UI
CloudStack UI
 
John Spray - Ceph in Kubernetes
John Spray - Ceph in KubernetesJohn Spray - Ceph in Kubernetes
John Spray - Ceph in Kubernetes
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
Microservices: AutoScaling in Hyper-Microservice Architecture | Nguyễn Trung ...
Microservices: AutoScaling in Hyper-Microservice Architecture | Nguyễn Trung ...Microservices: AutoScaling in Hyper-Microservice Architecture | Nguyễn Trung ...
Microservices: AutoScaling in Hyper-Microservice Architecture | Nguyễn Trung ...
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
 
Securing your Cloud Environment
Securing your Cloud EnvironmentSecuring your Cloud Environment
Securing your Cloud Environment
 

Ähnlich wie Monitoring CloudStack and components

Ähnlich wie Monitoring CloudStack and components (20)

VMworld 2013: vCloud Powered HPC is Better and Outperforming Physical
VMworld 2013: vCloud Powered HPC is Better and Outperforming PhysicalVMworld 2013: vCloud Powered HPC is Better and Outperforming Physical
VMworld 2013: vCloud Powered HPC is Better and Outperforming Physical
 
Mastering the move
Mastering the moveMastering the move
Mastering the move
 
2011-11-03 Intelligence Community Cloud Users Group
2011-11-03 Intelligence Community Cloud Users Group2011-11-03 Intelligence Community Cloud Users Group
2011-11-03 Intelligence Community Cloud Users Group
 
Super-NetOps Source of Truth
Super-NetOps Source of TruthSuper-NetOps Source of Truth
Super-NetOps Source of Truth
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoT
 
Continuous Security: From tins to containers - now what!
Continuous Security: From tins to containers - now what!Continuous Security: From tins to containers - now what!
Continuous Security: From tins to containers - now what!
 
Super-NetOps Source of Truth
Super-NetOps Source of TruthSuper-NetOps Source of Truth
Super-NetOps Source of Truth
 
Cloud-native Java EE-volution
Cloud-native Java EE-volutionCloud-native Java EE-volution
Cloud-native Java EE-volution
 
Digital Forensics and Incident Response in The Cloud Part 3
Digital Forensics and Incident Response in The Cloud Part 3Digital Forensics and Incident Response in The Cloud Part 3
Digital Forensics and Incident Response in The Cloud Part 3
 
Plan with confidence: Route to a successful Do178c multicore certification
Plan with confidence: Route to a successful Do178c multicore certificationPlan with confidence: Route to a successful Do178c multicore certification
Plan with confidence: Route to a successful Do178c multicore certification
 
Cloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit KubernetesCloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit Kubernetes
 
F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017
 
Kubernetes Navigation Stories – DevOpsStage 2019, Kyiv
Kubernetes Navigation Stories – DevOpsStage 2019, KyivKubernetes Navigation Stories – DevOpsStage 2019, Kyiv
Kubernetes Navigation Stories – DevOpsStage 2019, Kyiv
 
Simplify Networking for Containers
Simplify Networking for ContainersSimplify Networking for Containers
Simplify Networking for Containers
 
IBM QRadar SIEM V7.3.2 Deployment C1000-055 Questions
IBM QRadar SIEM V7.3.2 Deployment C1000-055 QuestionsIBM QRadar SIEM V7.3.2 Deployment C1000-055 Questions
IBM QRadar SIEM V7.3.2 Deployment C1000-055 Questions
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-IT
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPS
 
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud BoundariesGDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
 
Designing CloudStack Clouds
Designing CloudStack CloudsDesigning CloudStack Clouds
Designing CloudStack Clouds
 
F5 OpenShift Workshop
F5 OpenShift WorkshopF5 OpenShift Workshop
F5 OpenShift Workshop
 

Mehr von ShapeBlue

Mehr von ShapeBlue (20)

CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlueCloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
 
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlueCloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
 
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
 
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlueVM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
 
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHubHow We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
 
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
 
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
 
How We Use CloudStack to Provide Managed Hosting - Swen Brüseke - proIO
How We Use CloudStack to Provide Managed Hosting - Swen Brüseke - proIOHow We Use CloudStack to Provide Managed Hosting - Swen Brüseke - proIO
How We Use CloudStack to Provide Managed Hosting - Swen Brüseke - proIO
 
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
 
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
 
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
 
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
 
Use Existing Assets to Build a Powerful In-house Cloud Solution - Magali Perv...
Use Existing Assets to Build a Powerful In-house Cloud Solution - Magali Perv...Use Existing Assets to Build a Powerful In-house Cloud Solution - Magali Perv...
Use Existing Assets to Build a Powerful In-house Cloud Solution - Magali Perv...
 
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
 
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
 
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
 
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
 
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
 
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
 
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Monitoring CloudStack and components

  • 1. Monitoring CloudStack and Components August, 22nd 2017 Alexander Stock Cloud Infrastructure Architect ©2017itelligenceclassification:public|version:1.105/17/2017
  • 2. About Me 2 Sysadmin @BIT.Group GmbH – member of itelligence group Experience in Vmware, KVM, Nagios and Ansible Working with CloudStack since 2015 GitHub: https://github.com/AlexanderStock Mail: alexander.stock@bitgroup.de ©2017itelligenceclassification:public5/17/2017 CloudStack Berlin & Dresden, Germany https://www.meetup.com/german-CloudStack-user-group Ansible Dresden, Germany https://www.meetup.com/Ansible-Dresden
  • 3. Overview BIT.Group GmbH – member of itelligence group 3 350+ employees in Dresden, Bautzen, Hanover and Shanghai SAP Consulting, Development and Support SAP partner and service provider for SAP SE ©2017itelligenceclassification:external IT Consulting Development Cloud IT Infrastructure Management SAP BASIS SAP Solution Manager Application Lifecycle Management International BIT Service Desk SAP Service & Support ITIL SAP HANA Workshops IT Service Management SAP partner 5/17/2017
  • 4. Since June 2016 BIT.Group GmbH officially part of itelligence and NTT DATA Group Know-how, flexibility and internationality as part of NTT DATA network BIT.Group GmbH as part of itelligence / NTT DATA Group 4 Together internationally leading full IT service provider with: ©2017itelligenceclassification:external 3.500+ active SAP customers Locations in 40+ countries $1,5 billion in SAP revenue worldwide Over 9.000 SAP experts worldwide 5/17/2017
  • 5. Agenda 1. What do we use for monitoring? 2. MySQL 3. Tomcat 4. CloudStack API 5. Distributed Monitoring 5 5. Distributed Monitoring ©2017itelligenceclassification:public5/17/2017
  • 6. What do we use for Monitoring? 6 Why do we monitor CloudStack? Detecting performance issues Detecting misconfigurations Detecting resource bottlenecks Get a long-term overview of our installationsGet a long-term overview of our installations ©2017itelligenceclassification:public5/17/2017
  • 7. What do we use for Monitoring? We use Nagios with frontend called Check_MK Check_MK : Combines passive and active checks Auto inventory of Client hosts Manage host/services/reports 7 Manage host/services/reports Live status: Module to access to the core data of Nagios Can monitor Linux/Unix/Windows/Switches/Storage… Out of the Box S: https://en.wikipedia.org/wiki/File:Cmk-dashboard.png ©2017itelligenceclassification:public5/17/2017
  • 8. Event- Konsole Status GUI BI WATO Mobile Custom Applications Multisite Web Platform NagVis Event- Daemon PNP- 4Nagios RRDTool Monitoring Core (Nagios / Icinga) Live status What do we use for Monitoring? Syslog SNMP Traps Linux Solaris VMS Windows HP- UX AIX Switch Sensor Appliance Router PING DNS- Server HTTP- Server TCP- Port Daemon CMK Notify Monitoring Core (Nagios / Icinga) Check_MK Live check Nagios- Plugin Nagios- Plugin TCP or SSH TCP/IP SNMP InlineICMP
  • 9. What do we use for Monitoring? 9 Check_MK Host 1 2 34 AgentTCP Active check Passive checks Retrieve data 22.08.2017©2016itelligenceKlassifizierung:intern Nagios core triggers active check (Check_MK script) Check_MK script polls data from client over TCP Check_MK script writes long-term data to RRD files Check_MK script distributes check results to passive checks RRD 34 current state
  • 10. MySQL 10 Check_MK Plugin for MySQL Installation Configuration Monitoring-Client wget https://<mycheckmkserver>/check_mk/agents/mk_mysql mv mk_mysql /usr/lib/check_mk_agent/plugin/ Configuration Monitoring-Client Configuration Monitoring-Server ©2017itelligenceclassification:public5/17/2017 vi /etc/check_mk/mysql.cfg [client] user=monitor password=MyPassWord cmk -I <mydbhost> cmk -r
  • 11. MySQL 11 Checks: MySQL DB Size <database> MySQL Connections mysql MySQL DB Slave mysql MySQL InnoDB IO mysql MySQL Version mysql Alternatives for pure Nagios: Check mysql health Active Check for MySQL Advanced features like “cache hit rates“ or “slow queries“ ©2017itelligenceclassification:public5/17/2017
  • 12. Tomcat 12 Check_MK_Plugin for Tomcat using Jolokia (JMK Bridge): Installation wget http://search.maven.org/remotecontent?filepath=org/jolokia/jolokia- war/1.3.5/jolokia-war-1.3.5.war mv jolokia-war-1.3.5.war /usr/share/cloudstack-management/webapps/jolokia.war service cloudstack-management restart Configuration Monitoring-Client Configuration Monitoring-Server ©2017itelligenceclassification:public5/17/2017 cd /etc/check_mk/ Wget https://<mycheckmkserver>/itlinfra/check_mk/agents/cfg_examples/jolokia.cfg cmk -I <mytomcathost> cmk -r service cloudstack-management restart wget https://<mycheckmkserver>/check_mk/agents/mk_jolokia mv mk_jolokia /usr/lib/check_mk_agent/plugin/
  • 13. Tomcat 13 Metrics: JVM <PORT> <url> Requests JVM <PORT> <url> Sessions JVM <PORT> GC PS_MarkSweep JVM <PORT> GC PS_Scavenge JVM <PORT> Memory JVM <PORT> ThreadPool http-8080 JVM <PORT> ThreadPool jk-20400JVM <PORT> ThreadPool jk-20400 JVM <PORT> Threads JVM <PORT> Uptime ©2017itelligenceclassification:public5/17/2017
  • 14. CloudStack API 14 Check Cloudstack.py: Developed by BIT.Group to see what's going on inside CloudStack Python script which can monitor different parts of CloudStack Build as an active check which can also be used with plain Nagios Thresholds can be defined in a JSON file (Global thresholds and instance thresholds) Performance Data (long-term usage) will be produced by the ScriptsPerformance Data (long-term usage) will be produced by the Scripts Two categories: Availability checks Resource checks ©2017itelligenceclassification:public5/17/2017
  • 15. CloudStack API 15 Availabilty checks: Hoststatus: Status of Hosts per cluster Detects if Hosts are reachable and enabled Writes performance data System VM: Status for Cluster: kvm01 Host Result Status Enabled hv05 OK running yes hv03 OK running yes hv02 OK running yes hv04 OK running yes hv01 OK running yes System VM: Global status of all virtual routers Writes performance data Virtual router: Global status of all virtual routers Detects if VR is up or needs an update Checks Redundant Routers Writes performance data Name Status Running v-1405-VM OK yes s-1406-VM OK yes Name Status Running Upgrade r-1289-VM OK yes no r-1385-VM OK yes no r-1272-VM Critical yes yes r-1173-VM OK yes no r-1381-VM OK yes no Status of redundant VPC Routers Name Status Status ©2017itelligenceclassification:public5/17/2017
  • 16. CloudStack API 16 Resource checks: Capacity: • Status of all global capacity metrics • Thresholds can be set in JSON file • Writes performance data for each metric Domains/Projects: OK: CAPACITY_TYPE_CPU is in status ok. Value:37.2% OK: CAPACITY_TYPE_MEMORY is in status ok. Value:71.11% OK: CAPACITY_TYPE_STORAGE_ALLOCATED No Thresholds given.Value:26.99% OK: CAPACITY_TYPE_VIRTUAL_NETWORK_PUBLIC_IP No Thresholds given. Value:63.03% OK: CAPACITY_TYPE_PRIVATE_IP No Thresholds given. Value:3.92% OK: CAPACITY_TYPE_VLAN No Thresholds given. Value:92.96% OK: CAPACITY_TYPE_DIRECT_ATTACHED_PUBLIC_IP No Thresholds given. Value:2.01% OK: CAPACITY_TYPE_SECONDARY_STORAGE No Thresholds given. Value:45.01% OK: CAPACITY_TYPE_STORAGE No Thresholds given. Value:19.38% OK: CAPACITY_TYPE_LOCAL_STORAGE No Thresholds given. Value:0% Domains/Projects: • Monitors usage metrics for all domains/projects • Checks if domains/projects have • reached their resource thresholds • Thresholds can be set in JSON file • Writes performance data for all metrics Offerings: • Monitors if offerings can be deployed on clusters • Thresholds can be defined in JSON file • Writes performance data for each offering Results for Domain ROOT: Results for Domain DOM1: Warning: Domain DOM1 has reached threshold for cpu: 80 Results for Domain DOM2: Results for Domain DOM3: Results for Domain DOM4: Warning: Domain DOM4 has reached threshold for memory: 80 Results for Domain DOM5: Statistics for Cluster: kvm01 ! Offering ! Count! !XL ! 21! !XXL ! 12! !XXXL ! 5! !XXXXL ! 0! !XXXXXL ! 0! --> Critical: Offering: XXXXL can not be deployed anymore --> Critical: Offering: XXXXXL can not be deployed anymore ©2017itelligenceclassification:public5/17/2017
  • 17. CloudStack API 17 Execution: Configfiles: For domain and project checks: For offering and capacity checks: { "thresholds": { { "thresholds": { ./cloudstack-resources.py -m <MODE> -f <configfile> -d <optional DomainID> -p <optional ProjectID> "thresholds": { „DOM1": { "cpu": { "warn": "50", "critical": "90" } } }, "global":{ "cpu": { "warn": „60", "critical": "95" } } } "thresholds": { "CAPACITY_TYPE_MEMORY": { "warn": "50", "critical": "80" }, "CAPACITY_TYPE_CPU": { "warn": "30", "critical": „70" } } } ©2017itelligenceclassification:public5/17/2017
  • 18. CloudStack API 18 Outlook: Checks to come: Monitoring of usage of networks Monitoring optimal VM placement Resource forecasting Monitoring old snapshots Download: https://exchange.nagios.org/directory/Plugins/Cloud/Check_Cloudstack/details ©2017itelligenceclassification:public5/17/2017
  • 19. Distributed Monitoring 19 One Master Server which holds all configurations of the slaves Status of objects will be queried on demand via Live status All data is stored on the slaves Core State System System System RRDs Livestatus Master Site All data is stored on the slaves Configurations of the slaves will be done via API and HTTPS Slaves provide UI functionality for the customers Setup can be done over UI ©2017itelligenceclassification:public5/17/2017 Core State System System System RRDs Core State System System System RRDs Slave Site 2Slave Site 1 Livestatus Livestatus
  • 20. Distributed Monitoring 20 Configuration of hosts and setting over UI or API. Automation with Chef, Ansible… Central overview of all systems Rules can maintained centraly Monitoring Network (isolated) ©2017itelligenceclassification:public5/17/2017 NetworkCustomerA(isolated) NetworkCustomerB(isolated) UI Access User Replication of setting and Query of Livestatus Check of Servers
  • 21. Summary 21 Detecting performance issues Solved through MySQL and Tomcat checks Detecting misconfigurations: Solved through availability checks through the API Detecting resource bottlenecks: Solved through resource checks through the API Get a long-term overview of our installations: All checks producing RRD Files which can be used for analysis over a long period ©2017itelligenceclassification:public5/17/2017
  • 23. classification:public|author:AlexanderStock|version:1.1 Questions? Alexander Stock Cloud Infrastructure Architect alexander.stock@bitgroup.de BIT.Group GmbH – member of itelligence group We make the most of SAP® solutions! 5/17/2017©2017itelligenceclassification:public|author:AlexanderStock|version:1.1 Contact Questions?`
  • 24. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of itelligence AG. The information contained herein may be changed without prior notice. Some software products marketed by itelligence AG and its distributors contain proprietary software components of other software vendors. All product and service names mentioned and associated logos displayed are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary. Copyright itelligence AG - All rights reserved 8/22/2017©2017itelligence trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary. The information in this document is proprietary to itelligence. This document is a preliminary version and not subject to your license agreement or any other agreement with itelligence. This document contains only intended strategies, developments and product functionalities and is not intended to be binding upon itelligence to any particular course of business, product strategy, and/or development. itelligence assumes no responsibility for errors or omissions in this document. itelligence does not warrant the accuracy or completeness of the information, text, graphics, links, or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. itelligence shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials. This limitation shall not apply in cases of intent or gross negligence. The statutory liability for personal injury and defective products is not affected. itelligence has no control over the information that you may access through the use of hot links contained in these materials and does not endorse your use of third-party Web pages nor provide any warranty whatsoever relating to third-party Web pages.