Monitoring cloud applications presents unique challenges due to the scale and diversity of infrastructure resources and applications. A monitoring-as-a-service solution should scale dynamically, have minimal impact on monitored resources, and allow for customization of monitored metrics. Key benefits include end-to-end monitoring, ease of use, reliability, and cost effectiveness.
3. Context
Are agreed service levels met?
Overall how many applications are healthy vs non-healthy?
Is the health getting worse over time?
Are the business functions being performed as expected?
Do you have capacity within applications?
3
4. Context
Cloud Complexity
Scale and diversity of the infrastructure
- Servers, network devices, storages, etc.
- Hundreds, even thousands of machines
Massive number of user applications
- Catastrophic consequence of failure / security breach /
performance degradation
4
5. Context
Resource utilization is tightly coupled with cost incurred by
customers
Monitoring is indispensable
Availability, failure detection
Performance, provisioning
Security, anomaly detection
Application-level monitoring
5
6. Challenges - Overview
Inherits performance monitoring challenges of virtualized world
End user response time – a primary metric
Mechanism to collect data from various sources
Managing agents
Monitor, identify & heal bottlenecks
6
7. Challenges - Overview
Detect performance degradation:
Single malfunctioning application on a guest has a potential to
degrade performance of host and other resources
Resource contention among applications executing on VMs may
hamper performance
Virtual machines not configured with sufficient resource to
handle workload
7
9. Challenges – A Closer Look
System
Challenges
User Cloud
Challenges Monitoring
Network
Challenges
9
10. Challenges – System Level
Efficient Scalability:
Monitor tasks – tens of thousands
Cost effective - minimize resource usage
Facilitating service
10
11. Challenges – System Level
Efficient Scalability:
Massive Scale
Monitor inherent large scale tasks
Large number of users
- Infrastructure monitoring
- Application monitoring
Monitor tasks with high cost e.g. Resources with high consumption
11
12. Challenges – System Level
Monitoring QoS Assurance:
SLA management
Application security
Federated identity of cloud applications
Secured integration of cloud apps with on-premise apps
Multi-tenant environment
Authorization & access control
Monitor contention between monitoring tasks
12
13. Challenges – User Level
Continuous violation detection
Need of different detection model - Dynamically add/remove
servers based on performance
Achieve efficiency at the same time
Short-term burst Persistent violation
13
14. Challenges – Network Level
Resource-aware monitoring fabric
Monitoring the functioning of both systems and applications running
on large-scale distributed systems
Continuous collecting detailed attribute values
- A large number of nodes
- A large number of attributes
Overhead increases quickly as the system, application and
monitoring tasks scales up
14
15. Performance Monitoring
Understand performance of virtual infrastructure – outside in
approach
Troubleshoot bottlenecks
Plan future needs
15
17. CPU
CPU saturated?
High Ready time
Problematic if it is sustained for high periods
Possible contention for CPU resources among VMs
Workload Variability?
Resource limits on VMs?
Actual over commitment?
High SwapWait time
17
22. Monitoring-as-a-Service
Similar to other cloud services
Database service (e.g. SimpleDB, Datastore)
Storage service (e.g. S3)
Application service (e.g. AppEngine)
22
23. High Level Solution
Applications, Events & Alerts
Server – CPU, Customization
memory, disk IO
Packate rate, Gather data from
bandwidth, NICs various resources
Trend analysis
23
24. Monitoring-as-a-Service
External monitoring Web server, file server, mail server, VOIP
Server monitoring CPU, memory, processes, storage
Network monitoring Http, SSH, SNMP, discovery
Transaction Multi-step apps, workflows
monitoring
Cloud monitoring Track running instances, auto-deploy,
usage
Web Traffic monitor Visitor, page views
24
25. Key Highlights
Scale dynamically
Have minimum (or no) impact on the monitored infrastructure
Should be portable and has to be light weight
Easy feature customization. Not all metrics will need to be
monitored in the cloud for everyone
Heavy network based monitoring tools may not be a good fit
25
26. Key Highlights
Comprehensive monitoring of resource performance and
availability
Applications, databases, middleware and web servers
Provide innovative ideas to fetch data as business need grows
Dashboard, views, reports
Co-relate information from different sources
Trends analysis
Predict bottlenecks
26
28. Summary
Cloud is complex; monitoring needs are indispensable
End user response time is primary focus
Cloud services must be treated differently to on-premise
software when it comes to systems monitoring
Do not rely on vendors completely. If SLAs are serious,
maintain your own logs
Existing tools are good but use programmatic APIs for specific
needs
28