With the arrival of the cloud, and business focus on service based reporting, capturing data for Capacity Management has never been more important.
These slides discuss the challenges of capturing the sorts of data required, to answer the demands of the business.
1. www.metron-athene.com
Data, data, everywhere, and not a bit to use.
With the arrival of the cloud, and business focus on service based
reporting, capturing data has never been more important.
www.metron-athene.com
3. www.metron-athene.com
Session Agenda
• Why Talk about data?
• Business demands
• My basic principles
• Data Capture Techniques
• Data Sources
• The Obligatory Cloud Part
• APM
• CMIS
• Reality
www.metron-athene.com
4. www.metron-athene.com
Why talk about data?
• Dashboard, Dashboard, Dashboard
• Alerts
• Automation
• CMIS
• What does that all sit on?
• Raw data
• Ever increasing number of requests to take data
from an ever increasing number of sources
www.metron-athene.com
6. www.metron-athene.com
A Problem Shared
• Data you want
• Cool data you got hold of
• Solutions you found
• Write them down, scrunch it up, and throw them
up the front here.
– Put your name on it
– No prizes for hitting the presenter
– Or just ask later
www.metron-athene.com
7. www.metron-athene.com
Session Agenda
• Why Talk about data?
• Business demands
• My basic principles
• Data Capture Techniques
• Data Sources
• The Obligatory Cloud Part
• APM
• CMIS
• Reality
www.metron-athene.com
8. www.metron-athene.com
What businesses are asking for
• Have data for everything
– Internal to a system
– Across all infrastructure (build a service picture)
– Business volumes & transaction response times
• Don’t deploy more agents
• Ensure reliable data
• Minimal Storage
• No staff
www.metron-athene.com
9. www.metron-athene.com
Where a lot of people are
• A handful of tools for specific platforms
– Designed for sys admin roles
– No single person can access them all
• No business data
– Projections based on resource utilisations
• Huge volumes of “out of reach” data
• Some Agents, some SNMP capture, some stuff
nobody understands anymore
• Limited staff
www.metron-athene.com
10. www.metron-athene.com
Session Agenda
• Why Talk about data?
• Business demands
• My basic principles
• Data Capture Techniques
• Data Sources
• The Obligatory Cloud Part
• APM
• CMIS
• Reality
www.metron-athene.com
11. www.metron-athene.com
Data Capture (My Basic Principals)
• More is (just this once), more
• At data capture time get everything you will need
– Time travel is still fiction
– Quality is important
– Put it under YOUR control
• Full service picture
– Resource
– Application
– Network
– SAN
– Business data
www.metron-athene.com
12. www.metron-athene.com
Session Agenda
• Why Talk about data?
• Business demands
• My basic principles
• Data Capture Techniques
• Data Sources
• The Obligatory Cloud Part
• APM
• CMIS
• Reality
www.metron-athene.com
13. www.metron-athene.com
Capture Techniques
• Agentless (SNMP, WMI, etc)
– Is subject to more security issues, and network quality.
– Broken communication = lost data
– Easier/Faster implementation
– (often), Less data of lower quality
• Agent Based
– Autonomous
• Data collected by a local process. If the server is up, data
capture is running.
• Broken communication = catch up later
– Possibility to use existing Agents
– Overhead (system and human)
• Vote now!
• Blended Delivery Model
– Where most people are.
www.metron-athene.com
14. www.metron-athene.com
Session Agenda
• Why Talk about data?
• Business demands
• My basic principles
• Data Capture Techniques
• Data Sources
• The Obligatory Cloud Part
• APM
• CMIS
• Reality
www.metron-athene.com
15. www.metron-athene.com
Application/Service Data
• Databases
– Well thought out APIs or Windows Counters
– Well thought out Agents do this
• SAP
– Various transactions return Perf data (e.g. ST03)
• What if there is no designed interface?
– Logs, databases, write your own instrumentation
– APM Tools
www.metron-athene.com
16. www.metron-athene.com
Business/Application Transaction data (APM)
• A user action = A transaction
– Log on, Search, Add to Basket, Checkout, Payment = 5
transactions
• Benefits
– Common language
– Service based
– Defined SLAs
– Real workload volumes (Planning benefits)
• Usual Difficulties
– No tool capturing this data (Ask me for a recommendation)
– No access to the data held (Typically controlled by Operations)
– No import facility to capacity tool
• Avoid
– Exporting data from both tools into Excel and manually cutting
and pasting to get combined reports
www.metron-athene.com
17. www.metron-athene.com
SANs
• Challenge
– IOPS remains the biggest bottleneck
– Surprising number of capacity managers are unaware of
storage capacity available
• Where to get data
– SMI-S (Storage Management Initiative – Standard)
– PowerShell Plugins
• Learn PowerShell or learn to serve fries (some dude 2008)
– Storage Vendor central control server
• Operations Manager, StorageWorks, ControlCenter
• Using the data?
– Bring it into your capacity tool
www.metron-athene.com
18. www.metron-athene.com
In the last 6 months
• Business / Customer transaction reports (multiple
types)
• Open VMS T4 data
• Historical CPU & Memory data from home grown
scripts
• NetApp, HP EVA
• IP Pool allocation
• Datacenter temperature & power
www.metron-athene.com
19. www.metron-athene.com
More detailed example
• NetApp
– Operations Manager DFM CLI Export
– Occupancy and performance data for all LUNS, Volumes,
Aggregates & Systems connected to Operations Manager.
– dfm data export run –d comma –t “5 mins” –f avg –h “1 day”
– Database tables in .csv
– Script to produce something “nicer” to import
www.metron-athene.com
20. www.metron-athene.com
Session Agenda
• Why Talk about data?
• Business demands
• My basic principles
• Data Capture Techniques
• Data Sources
• The Obligatory Cloud Part
• APM
• CMIS
• Reality
www.metron-athene.com
22. www.metron-athene.com
Basic Cloud Types & Challenges (IaaS)
• Public Cloud (Worst Case)
– No control
– You put your faith in the provider
– Monitor response times only?
• Private Cloud (Best Case)
– Full control
– You are responsible, but have all the data
• Community Cloud (Never seen)
– Potential control
– You are involved and may have access to the data
• Hybrid Cloud (Where you’re likely to be)
– Some control
– Full control of the Private Cloud portion only
www.metron-athene.com
23. www.metron-athene.com
Want to Benchmark the Public cloud?
• “How hard can it be” Jeremy Clarkson
• Get a VM up and running and see what workload it can
handle
– AWS results all over the place
• Somebody else must have looked into this:
• http://www.spec.org/osgcloud/
– Still working on it….
– Join in ? (I’m short of the $10,000 required…)
• http://datasys.cs.iit.edu/events/MTAGS12/i02.pdf
– IaaS Cloud Benchmarking: Approaches, Challenges, and Experience
– Alexandru Iosup, Radu Prodan, and Dick Epema
www.metron-athene.com
24. www.metron-athene.com
Benchmarking the Cloud problems
• Cloud evolution
– Changes made under your feet
– We are no longer in the loop
“commercial clouds such as Amazon EC2 add frequently new functionality to
their systems. Thus, the benchmarking results obtained at any given time may
be unrepresentative for the future behaviour of the system.” Alexandru Iosup, Radu Prodan,
and Dick Epema
• So why don’t we continually benchmark the cloud?
– Because it’s complex and expensive (Challenge 1 = how to do it
cheap)
“A straightforward approach to benchmark both short-term dynamics and
long-term evolution is to measure the system under test periodically, with
judiciously chosen frequencies [26]. However, this approach increases the
pressure of the so-far unresolved Challenge 1.” Alexandru Iosup, Radu Prodan, and Dick Epema
www.metron-athene.com
25. www.metron-athene.com
Benchmarking (more problems)
• Even with lots of data, you’ll have a hard time
making it fit reality because you cannot replicate
all the software involved.
“We have surveyed in our previous work [26], [27] over ten
performance studies that use common benchmarks to assess the
virtualization overhead on computation (5–15%), I/O (10–30%), and
HPC kernels (results vary). We have shown in a recent study of four
commercial IaaS clouds [27] that virtualized resources obtained from
public clouds can have a much lower performance than the
theoretical peak, possibly because of the performance of the
middleware layer.” Alexandru Iosup, Radu Prodan, and Dick Epema
www.metron-athene.com
26. www.metron-athene.com
Long term observation
“We have observed the long-term evolution in performance of clouds
since 2007. Then, the acquisition of one EC2 cloud resource took an
average time of 50 seconds, and constantly increased to 64 seconds
in 2008 and 78 seconds in 2009. The EU S3 service shows
pronounced daily patterns with lower transfer rates during night hours
(7PM to 2AM), while the US S3 service exhibits a yearly pattern with
lowest mean performance during the months January, September,
and October. Other services have occasional decreases in
performance, such as SDB in March 2009, which later steadily
recovered until December [26].” Alexandru Iosup, Radu Prodan, and Dick Epema
www.metron-athene.com
27. www.metron-athene.com
Final nail in the coffin
“Depending on the provider and its middleware abstraction, several
cloud overheads and performance metrics can have different
interpretation and meaning.” Alexandru Iosup, Radu Prodan, and Dick Epema
• So you can’t trust the data from clouds to be what you expect.
• And you can’t trust your existing benchmarks to represent the
future.
• So…what can you do?
www.metron-athene.com
28. www.metron-athene.com
Private Cloud
• You are in charge and you monitor the hardware
utilisations
• The Cloud still has physical limits, and soft
“limits”
– Resource Pools, Reservations etc
• Opportunity
– Resource Utilisation and Service Information combined
• Users, Processes, Transactions, Business Volumes
• Challenge
– Business decision based on easy capacity monitoring?
www.metron-athene.com
29. www.metron-athene.com
Session Agenda
• Why Talk about data?
• Business demands
• My basic principles
• Data Capture Techniques
• Data Sources
• The Obligatory Cloud Part
• APM
• Reality
www.metron-athene.com
30. www.metron-athene.com
APM
• Transaction times
• Transaction counts
• Transaction type
• End to end
• Per server
• All that information you could never get from the
business, in one handy location
www.metron-athene.com
32. www.metron-athene.com
Session Agenda
• Why Talk about data?
• Business demands
• My basic principles
• Data Capture Techniques
• Data Sources
• The Obligatory Cloud Part
• APM
• CMIS
• Reality
www.metron-athene.com
34. www.metron-athene.com
CMIS
• Centralise
– A single interface to all data
• Organise it
– Mirror the organisation
• Automate
– Computers are great at performing
repetitive tasks. Use them.
www.metron-athene.com
35. www.metron-athene.com
Session Agenda
• Why Talk about data?
• Business demands
• My basic principles
• Data Capture Techniques
• Data Sources
• The Obligatory Cloud Part
• APM
• CMIS
• Reality
www.metron-athene.com
36. www.metron-athene.com
In reality how do people get data?
• From other internal teams
– Process and reprimands
• From the outsourcer
– Contract
– Enforce it
• The 1st rule of data club:
– Data supplier uses their own
tools
• Requires:
– Sponsor with teeth
www.metron-athene.com
37. www.metron-athene.com
So what conclusions do I draw?
• Be flexible
– You will have to take the data from whatever already exists
• Stand Your Ground
– Don’t make work for yourself (You don’t have the staff)
– They deliver the data, you keep it.
• Introduce APM/BTM tools
– The typical missing element
• Centralise the data at capture time
• Know your cloud strategy and get in early with
requirements
www.metron-athene.com