More Related Content Similar to Get the Big Picture! End-to-End Monitoring of Heterogeneous Middleware and Apps (20) More from SL Corporation (20) Get the Big Picture! End-to-End Monitoring of Heterogeneous Middleware and Apps1. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.1
Get the Big Picture!
End-to-End Monitoring of Heterogeneous Middleware and Apps
Tom Lubinski, CTO, SL Corporation
8/9 October, 2013
2. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.2
RTView – Get the Big Picture !
• What is end-to-end monitoring
• Why should you care about it ?
• Traditional system management tools are not enough
• Agent-based transaction monitoring tools not enough
• What RTView does differently to address this need
• Customer Use Case
3. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.3
What is End-To-End Monitoring ?
• Most people think this is end-to-end monitoring:
A B C D E
A Linear Data Flow …
Where A, B, C are JSPs, Servlets, Topics, Queues, etc.
4. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.4
What is End-To-End Monitoring ?
• Sometimes it can be more complex:
A B C D E
With Loops and Error Paths …
but it’s still “one-dimensional”
G F
Error
5. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.5
What is End-To-End Monitoring ?
• It’s even more complex than this
• Two more dimensions must be added to the picture …
6. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.6
What is End-To-End Monitoring ?
• A second dimension to capture the nested levels
systems are implemented in – component layering
Host Layer
Physical Servers, Network, Disk, OS
App Server CachingMessaging
Servlet JSP
EJB
Topic Queue
Route
Cache Service
7. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.7
What is End-To-End Monitoring ?
WebLogic
Server
Coherence
Cache
Load-Balanced Servers backed by Distributed Cache
A third dimension that most people don’t even think of …
process distribution
8. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.8
Why Should You Care ?
• These three dimensions create complexity – all going on
at same time
• Taken together they generate huge volumes of
monitoring data and complex relationships between the
application and the underlying components.
• Add in a fourth dimension, time, and the challenge
becomes even greater
9. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.9
Why Should You Care ?
• System Mgmt tools give you lots of data, but
typically one component at time … only the 2nd
dimension
• Transaction monitoring tools only 1st dimension.
• Both are essentially “after the fact” - like alerts …
you can tell when a transaction or component has
failed but it needs to be correlated with state of
entire system
10. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.10
How RTView Addresses This Problem
• RTView collects data from all these dimensions and is
aware of the relationships.
• If an app is dependent on a specific msg queue, RTView
can map this to the JMS server that contains the queue,
as well as the system that hosts the server and correlate
the metrics
• RTView is aware of Server load-balancing and the
distributed nature of Coherence caches
11. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.11
How RTView Addresses This Problem
www.sl.com
12. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.12
RTView Use Case
Application: On-Line Store
Large on-line store providing product search and
ordering services for consumer software products
WebLogic for Application Server
Coherence for Database Caching
Multiple other technologies, including TIBCO EMS for
communications services
VMWare Virtual Infrastructure
13. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.13
RTView Use Case
Application Team
Small Group (< 10) responsible for 100+ WebLogic
Servers + 100 Coherence nodes replicated in
DEV, TEST, and PROD
Some peripheral monitoring, e.g. Splunk for log
files, Omniture for web tracking, OEM for WLS and
OC
14. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.14
RTView Use Case
Support Challenges
Difficult to have confidence that store is “OK”
Only know when something goes wrong
When it does, difficult to determine the cause
Each subsystem implemented as a WL Cluster
WebLogic = OEM only allows them to see one server at a
time, but the store works off of “clusters”
Coherence = complete black box
15. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.15
RTView Use Case
RTView Solution
Hi-Level Overview Diagram showing “health state” –
provides confidence that all subsystems are OK
Present WebLogic information in “clusters” – makes it
possible to see aggregate metrics and load
balancing for each app
Correlate Coherence metrics with WL to provide
confidence that Coherence black box is OK
16. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.16
On-Line Store Overview Diagram
RTView used to create system overview diagram
17. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.17
WebLogic Cluster/Server Summary
All Servers Organized by Cluster, with Health State
18. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.18
WebLogic Cluster App Summary
Each Cluster shown as a unit, with server metrics aggregated
19. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.19
Load Balance Analysis
Load Balance Comparison of multiple metrics across WebLogic and Coherence
20. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.20
Aggregating Other Middleware Information
Health State of each service aggregated from multiple components
21. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.21
Aggregating Other Middleware Information
Health State of each service aggregated from multiple components
22. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.22
Aggregating Other Middleware Information
Including Aggregate Service Alert History over Time
23. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.23
Aggregating Other Middleware Information
Including Detailed History of Coherence Cache Service
24. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.24
The Emerging Standard Oracle Stack
VMware
Oracle Coherence/Databases
Messaging Middleware
Fusion / TIBCO / MQ
Oracle WebLogic
25. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.25
Conclusion
• 3-Dimensional End-to-end monitoring required for the
emerging standard Oracle application stack
• High-level overviews showing “health state” give confidence
that all subsystems are OK
• Ability to present WebLogic information in “clusters” makes it
possible to see aggregate metrics and load balancing for each
app
• Ability to correlate Coherence metrics with WebLogic helps to
ensure that Coherence black box is OK
• Include Virtual Infrastructure State and Components from
Other Vendors
26. © 2012 SL Corporation. All Rights Reserved.
© 2013 SL Corporation. All Rights Reserved.26
Thank you!
For more information, please visit
www.sl.com