Glimpse into the workings of an Edgwater Ranzal Infrastructure Engineer that specializes in Enterprise Performance Management (EPM). Presented at OAUG Collaborate 2015.
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
EPM Infrastructure: An Investigation
1. REMINDER
Check in on the
COLLABORATE mobile app
EPM Infrastructure: An Investigation
Prepared by:
Alan Ramirez
aramirez@ranzal.com
Infrastructure Engineer
Edgewater Ranzal
You don’t know what you don’t know.
When EPM is slow, where do you go?
Infrastructure Insight and Workflow.
Session ID#: 10125
@alanr723
2. 1,700+ Oracle EPM & BI projects successfully
delivered since our founding in 1996
100% Focus on Oracle EPM/OBIEE
Product Experts across the full EPM/BI Suite -
Planning, HFM, HSF, HPCM, Essbase, OBIEE,
DRM, FDMEE
Oracle ACEs Across EPM/BI Platform
(Planning/BI, HFM & FDMEE)
Exalytics – Installation, Configuration &
Benchmarking Services
Infrastructure Services – Design,
Configuration, Performance Tuning,
Upgrades & Patching
Support Services – Remote Help Desk, Lev 1
Support, Patch release Support
Business Analytics Solutions Provider Using Oracle EPM and BI Technologies
Edgewater Ranzal
3. Presenter Information
Alan Ramirez, Infrastructure Engineer
■ Employed with Ranzal for 3 years
■ Over 11 years of Oracle EPM/Hyperion experience
▪ Started on Essbase 6.5, Planning 3.5.1, HFM 4.0, Reports 7.x
▪ Experience: Software development, DBA, Infrastructure, QA/CM
■ Adept on all platforms, particular fondness for Linux
▪ Red Hat Certified Engineer (RHEL 5)
▪ Exalytics Certified Specialist; experienced with X2-4, X3-4, X4-4
■ Core tenet: Systems approach as a science, not a black box
▪ No recurring reboots - strive for stability from understanding
▪ Uptime is revered, restarts are for evaluation, not resolution
▪ Deliver quality through stability
▪ Customer service and documentation
Business Analytics Solutions Provider Using Oracle EPM and BI Technologies
4. Agenda
■ Overview
■ Getting Started
■ Troubleshooting Workflow
■ System Startup
■ Patching
■ Stability
■ Comparing Environments
■ Virtualization
■ Real Life Examples
■ Questions
5. Overview
■ Glimpse into the workings of an Infrastructure Engineer that
specializes in EPM
■ Goals:
▪ Exposure to an approach
▪ Awareness of various faculties of the product
▪ Demonstrate a high level troubleshooting workflow
▪ Examples of a simple Infrastructure review
6. Where to start?
How to get your bearings.
Put me on any one of your EPM servers, and I’ll
figure out the rest.
7. Deployment Report
■ All servers are connected to a common set of database
tables collectively referred to as the EPM Registry
■ Survey the entire environment from any EPM server
▪ All hostnames and configured products – architecture diagram
▪ RDBMS flavor, hostname, and connection strings
▪ WebLogic configuration
▪ History of interaction with EPM Registry
— Clean and simple vs repetition and manual registry changes
— Were web apps (JVMs) redeployed recently?
— Were any other changes made recently to the config?
13. Logs
■ Diagnostics
■ Start with Web Tier
▪ ORACLE_EPM_INSTANCE/diagnostics/logs/services
▪ ORACLE_EPM_INSTANCE/diagnostics/logs/starter
▪ MW_HOME/user_projects/domains/EPMSystem/servers/server/logs
■ Services Tier - R&A Services, EPMA (Dimension) Server
▪ ORACLE_EPM_INSTANCE/diagnostics/logs/product
■ Application Server logs
▪ Essbase.log, HsvEventLog.log, Interop.log
■ Event Viewer
14. Logs Step 1 - services directory:
EPM_INSTANCE_HOMEdiagnosticslogsservices
■ Directly relates to NT services
■ Typically the start of my workflow
■ Each svc has sysout and a syserr
■ Sysout most useful
■ Syserr rarely has timestamps
15. Understanding WebLogic State
When WebLogic completes it’s startup process, it writes out:
<Notice> <WebLogicServer> <BEA-000360> <Server started in RUNNING mode>
Believe it or not, you do
get used to reading
these, and will become
familiar with what good
logs look like, and can
quickly evaluate logs
and know if things are
good or not. Most often
I will tail the last 50-100
lines of each log, but
not uncommon to
quickly browse entire
logs looking at patterns.
16. Logs Step 2 - domain logs:
MW_HOMEuser_projectsdomainsEPMSystemserversFoundationServices0logs
■ Under the WebLogic domain is a directory
for each Managed Server
■ Each Server directory contains a logs dir
■ More detailed than services logs
■ Logs for various sub threads
17. Logs Step 3 – main logs directory:
EPM_INSTANCE_HOMEdiagnosticslogs
■ The services tier logs here
▪ Reporting & Analysis Agent
▪ EPMA
▪ HSF
20. Start EPM System
o Many wrote their own scripts in 11.1.2.1 and
earlier (net start, sc, psexec)
o 1h 45m for triple redundancy customer with
62 services prompted me to study and refine.
Reduced down to 25m with what became a
standard for our team
o Much improved starting 11.1.2.2
o Add’l tweaks get 11.1.2.3 up <2mins
■ 11.1.2.2
▪ Parallel startup
▪ No dependencies
▪ Startup type: Automatic
or Manual is fine
▪ Typical 8-15mins
■ 11.1.2.3 & 11.1.2.4
▪ Same as 11.1.2.2, but
faster
▪ Typical 2-7mins
■ 11.1.2.1
▪ Sequential due to
dependencies
▪ Startup type: Manual
▪ Single-threaded startup
▪ Typical 20-30mins
21. ■ Only created when using built-in scripts
■ Quick confirmation that all services started successfully
■ Analyze Pass column to be sure all are good
■ Review of history can evidence health or even frustration
Starter logs:
EPM_INSTANCE_HOMEdiagnosticslogsstarter
23. Patching – What version are you on?
“We’re on the ‘502’ version of EPM.”
o Each product has it’s own code line and version number
o 500 patch was a giant patch covering IE 10 support
- HUB 500 was all products except Essbase suite
- Separate patches for Essbase 500, EAS 500, APS 500, etc.
o Back to individual version numbers per product
■ Mar 2014:
▪ HUB 11.1.2.3.500
▪ Essbase 11.1.2.3.500
■ Dec 2014:
▪ HSS 11.1.2.3.502
▪ HFM 11.1.2.3.502
▪ Essbase 11.1.2.3.505
■ Nov 2013:
▪ HSS 11.1.2.3.001
▪ HFM 11.1.2.3.100
▪ Essbase 11.1.2.3.003
24.
25.
26. Patching - Opatch
■ A Java-based utility from Oracle that assists with the exercise
of applying and rolling back patches to Oracle software
■ Multiple Oracle homes, which Opatch directory?
▪ EPMSystem11R1 – Oracle EPM System products
▪ oracle_common – ADF/Jdeveloper components
▪ odi – Oracle Data Integrator (FDMEE) component
27. Patching – PSEs vs PSUs
■ PSE: Patch Set Exception is a singular, one-off patch that typically
addresses a specific issue
■ PSU: Patch Set Update is a collection, or grouping, of PSEs that
have been regression tested together
■ Do not apply all available PSEs, but instead maintain latest PSUs
■ PSUs are released on an approximately quarterly release schedule
28. Available Patch Sets and Patch Set Updates
for EPM Products (Doc ID 1400559.1)
OBIEE 11g: Bundle Patches (Doc ID 1488475.1)
30. Stability
■ How often do you restart services?
■ How about rebooting servers?
■ History
▪ Consistency of process, logs over time, routines….
▪ Evaluate Starter logs
■ Some services are susceptible to abuse
▪ Financial Reporting
▪ Planning – web forms, SmartView
▪ EAS
■ Essbase – often don’t realize there are issues
▪ xcp files
▪ Graceful shutdowns – check both Essbase and app logs
31. Stability - Planning
■ Heap dumps enabled on OutOfMemory condition can show
exactly what was going on when the JVM ran out of memory
▪ Large/bad webforms
▪ SmartView retrieves
— Large hit to JVM if suppression options are disabled
— Query below would have tried to produce > 28 million cells
■ Essbase Governor
▪ QRYGOVEXECTIME
▪ QRYGOVEXECBLK
■ Planning Governor
▪ ERROR_THRESHOLD_NUM_OF_CELLS=175,000
32. Stability - WebLogic
■ STUCK threads?
■ Long running task – any task where execution runs longer
than a predefined (default 10min) threshold
▪ Not intelligent
▪ Tunable, increase to 20 mins?
▪ Need an in-depth understanding of the application
■ Causes
▪ SmartView retrieves
▪ Planning form resultset too large
▪ Bad user sessions (Click the ‘x’ instead of proper logouts)
▪ User Behavior: IE “Not Responding”, Close browser and retry
▪ WebLogic Connection Pool too small
34. Grading Environments – Many Criteria
■ Architecture
▪ Server Specifications
▪ VMware Infrastructure
▪ Storage Infrastructure
▪ EPM product distribution
■ Opatches
■ Web tier
▪ JVM heap settings
▪ Connection pools
■ App tier
▪ Tuning values
▪ Log sizes and rotations
■ RDBMS
▪ Statistics/Indexes
■ Performance
▪ Resource dedication (virt. only)
▪ Power Plan
▪ CPU
▪ Storage
▪ AV On Access Exclusions
▪ Windows TCP/IP tuning
■ Networking
▪ hosts file, name resolution,
TCP/IP settings
▪ Topology, hops, subnets
▪ FQDN
35. Sample Infrastructure Review
Review Summary of 26 major criteria across all Production EPM servers
CUSTOMER: American multinational food and beverage company
Considering correctness, stability, performance,
what kind of shape is my EPM environment in?
37. Virtualization of
Oracle EPM
■ Primary advantage of a typical virtualization strategy is to reduce
capital and operating costs via server consolidation
▪ Obtain greater densities w small/med servers (2-4 vCPUs, 4GB)
▪ Common to see 20-25 active machines on a single host
▪ Medium sized host: 16 cores, 64GB memory
■ Heavy footprint of EPM does not permit anywhere near the same
degree of server consolidation
■ Reserve 100% of resources to achieve a 1:1 ratio physical to virtual
■ Highly sensitive to even low latency
■ Does NOT respond well in environments that are oversubscribed
▪ Overcommittment
▪ Ballooning
▪ Compression
38. Real Life Examples
Each environment is unique and presents a new
set of challenges.
■ Proactive DBA Killing Pools
■ Profile Limits Essbase
■ Factory BIOS Config
■ Teaming NICs
39. Story #1 – Proactive DBA
Customer: Medical Center for Private Research University
Issue:
■ 12 hours to load and consolidate May data
■ Repeatedly restarting EPM because don’t know what else to do
■ No idea how to approach. Network! Storage? Hard drives! Oh my!
40. Story #1 – Proactive DBA
■ Analyzed 6 days of logs
across 11 WebLogic
JVMs in 2 environments
■ All WLS connection pools drop simultaneously every 5 hours
41. Story #1 – Proactive DBA in the way
■ Root Cause:
▪ 6-8 months prior, connections did not appear to be properly
closed when EPM System was stopped
▪ Frequent restarts as connections continue to grow
▪ As a result, the DBA implemented a connection cleanup
routine to kill idle sessions
▪ This routine was prematurely terminating valid database
pool connections held by the application servers
42. Story #2 – Can’t connect to Essbase
Customer: Global Satellite Services Provider
ISSUE:
■ EssbaseCluster-1 could not be expanded in EAS
■ All Essbase applications could not be started, only some
▪ Error 1013000 loading application: Serious Error(1013000)
▪ Unable to Create Request Server Thread
■ They had tried restarting services, EAS, Essbase, etc
■ Cannot start additional Essbase applications
But then,
■ I stopped two apps, and was able to start one of the apps that
didn’t previously start – suggestive of resource limits
43. Story #2 – Can’t connect to Essbase
■ User profile settings too restrictive (Linux security: limits.conf)
■ Essbase server cannot create additional processes
▪ Not possible to start additional applications
▪ Cannot open additional connections from EAS to Essbase
BEFORE
AFTER
44. Story #3 – Intel SpeedStep
Customer: American multinational financial services corporation
■ Two Exalytics servers: PROD is much slower
■ Studied network, storage throughput tests, evaluated I/O
■ Cannot find anything, until I decided to check core count
■ cat /proc/cpuinfo
▪ Noticed one degraded CPU frequency
▪ Rechecked and it was fine, rechecked again to find lower speeds
▪ Enter SpeedStep: power saving via stepping down clock speed
■ Resolution: Disable SpeedStep in BIOS
45. Story #4 – WebLogic Won’t Start
Customer: Travel Technology company
■ No managed WebLogic servers would start
■ Admin Server would not start
■ WebLogic logs showed trying to listen on a certain IP address, but that IP
address no longer exists
■ The IP address was that of the backup network
■ Disabling that NIC allowed WLS to start
■ Further research determined that HP (hosting provider) had teamed the NICs
47. Contact Information
Edgewater Ranzal
108 Corporate Park Drive, Suite 105
White Plains, NY 10604
Tel (914) 253-6600
Email: info@ranzal.com
Company Contact
Robin Ranzal Knowles, President
Alan Ramirez
Infrastructure
Edgewater Ranzal
ranzal.com
aramirez@ranzal.com
@alanr723
Thank you for attending!
48. Please complete the session
evaluation
We appreciate your feedback and insight
You may complete the session evaluation either
on paper or online via the mobile app