2. StrikrSystemsLLP
Motivation
monitoring is an integral part of the DBOps.
Any tool or product that is developed should assist the DBA monitoring team
to perform their tasks efficiently and with minimum friction.
3. StrikrSystemsLLP
Current Scenario
Skill gap and plenty of L1.
Repeated tasks which need strong due-diligence
a missed alert may be problematic
large number of alerts
Work gets slowed down due to tools friction.
4. StrikrSystemsLLP
Current workflow
alerts are generated which are then pushed over email. DBA monitoring team reads each email,
reviews the alert details and acts accordingly.
what is the content of each email ?
alert information, including host and alert details.
once the alert is deemed important, the host name is copied from the email and pasted
in the browser window that is running the oracle inventory system.
the search returns the host details which includes the IP_ADDRESS.
now the person copies the IP_ADDRESS.
Since, there is bi-directional passwordless SSH between the Jump server and the production
databases.
a Putty session is started with selecting the Jump Server IP address.
On entering the password, a prompt is presented.
next, 'su - ora_l1' is performed to access the account.
the user types 'ssh' and then pastes the IP_ADDRESS.
on pressing ENTER, the 'ssh DB_IP_ADDRESS' command is issued which starts the
multi-level menu based program.
5. StrikrSystemsLLP
Analysis of the Current Workflow
What is the source of Alert ?
Oracle Enterprise Manager (oem)
Why is the alert sent over email ?
multiple recipients through a reflector address
6. StrikrSystemsLLP
Analysis of the Current Workflow
What is the content of each email ?
every notification is composed of 14 environment variables.
TARGET_NAME
TARGET_TYPE
HOST
METRIC
METRIC_VALUE
POLICY_RULE
KEY_VALUE
KEY_VALUE_NAME
VIOLATION_CONTEXT
TIMESTAMP
SEVERITY
UP
DOWN
UNREACHABLE CLEAR
UNREACHABLE START
BLACKOUT END
BLACKOUT START
Other metrics can have any of the following severities:
WARNING
CRITICAL
CLEAR
METRIC ERROR CLEAR
METRIC ERROR START
MESSAGE
RULE_NAME
RULE_OWNER
7. StrikrSystemsLLP
Analysis of the Current Workflow
What is the starting point for the monitoring team ?
in order for the person to work, (s)he requires
DB_NAME
HOST_NAME
IP_ADDRESS
ALERT
but (s)he is provided with email which requires access to two more sources of information.
1. oracle inventory system
get the IP_ADDRESS for a given HOST
2. locally maintained Excel spreadsheet which contains multiple entries like
List of Alerts to ignore (aka known conditions)
MASSSMSDB
SVHJ0439
172.30.3.181
Critical:SVHJ0439 - CPU Utilization is 96.595%, crossed warning (80) or critical (95) threshold
List of Alerts to consider mandatorily
CONUSG6
SVHJ1196
172.30.6.197
Warning:+ASM_SVHJ1196_svc - Disk Group ARCH is 76.673% used.
8. StrikrSystemsLLP
Analysis of the Current Workflow
What is the source of inefficiency, friction in the current approach ?
INE01. manually scan each email
INE02. manually lookup each host_name
INE03. manually compare alert contents with the alerts maintained in spreadsheet
INE04. manually SSH to the jump server.
INE05. manually SSH to the IP_ADDRESS of production DB from jump server
INE06. manually navigate each level and sub-level of the menu program
INE07. information returned via menu options, is difficult to filter and/or drill-down
INE08. in time-sensitive scenarios, the user has to maintain large number of putty sessions
as the menu program becomes a bottle neck.
Summary
for a newbie, the menu program is good for learning.
However in day-to-day operations, when each of the steps are repeated
a large number of times, it is not only boring but frustrating to use two sets of tools
(menu_program and command_line) to accomplish the same task.
9. StrikrSystemsLLP
What is the solution ?
What is the solution ?
side-step email completely
oem to push alerts to 'os-script'
use oradb inventory for host to ip_address
process alerts to generate a summary page
clicking a alert to generate a action webpage
menu is embedded in the webpage itself.
Schematic
11. StrikrSystemsLLP
What is the solution ?
What is the solution ?
configure oracle enterprise manager (oem) to pass the 'alert and policy violation information'
to a 'OS script' (autoport_dbops.sh) which writes the 'alert' to a directory on the jump server.
download and place a copy of the oracle database inventory in CSV format (ora_inv.csv)
on the jump server.
place a file containing the list of the alerts which are 'known_to_ignore' (ora_ign.csv)
and 'known_to_process' (ora_proc.csv) in CSV format on the jump server.
a program process(es) each of the alert files placed on the jump server
and keeps appending the processed information to a ora_r2a.csv file (r2a - ready to act).
the web application loads the ora_r2a.csv
and generates a 'live' web page for the current monitoring situation.
the web page has three set of categories of all the alert(s) processed so far.
when the user clicks a particular link, the target database IP address is automatically selected
for further reference.
since a multi-menu is structurally a star configuration, the user is automatically navigated
to another page that "one-click' access to any of the operations that are currently performed.
in order to support any operator activity, custom filter(s) can be collected and executed
in the background.
detailed
12. StrikrSystemsLLP
What are the benefits ?
What are the benefits ? (version 1)
Task focussed interface (TFI) for the monitoring team.
no need to access any XLSX file or inventory system.
no need to manually copy and paste any data.
no need to manually SSH as it is completely masked from the user
entire multi-level menu available in a 'one-click' star configuration
13. StrikrSystemsLLP
What are the benefits ?
What are the benefits ? (version 2)
encourage user(s) to crowd source update the list of alerts
ie. 'known_to_ignore' and 'known_to_process'.
transparently integrate the action items with Ansible
(if required in future)
integrate the UI with Oracle APEX.
new options and feature(s) can be added within minimum friction
14. StrikrSystemsLLP
current proposed
email
Read
Categorize
Refer
Login to
Ora Inv Repo
Search, Collect
IP, tech details
SSH jumpsrvr
Alert, notification
Navigate Menu
SSH prod DB
Set ENV
Issue SQL
Alert → Orchestrate → Followup → Close
manual
manual
manual
manual
manual
manual
manual
os-script
Alert Processor
Inventory Processor
Alert Matcher
SSH Connector
Dashboard, Menu
generator
User visits
Dashboard
SSH prod DB
Set ENV
Issue SQL
Controller
15. StrikrSystemsLLP
Thanks for your time
Thanks for viewing Strikr case study
on one-click friction free database
operations for Oracle.
Engineering
Ragini Jain
Saifi Khan
94 80 87 33 52
hello@strikr.in