SlideShare ist ein Scribd-Unternehmen logo
1 von 80
Downloaden Sie, um offline zu lesen
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 1
Troubleshooting and Diagnosing 19c RAC
Sandesh Rao
VP AIOps - Autonomous Database
@sandeshr
https://www.linkedin.com/in/raosandesh/
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
2
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Architecture and Basics
Troubleshooting Scenarios
Proactive and Reactive tools
19c and beyond
Q&A
1
2
3
4
5
3
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Architecture and Basics
Troubleshooting Scenarios
Proactive and Reactive tools
18/19c and beyond
Q&A
1
2
3
4
5
4
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Grid Infrastructure
• Grid Infrastructure is Combination of :
– Oracle Cluster Ready Services (CRS)
– Oracle Automatic Storage Management (ASM)
• The Grid Home contains the software for both
products
– Must be installed in different location to RDBMS home
– Installer locks the Grid Home path by setting root
permissions
• CRS can also be Standalone for ASM and/or Oracle
Restart
• CRS can run by itself or in combination with other
vendor clusterware
5
Overview
Disk Group A Disk Group B
Database
Instance
Database
Instance
ASM
Instance
ASM
Instance
Database
Instance
Database
Instance
ASM
Instance
ASM
Instance
Database
Instance
ASM
Instance
Host 1 Host 2 Host 3
Cluster
ASM Disk Groups
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Grid Infrastructure
• Shared Oracle Cluster Registry (OCR) and Voting files
– Must be in ASM or CFS
– OCR backed up every 4 hours automatically GIHOME/cdata
– Kept 4,8,12 hours, 1 day, 1 week
– Restored with ocrconfig
– Voting file backed up into OCR at each change.
– Voting file restored with crsctl
CRS Requirements
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Grid Infrastructure
• Requirements
– One or more redundant private networks for inter-node communications
– High speed with low latency
– Separate physical network or managed converged network
– VLANS are supported
• Usage
– Interconnect is a memory backplane for the cluster
– Clusterware messaging
– RDBMS messaging and block transfer
– ASM messaging
– HANFS for block traffic
CRS Network
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 8
CRS stack is spawned from
Oracle HA Services Daemon
(ohasd)
On Unix ohasd runs out of
inittab with respawn
A node can be evicted
when deemed unhealthy
• May require reboot
• IPMI integration
or diskmon in case of Exadata
CRS provides Cluster Time
Synchronization services
• Always runs but in observer
mode if ntpd configured
How it works
Grid Infrastructure
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Core Resources
Grid Infrastructure Processes
HA Stack CRS Stack CRS Service
Level 0 Level 1 Level 2 Level 3 Level 4
INIT
ohasd
cssdmonitor
Network sources
SCAN VIP
Node VIP
ACFS Registry
GNS VIP
ASM Instance
Diskgroup
DB Resources
SCAN Listener
Listener
Services
eONS
ONS
GNS
GSD
CRSD
orarootagent
CRSD
oraagent
ASM
mDNSD
GIPCS
EVMD
GPNPD
CRSD
CTSSD
Diskmon
CSSD
OHASD
oraagent
OHASD
oraclerootagent
cssdagent
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle RAC 12c and onwards
Flex Cluster Flex ASM
Full Oracle
Multitenant & In-
Memory Support
Fleet
Provisioning and
Patching (FPP)
10
http://www.slideshare.net/MarkusMichalewicz/oracle-
database-inmemory-meets-oracle-rac
New In-Memory
Format
SALES
Column
Format
Oracle Confidential – Internal/Restricted/Highly Restricted
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Configure during Installation
• Reject non-Oracle I/O
• Stops OS utilities from
overwriting ASM disks
• Protects database files
• Reduce OS resource usage
• Fewer open file descriptors
• Faster node recovery
11
12.2 Automatic Storage Management (ASM)
ASM Filter Driver – Full Integration
• Further configuration and
monitoring is conducted by
using the AFDTOOL utility:
• Provision a disk:
$ afdtool -add /dev/dsk1 disk1
• Remove a disk:
$ afdtool -delete disk1
• List the managed disks:
$ afdtool -getdevlist
Oracle Confidential – Internal/Restricted/Highly Restricted
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 12
Oracle RAC 12.2 Enhancements Worth Noticing
Node Weighting
Idea: If everything is
equal, let the majority of
work survive
Pluggable Database
& Service Isolation
Improved singleton
workload performance
and failure behavior
Service-oriented
Buffer Cache Access
Improved data access
performance & planned
maintenance operation
Fully Integrated
Extended RAC Support
Site-awareness and installer
support for extended RAC
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Node Eviction Basics
13
Behavior pre-12.1.0.2
NodeA
Oracle GI | HUB
Oracle RAC
NodeB
Oracle GI | HUB
Oracle RAC
cons_1 cons_2
• Node eviction follows a
rather predictable pattern
– Example in a 2-node cluster: The node
with the lowest node number survives.
• Customers must not base their
application logic on which node
survives the split brain.
– As this may(!) change in future releases
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Node Weighting
14
Idea: Everything equal, let the majority of work survive
NodeA
Oracle GI | HUB
Oracle RAC
NodeB
Oracle GI | HUB
Oracle RAC
cons_1 cons_2
• Node Weighing is a new feature that considers
the workload run on a node during fencing
• The idea is to let the majority of work survive,
if everything else is equal
– “Majority work” is for example represented
by the number of services.
• Example: In a 2-node cluster, the node hosting
the majority of services (at fencing time) is
meant to survive
• DBAs can overrule and rate a service
as a “critical” based on business needs
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 15
Oracle RAC 12.2 Enhancements Worth Noticing
Node Weighting
Idea: If everything is
equal, let the majority of
work survive
Pluggable Database
& Service Isolation
Improved singleton
workload performance
and failure behavior
Service-oriented
Buffer Cache Access
Improved data access
performance & planned
maintenance operation
Fully Integrated
Extended RAC Support
Site-awareness and installer
support for extended RAC
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Pluggable Database & Service Isolation
16
Prevents “noisy neighbors” from affecting others with unnecessary chatter
NodeA
Oracle GI | HUB
Oracle RAC
NodeB
Oracle GI | HUB
Oracle RAC
cons_1 cons_2
• Using Oracle Multitenant, PDBs can be opened
as singletons (in one database instance only), in
a subset of instances or all in instances at once.
• If certain PDBs are only opened on some
instances, Pluggable Database Isolation
– improves performance by
• Reducing DLM operations for
PDBs not open in all instances.
• Optimizing block operations based
on in-memory block separation.
MSG
Messages (MSG)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Pluggable Database & Service Isolation
17
Prevents instance failures of instances only hosting singleton PDBs to affect others
NodeA
Oracle GI | HUB
Oracle RAC
NodeB
Oracle GI | HUB
Oracle RAC
cons_1 cons_2
• Using Oracle Multitenant, PDBs can be opened
as singletons (in one database instance only), in
a subset of instances or in all instances at once.
• If certain PDBs are only opened on some
instances, Pluggable Database Isolation
– Improves performance by
• Reducing DLM operations for
PDBs not open in all instances.
• Optimizing block operations based
on in-memory block separation.
– Ensures that instance failures of instances
only hosting singleton PDBs will not impact
other instances of the same RAC-based CDB.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 18
Oracle RAC 12.2 Enhancements Worth Noticing
Node Weighting
Idea: If everything is
equal, let the majority of
work survive
Pluggable Database
& Service Isolation
Improved singleton
workload performance
and failure behavior
Service-oriented
Buffer Cache Access
Improved data access
performance & planned
maintenance operation
Fully Integrated
Extended RAC Support
Site-awareness and installer
support for extended RAC
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Service-oriented Buffer Cache Access
19
Improve performance by managing data with the service to which it belongs
NodeA
Oracle GI
Oracle RAC
NodeB
Oracle GI
Oracle RAC
cons_1 cons_2
• Service-oriented Buffer Cache Access over time
determines the data (on database object level)
accessed by the service. This information
– Is persisted in the database.
– Is used to improve data access performance
(e.g. do not manage data of a service in an
instance that does not host the service).
– Can be used to pre-warm an instance cache prior
to a service startup (fresh start or relocation).
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 20
Oracle RAC 12.2 Enhancements Worth Noticing
Node Weighting
Idea: If everything is
equal, let the majority of
work survive
Pluggable Database
& Service Isolation
Improved singleton
workload performance
and failure behavior
Service-oriented
Buffer Cache Access
Improved data access
performance & planned
maintenance operation
Fully Integrated
Extended RAC Support
Site-awareness and installer
support for extended RAC
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cluster Domain
22
For cost reduction through centralization, standardization and optimization
Why Use Oracle RAC for Your Private Database Cloud?
Cluster
Single
Node
Cluster Centralization:
centralize common management tasks
on the Domain Services Cluster.
Domain Services Cluster Standardization:
Use the same building blocks –
commodity hardware clusters – to
scale databases, compute & storage.
Database
Member
Cluster
Application
Member
Cluster
Optimization example:
Version independence – run any
Oracle RAC 12.2+ Member Cluster
using any platform at any time.
Linux
Cluster
AIX
Cluster
Solaris
Cluster
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 23
Centralization – Cluster Domain & Domain Services
Domain Services Cluster
Mgmt
Repository
Service
Trace File
Analyzer
(TFA)
Service
Rapid
Home
Provision
Service
Cluster Domain
A Cluster Domain is a logical management
entity to group various clusters in your DC.
The Mgmt Repository and the TFA service are
mandatory in the Cluster Domain. They represent
centralized versions of their local counterparts.
To provide centralized services in the Cluster
Domain, you need to deploy a Domain Services
Cluster. It will host the central services.
Additional services can
be added as needed.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 24
Standardization – Member Clusters
Cluster Domain
Database
Member Cluster
uses
local ASM
Application
Member Cluster
GI only
A (Database) Member Cluster is a cluster that registers with the Mgmt Repository
Service and uses the centralized TFA service. It can use additional services as needed.
Domain Services Cluster
Mgmt
Repository
Service
Trace File
Analyzer
(TFA)
Service
Rapid
Home
Provision
Service
An Application Member Cluster (available
since 12.1.0.2) is a cluster designed to host
applications. It uses a lightweight GI stack.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 25
Standardization – Storage Consolidation
Domain Services Cluster
Mgmt
Repository
Service
Trace File
Analyzer
(TFA)
Service
Rapid
Home
Provision
Service
Database
Member Cluster
uses the
ASM Services
Shared ASM
Cluster Domain
Storage Services
ASM
Service
IO
Service
ACFS
Services
Database
Member Cluster
uses the IO &
ASM Services
Storage flexibility: Member Clusters do not need
direct connectivity to shared disks. Using the
shared ASM Service, they can use network
connectivity to the IOservice to access a centrally
managed pool of storage.
To further standardize and centralize, various
Storage Services are offered in the domain.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 26
Fleet Patching & Provisioning Support
Database & Grid Infrastructure
11.2.0.3.
11.2.0.4.
12.1
12.2
18
VM VM
VM VM
VM VM
VM VM
• Single Instance
• Oracle Restart
• Oracle RAC One
• Oracle RAC
BM
Non-CDB CDB/PDB
VM
• Generic Software
• Data Guard Aware
• Customizable
Multi-OS
19
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Hides errors, timeouts, and
maintenance
No application knowledge or
changes to use
Rebuilds session state & in-flight
transactions
Adapts as applications change:
protected for the future
Standardize on Transparent Application Continuity
27
Request
Errors/Timeouts hidden
TAC
Applications see no errors during outages
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle RAC Performance Features
• Automatic Undo Management
• Cache Fusion
• Oracle Real Application Clusters
• Session Affinity
• PDB & Services Isolation
• Service-Oriented Buffer Cache
• Leaf Block Split Optimizations
• Self Tuning LMS
• Multithreaded Cache Fusion
• ExaFusion Direct-to-Wire Protocol
• Smart Fusion Block Transfer
• Universal Connection Pool (UCP) Support for Oracle RAC
• Support for Distributed Transactions (XA) in Oracle RAC
• Parallel Execution Optimizations for Oracle RAC
• Affinity Locking and Read-Mostly Objects
• Reader Bypass
• Flash Cache
• Connection Load Balancing
• Load Balancing Advisory
• Cluster Managed Services
• Automatic Storage Management
9i 10g
11g
12c
18c
• Zero Downtime Patching
Clusterware
• Fleet Provisioning and Patching
• Automated Transaction
Draining
• Support TLS Ciphers for
Clusterware
• Automated PDB Relocation
Over two decades of innovation
19c
• Scalable Sequences
• Undo RDMA-Read
• Commit Cache
• Database Reliability Framework
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
RAC Enhancements
• Remastering Slaves (1 slave per LMS)
– Starting with Oracle RAC 12.1, the LMS offloads heavy remastering work to the slave
– This improves LMS’s responsiveness for Cache Fusion requests during remastering
• Support for 100 LMS’s – change in default value
– Oracle RAC 12.2 supports up to 100 LMS’s (names: LMS0-LM99) as opposed to 35
– On larger systems (lots of CPU, large SGA), more LMS’s will start by default
– More LMS’s means better reconfiguration time without any impact during runtime
• More Dynamic Remastering (DRM)
– Starting with Oracle RAC 19c, DRM is planned to more adaptively consider the overall system state
29
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Architecture and Basics
Troubleshooting Scenarios
Proactive and Reactive tools
18/19c and beyond
Q&A
1
2
3
4
5
30
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 31
Cluster Startup
Troubleshooting Scenarios
Oracle Support
TFA
Check core CRS
resources
running
ps –ef|grep init.ohasd
ps –ef|grep ohasd.bin
Not Running
Review status of
CRS services &
stack
crsctl check crs
crsctl check cluster
Running
Compare OLR
permissions to
reference
system & fix
differences
Not Running
Running
tfactl diagcollect
Review & fix
issues in logs
ohasd.log
Agent logs
process logs
Review & fix
CRS startup
config & log
crsctl config crs
ohasd.log
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 32
Node Eviction Problem Triage
Troubleshooting Scenarios
Oracle Support
TFA
tfactl diagcollect
Check for &
fix
resource
starvation
System log
Troubleshooting guides:
1531223.1 (OSWatcher)
1328466.1 (CHM)
Check for &
fix
network
heartbeat
problems
ocssd.log
Troubleshooting guides:
1050693.1
1534949.1
1546004.1
Check for &
fix
voting disk
problems
Troubleshooting guides:
1549428.1
1466639.1
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Reconfiguration Performance Improvements
11.2.0.4
11204
4 x
1.5 x
12.2 18.1
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Timings with different #LMS:
– Total reconfiguration time for an
instance leave & re-join
– 100GB cache
– 2 node RAC
34
Reconfiguration Performance as of 18c
Buffer Cache Size Reconfiguration Time
25GB 3.0 sec
50GB 4.9 sec
100GB 8.3 sec
• Timings with different cache sizes:
– Total reconfiguration time for an
instance leave & re-join
– 8 LMS’s
– 2 node RAC
# LMS Reconfiguration Time
8 LMS’s 8.3 sec
16 LMS’s 5.0 sec
32 LMS’s 3.6 sec
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Reconfiguration Diagnosability
**************** BEGIN DLM RCFG HA STATS ****************
Total dlm rcfg time (inc 6): 3.586 secs (394926177, 394929763)
Begin step .........: 0.005 secs (394926177, 394926182)
Freeze step ........: 0.019 secs (394926182, 394926201)
Sync 1 step ........: 0.002 secs (394926264, 394926266)
Sync 2 step ........: 0.024 secs (394926266, 394926290)
Enqueue cleanup step: 0.002 secs (394926290, 394926292)
Sync pcm1 step .....: 0.004 secs (394926293, 394926297)
……
….
Enqueue dubious step: 0.004 secs (394926432, 394926436)
Sync 5 step ........: 0.000 secs (394926436, 394926436)
Enqueue grant step .: 0.001 secs (394926436, 394926437)
Sync 6 step ........: 0.012 secs (394926437, 394926449)
Fixwrt replay step .: 0.885 secs (394928837, 394929722)
Sync 8 step ........: 0.040 secs (394929722, 394929762)
End step ...........: 0.001 secs (394929762, 394929763)
Number of replayed enqueues sent / received .......: 2246 / 893
Number of replayed fusion locks sent / received ...: 124027 / 0
Number of enqueues mastered before / after rcfg ...: 2058 / 1384
**************** END DLM RCFG HA STATS *****************
Detailed timing
breakdown available
in LMON trace file
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
DRM Diagnosability
Dynamic Remastering Statistics DB/Inst: SALES/sales1 Snaps: 393-452
-> Affinity objects - Affinity objects mastered at the begin/end snapshot
-> Read-mostly objects - Read-mostly objects mastered at the begin/end snapshot
per Begin End
Name Total Remaster Op Snap Snap
-------------------------------- ------------ ------------- -------- --------
remaster ops 24 1.00
remastered objects 24 1.00
remaster time (s) 7.4 0.31
freeze time (s) 1.5 0.06
cleanup time (s) 2.4 0.10
replay time (s) 0.3 0.01
fixwrite time (s) 2.4 0.10
sync time (s) 0.1 0.01
affinity objects N/A 3 27
read-mostly objects N/A 0 0
read-mostly objects (persistent) N/A 0 0
Detailed timing
breakdown available
in AWR Report
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Architecture and Basics
Troubleshooting Scenarios
Proactive and Reactive tools
19c and beyond
Q&A
1
2
3
4
5
37
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle’s Database and Clusterware Tools
• What if issues were detected before they
had an impact?
• What if you were notified with a specific
diagnosis and corrective actions?
• What if resource bottlenecks threatening
SLAs were identified early?
• What if bottlenecks could be
automatically relieved just in time?
• What if database hangs and node
reboots could be eliminated?
Confidential – Oracle Restricted 38
Cluster
Verification
Utility
ORAchk /
EXAchk
Cluster
Health
Monitor
Cluster
Health
Advisor
Trace File
Analyzer
Hang
Manager
Memory
Guard
Quality of
Service
Management
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Automatic proactive warning
of problems before they
impact you
39
Get scheduled health reports
sent to you in email
Why Oracle ORAchk & EXAchk
Health checks for most impactful
reoccurring problems
Runs in your environment
with no need to send
anything to Oracle
Findings can be integrated
into other tools of choiceEngineered
Systems
Non
Engineered
Systems
EXAchk
Common Framework
ORAchk
Further slide details
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Engineered Systems
Oracle Exadata Database Machine
Oracle SuperCluster
Oracle Private Cloud Appliance
Oracle Database Appliance
Oracle Big Data Appliance
Oracle Exalogic Elastic Cloud
Oracle Exalytics In-Memory
Machine
Oracle Zero Data Loss Recovery
Appliance
Oracle ZFS Storage Appliance
Systems
Oracle Solaris
Cross stack checks
Solaris Cluster
OVN
ASR
41
Oracle Stack Coverage
Oracle Database
Standalone Database
Grid Infrastructure & RAC
Maximum Availability Architecture
(MAA) Scorecard
Upgrade Readiness Validation
Golden Gate
Enterprise Manager Cloud
Control
Repository
Agent
OMS
Middleware
Application Continuity
Oracle Identify and Access
Management Suite (Oracle IAM)
E-Business Suite
Oracle Payables
Oracle Workflow
Oracle Purchasing
Oracle Order Management
Oracle Process Manufacturing
Oracle Receivables
Oracle Fixed Assets
Oracle HCM
Oracle CRM
Oracle Project Billing
Siebel
Database best practices
PeopleSoft
Database best practices
SAP
EXAdata best practices
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Profiles provide logical grouping of
checks which are about similar topics
• Run only checks in a specific profile
• Run everything except checks in a specific
profile
Profiles
./exachk –profile <profile>
./exachk –excludeprofile <profile>
Profile Description
asm ASM Checks
avdf Audit Vault Configuration checks
clusterware Oracle clusterware checks
control_VM Checks only for Control VM(ec1-vm, ovmm, db, pc1, pc2).
No cross node checks
corroborate Exadata checks needs further review by user to determine
pass or fail
dba DBA Checks
ebs Oracle E-Business Suite checks
eci_healthchecks Enterprise Cloud Infrastructure Healthchecks
ecs_healthchecks Enterprise Cloud System Healthchecks
goldengate Oracle GoldenGate checks
hardware Hardware specific checks for Oracle Engineered systems
maa Maximum Availability Architecture Checks
ovn Oracle Virtual Networking
platinum Platinum certification checks
preinstall Pre-installation checks
prepatch Checks to execute before patching
security Security checks
solaris_cluster Solaris Cluster Checks
storage Oracle Storage Server Checks
switch Infiniband switch checks
sysadmin Sysadmin checks
user_defined_checks Run user defined checks from user_defined_checks.xml
44
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Profiles provide logical grouping of
checks which are about similar topics
• Run only checks in a specific profile
• Run everything except checks in a specific
profile
Profiles
./orachk –profile <profile>
./orachk –excludeprofile <profile>
Profile Description
asm ASM Checks
bi_middleware Oracle Business Intelligence checks
clusterware Oracle clusterware checks
dba DBA Checks
ebs Oracle E-Business Suite checks
emagent Cloud control agent checks
emoms Cloud Control management server
em Cloud control checks
goldengate Oracle GoldenGate checks
hardware Hardware specific checks for Oracle Engineered systems
oam Oracle Access Manager checks
oim Oracle Identify Manager checks
oud Oracle Unified Directory server checks
ovn Oracle Virtual Networking
peoplesoft Peoplesoft best practices
preinstall Pre-installation checks
prepatch Checks to execute before patching
security Security checks
siebel Siebel Checks
solaris_cluster Solaris Cluster Checks
storage Oracle Storage Server Checks
switch Infiniband switch checks
sysadmin Sysadmin checks
user_defined_checks Run user defined checks from user_defined_checks.xml
45
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Integration
•Check results integrated into EM
compliance framework via plugin
•View results in native EM
compliance dashboards
•Related checks grouped into
compliance standards
•View targets checked, violations &
average score
•Drill down into compliance standard
to see individual check results
•View break down by target
46
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
JSON Output to Integrate with Kibana, Elastic Search etc
48
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Health Check Collection Manager Dashboard
49
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Differences between each run
Diff Output
50
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• New checks to help when upgrading the database
to 12.2+
• Both pre and post upgrade verification to prevent
problems related to:
• OS configuration
• Grid Infrastructure & Database patch prerequisites
• Database configuration
• Cluster configuration
Upgrade to Database 12.2 and beyond with confidence
orachk -u –o pre
orachk -u –o post
Pre upgrade
Post upgrade
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 60
Real-time Status Summary
tfactl summary
Choose an
option to drill
down
High-level summary of all
Database components
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 61
Real-time Status Summary – Drill Down
Drill downs show real-time
analytics & details of any
problems found
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Perform Analysis Using the Included Tools
Not all tools are included in Grid or Database install.
Download from 1513912.1 to get full collection of tools
Tool Description
orachk or
exachk
Provides health checks for the Oracle stack.
Oracle Trace File Analyzer will install either
• Oracle EXAchk for Engineered Systems, see document 1070954.1 for
more details
or
• Oracle ORAchk for all non-Engineered Systems, see document
1268927.2 for more details
oswatcher Collects and archives OS metrics. These are useful for instance or node
evictions & performance Issues. See document 301137.1 for more details
procwatcher Automates & captures database performance diagnostics and session level
hang information. See document 459694.1 for more details
oratop Provides near real-time database monitoring. See document 1500864.1
for more details.
alertsummary Provides summary of events for one or more database or ASM alert files
from all nodes
ls Lists all files TFA knows about for a given file name pattern across all nodes
pstack Generate process stack for specified processes across all nodes
Tool Description
grep Search alert or trace files with a given database and file name pattern, for
a search string.
summary Provides high level summary of the configuration
vi Opens alert or trace files for viewing a given database and file name
pattern in the vi editor
tail Runs a tail on an alert or trace files for a given database and file name
pattern
param Shows all database and OS parameters that match a specified pattern
dbglevel Sets and unsets multiple CRS trace levels with one command
history Shows the shell history for the tfactl shell
changes Reports changes in the system setup over a given time period. This
includes database parameters, OS parameters and patches applied
calog Reports major events from the Cluster Event log
events Reports warnings and errors seen in the logs
managelogs Shows disk space usage and purges ADR log and trace files
ps Finds processes
triage Summarize oswatcher/exawatcher data
62
Verify which tools you have installed: tfactl toolstatus
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 89
Generates Diagnostic Metrics View of Cluster and Databases
Cluster Health Monitor (CHM)
GIMR
ologgerd
(master)
osysmond
osysmond
osysmond
osysmond
12c Grid Infrastructure
Management Repository
• Always on - Enabled by default
• Provides Detailed OS Resource Metrics
• Assists Node eviction analysis
• Locally logs all process data
• User can define pinned processes
• Listens to CSS and GIPC events
• Categorizes processes by type
• Supports plug-in collectors (ex.
traceroute, netstat, ping, etc.)
• New CSV output for ease of analysis
OS Data OS Data
OS Data
OS Data
Confidential – Oracle Internal/Restricted/Highly RestrictedConfidential – Oracle Restricted
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 90
Oclumon CLI or Full Integration with EM Cloud Control
Cluster Health Monitor (CHM)
Confidential – Oracle Internal/Restricted/Highly RestrictedConfidential – Oracle Restricted
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cluster Health Advisor (CHA)*
Discovers Potential Cluster & DB Problems - Notifies with Corrective Actions
91
OS Data
GIMR
ochad
• Always on - Enabled by default
• Detects node and database
performance problems
• Provides early-warning alerts and
corrective action
• Supports on-site calibration to improve
sensitivity
• Integrated into EMCC Incident Manager
and notifications
• Standalone Interactive GUI Tool
DB Data
CHM
Node
Health
Prognostics
Engine
Database
Health
Prognostics
Engine
* Requires and Included with RAC or R1N License
Confidential – Oracle Restricted
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Calibrating CHA to your RAC deployment
Confidential – Oracle Restricted 92
Choosing a Data Set for Calibration – Defining “normal”
$ chactl query calibration –cluster –timeranges ‘start=2016-10-28 07:00:00,end=2016-10-28 13:00:00’
Cluster name : mycluster
Start time : 2016-10-28 07:00:00
End time : 2016-10-28 13:00:00
Total Samples : 11524
Percentage of filtered data : 100%
1) Disk read (ASM) (Mbyte/sec)
MEAN MEDIAN STDDEV MIN MAX
0.11 0.00 2.62 0.00 114.66
<25 <50 <75 <100 >=100
99.87% 0.08% 0.00% 0.02% 0.03%
2) Disk write (ASM) (Mbyte/sec)
MEAN MEDIAN STDDEV MIN MAX
0.01 0.00 0.15 0.00 6.77
<50 <100 <150 <200 >=200
100.00% 0.00% 0.00% 0.00% 0.00%
3) Disk throughput (ASM) (IO/sec)
MEAN MEDIAN STDDEV MIN MAX
2.20 0.00 31.17 0.00 1100.00
<5000 <10000 <15000 <20000 >=20000
100.00% 0.00% 0.00% 0.00% 0.00%
4) CPU utilization (total) (%)
MEAN MEDIAN STDDEV MIN MAX
9.62 9.30 7.95 1.80 77.90
<20 <40 <60 <80 >=80
92.67% 6.17% 1.11% 0.05% 0.00%
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Calibrating CHA to your RAC deployment
• Create and store the new model
$ chactl query calibrate cluster –model daytime –timeranges ‘start=2018-10-28 07:00:00,
end=2018-10-28 13:00:00’
• Begin using the new model
$ chactl monitor cluster –model daytime
• Confirm the new model is being used
$ chactl status –verbose
monitoring nodes svr01, svr02 using model daytime
monitoring database qoltpacdb, instances oltpacdb_1, oltpacdb_2 using model DEFAULT_DB
Confidential – Oracle Restricted 93
Creating a new CHA Model with CHACTL
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cluster Health Advisor – Command Line Operations
Confidential – Oracle Restricted 94
Monitoring Your Databases and Nodes with CHACTL
Enable CHA monitoring on RAC database with optional model
$ chactl monitor database –db oltpacdb [-model model_name]
Enable CHA monitoring on RAC database with optional verbose
$ chactl status –verbose
monitoring nodes svr01, svr02 using model DEFAULT_CLUSTER
monitoring database oltpacdb, instances oltpacdb_1, oltpacdb_2 using model DEFAULT_DB
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
CHA Command Line Operations
Confidential – Oracle Restricted 95
Checking for Health Issues and Corrective Actions with CHACTL QUERY DIAGNOSIS
$ chactl query diagnosis -db oltpacdb -start "2016-10-28 01:52:50" -end "2016-10-28 03:19:15"
2016-10-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected]
2016-10-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected]
2016-10-28 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected]
2016-10-28 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected]
Problem: DB Control File IO Performance
Description: CHA has detected that reads or writes to the control files are slower than expected.
Cause: The Cluster Health Advisor (CHA) detected that reads or writes to the control files were
slow because of an increase in disk IO.
The slow control file reads and writes may have an impact on checkpoint and Log Writer (LGWR) performance.
Action: Separate the control files from other database files and move them to faster disks or Solid
State Devices.
Problem: DB Log File Switch
Description: CHA detected that database sessions are waiting longer than expected
for log switch completions.
Cause: The Cluster Health Advisor (CHA) detected high contention during log switches
because the redo log files were small and the redo logs switched frequently.
Action: Increase the size of the redo logs.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cluster Health Advisor – Command Line Operations
Confidential – Oracle Restricted 96
HTML Diagnostic Health Output Available (-html <file_name>)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 97
Oracle 12c Hang Manager
• Always on - Enabled by default
• Reliably detects database hangs and
deadlocks
• Autonomously resolves them
• Supports QoS Performance Classes, Ranks
and Policies to maintain SLAs
• Logs all detections and resolutions
• New SQL interface to configure sensitivity
(Normal/High) and trace file sizes
Autonomously Preserves Database Availability and Performance Session
DIA0
EVALUATE
DETECT
ANALYZE
Hung?
VERIFY
Victim
QoS
Policy
Confidential – Oracle Restricted
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 98
Full Resolution Dump Trace File and DB Alert Log Audit Reports
Oracle 12c Hang Manager
Dump file …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc
Oracle Database 12c Enterprise Edition Release 18/19c.0.0.0 - 64bit Beta
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics
and Real Application Testing options
Build label: RDBMS_MAIN_LINUX.X64_151013
ORACLE_HOME: …/3775268204/oracle
System name: Linux
Node name: slc05kyr
Release: 2.6.39-400.211.1.el6uek.x86_64
Version: #1 SMP Fri Nov 15 13:39:16 PST 2013
Machine: x86_64
VM name: Xen Version: 3.4 (PVM)
Instance name: hm62
Redo thread mounted by this instance: 2
Oracle process number: 19
Unix process pid: 12656, image: oracle@slc05kyr (DIA0)
*** 2015-10-13T16:47:59.541509+17:00
*** SESSION ID:(96.41299) 2015-10-13T16:47:59.541519+17:00
*** CLIENT ID:() 2015-10-13T16:47:59.541529+17:00
*** SERVICE NAME:(SYS$BACKGROUND) 2015-10-13T16:47:59.541538+17:00
*** MODULE NAME:() 2015-10-13T16:47:59.541547+17:00
*** ACTION NAME:() 2015-10-13T16:47:59.541556+17:00
*** CLIENT DRIVER:() 2015-10-13T16:47:59.541565+17:00
2015-10-13T16:47:59.435039+17:00
Errors in file /oracle/log/diag/rdbms/hm6/hm6/trace/hm6_dia0_12433.trc (incident=7353):
ORA-32701: Possible hangs up to hang ID=1 detected
Incident details in: …/diag/rdbms/hm6/hm6/incident/incdir_7353/hm6_dia0_12433_i7353.trc
2015-10-13T16:47:59.506775+17:00
DIA0 requesting termination of session sid:40 with serial # 43179 (ospid:13031) on instance 2
due to a GLOBAL, HIGH confidence hang with ID=1.
Hang Resolution Reason: Automatic hang resolution was performed to free a
significant number of affected sessions.
DIA0: Examine the alert log on instance 2 for session termination status of hang with ID=1.
In the alert log on the instance local to the session (instance 2 in this case),
we see the following:
2015-10-13T16:47:59.538673+17:00
Errors in file …/diag/rdbms/hm6/hm62/trace/hm62_dia0_12656.trc (incident=5753):
ORA-32701: Possible hangs up to hang ID=1 detected
Incident details in: …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc
2015-10-13T16:48:04.222661+17:00
DIA0 terminating blocker (ospid: 13031 sid: 40 ser#: 43179) of hang with ID = 1
requested by master DIA0 process on instance 1
Hang Resolution Reason: Automatic hang resolution was performed to free a
significant number of affected sessions.
by terminating session sid:40 with serial # 43179 (ospid:13031)
Confidential – Oracle Restricted
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Architecture and Basics
Troubleshooting Scenarios
Proactive and Reactive tools
19c and beyond
Q&A
1
2
3
4
5
99
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle RAC 18c
• Manages hung database processes
– Detect & resolves
– Cross-layer hangs
• I.E: Hangs caused by a blocked ASM resource.
• Resolves deadlocks
• User defined control via PL/SQL
• Early Warning exposed via (V$ view)
100
Hang Manager
Database Member
Cluster
Uses ASM IO
Service
IO Service
ASM
Service
Session
Detect
Analyze
Evaluate
Hung?
Hang
Resolution
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Diagnostic Service
Oracle Confidential – Highly Restricted 101
All data
aggregated in
one place
Real-time
overview of
infrastructure &
services
Fine-grained drill
down for
diagnosis
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Initial Anomalous Events Ranked Anomalous Events
Oracle Confidential – Highly Restricted 102
Timestamp Correlation & Ranking
Full initial list of anomalous events
1. Sort the anomalous events in chronological order
2. keep tack of unique events and their first occurrence
3. Compare sequence of events to previous timeframes in the same collection
4. Prioritize unique events not seen previously in the collection
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle 19c
• Applied Machine Learning for
Database Diagnostics
– Efficient diagnosis using Machine
Learning
– Automatically performs
corrective actions to prevent
possible issues
– Provides simple alerts &
recommendations for issues that
require manual intervention
Confidential – Oracle Restricted 103
Oracle Domain Services Cluster
IO Service
ACFS
Services
ASM
Service
TFA
Service
Management
Service
RHP
Service
Shared ASM
Subject Matter
ExpertLog
ASHMetrics
ML
Knowledge
Extraction
Model
Generation
Human
Supervision
Application
Optimized
Models
Feedback
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Monitors for problems before
service disruption
– E.g HB for critical processes
• Detects the cause of problem
• Use collected data across all nodes
to identify root cause
– E.g. Waits on GRD
• Resolves the problem with minimal
disruption
– E.g Resize internal Structures
Introducing Database Reliability Framework
Resources
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Monitor
Detect
Review
Resolve
• Increase in number of resources in
the Global Resource Directory
(GRD)
• Resulting in higher wait times for
GRD
• Several solutions possible
– Is wait time due to high CPU load?
– Increase in number of LMS help?
– Increasing CR slaves help
– Increasing internal thresholds help?
Database Reliability Framework in Action
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Busy FG process(es) using CPU
• Potential upcoming memory
starvation
• LGWR constrained by CPU
• Too many RT processes
• Insufficient CR slaves
• DLM resource cache incorrectly
sized
• Control file IO (CFIO) stall
• v$ views
• v$gcr_metrics - details on all defined
metrics
• v$gcr_actions - details on all defined
actions
• v$gcr_log – metric/action history
summary log
• v$gcr_status – details on latest
metric/action status
107
Examples and DRF Views
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Increase the maximum number of
LMSs
– Based on System utilization (DRF)
• Each LMS will spawn a dedicated
CR slave
– Threshold of Rollback Changes
– Threaded CR slave in 18c
• Optimized for Multi core/thread architecture
• Remastering Slaves (RMV0..)
– Offloads heavy remastering work to
slaves
Cache Fusion Optimizations
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Commit Cache
• Reduce Cache Fusion traffic for
remote undo header lookups
• Often becomes a bottleneck with
DML heavy OLTP/mixed workloads
• Remote undo header lookups are
needed for:
– Check if a transaction has committed
– Delayed block cleanout
109
0
400
800
1,200
1,600
2,000
Data
Blocks
Undo
Headers
Undo
Blocks
Others
#BlockTransfers(thousands)
CR (Immediate) CR (Busy)
Current (Immediate) Current (Busy)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Undo Block RDMA-read
• In some workloads, more than half of remote reads are
for Undo Blocks to satisfy read consistency
– Undo Block RDMA-read uses RDMA to directly
and rapidly access UNDO blocks in remote
instances
• Commit Cache
– The Commit Cache maintains an in-memory
table on each instance which records the
commit time of transactions
– Remote LMS directly reads the commit cache
and sends back commit times for requested
transactions.
• Replaces having to send entire 8K transaction
table block
110
RAC Optimizations for Exadata
UNDOUNDO
RDMA RDMA
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 111
Fusion Block Transfer
1 2 3 4
Shadow Process LGWR
gc current block busy,
gc buffer busy acquire,
gc buffer busy release
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• On Exadata, Oracle does not wait for the
log write notification
– Exadata ensures the log write completes before
changes to block on another instance commit,
guaranteeing durability
– Wait for Log I/O during transfer of hot blocks is
eliminated
– Up to 40% throughput and 33% response time
improvement in some heavily contended OLTP
workloads
• Storage software will ensure
correct ordering of writes
112
Smart Fusion Block Transfer
1. Issue log write
2. Wait for log
write completion
3. Transfer
block
Exadata Avoids I/O Wait confirmation
Storage
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Continuous Feature Improvements
Lock Domain per PDB Utilize Bloom Filter to further
reduce Reconfiguration times
Utilize Database Reliability
Framework
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Scalable Sequences
Continuous Application Availability
Oracle RAC Sharding
Cluster Domains
Cluster Health Advisor (CHA)
RAC Reader Nodes
Application Continuity (AC)
Oracle Flex ASM & Flex Clusters
Rapid Home Provisioning (RHP)
Cluster Health Monitor (CHM)
Oracle Quality of Service Management (QoS)
Policy-Based Cluster Management
Oracle RAC One Node & RACcheck
Oracle ASM Cluster File System (ACFS)
Oracle Grid Infrastructure (GI)
UCP and OCI Load Balancing Support for RAC
Cluster Verification Utility (CVU)
Cluster-Managed Services
Oracle Clusterware
Oracle Automatic Storage Management (ASM)
Oracle Real Application Clusters (RAC) Oracle 9i
Oracle 10g
Oracle RAC’s Journey into the Autonomous Database
Oracle 11g
Oracle
12c
20-years of continuous innovation*
Oracle 18c/19c
* Documented features list is selective; 20 years include development time
114
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Flex Cluster
Leaf nodes deprecated
Massive Parallel Query Oracle RAC
deprecated
Oracle RAC Reader Nodes
to be implemented on Hub nodes
Flex Cluster – Changes Down the Road
115
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
gridSetup and zip-based install
for Oracle Grid Infrastructure
NEW: RPM-based installs for the
Oracle Database and Oracle Client
ASM Management for
NFS-based Clusterware files
for easier management and thereby
better availability.
Separate Diskgroup for Grid
Infrastructure Management
Repository (GIMR)
allows for more flexibility during Grid
Infrastructure Installation
Better Management
$ORACLE_HOME/gridSetup.sh
Configure ASM on NFS
116
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
More Changes..
• Desupport of Direct File System Placement for Oracle Clusterware Files
– Introduced with Oracle Clusterware 12c Rel. 2 (12.2.0.1)
– Effective with Oracle Clusterware 18c
– Desupport revoked effective with Oracle Clusterware 19c
• Oracle Grid Infrastructure Management Repository (GIMR)
– Around since Oracle Grid Infrastructure 11g Release 2
– Automatic Installation of the GIMR introduced with Grid Infrastructure 12.1.0.2
– Separate diskgroup installation introduced with Grid Infrastructure 12c Release 2
– Automatic install revised for Oracle Grid Infrastructure 19c
• Plans foresee a GIMR installation outside of the Oracle Grid Infrastructure home for Standard Clusters
• Centralized GIMR hosting on a Domain Services Cluster (for Member Clusters) remains unchanged
117
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Patching Improvements
• OJVM is Oracle RAC rolling patch enabled with Oracle RAC 18c (18.4)
– Non-Java services are available at all times
– Java services are available all the time, except for a ~10 seconds brownout
• No errors are reported during the brownout
• Zero-Downtime Oracle Grid Infrastructure Patching (*18.3)
– Patch Oracle Grid Infrastructure without interrupting database operations
– Patches are applied out-of-place and in a rolling fashion with one node being patched
at a time while the database instance(s) on that node remain up and running
– Supported for Oracle RAC and RAC One Node clusters with two or more nodes
119
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
The Road Ahead Leads into the Autonomous Database Cloud
• Future scalability & performance improvements
– Tailor to scaling well within Exadata dimensions (“scale linear across 64 nodes, not 200”)
– Are designed to meet ADB performance requirements and will grow as ADB enhances
– Will leverage RDMA technology for server-less communication
– Plan to use RoCE as the next-generation network for the cloud
• Details in MOS note “Oracle RAC Interconnect Protocols – Support and Roadmap (ID 2434852.1)”
– Will substitute storage access with network-based access to data on remote nodes
– Are likely to utilize NVM for storage on independent servers
• Future availability improvements
– Will focus on reducing re-configuration times (brownouts) further to come closer to “zero”
– Will provide even more ways to perform maintenance & admin tasks with no downtime
121
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Weitere ähnliche Inhalte

Was ist angesagt?

Getting optimal performance from oracle e business suite(aioug aug2015)
Getting optimal performance from oracle e business suite(aioug aug2015)Getting optimal performance from oracle e business suite(aioug aug2015)
Getting optimal performance from oracle e business suite(aioug aug2015)
pasalapudi123
 

Was ist angesagt? (20)

The Machine Learning behind the Autonomous Database ILOUG Feb 2020
The Machine Learning behind the Autonomous Database   ILOUG Feb 2020 The Machine Learning behind the Autonomous Database   ILOUG Feb 2020
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
 
AIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should know
AIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should knowAIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should know
AIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should know
 
AIOUG-GroundBreakers-Jul 2019 - 19c RAC
AIOUG-GroundBreakers-Jul 2019 - 19c RACAIOUG-GroundBreakers-Jul 2019 - 19c RAC
AIOUG-GroundBreakers-Jul 2019 - 19c RAC
 
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEAIntroduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
 
The Oracle Autonomous Database
The Oracle Autonomous DatabaseThe Oracle Autonomous Database
The Oracle Autonomous Database
 
Developer day v2
Developer day v2Developer day v2
Developer day v2
 
20 tips and tricks with the Autonomous Database
20 tips and tricks with the Autonomous Database20 tips and tricks with the Autonomous Database
20 tips and tricks with the Autonomous Database
 
TFA, ORAchk and EXAchk 20.2 - What's new
TFA, ORAchk and EXAchk 20.2 - What's new TFA, ORAchk and EXAchk 20.2 - What's new
TFA, ORAchk and EXAchk 20.2 - What's new
 
Oracle Trace File Analyzer Overview
Oracle Trace File Analyzer OverviewOracle Trace File Analyzer Overview
Oracle Trace File Analyzer Overview
 
How to use Exachk effectively to manage Exadata environments OGBEmea
How to use Exachk effectively to manage Exadata environments OGBEmeaHow to use Exachk effectively to manage Exadata environments OGBEmea
How to use Exachk effectively to manage Exadata environments OGBEmea
 
Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...
Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...
Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...
 
What's new in oracle trace file analyzer 18.2.0
What's new in oracle trace file analyzer 18.2.0What's new in oracle trace file analyzer 18.2.0
What's new in oracle trace file analyzer 18.2.0
 
AWR, ASH with EM13 at HotSos 2016
AWR, ASH with EM13 at HotSos 2016AWR, ASH with EM13 at HotSos 2016
AWR, ASH with EM13 at HotSos 2016
 
Getting optimal performance from oracle e business suite(aioug aug2015)
Getting optimal performance from oracle e business suite(aioug aug2015)Getting optimal performance from oracle e business suite(aioug aug2015)
Getting optimal performance from oracle e business suite(aioug aug2015)
 
Using Machine Learning to Debug complex Oracle RAC Issues
Using Machine Learning  to Debug complex Oracle RAC IssuesUsing Machine Learning  to Debug complex Oracle RAC Issues
Using Machine Learning to Debug complex Oracle RAC Issues
 
AIOUG - Groundbreakers - Jul 2019 - 19 Troubleshooting Tips and Tricks for Da...
AIOUG - Groundbreakers - Jul 2019 - 19 Troubleshooting Tips and Tricks for Da...AIOUG - Groundbreakers - Jul 2019 - 19 Troubleshooting Tips and Tricks for Da...
AIOUG - Groundbreakers - Jul 2019 - 19 Troubleshooting Tips and Tricks for Da...
 
New availability features in oracle rac 12c release 2 anair ss
New availability features in oracle rac 12c release 2 anair   ssNew availability features in oracle rac 12c release 2 anair   ss
New availability features in oracle rac 12c release 2 anair ss
 
Oracle RAC 12c Rel. 2 for Continuous Availability
Oracle RAC 12c Rel. 2 for Continuous AvailabilityOracle RAC 12c Rel. 2 for Continuous Availability
Oracle RAC 12c Rel. 2 for Continuous Availability
 
Rac 12c rel2_operational_best_practices_sangam_2017
Rac 12c rel2_operational_best_practices_sangam_2017Rac 12c rel2_operational_best_practices_sangam_2017
Rac 12c rel2_operational_best_practices_sangam_2017
 
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
 

Ähnlich wie AUSOUG - NZOUG-GroundBreakers-Jun 2019 - 19c RAC

Ähnlich wie AUSOUG - NZOUG-GroundBreakers-Jun 2019 - 19c RAC (20)

Rac 12c rel2_operational_best_practices_sangam_2017_as_pdf
Rac 12c rel2_operational_best_practices_sangam_2017_as_pdfRac 12c rel2_operational_best_practices_sangam_2017_as_pdf
Rac 12c rel2_operational_best_practices_sangam_2017_as_pdf
 
Oracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion EditionOracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion Edition
 
The Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - PresentationThe Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - Presentation
 
Consolidate and prepare for cloud efficiencies
Consolidate and prepare for cloud efficienciesConsolidate and prepare for cloud efficiencies
Consolidate and prepare for cloud efficiencies
 
Oracle Storage a ochrana dat
Oracle Storage a ochrana datOracle Storage a ochrana dat
Oracle Storage a ochrana dat
 
Oracle RAC and Your Way to the Cloud by Angelo Pruscino
Oracle RAC and Your Way to the Cloud by Angelo PruscinoOracle RAC and Your Way to the Cloud by Angelo Pruscino
Oracle RAC and Your Way to the Cloud by Angelo Pruscino
 
Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016
 
ZFS appliance
ZFS applianceZFS appliance
ZFS appliance
 
Oracle RAC 12c Rel. 2 & Cluster Architecture Internals OOW17 by Anil Nair
Oracle RAC 12c Rel. 2 & Cluster Architecture Internals OOW17 by Anil NairOracle RAC 12c Rel. 2 & Cluster Architecture Internals OOW17 by Anil Nair
Oracle RAC 12c Rel. 2 & Cluster Architecture Internals OOW17 by Anil Nair
 
Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster
 
Streaming solutions for real time problems
Streaming solutions for real time problems Streaming solutions for real time problems
Streaming solutions for real time problems
 
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
The Top 5 Reasons to Deploy Your Applications on Oracle RACThe Top 5 Reasons to Deploy Your Applications on Oracle RAC
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
 
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
 
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
 
Oracle RAC 12c Overview
Oracle RAC 12c OverviewOracle RAC 12c Overview
Oracle RAC 12c Overview
 
Oracle Database Appliance Workshop
Oracle Database Appliance WorkshopOracle Database Appliance Workshop
Oracle Database Appliance Workshop
 
Exadata 12c New Features RMOUG
Exadata 12c New Features RMOUGExadata 12c New Features RMOUG
Exadata 12c New Features RMOUG
 
Meetup Oracle Database MAD_BCN: 1.1 Servicios de Oracle Database en la nube
 Meetup Oracle Database MAD_BCN: 1.1 Servicios de Oracle Database en la nube Meetup Oracle Database MAD_BCN: 1.1 Servicios de Oracle Database en la nube
Meetup Oracle Database MAD_BCN: 1.1 Servicios de Oracle Database en la nube
 
Přehled portfolia ODA a praktických případů v regionu EMEA
Přehled portfolia ODA a praktických případů v regionu EMEAPřehled portfolia ODA a praktických případů v regionu EMEA
Přehled portfolia ODA a praktických případů v regionu EMEA
 
Oracle Database Appliance (ODA) X6-2 Portfolio Overview
Oracle Database Appliance (ODA) X6-2 Portfolio OverviewOracle Database Appliance (ODA) X6-2 Portfolio Overview
Oracle Database Appliance (ODA) X6-2 Portfolio Overview
 

Mehr von Sandesh Rao

Mehr von Sandesh Rao (20)

Whats new in Autonomous Database in 2022
Whats new in Autonomous Database in 2022Whats new in Autonomous Database in 2022
Whats new in Autonomous Database in 2022
 
Oracle Database performance tuning using oratop
Oracle Database performance tuning using oratopOracle Database performance tuning using oratop
Oracle Database performance tuning using oratop
 
Analysis of Database Issues using AHF and Machine Learning v2 - AOUG2022
Analysis of Database Issues using AHF and Machine Learning v2 -  AOUG2022Analysis of Database Issues using AHF and Machine Learning v2 -  AOUG2022
Analysis of Database Issues using AHF and Machine Learning v2 - AOUG2022
 
Analysis of Database Issues using AHF and Machine Learning v2 - SOUG
Analysis of Database Issues using AHF and Machine Learning v2 -  SOUGAnalysis of Database Issues using AHF and Machine Learning v2 -  SOUG
Analysis of Database Issues using AHF and Machine Learning v2 - SOUG
 
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
 
15 Troubleshooting tips and Tricks for Database 21c - KSAOUG
15 Troubleshooting tips and Tricks for Database 21c - KSAOUG15 Troubleshooting tips and Tricks for Database 21c - KSAOUG
15 Troubleshooting tips and Tricks for Database 21c - KSAOUG
 
Top 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseTop 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous Database
 
How to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata EnvironmentsHow to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata Environments
 
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
 
TFA Collector - what can one do with it
TFA Collector - what can one do with it TFA Collector - what can one do with it
TFA Collector - what can one do with it
 
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaIntroduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
 
Troubleshooting tips and tricks for Oracle Database Oct 2020
Troubleshooting tips and tricks for Oracle Database Oct 2020Troubleshooting tips and tricks for Oracle Database Oct 2020
Troubleshooting tips and tricks for Oracle Database Oct 2020
 
Machine Learning in Autonomous Data Warehouse
 Machine Learning in Autonomous Data Warehouse Machine Learning in Autonomous Data Warehouse
Machine Learning in Autonomous Data Warehouse
 
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
 
Introduction to Machine Learning and Data Science using Autonomous Database ...
Introduction to Machine Learning and Data Science using Autonomous Database  ...Introduction to Machine Learning and Data Science using Autonomous Database  ...
Introduction to Machine Learning and Data Science using Autonomous Database ...
 
Troubleshooting Tips and Tricks for Database 19c ILOUG Feb 2020
Troubleshooting Tips and Tricks for Database 19c   ILOUG Feb 2020Troubleshooting Tips and Tricks for Database 19c   ILOUG Feb 2020
Troubleshooting Tips and Tricks for Database 19c ILOUG Feb 2020
 
Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...
 
Troubleshooting Tips and Tricks for Database 19c - Sangam 2019
Troubleshooting Tips and Tricks for Database 19c - Sangam 2019Troubleshooting Tips and Tricks for Database 19c - Sangam 2019
Troubleshooting Tips and Tricks for Database 19c - Sangam 2019
 
20 Tips and Tricks with the Autonomous Database
20 Tips and Tricks with the Autonomous Database 20 Tips and Tricks with the Autonomous Database
20 Tips and Tricks with the Autonomous Database
 
The Machine Learning behind the Autonomous Database- EMEA Tour Oct 2019
The Machine Learning behind the Autonomous Database- EMEA Tour Oct 2019 The Machine Learning behind the Autonomous Database- EMEA Tour Oct 2019
The Machine Learning behind the Autonomous Database- EMEA Tour Oct 2019
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 

AUSOUG - NZOUG-GroundBreakers-Jun 2019 - 19c RAC

  • 1. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 1 Troubleshooting and Diagnosing 19c RAC Sandesh Rao VP AIOps - Autonomous Database @sandeshr https://www.linkedin.com/in/raosandesh/
  • 2. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 2
  • 3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Program Agenda Architecture and Basics Troubleshooting Scenarios Proactive and Reactive tools 19c and beyond Q&A 1 2 3 4 5 3
  • 4. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Program Agenda Architecture and Basics Troubleshooting Scenarios Proactive and Reactive tools 18/19c and beyond Q&A 1 2 3 4 5 4
  • 5. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Grid Infrastructure • Grid Infrastructure is Combination of : – Oracle Cluster Ready Services (CRS) – Oracle Automatic Storage Management (ASM) • The Grid Home contains the software for both products – Must be installed in different location to RDBMS home – Installer locks the Grid Home path by setting root permissions • CRS can also be Standalone for ASM and/or Oracle Restart • CRS can run by itself or in combination with other vendor clusterware 5 Overview Disk Group A Disk Group B Database Instance Database Instance ASM Instance ASM Instance Database Instance Database Instance ASM Instance ASM Instance Database Instance ASM Instance Host 1 Host 2 Host 3 Cluster ASM Disk Groups
  • 6. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Grid Infrastructure • Shared Oracle Cluster Registry (OCR) and Voting files – Must be in ASM or CFS – OCR backed up every 4 hours automatically GIHOME/cdata – Kept 4,8,12 hours, 1 day, 1 week – Restored with ocrconfig – Voting file backed up into OCR at each change. – Voting file restored with crsctl CRS Requirements
  • 7. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Grid Infrastructure • Requirements – One or more redundant private networks for inter-node communications – High speed with low latency – Separate physical network or managed converged network – VLANS are supported • Usage – Interconnect is a memory backplane for the cluster – Clusterware messaging – RDBMS messaging and block transfer – ASM messaging – HANFS for block traffic CRS Network
  • 8. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 8 CRS stack is spawned from Oracle HA Services Daemon (ohasd) On Unix ohasd runs out of inittab with respawn A node can be evicted when deemed unhealthy • May require reboot • IPMI integration or diskmon in case of Exadata CRS provides Cluster Time Synchronization services • Always runs but in observer mode if ntpd configured How it works Grid Infrastructure
  • 9. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Core Resources Grid Infrastructure Processes HA Stack CRS Stack CRS Service Level 0 Level 1 Level 2 Level 3 Level 4 INIT ohasd cssdmonitor Network sources SCAN VIP Node VIP ACFS Registry GNS VIP ASM Instance Diskgroup DB Resources SCAN Listener Listener Services eONS ONS GNS GSD CRSD orarootagent CRSD oraagent ASM mDNSD GIPCS EVMD GPNPD CRSD CTSSD Diskmon CSSD OHASD oraagent OHASD oraclerootagent cssdagent
  • 10. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle RAC 12c and onwards Flex Cluster Flex ASM Full Oracle Multitenant & In- Memory Support Fleet Provisioning and Patching (FPP) 10 http://www.slideshare.net/MarkusMichalewicz/oracle- database-inmemory-meets-oracle-rac New In-Memory Format SALES Column Format Oracle Confidential – Internal/Restricted/Highly Restricted
  • 11. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Configure during Installation • Reject non-Oracle I/O • Stops OS utilities from overwriting ASM disks • Protects database files • Reduce OS resource usage • Fewer open file descriptors • Faster node recovery 11 12.2 Automatic Storage Management (ASM) ASM Filter Driver – Full Integration • Further configuration and monitoring is conducted by using the AFDTOOL utility: • Provision a disk: $ afdtool -add /dev/dsk1 disk1 • Remove a disk: $ afdtool -delete disk1 • List the managed disks: $ afdtool -getdevlist Oracle Confidential – Internal/Restricted/Highly Restricted
  • 12. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 12 Oracle RAC 12.2 Enhancements Worth Noticing Node Weighting Idea: If everything is equal, let the majority of work survive Pluggable Database & Service Isolation Improved singleton workload performance and failure behavior Service-oriented Buffer Cache Access Improved data access performance & planned maintenance operation Fully Integrated Extended RAC Support Site-awareness and installer support for extended RAC
  • 13. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Node Eviction Basics 13 Behavior pre-12.1.0.2 NodeA Oracle GI | HUB Oracle RAC NodeB Oracle GI | HUB Oracle RAC cons_1 cons_2 • Node eviction follows a rather predictable pattern – Example in a 2-node cluster: The node with the lowest node number survives. • Customers must not base their application logic on which node survives the split brain. – As this may(!) change in future releases
  • 14. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Node Weighting 14 Idea: Everything equal, let the majority of work survive NodeA Oracle GI | HUB Oracle RAC NodeB Oracle GI | HUB Oracle RAC cons_1 cons_2 • Node Weighing is a new feature that considers the workload run on a node during fencing • The idea is to let the majority of work survive, if everything else is equal – “Majority work” is for example represented by the number of services. • Example: In a 2-node cluster, the node hosting the majority of services (at fencing time) is meant to survive • DBAs can overrule and rate a service as a “critical” based on business needs
  • 15. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 15 Oracle RAC 12.2 Enhancements Worth Noticing Node Weighting Idea: If everything is equal, let the majority of work survive Pluggable Database & Service Isolation Improved singleton workload performance and failure behavior Service-oriented Buffer Cache Access Improved data access performance & planned maintenance operation Fully Integrated Extended RAC Support Site-awareness and installer support for extended RAC
  • 16. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Pluggable Database & Service Isolation 16 Prevents “noisy neighbors” from affecting others with unnecessary chatter NodeA Oracle GI | HUB Oracle RAC NodeB Oracle GI | HUB Oracle RAC cons_1 cons_2 • Using Oracle Multitenant, PDBs can be opened as singletons (in one database instance only), in a subset of instances or all in instances at once. • If certain PDBs are only opened on some instances, Pluggable Database Isolation – improves performance by • Reducing DLM operations for PDBs not open in all instances. • Optimizing block operations based on in-memory block separation. MSG Messages (MSG)
  • 17. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Pluggable Database & Service Isolation 17 Prevents instance failures of instances only hosting singleton PDBs to affect others NodeA Oracle GI | HUB Oracle RAC NodeB Oracle GI | HUB Oracle RAC cons_1 cons_2 • Using Oracle Multitenant, PDBs can be opened as singletons (in one database instance only), in a subset of instances or in all instances at once. • If certain PDBs are only opened on some instances, Pluggable Database Isolation – Improves performance by • Reducing DLM operations for PDBs not open in all instances. • Optimizing block operations based on in-memory block separation. – Ensures that instance failures of instances only hosting singleton PDBs will not impact other instances of the same RAC-based CDB.
  • 18. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 18 Oracle RAC 12.2 Enhancements Worth Noticing Node Weighting Idea: If everything is equal, let the majority of work survive Pluggable Database & Service Isolation Improved singleton workload performance and failure behavior Service-oriented Buffer Cache Access Improved data access performance & planned maintenance operation Fully Integrated Extended RAC Support Site-awareness and installer support for extended RAC
  • 19. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Service-oriented Buffer Cache Access 19 Improve performance by managing data with the service to which it belongs NodeA Oracle GI Oracle RAC NodeB Oracle GI Oracle RAC cons_1 cons_2 • Service-oriented Buffer Cache Access over time determines the data (on database object level) accessed by the service. This information – Is persisted in the database. – Is used to improve data access performance (e.g. do not manage data of a service in an instance that does not host the service). – Can be used to pre-warm an instance cache prior to a service startup (fresh start or relocation).
  • 20. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 20 Oracle RAC 12.2 Enhancements Worth Noticing Node Weighting Idea: If everything is equal, let the majority of work survive Pluggable Database & Service Isolation Improved singleton workload performance and failure behavior Service-oriented Buffer Cache Access Improved data access performance & planned maintenance operation Fully Integrated Extended RAC Support Site-awareness and installer support for extended RAC
  • 21. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Cluster Domain 22 For cost reduction through centralization, standardization and optimization Why Use Oracle RAC for Your Private Database Cloud? Cluster Single Node Cluster Centralization: centralize common management tasks on the Domain Services Cluster. Domain Services Cluster Standardization: Use the same building blocks – commodity hardware clusters – to scale databases, compute & storage. Database Member Cluster Application Member Cluster Optimization example: Version independence – run any Oracle RAC 12.2+ Member Cluster using any platform at any time. Linux Cluster AIX Cluster Solaris Cluster
  • 22. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 23 Centralization – Cluster Domain & Domain Services Domain Services Cluster Mgmt Repository Service Trace File Analyzer (TFA) Service Rapid Home Provision Service Cluster Domain A Cluster Domain is a logical management entity to group various clusters in your DC. The Mgmt Repository and the TFA service are mandatory in the Cluster Domain. They represent centralized versions of their local counterparts. To provide centralized services in the Cluster Domain, you need to deploy a Domain Services Cluster. It will host the central services. Additional services can be added as needed.
  • 23. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 24 Standardization – Member Clusters Cluster Domain Database Member Cluster uses local ASM Application Member Cluster GI only A (Database) Member Cluster is a cluster that registers with the Mgmt Repository Service and uses the centralized TFA service. It can use additional services as needed. Domain Services Cluster Mgmt Repository Service Trace File Analyzer (TFA) Service Rapid Home Provision Service An Application Member Cluster (available since 12.1.0.2) is a cluster designed to host applications. It uses a lightweight GI stack.
  • 24. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 25 Standardization – Storage Consolidation Domain Services Cluster Mgmt Repository Service Trace File Analyzer (TFA) Service Rapid Home Provision Service Database Member Cluster uses the ASM Services Shared ASM Cluster Domain Storage Services ASM Service IO Service ACFS Services Database Member Cluster uses the IO & ASM Services Storage flexibility: Member Clusters do not need direct connectivity to shared disks. Using the shared ASM Service, they can use network connectivity to the IOservice to access a centrally managed pool of storage. To further standardize and centralize, various Storage Services are offered in the domain.
  • 25. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 26 Fleet Patching & Provisioning Support Database & Grid Infrastructure 11.2.0.3. 11.2.0.4. 12.1 12.2 18 VM VM VM VM VM VM VM VM • Single Instance • Oracle Restart • Oracle RAC One • Oracle RAC BM Non-CDB CDB/PDB VM • Generic Software • Data Guard Aware • Customizable Multi-OS 19
  • 26. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Hides errors, timeouts, and maintenance No application knowledge or changes to use Rebuilds session state & in-flight transactions Adapts as applications change: protected for the future Standardize on Transparent Application Continuity 27 Request Errors/Timeouts hidden TAC Applications see no errors during outages
  • 27. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle RAC Performance Features • Automatic Undo Management • Cache Fusion • Oracle Real Application Clusters • Session Affinity • PDB & Services Isolation • Service-Oriented Buffer Cache • Leaf Block Split Optimizations • Self Tuning LMS • Multithreaded Cache Fusion • ExaFusion Direct-to-Wire Protocol • Smart Fusion Block Transfer • Universal Connection Pool (UCP) Support for Oracle RAC • Support for Distributed Transactions (XA) in Oracle RAC • Parallel Execution Optimizations for Oracle RAC • Affinity Locking and Read-Mostly Objects • Reader Bypass • Flash Cache • Connection Load Balancing • Load Balancing Advisory • Cluster Managed Services • Automatic Storage Management 9i 10g 11g 12c 18c • Zero Downtime Patching Clusterware • Fleet Provisioning and Patching • Automated Transaction Draining • Support TLS Ciphers for Clusterware • Automated PDB Relocation Over two decades of innovation 19c • Scalable Sequences • Undo RDMA-Read • Commit Cache • Database Reliability Framework
  • 28. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | RAC Enhancements • Remastering Slaves (1 slave per LMS) – Starting with Oracle RAC 12.1, the LMS offloads heavy remastering work to the slave – This improves LMS’s responsiveness for Cache Fusion requests during remastering • Support for 100 LMS’s – change in default value – Oracle RAC 12.2 supports up to 100 LMS’s (names: LMS0-LM99) as opposed to 35 – On larger systems (lots of CPU, large SGA), more LMS’s will start by default – More LMS’s means better reconfiguration time without any impact during runtime • More Dynamic Remastering (DRM) – Starting with Oracle RAC 19c, DRM is planned to more adaptively consider the overall system state 29
  • 29. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Program Agenda Architecture and Basics Troubleshooting Scenarios Proactive and Reactive tools 18/19c and beyond Q&A 1 2 3 4 5 30
  • 30. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 31 Cluster Startup Troubleshooting Scenarios Oracle Support TFA Check core CRS resources running ps –ef|grep init.ohasd ps –ef|grep ohasd.bin Not Running Review status of CRS services & stack crsctl check crs crsctl check cluster Running Compare OLR permissions to reference system & fix differences Not Running Running tfactl diagcollect Review & fix issues in logs ohasd.log Agent logs process logs Review & fix CRS startup config & log crsctl config crs ohasd.log
  • 31. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 32 Node Eviction Problem Triage Troubleshooting Scenarios Oracle Support TFA tfactl diagcollect Check for & fix resource starvation System log Troubleshooting guides: 1531223.1 (OSWatcher) 1328466.1 (CHM) Check for & fix network heartbeat problems ocssd.log Troubleshooting guides: 1050693.1 1534949.1 1546004.1 Check for & fix voting disk problems Troubleshooting guides: 1549428.1 1466639.1
  • 32. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Reconfiguration Performance Improvements 11.2.0.4 11204 4 x 1.5 x 12.2 18.1
  • 33. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Timings with different #LMS: – Total reconfiguration time for an instance leave & re-join – 100GB cache – 2 node RAC 34 Reconfiguration Performance as of 18c Buffer Cache Size Reconfiguration Time 25GB 3.0 sec 50GB 4.9 sec 100GB 8.3 sec • Timings with different cache sizes: – Total reconfiguration time for an instance leave & re-join – 8 LMS’s – 2 node RAC # LMS Reconfiguration Time 8 LMS’s 8.3 sec 16 LMS’s 5.0 sec 32 LMS’s 3.6 sec
  • 34. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Reconfiguration Diagnosability **************** BEGIN DLM RCFG HA STATS **************** Total dlm rcfg time (inc 6): 3.586 secs (394926177, 394929763) Begin step .........: 0.005 secs (394926177, 394926182) Freeze step ........: 0.019 secs (394926182, 394926201) Sync 1 step ........: 0.002 secs (394926264, 394926266) Sync 2 step ........: 0.024 secs (394926266, 394926290) Enqueue cleanup step: 0.002 secs (394926290, 394926292) Sync pcm1 step .....: 0.004 secs (394926293, 394926297) …… …. Enqueue dubious step: 0.004 secs (394926432, 394926436) Sync 5 step ........: 0.000 secs (394926436, 394926436) Enqueue grant step .: 0.001 secs (394926436, 394926437) Sync 6 step ........: 0.012 secs (394926437, 394926449) Fixwrt replay step .: 0.885 secs (394928837, 394929722) Sync 8 step ........: 0.040 secs (394929722, 394929762) End step ...........: 0.001 secs (394929762, 394929763) Number of replayed enqueues sent / received .......: 2246 / 893 Number of replayed fusion locks sent / received ...: 124027 / 0 Number of enqueues mastered before / after rcfg ...: 2058 / 1384 **************** END DLM RCFG HA STATS ***************** Detailed timing breakdown available in LMON trace file
  • 35. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | DRM Diagnosability Dynamic Remastering Statistics DB/Inst: SALES/sales1 Snaps: 393-452 -> Affinity objects - Affinity objects mastered at the begin/end snapshot -> Read-mostly objects - Read-mostly objects mastered at the begin/end snapshot per Begin End Name Total Remaster Op Snap Snap -------------------------------- ------------ ------------- -------- -------- remaster ops 24 1.00 remastered objects 24 1.00 remaster time (s) 7.4 0.31 freeze time (s) 1.5 0.06 cleanup time (s) 2.4 0.10 replay time (s) 0.3 0.01 fixwrite time (s) 2.4 0.10 sync time (s) 0.1 0.01 affinity objects N/A 3 27 read-mostly objects N/A 0 0 read-mostly objects (persistent) N/A 0 0 Detailed timing breakdown available in AWR Report
  • 36. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Program Agenda Architecture and Basics Troubleshooting Scenarios Proactive and Reactive tools 19c and beyond Q&A 1 2 3 4 5 37
  • 37. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle’s Database and Clusterware Tools • What if issues were detected before they had an impact? • What if you were notified with a specific diagnosis and corrective actions? • What if resource bottlenecks threatening SLAs were identified early? • What if bottlenecks could be automatically relieved just in time? • What if database hangs and node reboots could be eliminated? Confidential – Oracle Restricted 38 Cluster Verification Utility ORAchk / EXAchk Cluster Health Monitor Cluster Health Advisor Trace File Analyzer Hang Manager Memory Guard Quality of Service Management
  • 38. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Automatic proactive warning of problems before they impact you 39 Get scheduled health reports sent to you in email Why Oracle ORAchk & EXAchk Health checks for most impactful reoccurring problems Runs in your environment with no need to send anything to Oracle Findings can be integrated into other tools of choiceEngineered Systems Non Engineered Systems EXAchk Common Framework ORAchk Further slide details
  • 39. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Engineered Systems Oracle Exadata Database Machine Oracle SuperCluster Oracle Private Cloud Appliance Oracle Database Appliance Oracle Big Data Appliance Oracle Exalogic Elastic Cloud Oracle Exalytics In-Memory Machine Oracle Zero Data Loss Recovery Appliance Oracle ZFS Storage Appliance Systems Oracle Solaris Cross stack checks Solaris Cluster OVN ASR 41 Oracle Stack Coverage Oracle Database Standalone Database Grid Infrastructure & RAC Maximum Availability Architecture (MAA) Scorecard Upgrade Readiness Validation Golden Gate Enterprise Manager Cloud Control Repository Agent OMS Middleware Application Continuity Oracle Identify and Access Management Suite (Oracle IAM) E-Business Suite Oracle Payables Oracle Workflow Oracle Purchasing Oracle Order Management Oracle Process Manufacturing Oracle Receivables Oracle Fixed Assets Oracle HCM Oracle CRM Oracle Project Billing Siebel Database best practices PeopleSoft Database best practices SAP EXAdata best practices
  • 40. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Profiles provide logical grouping of checks which are about similar topics • Run only checks in a specific profile • Run everything except checks in a specific profile Profiles ./exachk –profile <profile> ./exachk –excludeprofile <profile> Profile Description asm ASM Checks avdf Audit Vault Configuration checks clusterware Oracle clusterware checks control_VM Checks only for Control VM(ec1-vm, ovmm, db, pc1, pc2). No cross node checks corroborate Exadata checks needs further review by user to determine pass or fail dba DBA Checks ebs Oracle E-Business Suite checks eci_healthchecks Enterprise Cloud Infrastructure Healthchecks ecs_healthchecks Enterprise Cloud System Healthchecks goldengate Oracle GoldenGate checks hardware Hardware specific checks for Oracle Engineered systems maa Maximum Availability Architecture Checks ovn Oracle Virtual Networking platinum Platinum certification checks preinstall Pre-installation checks prepatch Checks to execute before patching security Security checks solaris_cluster Solaris Cluster Checks storage Oracle Storage Server Checks switch Infiniband switch checks sysadmin Sysadmin checks user_defined_checks Run user defined checks from user_defined_checks.xml 44
  • 41. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Profiles provide logical grouping of checks which are about similar topics • Run only checks in a specific profile • Run everything except checks in a specific profile Profiles ./orachk –profile <profile> ./orachk –excludeprofile <profile> Profile Description asm ASM Checks bi_middleware Oracle Business Intelligence checks clusterware Oracle clusterware checks dba DBA Checks ebs Oracle E-Business Suite checks emagent Cloud control agent checks emoms Cloud Control management server em Cloud control checks goldengate Oracle GoldenGate checks hardware Hardware specific checks for Oracle Engineered systems oam Oracle Access Manager checks oim Oracle Identify Manager checks oud Oracle Unified Directory server checks ovn Oracle Virtual Networking peoplesoft Peoplesoft best practices preinstall Pre-installation checks prepatch Checks to execute before patching security Security checks siebel Siebel Checks solaris_cluster Solaris Cluster Checks storage Oracle Storage Server Checks switch Infiniband switch checks sysadmin Sysadmin checks user_defined_checks Run user defined checks from user_defined_checks.xml 45
  • 42. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Enterprise Manager Integration •Check results integrated into EM compliance framework via plugin •View results in native EM compliance dashboards •Related checks grouped into compliance standards •View targets checked, violations & average score •Drill down into compliance standard to see individual check results •View break down by target 46
  • 43. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | JSON Output to Integrate with Kibana, Elastic Search etc 48
  • 44. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Health Check Collection Manager Dashboard 49
  • 45. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Differences between each run Diff Output 50
  • 46. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • New checks to help when upgrading the database to 12.2+ • Both pre and post upgrade verification to prevent problems related to: • OS configuration • Grid Infrastructure & Database patch prerequisites • Database configuration • Cluster configuration Upgrade to Database 12.2 and beyond with confidence orachk -u –o pre orachk -u –o post Pre upgrade Post upgrade
  • 47. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 60 Real-time Status Summary tfactl summary Choose an option to drill down High-level summary of all Database components
  • 48. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 61 Real-time Status Summary – Drill Down Drill downs show real-time analytics & details of any problems found
  • 49. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Perform Analysis Using the Included Tools Not all tools are included in Grid or Database install. Download from 1513912.1 to get full collection of tools Tool Description orachk or exachk Provides health checks for the Oracle stack. Oracle Trace File Analyzer will install either • Oracle EXAchk for Engineered Systems, see document 1070954.1 for more details or • Oracle ORAchk for all non-Engineered Systems, see document 1268927.2 for more details oswatcher Collects and archives OS metrics. These are useful for instance or node evictions & performance Issues. See document 301137.1 for more details procwatcher Automates & captures database performance diagnostics and session level hang information. See document 459694.1 for more details oratop Provides near real-time database monitoring. See document 1500864.1 for more details. alertsummary Provides summary of events for one or more database or ASM alert files from all nodes ls Lists all files TFA knows about for a given file name pattern across all nodes pstack Generate process stack for specified processes across all nodes Tool Description grep Search alert or trace files with a given database and file name pattern, for a search string. summary Provides high level summary of the configuration vi Opens alert or trace files for viewing a given database and file name pattern in the vi editor tail Runs a tail on an alert or trace files for a given database and file name pattern param Shows all database and OS parameters that match a specified pattern dbglevel Sets and unsets multiple CRS trace levels with one command history Shows the shell history for the tfactl shell changes Reports changes in the system setup over a given time period. This includes database parameters, OS parameters and patches applied calog Reports major events from the Cluster Event log events Reports warnings and errors seen in the logs managelogs Shows disk space usage and purges ADR log and trace files ps Finds processes triage Summarize oswatcher/exawatcher data 62 Verify which tools you have installed: tfactl toolstatus
  • 50. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 89 Generates Diagnostic Metrics View of Cluster and Databases Cluster Health Monitor (CHM) GIMR ologgerd (master) osysmond osysmond osysmond osysmond 12c Grid Infrastructure Management Repository • Always on - Enabled by default • Provides Detailed OS Resource Metrics • Assists Node eviction analysis • Locally logs all process data • User can define pinned processes • Listens to CSS and GIPC events • Categorizes processes by type • Supports plug-in collectors (ex. traceroute, netstat, ping, etc.) • New CSV output for ease of analysis OS Data OS Data OS Data OS Data Confidential – Oracle Internal/Restricted/Highly RestrictedConfidential – Oracle Restricted
  • 51. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 90 Oclumon CLI or Full Integration with EM Cloud Control Cluster Health Monitor (CHM) Confidential – Oracle Internal/Restricted/Highly RestrictedConfidential – Oracle Restricted
  • 52. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Cluster Health Advisor (CHA)* Discovers Potential Cluster & DB Problems - Notifies with Corrective Actions 91 OS Data GIMR ochad • Always on - Enabled by default • Detects node and database performance problems • Provides early-warning alerts and corrective action • Supports on-site calibration to improve sensitivity • Integrated into EMCC Incident Manager and notifications • Standalone Interactive GUI Tool DB Data CHM Node Health Prognostics Engine Database Health Prognostics Engine * Requires and Included with RAC or R1N License Confidential – Oracle Restricted
  • 53. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Calibrating CHA to your RAC deployment Confidential – Oracle Restricted 92 Choosing a Data Set for Calibration – Defining “normal” $ chactl query calibration –cluster –timeranges ‘start=2016-10-28 07:00:00,end=2016-10-28 13:00:00’ Cluster name : mycluster Start time : 2016-10-28 07:00:00 End time : 2016-10-28 13:00:00 Total Samples : 11524 Percentage of filtered data : 100% 1) Disk read (ASM) (Mbyte/sec) MEAN MEDIAN STDDEV MIN MAX 0.11 0.00 2.62 0.00 114.66 <25 <50 <75 <100 >=100 99.87% 0.08% 0.00% 0.02% 0.03% 2) Disk write (ASM) (Mbyte/sec) MEAN MEDIAN STDDEV MIN MAX 0.01 0.00 0.15 0.00 6.77 <50 <100 <150 <200 >=200 100.00% 0.00% 0.00% 0.00% 0.00% 3) Disk throughput (ASM) (IO/sec) MEAN MEDIAN STDDEV MIN MAX 2.20 0.00 31.17 0.00 1100.00 <5000 <10000 <15000 <20000 >=20000 100.00% 0.00% 0.00% 0.00% 0.00% 4) CPU utilization (total) (%) MEAN MEDIAN STDDEV MIN MAX 9.62 9.30 7.95 1.80 77.90 <20 <40 <60 <80 >=80 92.67% 6.17% 1.11% 0.05% 0.00%
  • 54. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Calibrating CHA to your RAC deployment • Create and store the new model $ chactl query calibrate cluster –model daytime –timeranges ‘start=2018-10-28 07:00:00, end=2018-10-28 13:00:00’ • Begin using the new model $ chactl monitor cluster –model daytime • Confirm the new model is being used $ chactl status –verbose monitoring nodes svr01, svr02 using model daytime monitoring database qoltpacdb, instances oltpacdb_1, oltpacdb_2 using model DEFAULT_DB Confidential – Oracle Restricted 93 Creating a new CHA Model with CHACTL
  • 55. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Cluster Health Advisor – Command Line Operations Confidential – Oracle Restricted 94 Monitoring Your Databases and Nodes with CHACTL Enable CHA monitoring on RAC database with optional model $ chactl monitor database –db oltpacdb [-model model_name] Enable CHA monitoring on RAC database with optional verbose $ chactl status –verbose monitoring nodes svr01, svr02 using model DEFAULT_CLUSTER monitoring database oltpacdb, instances oltpacdb_1, oltpacdb_2 using model DEFAULT_DB
  • 56. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | CHA Command Line Operations Confidential – Oracle Restricted 95 Checking for Health Issues and Corrective Actions with CHACTL QUERY DIAGNOSIS $ chactl query diagnosis -db oltpacdb -start "2016-10-28 01:52:50" -end "2016-10-28 03:19:15" 2016-10-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected] 2016-10-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected] 2016-10-28 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected] 2016-10-28 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected] Problem: DB Control File IO Performance Description: CHA has detected that reads or writes to the control files are slower than expected. Cause: The Cluster Health Advisor (CHA) detected that reads or writes to the control files were slow because of an increase in disk IO. The slow control file reads and writes may have an impact on checkpoint and Log Writer (LGWR) performance. Action: Separate the control files from other database files and move them to faster disks or Solid State Devices. Problem: DB Log File Switch Description: CHA detected that database sessions are waiting longer than expected for log switch completions. Cause: The Cluster Health Advisor (CHA) detected high contention during log switches because the redo log files were small and the redo logs switched frequently. Action: Increase the size of the redo logs.
  • 57. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Cluster Health Advisor – Command Line Operations Confidential – Oracle Restricted 96 HTML Diagnostic Health Output Available (-html <file_name>)
  • 58. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 97 Oracle 12c Hang Manager • Always on - Enabled by default • Reliably detects database hangs and deadlocks • Autonomously resolves them • Supports QoS Performance Classes, Ranks and Policies to maintain SLAs • Logs all detections and resolutions • New SQL interface to configure sensitivity (Normal/High) and trace file sizes Autonomously Preserves Database Availability and Performance Session DIA0 EVALUATE DETECT ANALYZE Hung? VERIFY Victim QoS Policy Confidential – Oracle Restricted
  • 59. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 98 Full Resolution Dump Trace File and DB Alert Log Audit Reports Oracle 12c Hang Manager Dump file …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc Oracle Database 12c Enterprise Edition Release 18/19c.0.0.0 - 64bit Beta With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics and Real Application Testing options Build label: RDBMS_MAIN_LINUX.X64_151013 ORACLE_HOME: …/3775268204/oracle System name: Linux Node name: slc05kyr Release: 2.6.39-400.211.1.el6uek.x86_64 Version: #1 SMP Fri Nov 15 13:39:16 PST 2013 Machine: x86_64 VM name: Xen Version: 3.4 (PVM) Instance name: hm62 Redo thread mounted by this instance: 2 Oracle process number: 19 Unix process pid: 12656, image: oracle@slc05kyr (DIA0) *** 2015-10-13T16:47:59.541509+17:00 *** SESSION ID:(96.41299) 2015-10-13T16:47:59.541519+17:00 *** CLIENT ID:() 2015-10-13T16:47:59.541529+17:00 *** SERVICE NAME:(SYS$BACKGROUND) 2015-10-13T16:47:59.541538+17:00 *** MODULE NAME:() 2015-10-13T16:47:59.541547+17:00 *** ACTION NAME:() 2015-10-13T16:47:59.541556+17:00 *** CLIENT DRIVER:() 2015-10-13T16:47:59.541565+17:00 2015-10-13T16:47:59.435039+17:00 Errors in file /oracle/log/diag/rdbms/hm6/hm6/trace/hm6_dia0_12433.trc (incident=7353): ORA-32701: Possible hangs up to hang ID=1 detected Incident details in: …/diag/rdbms/hm6/hm6/incident/incdir_7353/hm6_dia0_12433_i7353.trc 2015-10-13T16:47:59.506775+17:00 DIA0 requesting termination of session sid:40 with serial # 43179 (ospid:13031) on instance 2 due to a GLOBAL, HIGH confidence hang with ID=1. Hang Resolution Reason: Automatic hang resolution was performed to free a significant number of affected sessions. DIA0: Examine the alert log on instance 2 for session termination status of hang with ID=1. In the alert log on the instance local to the session (instance 2 in this case), we see the following: 2015-10-13T16:47:59.538673+17:00 Errors in file …/diag/rdbms/hm6/hm62/trace/hm62_dia0_12656.trc (incident=5753): ORA-32701: Possible hangs up to hang ID=1 detected Incident details in: …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc 2015-10-13T16:48:04.222661+17:00 DIA0 terminating blocker (ospid: 13031 sid: 40 ser#: 43179) of hang with ID = 1 requested by master DIA0 process on instance 1 Hang Resolution Reason: Automatic hang resolution was performed to free a significant number of affected sessions. by terminating session sid:40 with serial # 43179 (ospid:13031) Confidential – Oracle Restricted
  • 60. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Program Agenda Architecture and Basics Troubleshooting Scenarios Proactive and Reactive tools 19c and beyond Q&A 1 2 3 4 5 99
  • 61. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle RAC 18c • Manages hung database processes – Detect & resolves – Cross-layer hangs • I.E: Hangs caused by a blocked ASM resource. • Resolves deadlocks • User defined control via PL/SQL • Early Warning exposed via (V$ view) 100 Hang Manager Database Member Cluster Uses ASM IO Service IO Service ASM Service Session Detect Analyze Evaluate Hung? Hang Resolution
  • 62. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Diagnostic Service Oracle Confidential – Highly Restricted 101 All data aggregated in one place Real-time overview of infrastructure & services Fine-grained drill down for diagnosis
  • 63. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Initial Anomalous Events Ranked Anomalous Events Oracle Confidential – Highly Restricted 102 Timestamp Correlation & Ranking Full initial list of anomalous events 1. Sort the anomalous events in chronological order 2. keep tack of unique events and their first occurrence 3. Compare sequence of events to previous timeframes in the same collection 4. Prioritize unique events not seen previously in the collection
  • 64. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle 19c • Applied Machine Learning for Database Diagnostics – Efficient diagnosis using Machine Learning – Automatically performs corrective actions to prevent possible issues – Provides simple alerts & recommendations for issues that require manual intervention Confidential – Oracle Restricted 103 Oracle Domain Services Cluster IO Service ACFS Services ASM Service TFA Service Management Service RHP Service Shared ASM Subject Matter ExpertLog ASHMetrics ML Knowledge Extraction Model Generation Human Supervision Application Optimized Models Feedback
  • 65. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Monitors for problems before service disruption – E.g HB for critical processes • Detects the cause of problem • Use collected data across all nodes to identify root cause – E.g. Waits on GRD • Resolves the problem with minimal disruption – E.g Resize internal Structures Introducing Database Reliability Framework Resources
  • 66. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Monitor Detect Review Resolve • Increase in number of resources in the Global Resource Directory (GRD) • Resulting in higher wait times for GRD • Several solutions possible – Is wait time due to high CPU load? – Increase in number of LMS help? – Increasing CR slaves help – Increasing internal thresholds help? Database Reliability Framework in Action
  • 67. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Busy FG process(es) using CPU • Potential upcoming memory starvation • LGWR constrained by CPU • Too many RT processes • Insufficient CR slaves • DLM resource cache incorrectly sized • Control file IO (CFIO) stall • v$ views • v$gcr_metrics - details on all defined metrics • v$gcr_actions - details on all defined actions • v$gcr_log – metric/action history summary log • v$gcr_status – details on latest metric/action status 107 Examples and DRF Views
  • 68. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Increase the maximum number of LMSs – Based on System utilization (DRF) • Each LMS will spawn a dedicated CR slave – Threshold of Rollback Changes – Threaded CR slave in 18c • Optimized for Multi core/thread architecture • Remastering Slaves (RMV0..) – Offloads heavy remastering work to slaves Cache Fusion Optimizations
  • 69. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Commit Cache • Reduce Cache Fusion traffic for remote undo header lookups • Often becomes a bottleneck with DML heavy OLTP/mixed workloads • Remote undo header lookups are needed for: – Check if a transaction has committed – Delayed block cleanout 109 0 400 800 1,200 1,600 2,000 Data Blocks Undo Headers Undo Blocks Others #BlockTransfers(thousands) CR (Immediate) CR (Busy) Current (Immediate) Current (Busy)
  • 70. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • Undo Block RDMA-read • In some workloads, more than half of remote reads are for Undo Blocks to satisfy read consistency – Undo Block RDMA-read uses RDMA to directly and rapidly access UNDO blocks in remote instances • Commit Cache – The Commit Cache maintains an in-memory table on each instance which records the commit time of transactions – Remote LMS directly reads the commit cache and sends back commit times for requested transactions. • Replaces having to send entire 8K transaction table block 110 RAC Optimizations for Exadata UNDOUNDO RDMA RDMA
  • 71. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 111 Fusion Block Transfer 1 2 3 4 Shadow Process LGWR gc current block busy, gc buffer busy acquire, gc buffer busy release
  • 72. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | • On Exadata, Oracle does not wait for the log write notification – Exadata ensures the log write completes before changes to block on another instance commit, guaranteeing durability – Wait for Log I/O during transfer of hot blocks is eliminated – Up to 40% throughput and 33% response time improvement in some heavily contended OLTP workloads • Storage software will ensure correct ordering of writes 112 Smart Fusion Block Transfer 1. Issue log write 2. Wait for log write completion 3. Transfer block Exadata Avoids I/O Wait confirmation Storage
  • 73. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Continuous Feature Improvements Lock Domain per PDB Utilize Bloom Filter to further reduce Reconfiguration times Utilize Database Reliability Framework
  • 74. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Scalable Sequences Continuous Application Availability Oracle RAC Sharding Cluster Domains Cluster Health Advisor (CHA) RAC Reader Nodes Application Continuity (AC) Oracle Flex ASM & Flex Clusters Rapid Home Provisioning (RHP) Cluster Health Monitor (CHM) Oracle Quality of Service Management (QoS) Policy-Based Cluster Management Oracle RAC One Node & RACcheck Oracle ASM Cluster File System (ACFS) Oracle Grid Infrastructure (GI) UCP and OCI Load Balancing Support for RAC Cluster Verification Utility (CVU) Cluster-Managed Services Oracle Clusterware Oracle Automatic Storage Management (ASM) Oracle Real Application Clusters (RAC) Oracle 9i Oracle 10g Oracle RAC’s Journey into the Autonomous Database Oracle 11g Oracle 12c 20-years of continuous innovation* Oracle 18c/19c * Documented features list is selective; 20 years include development time 114
  • 75. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Flex Cluster Leaf nodes deprecated Massive Parallel Query Oracle RAC deprecated Oracle RAC Reader Nodes to be implemented on Hub nodes Flex Cluster – Changes Down the Road 115
  • 76. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | gridSetup and zip-based install for Oracle Grid Infrastructure NEW: RPM-based installs for the Oracle Database and Oracle Client ASM Management for NFS-based Clusterware files for easier management and thereby better availability. Separate Diskgroup for Grid Infrastructure Management Repository (GIMR) allows for more flexibility during Grid Infrastructure Installation Better Management $ORACLE_HOME/gridSetup.sh Configure ASM on NFS 116
  • 77. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | More Changes.. • Desupport of Direct File System Placement for Oracle Clusterware Files – Introduced with Oracle Clusterware 12c Rel. 2 (12.2.0.1) – Effective with Oracle Clusterware 18c – Desupport revoked effective with Oracle Clusterware 19c • Oracle Grid Infrastructure Management Repository (GIMR) – Around since Oracle Grid Infrastructure 11g Release 2 – Automatic Installation of the GIMR introduced with Grid Infrastructure 12.1.0.2 – Separate diskgroup installation introduced with Grid Infrastructure 12c Release 2 – Automatic install revised for Oracle Grid Infrastructure 19c • Plans foresee a GIMR installation outside of the Oracle Grid Infrastructure home for Standard Clusters • Centralized GIMR hosting on a Domain Services Cluster (for Member Clusters) remains unchanged 117
  • 78. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Patching Improvements • OJVM is Oracle RAC rolling patch enabled with Oracle RAC 18c (18.4) – Non-Java services are available at all times – Java services are available all the time, except for a ~10 seconds brownout • No errors are reported during the brownout • Zero-Downtime Oracle Grid Infrastructure Patching (*18.3) – Patch Oracle Grid Infrastructure without interrupting database operations – Patches are applied out-of-place and in a rolling fashion with one node being patched at a time while the database instance(s) on that node remain up and running – Supported for Oracle RAC and RAC One Node clusters with two or more nodes 119
  • 79. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | The Road Ahead Leads into the Autonomous Database Cloud • Future scalability & performance improvements – Tailor to scaling well within Exadata dimensions (“scale linear across 64 nodes, not 200”) – Are designed to meet ADB performance requirements and will grow as ADB enhances – Will leverage RDMA technology for server-less communication – Plan to use RoCE as the next-generation network for the cloud • Details in MOS note “Oracle RAC Interconnect Protocols – Support and Roadmap (ID 2434852.1)” – Will substitute storage access with network-based access to data on remote nodes – Are likely to utilize NVM for storage on independent servers • Future availability improvements – Will focus on reducing re-configuration times (brownouts) further to come closer to “zero” – Will provide even more ways to perform maintenance & admin tasks with no downtime 121
  • 80. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |