SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Downloaden Sie, um offline zu lesen
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 1
Managing & Troubleshooting Cluster
- 360 degrees
Disclaimer
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 2
This views/content in this slides are those of the author and
do not necessarily reflect that of Oracle Corporation and/or its
affiliates/subsidiaries. The material in this document is for
informational purposes only and is published with no
guarantee or warranty, express or implied..
This material should not be reproduced or used without the
authors' written permission.
About me
Syed Jaffer Hussain
Database Support Manager
Over 20 years IT hands-on experience
14+ years as an Oracle DBA
Technologist of the year, DBA 2011.
Oracle ACE Director
Oracle 10g Certified Master(OCM)
Oracle 10g RAC Certified Expert
OCP v8i,9i,10g & 11g
ITIL v3 Foundation Certified
Co-Authored - Oracle 11g R1/R2 Real Application Clusters Essentials
- Expert Oracle RAC (in-progress)
Twitter: @sjaffarhussain
http://jaffardba.blogspot.com
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 3
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 4
A famous personality in the Oracle community once
compared an Oracle DBA with a Pilot!!!
Foreword
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 5
Any deployment (installation/upgrade) is a single shot
Foreword
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 6
Where as
Foreword
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 7
Administration and Troubleshooting is never ending
Foreword
What is covered
o What's new in 11gR2 Clusterware – Key new features at a glance
o Oracle 11gR2 Clusterware software stack
o Clusterware start-up sequence
o Cluster logs & directory tree structure
o Analyzing Cluster logs
o Cluster logs rotation/retention policy
o Troubleshooting Cluster start-up failures
o Debugging/Tracing Clusterware components
o Tools & Utilities – how to pick the right one
o References
o Q & A
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 8
Key new features at a glance
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 9
Oracle Grid Infrastructure
• Clusterware and ASM binaries are installed together in a single home
directory: Grid Home (GI)
Key new features at a glance
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 10
• OCR can also be stored in ASM diskgroup
• Upto five (05) copies of OCR files
Key new features at a glance
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 11
Oracle Local Registry – OLR
• Independent OLR copy for each node
• Not shared between nodes
• Stores local node configuration details required by OHASD
• Configured upon installation/upgrade
• Facilitates the CRS startup process when OCR/VD stored in ASM
• ocrcheck -local
• Located under $GRID_HOME/cdata/hostname/hostname.olr
• $GRID_HOME/bin/ocrconfig –local –manualbakup/restore
Key new features at a glance
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 12
• Voting Disk (files) can also be stored in ASM diskgroup
• VD copies can’t resides in multiple ASM diskgroups
Key new features at a glance
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 13
• crsctl start cluster –all -- starts cluster on all nodes
• crsctl stop cluster –all -- stops cluster on all nodes
• crsctl check cluster –all -- verify cluster health on all
nodes
Clusterized cluster-aware commands
Key new features at a glance
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 14
Complete redesign of Cluster daemon in 11gR2, ohasd introduction
Replaces RACG layer with Agents
New Services
Grid Plug and Play (GPnP)
Cluster Time Synchronization Service (CTSS)
Grid Name Service
Cluster can be started in exclusive mode for maintenance purpose
./crsctl start res –t –init ---- list all Clusterware daemon resources
Oracle 11gR2 Clusterware Stack
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 15
Oracle 11gR2 Clusterware Software Stacks
Cluster Ready Services
daemon(CRSD)
1
The upper stack
Oracle 11gR2 Clusterware Stack
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 16
Oracle 11gR2 Clusterware Software Stack
Cluster Ready Services
daemon(CRS)
Oracle High Availability
Services daemon(OHASD)
1 2
The lower stack
Oracle 11gR2 Clusterware Stack
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 17
Oracle 11gR2 Clusterware Software Stack
Cluster Ready Services
daemon(CRS)
Oracle High Availability
Services daemon(OHASD)
1 2
The upper stack
Oracle 11gR2 Clusterware Stack
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 18
Oracle 11gR2 Clusterware Software Stack
Cluster Ready Services
Stack (CRS)
Oracle High Availability
Service Stack (OHASD)
1 2
The lower stack
Oracle 11gR2 Clusterware Stack
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 19
Oracle High Availability Services daemon (ohasd)
h1:3:respawn:/sbin/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
/etc/inittab
entry point
h1:3:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
Linux
HPUX
Oracle/RedHat Linux 6
/etc/init, /etc/init.d/init.ohasd run
Clusterware Startup Sequence
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 20
Clusterware Startup Sequence
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 21
Clusterware Startup Sequence
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 22
Clusterware Startup Sequence
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 23
Clusterware Startup Sequence
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 24
Clusterware Startup Sequence
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 25
Oracle High Availability Service
Daemon (ohasd)
Cluster Synchronization Service
Daemon (cssd)
Event Manager Daemon (evmd)
Cluster Ready Service Daemon
(crsd)
Voting
disk
GPnP /
OLR
OCR
Cluster logs & directory structure
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 26
alert<hostname>.log
CRSD == crsd.log
CSSD == ocssd.log
OHASD == ohsad.log
EVMD == evmd.log
Operating System Logs
GRID_HOME/log/host_name/
Cluster logs & directory structure
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 27
• Writes Clusterware stack all important alerts messages
• Posts Cluster stack start/stop messages
• nodes eviction messages
• OLR events
• Voting and OCR disk related messages
• Active nodes list
• Preferably the first log file to review upon any cluster issues
GRID_HOME/log/host_name/alterrac1.log
Cluster logs & directory structure
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 28
GRID_HOME/log/host_name/alterrac1.log
[ohasd(10937)]CRS-1301:Oracle High Availability Service started on node rac1.
[cssd(19712)]CRS-1713:CSSD daemon is started in exclusive mode
[ohasd(19506)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server ‘rac1’
[cssd(19951)]CRS-1605:CSSD voting file is online: /dev/rdsk/c0t13d2; details in
/u00/app/11.2.0/grid_1/log/rac1/cssd/ocssd.log.
2013-04-23 16:42:28.906: [ CSSD][6]clssnmvFindInitialConfigs: No voting files found
2013-04-23 16:42:28.906: [ CSSD][6](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not
found. Retrying discovery in 15 seconds
[cssd(7945)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details
at (:CSSNM00070:) in /u00/app/11
.2.0/grid/log/usdbt01/cssd/ocssd.log
[/u00/app/11.2.0/grid_1/bin/oraagent.bin(19914)]CRS-5815:Agent
'/u00/app/11.2.0/grid_1/bin/oraagent_oracle' could not find any base
type entry points for type 'ora.daemon.type'. Details at (:CRSAGF00108:) {0:9:2} in
/u00/app/11.2.0/grid_1/log/usdbp01/agent/ohasd/oraagent_oracle/oraagent_oracle.log.
[cssd(19951)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1,rac2.
[cssd(3726)]CRS-1625:Node rac2, number 2, was manually shut down
[cssd(3726)]CRS-1612:Network communication with node rac2 (2) missing for 50% of timeout interval.
Removal of this node from cluster in 14.145 seconds
Cluster logs & directory structure
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 29
• Cluster Ready Services daemon (CRSD) maintains this log
• Any cluster resources Start/stop/failure occurrence are written
• Review the log when you have resources failures, unable to start etc
• 2013-04-05 02:05:05.379: [ CRSPE][46] {13:24208:51894} Resource ora.prddb.db has been updated in
the PE data model:6000000004f82610
• 2013-04-05 02:08:15.371: [ CRSPE][46] {0:33:627} State change received from rac1 for ora.prddb.db
• 2013-04-05 02:08:22.698: [ CRSPE][46] {0:33:628} CRS-2672: Attempting to start
'ora.prddb.prddb_srv.svc' on ‘rac1‘
• 2013-04-23 17:32:46.340: [ OCRRAW][1]proprioini: all disks are not OCR/OLR formatted
• 2013-04-23 17:32:46.340: [ OCRRAW][1]proprinit: Could not open raw device
GRID_HOME/log/host_name/crsd.log
Cluster logs & directory structure
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 30
• Cluster Synchronization daemon (CSSD) maintains this log
• Busiest log file
• Records node inter-communication messages
• Heart beat missing and node eviction messages
2013-04-23 16:14:14.712: [ CSSD][6]clssnmvDiskVerify: Successful discovery of 0 disks
2013-04-23 16:14:14.712: [ CSSD][6]clssnmCompleteInitVFDiscovery: Completing initial voting file
discovery
2013-04-23 16:14:14.712: [ CSSD][6]clssnmvFindInitialConfigs: No voting files found
2013-04-23 16:14:14.713: [ CSSD][6](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not
found. Retrying discovery in 15 seconds
• 2013-04-17 11:33:23.286: [GIPCHALO][7] gipchaLowerProcessNode: bootstrap node considered dead
because of idle connection time 600014 ms, node 60000000019645b0 { host ‘rac2', haName 'CSS_crs',
srcLuid 8c78ad11-53582d91, dstLuid 2f6e4604-5e451051 numInf 1, contigSeq 6712447, lastAck 6697196,
lastValidAck 6712447, sendSeq [6697202 : 6697202], createTime 2417887052, sentRegister 1,
localMonitor 1, flags 0x808 }
• 2013-04-20 15:13:18.953: [ CSSD][54]clssnmSendingThread: sending status msg to all nodes
• 2013-04-20 15:13:18.954: [ CSSD][54]clssnmSendingThread: sent 4 status msgs to all nodes
• 2013-04-12 17:35:55.351: [ CSSD][49]clssnmvReadDskHeartbeat: Reading DHBs to get the latest info
for node rac1, 17 LATSvalid 0 uniqueness 1348227938
• [cssd(7335)]CRS-1612:Network communication with node rac2 (02) missing for 50% of timeout interval. Removal of this node from cluster in 14.397
seconds2013-03-15 17:02:44.964
[cssd(7335)]CRS-1611:Network communication with node rac2 (02) missing for 75% of timeout interval. Removal of this node from cluster in 7.317
seconds2013-03-15 17:02:50.024
[cssd(7335)]CRS-1610:Network communication with node rac2 (02) missing for 90% of timeout interval. Removal of this node from cluster in
GRID_HOME/log/host_name/ocssd.log
Cluster logs & directory structure
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 31
• Oracle High Availability Service (OHASD) maintains this log
• High Availability service messages are written
• Review the log when you have issues whist running root.sh/rootupgrd.sh
• If the service unable to start or becomes unhealthy due to OLR problems
• Loads default debugging levels
2013-04-17 11:32:47.096: [ default][1] OHASD Daemon Starting. Command string :reboot
2013-04-17 11:32:47.125: [ default][1] Initializing OLR
2013-04-17 11:32:47.255: [ OCRRAW][1]proprioo: for disk 0
(/u00/app/12.1.0/grid_1/cdata/rac2.olr), id match (1), total id sets,
need recover (0), my votes (0), total votes (0), commit_lsn (3118), lsn (3118)
2013-04-17 11:32:47.368: [ default][1] Loading debug levels...
2013-04-17 11:32:47.803: [ clsdmt][13]Creating PID [6401] file for home /u00/app/12.1.0/grid_1
host usdbp10 bin ohasd to /u00/app/12.1.0/grid_1/ohasd/init/
GRID_HOME/log/host_name/ohasd.log
Cluster logs & directory structure
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 32
Cluster logs rotation/retention policy
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 33
Operating System logs
HPUX - /var/adm/syslog/syslog.log
AIX - /bin/errpt –a
Linux - /var/log/messages
Windows - Refer .TXT log files under Application/System log using Windows Event Viewer
Solaris - /var/adm/messages
Cluster logs rotation/retention policy
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 34
Managing Clusterware log files manually is not recommended...
Cluster logs rotation/retention policy
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 35
They are governed and managed automatically ……
Cluster logs rotation/retention policy
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 36
• Most Clusterware log files follow the 10x10 rule as part of automatic
rotation/retention policy and governed automatically.
• 10 copies of cssd.log files with 50M retained and rotated subsequently.
• ohasd, evmd, crsd etc logs also retain 10 copies with 10M size.
• The policy doesn’t applies to the alterHOSTNAME.log file.
Troubleshooting Cluster start-up failures
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 37
$GRID_HOME/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
Troubleshooting Cluster start-up failures
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 38
$GRID_HOME/bin/crsctl check crs
$GRID_HOME/bin/crsctl check cluster
CRS-4639: Could not contact Oracle High Availability Services
CRS-4124: Oracle High Availability Services startup failed
CRS-4000: Command Check failed, or completed with errors
OR
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
Troubleshooting Cluster start-up failures
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 39
Oracle High Availability Service (ohasd) start-up failures – common causes
CRS-4639: Could not contact Oracle High Availability Services
CRS-4124: Oracle High Availability Services startup failed
CRS-4000: Command Check failed, or completed with errors
1. Verify whether the cluster auto start-up is configured or not?
• crsctl config has
• /var/opt/oracle/scls_scr/hostname/root or
/etc/oracle/scls_scr/hostname/root
• Verify OS run level
• Check whether ohasd daemon process is up or not: ps –ef |grep ohasd
2. Verify ohasd auto-start pointer in the /etc/init and /etc/inittab files ?
• h1:3:respawn:/sbin/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
3. Verify the OLR availability, corruption and accessibility on the local node
• Review the ohasd.log file for more details
4. Verify whether the ohasd agents are up or not – for unhealthy cluster
• ps –ef |grep oraagent|orarootagent|cssdagent|cssdmonitor
• Review the ohasd.log file
5. Verify Grid Infrastructure location permission
• Compare with a good node location
Troubleshooting Cluster start-up failures
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 40
Oracle High Availability Service (ohasd) start-up failures – troubleshooting common
causes
1. Enable Cluster auto start-up
• crsctl enable has|crs
• crsctl start crs/cluster
2. Put the following line in the respective OS files
• h1:3:respawn:/sbin/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
3. Restore or set permissions on the OLR
• Restore from the recent OLR backup, if the file is corrupted
• Reset appropriate permission on the local node for the file
4. Verify whether the ohasd agents are up or not – for unhealthy cluster status
• Set permission if needed
• If binaries are corrupted, restore them from a latest backup
5. Reset permissions or restore from the recent backups
6. Additionally, remove/rename the files from the /var/tmp/.oracle,
/usr/tmp/.oracle or /tmp/.oracle locations
Troubleshooting Cluster start-up failures
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 41
Cluster Synchronization Service (cssd) start-up failures – common causes
CRS-4530: Communications failure contacting Cluster Synchronization Services
daemon:
1. Verify the following:
• GPnP profile accessibility
• Voting disk files accessibility
• Check the underlying network (private network) for any connectivity
issues
2. Verify whether the daemon status on the OS
• ps –ef |grep ocssd.bin
Troubleshooting Cluster start-up failures
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 42
Cluster Synchronization Service (cssd) start-up failures – troubleshooting common
causes
1. Review the ocssd.log file to diagnose the issue:
• Review the ocssd.log file if the daemon is able to access the GPnP
profile
• Run, crsctl query css votedisk to verify whether the voting disk files
are accessible
• If Voting disk permissions are lost, reset them
• Resolve underlying network issues for any heart-beat issues and bring up
the interconnect resource:
./crsctl start res ora.cluster_interconnect.haip –init
2. Start the process manually
• Try to start the daemon process manually it is not up or unhealthy:
./crsctl start res ora.cssd –init
Troubleshooting Cluster start-up failures
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 43
Cluster Ready Service (crsd) start-up failures – common causes
CRS-4535: Cannot communicate with Cluster Ready Services:
1. Verify the following:
• Oracle Cluster Registry (OCR) accessibility
./ocrcheck
• Look for any Grid Home ownership and permission changes
• Check for the OCR mirror copy issues
• Verify and validate underlying network (private network)
2. Verify whether the daemon status on the OS
• ps –ef |grep crsd.bin
• crsctl stat res –t –init, look for ora.crsd status
3. Verify crsd agents
• ps –ef |grep oraagent|orarootagent
Troubleshooting Cluster start-up failures
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 44
Cluster Ready Service (crsd) start-up failures – troubleshooting common causes
1. Take the following action:
• Review the crsd.log file
• Take appropriate steps to resolve ownership, privilege issues on the OCR
files.
• Compare with the a good node, and restore the directory
• ./ocrcheck
• Verify and validate underlying network (private network)
2. Verify whether the daemon status on the OS
• Restart the process manually
./crsctl start res ora.crsd -init
3. Verify the following:
• ./crs_stat –t
4. Ensure sufficient free space available under the $GRID_HOME to avoid cluster
unhealthy issues.
Troubleshooting Cluster start-up failures
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 45
Troubleshooting other clusterware process
crsctl stat res –t –init
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE rac1 Started,STABLE
ora.cluster_interconnect.haip
1 ONLINE OFFLINE rac1 STABLE
ora.crsd
1 ONLINE OFFLINE rac1 STABLE
ora.cssd
1 ONLINE OFFLINE rac1 STABLE
ora.cssdmonitor
1 ONLINE UNKNOWN rac1 STABLE
ora.ctssd
1 ONLINE ONLINE rac1 ACTIVE:0,STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.drivers.acfs
1 ONLINE ONLINE rac1 STABLE
ora.evmd
1 ONLINE ONLINE rac1 STABLE
ora.gipcd
1 ONLINE ONLINE rac1 STABLE
ora.gpnpd
1 ONLINE ONLINE rac1 STABLE
ora.mdnsd
1 ONLINE ONLINE rac1 STABLE
ora.storage
1 ONLINE ONLINE rac1 STABLE
Troubleshooting Cluster start-up failures
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 46
Troubleshooting other clusterware process
./crsctl start res ora.cluster_interconnect.haip –init
./crsctl start res ora.cssd –init
The following output will be displayed at your screen:
CRS-2679: Attempting to clean 'ora.cssdmonitor' on 'rac1'
CRS-2681: Clean of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
Debugging/Tracing Cluster components
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 47
• Flexibility to modify the default tracing/logging levels for any Clusterware main and sub-
processes
• Range from 1 – 5, 0 value disables the tracing level
•ohasd.log file also writes the default trace levels message when Oracle High Availability
service daemon starts up on the local node
• crsctl get log {css|crs|evm} ALL – lists existing trace levels for the modules
• crsctl lsmodules – list the module detals
• crsctl lsmodules {css|crs|evm}
Debugging/Tracing Cluster components
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 48
Default trace levels:
./crsctl get log css all
Get CSSD Module: BCCM Log Level: 2
Get CSSD Module: CLSF Log Level: 0
Get CSSD Module: CLSINET Log Level: 0
Get CSSD Module: CSSD Log Level: 2
Get CSSD Module: GIPCBCCM Log Level: 2
Get CSSD Module: GIPCCM Log Level: 2
Get CSSD Module: GIPCGM Log Level: 2
Get CSSD Module: GIPCNM Log Level: 2
Get CSSD Module: GPNP Log Level: 1
Get CSSD Module: OLR Log Level: 0
Get CSSD Module: SKGFD Log Level: 0
Default moduels:
./crsctl lsmodules
Usage:
crsctl lsmodules
{mdns|gpnp|css|crf|crs|ctss|evm|gipc}
where
mdns multicast Domain Name Server
gpnp Grid Plug-n-Play Service
css Cluster Synchronization
Services
crf Cluster Health Monitor
crs Cluster Ready Services
ctss Cluster Time Synchronization
Service
evm EventManager
gipc Grid Interprocess
Communications
Debugging/Tracing Cluster components
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 49
The following enable various tracing levels:
./crsctl set log crs crsmain=3
./crsctl set log crs crsmain=3,crsevt=4
./crsctl set log crs all=5
./crsctl set log res ora.prddb.db:5
The following examples explains how to set tracing levels on the OS:
export ORA_CRSDEBUG_ALL=1 --sets debugging level 1 to all modules
export ORA_CRSDDEBUG_CRS=2 --sets debugging level 2 to CRS module
Debugging/Tracing Cluster components
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 50
The following disable tracing:
./crsctl set log crs crsmain=0
./ crsctl set log res ora.prddb.db:0
./ crsctl set log res ora.crs:0 -init
The following examples explains how to set tracing levels on the OS:
export ORA_CRSDEBUG_ALL=1 --sets debugging level 1 to all modules
export ORA_CRSDDEBUG_CRS=2 --sets debugging level 2 to CRS module
Tools & Utilities
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 51
Tools & Utilities - how to pick the right one
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 52
Tools & Utilities - Diagnostic Collection Script
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 53
Diagcollection.pl:
• Located under $GRID_HOME/bin location
• Is a tool that gathers required Clusterware diagnostic information in a bunch of trace
files from various resources: CRS logs, trace & core files, OCR data etc.
• Can collect diagnostic infromation at different layers and homes: Cluster, Oracle RDBMS,
Core , Oracle Base etc
• All the information will be then zipped into a few zip files
• Duration required to gather the information is directly propositional to the levels used
• Upload these files to My Oracle Support for issue investigation
Tools & Utilities - Diagnostic Collection Script
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 54
Examples:
./diagcollection.sh --collect –-chmos
./diagcollection.sh –-collect –-chmos –-incidenttime <timeperiod> --
incidentduration 05:00 (five hours report)
Alternative, you can use the following:
./oclumon dumpnodeview -allnodes -v -last "04:59:59 >/tmp/output.txt
./oclumon dumpnodeview -allnodes -v -s "2013-04-24 09:00" -e "2013-04-24
03:15:00
Output files
crsData_rac1_20121204_1103.tar.gz
ocrData_rac1 _20121204_1103.tar.gz
coreData_rac1 _20121204_1103.tar.gz
osData_rac1 _20121204_1103.tar.gz
Tools & Utilities - Cluster Health Monitor (CHM)
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 55
• Is a tool designed, developed to detect and analyze OS, Cluster resources failures etc.
• Formerly known as Instantaneous Problem Detector for OS(IPD/OS) .
• Pre 11gR2 versions, you need to download the tool from OTN.
• With 11gR2, it is the integral part of the software and integrated closely with GI.
• ora.crf CHM resource introduced | crsctl stat res –t -init
• Not available on some platforms.
• Can be used on RAC and non-RAC environments.
• Collects OS real-time (every second, 5 sec from 11203) statistics : memory, swap, I/O,
net work etc
• Stores real-time monitoring metrics in the CHM repository.
• Historical data can be used to diagnose: node eviction, instance hang, server perf. etc
• Contains two services:
System Monitoring Service (osysmond)
• runs on every node, monitor and collect OS metrics and send data to
OloggeredCluster Loger Service (ologgered)
•Stores the information received from the nodes in the respository
•Runs in one node as master service and standby service on other nodes
• CHM vs OSWatcher:
CHM takes less CPU, and less overhead on the node, OSWatcher doesn’t run
When the server CPU is heavily used
Tools & Utilities - Cluster Health Monitor (CHM)
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 56
• Consumes less than %5 CPU/core, minimal overhead on the server
• Takes 1GB space by default across all nodes.
• Apprx. 0.5GB data per day.
• Data can be kept for 3 days.
• ./oclumon manage –get repsize
• oclumon – a command-line tool, used to manage CHM repository
• Stores in a management repository database with 12c.
Tools & Utilities - OSWatcher Block Box (osbb)
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 57
• Is a tool captures OS performance metrics and stores in the statistical data in a file
• vmstat, netstat, top, tracerouts, ps, iostat etc
• Available on MOS.
• On RAC, need to configure, schedule on individual nodes
• Supports most UNIX/LINUX platforms.
• ./startoswbb.sh (default interval/retention, 30 sec/48 hrs)
• ./startoswbb.sh 60 10 (60 seconds interval, 10 hrs data retention)
• ./stoposwbb.sh
• Review the dat file in the /archive directory
Tools & Utilities - oratop
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 58
• Is a OS top-like utility on Linux Platforms
• Provides nearly real-time database monitoring capabilities for a RAC and Non-RAC dbs
for 11.2.0.3 or higher.
• A very light weight utility, consumes 0.20% memory and <1% CPU.
• Download the oratop.zip from MOS, set chmod 755.
• Db init parameters: statistics_level = TYPICAL, timed_statistics = TRUE must be set
• Need to input username/password, connects as system user when no credentials
provides.
• Set the following on the OS
$ ORACLE_UNQNAME=<dbname>
$ ORACLE_SID=<instance_name1>
$ ORACLE_HOME=<db_home>
$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib
$ export PATH=$ORACLE_HOME/bin:$PATH
• Download the oratop.zip from MOS, set chmod 755
• Needs a TNS to the source database
• Need privileges on, v_$SESSION, v_$SYSMETRIC, v_$INSTANCE, v_$PROCESS,
V_$SYSTEM_EVENT etc
Tools & Utilities - oratop
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 59
• Live window : lists 5 TOP wait events, top Oracle sessions of i/o, memory, db load,
• Provides database blocking details.
• Press q/Q or Control+C to abort
./oratop -i 5 / as sysdba - interval every 5 seconds
./oratop -i 5 username/password@tns_alias
Databases
Top 5 DB events
Processes
Header
• % db - (values > 99%)
• %CU - (load > 2 x cpu counts & host cpu > 99)
• HLD - (load > 2 * cpu counts and aas > cpu
counts)
• IORL - (value > 20ms)
• %FR - (value < 1%)
• ASW - (value = session counts, USN)
• AAS - (value > cpu counts)
• DBW - (value > 50%)
• EVENT – Active wait event
• PGA – potential unusual memory growth
• BLOCKER - a blocking session with (wait time >
5 minutes)
Tools & Utilities – RACcheck v.2.2.1
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 60
• A RAC configuration auditing utility that audits various important configuration settings:
Cluster, ASM, Grid Infrastructure etc
• Audits, OS Kernal parameters/Packages, 11.2.0.3 upgrade readiness etc
• Download the raccheck.zip from MOS, chmod to 755.
• Ability to compare between two outputs.
• All the recommendations/output written to a HTML file.
• The output include overall health check rating – out of 100, bug fixes, patch
recommendations etc.
• Upload the .zip file if MOS ask to do so.
Tools & Utilities – RACcheck v.2.2.1
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 61
Examples:
./raccheck – follow the interactive steps
./raccheck –u –o pre|post
./raccheck –h
./raccheck –s
./raccheck -diff report1 report2
Usage : ./raccheck [-abvhpfmsuSo:c:rt:]
-a All (Perform best practice check and recommended patch check)
-b Best Practice check only. No recommended patch check
-h Show usage
-v Show version
-p Patch check only
-m exclude checks for Maximum Availability Architecture
-u Run raccheck to check pre-upgrade or post-upgrade best
practices.-o pre or -o post is mandatory with -u option like ./raccheck -u -o pre
-f Run Offline.Checks will be performed on data already
-o Argument to an option. if -o is followed by
v,V,Verbose,VERBOSE or Verbose, it will print checks which
passs on the screen
if -o option is not specified,it will print only failures on
screen. for eg: raccheck -a -o v -r To include High availability best practices also in regular
healthcheck eg ./racchekck -r(not applicable for exachk) -c Pass specific module or component to check
best practice
for. By default it will check for components indentified fr
Tools & Utilities – Hang analysis/system state
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 62
• HANGANALYZE helps detecting the cause of database hang
• Advised to run the HANGANLYZE when a database suffers from hang, performance
degradation, latching issues etc
• Available since 8.1.6, provides cluster-wide analysis from 9i
Examples:
SQL> sqlplus " / as sysdba"
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug setinst all
SQL> oradebug –g def hanganalyze 3
-- wait 90 seconds
SQL> oradebug –g def hanganalyze 3
SQL> oradebug tracefile_name
SQL> exit
Tools & Utilities – Hang analysis/system state
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 63
HANGANLYZE Level:
10 Dump all processes
5 Level 4 + Dump all processes involved in wait chains (NLEAF state)
4 Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)
3 Level 2 + Dump only processes thought to be in a hang (IN_HANG state)
-- recommended
1-2 Only HANGANALYZE output, no process dump at all
Tools & Utilities – Hang analysis/system state
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 64
Review the trace file:
HANG ANALYZE (section)
CYCLES
list process dependencies for deadlock/hung state
BLOCKER OF MANY SESSIONS:
When a session block too many session, you will have this section, when a session block
10 or more sessions
STATE OF NODES | OPEN CHAINS | OTHER CHAINS
Tools & Utilities – Hang analysis/system state
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 65
Dumping system state
When a database in a complete hung state and you can’t connect to the database as / as
sysdba, when memory leaks are suspected, use the following:
sqlplus – prelim / as sysdba
SQL> oradebug setmypid
SQL> oradebug unlimit;
SQL> oradebug – g all dump systemstate 10|266
Wait for 60 seconds
SQL> oradebug -g all dump systemstate 10|266
Review/upload the trace file to MOS
References
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 66
RACcheck - RAC Configuration Audit Tool [ID 1268927.1]
Oracle Premier Support - Oracle Database Support News - Issue November, 2012 Volume 22 [ID 1513219.1
Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]
Oracle Clusterware CRSD OCSSD EVMD Log Rotation Policy [ID 557204.1]
CRS Diagnostic Data Gathering: A Summary of Common tools and their Usage [ID 783456.1]
Remote Diagnostic Agent (RDA) 4 - Getting Started [ID 314422.1]
Data Gathering for Troubleshooting Oracle Clusterware (CRS or GI) And Real Application Cluster (RAC) Issues [ID 289690.1]
A big thank you all
for
listening ...
Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 67
You can write me at sjaffarhussain@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

Oracle Failover Database Cluster with Grid Infrastructure 12c
Oracle Failover Database Cluster with Grid Infrastructure 12cOracle Failover Database Cluster with Grid Infrastructure 12c
Oracle Failover Database Cluster with Grid Infrastructure 12cTrivadis
 
Oracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and conceptOracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and conceptSantosh Kangane
 
Understanding Oracle RAC 11g Release 2 Internals
Understanding Oracle RAC 11g Release 2 InternalsUnderstanding Oracle RAC 11g Release 2 Internals
Understanding Oracle RAC 11g Release 2 InternalsMarkus Michalewicz
 
RAC Attack 12c Installation Instruction
RAC Attack 12c Installation InstructionRAC Attack 12c Installation Instruction
RAC Attack 12c Installation InstructionYury Velikanov
 
Migrating from Single Instance to RAC Data guard
Migrating from Single Instance to RAC Data guard Migrating from Single Instance to RAC Data guard
Migrating from Single Instance to RAC Data guard Fuad Arshad
 
MIgrating from Single Instance to RAC via Dataguard
MIgrating from Single Instance to RAC via DataguardMIgrating from Single Instance to RAC via Dataguard
MIgrating from Single Instance to RAC via DataguardFuad Arshad
 
Expert performance tuning tips for Oracle RAC
Expert performance tuning tips for Oracle RACExpert performance tuning tips for Oracle RAC
Expert performance tuning tips for Oracle RACSolarWinds
 
Exadata master series_asm_2020
Exadata master series_asm_2020Exadata master series_asm_2020
Exadata master series_asm_2020Anil Nair
 
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0Yury Velikanov
 
RAC-Installing your First Cluster and Database
RAC-Installing your First Cluster and DatabaseRAC-Installing your First Cluster and Database
RAC-Installing your First Cluster and DatabaseNikhil Kumar
 
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Markus Michalewicz
 
Oracle RAC 12c Best Practices Sanger OOW13 [CON8805]
Oracle RAC 12c Best Practices Sanger OOW13 [CON8805]Oracle RAC 12c Best Practices Sanger OOW13 [CON8805]
Oracle RAC 12c Best Practices Sanger OOW13 [CON8805]Markus Michalewicz
 
Collaborate 17 Oracle RAC 12cRel 2 Best Practices
Collaborate 17 Oracle RAC 12cRel 2 Best PracticesCollaborate 17 Oracle RAC 12cRel 2 Best Practices
Collaborate 17 Oracle RAC 12cRel 2 Best PracticesAnil Nair
 
MIgrating to RAC using Dataguard
MIgrating to RAC  using Dataguard MIgrating to RAC  using Dataguard
MIgrating to RAC using Dataguard Fuad Arshad
 
Presentation Template - NCOAUG Conference Presentation - 16 9
Presentation Template - NCOAUG Conference Presentation - 16 9Presentation Template - NCOAUG Conference Presentation - 16 9
Presentation Template - NCOAUG Conference Presentation - 16 9Mohamed Sadek
 
Oracle Clusterware and Private Network Considerations - Practical Performance...
Oracle Clusterware and Private Network Considerations - Practical Performance...Oracle Clusterware and Private Network Considerations - Practical Performance...
Oracle Clusterware and Private Network Considerations - Practical Performance...Guenadi JILEVSKI
 
A Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cA Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cLeighton Nelson
 
Oracle RAC 11g Release 2 Client Connections
Oracle RAC 11g Release 2 Client ConnectionsOracle RAC 11g Release 2 Client Connections
Oracle RAC 11g Release 2 Client ConnectionsMarkus Michalewicz
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slidesMohamed Farouk
 
Upgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 ClusterwareUpgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 ClusterwareNikhil Kumar
 

Was ist angesagt? (20)

Oracle Failover Database Cluster with Grid Infrastructure 12c
Oracle Failover Database Cluster with Grid Infrastructure 12cOracle Failover Database Cluster with Grid Infrastructure 12c
Oracle Failover Database Cluster with Grid Infrastructure 12c
 
Oracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and conceptOracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and concept
 
Understanding Oracle RAC 11g Release 2 Internals
Understanding Oracle RAC 11g Release 2 InternalsUnderstanding Oracle RAC 11g Release 2 Internals
Understanding Oracle RAC 11g Release 2 Internals
 
RAC Attack 12c Installation Instruction
RAC Attack 12c Installation InstructionRAC Attack 12c Installation Instruction
RAC Attack 12c Installation Instruction
 
Migrating from Single Instance to RAC Data guard
Migrating from Single Instance to RAC Data guard Migrating from Single Instance to RAC Data guard
Migrating from Single Instance to RAC Data guard
 
MIgrating from Single Instance to RAC via Dataguard
MIgrating from Single Instance to RAC via DataguardMIgrating from Single Instance to RAC via Dataguard
MIgrating from Single Instance to RAC via Dataguard
 
Expert performance tuning tips for Oracle RAC
Expert performance tuning tips for Oracle RACExpert performance tuning tips for Oracle RAC
Expert performance tuning tips for Oracle RAC
 
Exadata master series_asm_2020
Exadata master series_asm_2020Exadata master series_asm_2020
Exadata master series_asm_2020
 
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
 
RAC-Installing your First Cluster and Database
RAC-Installing your First Cluster and DatabaseRAC-Installing your First Cluster and Database
RAC-Installing your First Cluster and Database
 
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
 
Oracle RAC 12c Best Practices Sanger OOW13 [CON8805]
Oracle RAC 12c Best Practices Sanger OOW13 [CON8805]Oracle RAC 12c Best Practices Sanger OOW13 [CON8805]
Oracle RAC 12c Best Practices Sanger OOW13 [CON8805]
 
Collaborate 17 Oracle RAC 12cRel 2 Best Practices
Collaborate 17 Oracle RAC 12cRel 2 Best PracticesCollaborate 17 Oracle RAC 12cRel 2 Best Practices
Collaborate 17 Oracle RAC 12cRel 2 Best Practices
 
MIgrating to RAC using Dataguard
MIgrating to RAC  using Dataguard MIgrating to RAC  using Dataguard
MIgrating to RAC using Dataguard
 
Presentation Template - NCOAUG Conference Presentation - 16 9
Presentation Template - NCOAUG Conference Presentation - 16 9Presentation Template - NCOAUG Conference Presentation - 16 9
Presentation Template - NCOAUG Conference Presentation - 16 9
 
Oracle Clusterware and Private Network Considerations - Practical Performance...
Oracle Clusterware and Private Network Considerations - Practical Performance...Oracle Clusterware and Private Network Considerations - Practical Performance...
Oracle Clusterware and Private Network Considerations - Practical Performance...
 
A Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cA Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12c
 
Oracle RAC 11g Release 2 Client Connections
Oracle RAC 11g Release 2 Client ConnectionsOracle RAC 11g Release 2 Client Connections
Oracle RAC 11g Release 2 Client Connections
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slides
 
Upgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 ClusterwareUpgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 Clusterware
 

Ähnlich wie Managing troubleshooting cluster_360dgrees

Upgrading to Oracle 11gR2
Upgrading to Oracle 11gR2Upgrading to Oracle 11gR2
Upgrading to Oracle 11gR2Syed Hussain
 
Oracle 12c Multi Process Multi Threaded
Oracle 12c Multi Process Multi ThreadedOracle 12c Multi Process Multi Threaded
Oracle 12c Multi Process Multi ThreadedMarkus Flechtner
 
Presentation 12c grid_upgrade
Presentation 12c grid_upgradePresentation 12c grid_upgrade
Presentation 12c grid_upgradeJacques Kostic
 
Manual Tecnico OGG Oracle to MySQL
Manual Tecnico OGG Oracle to MySQLManual Tecnico OGG Oracle to MySQL
Manual Tecnico OGG Oracle to MySQLErick Vidbaz
 
OTN Tour 2014: Rac 11g vs 12c
OTN Tour 2014: Rac 11g vs 12cOTN Tour 2014: Rac 11g vs 12c
OTN Tour 2014: Rac 11g vs 12cDeiby Gómez
 
Oracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open WorldOracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open WorldPaul Marden
 
My sql fabric webinar tw2
My sql fabric webinar tw2My sql fabric webinar tw2
My sql fabric webinar tw2Ivan Tu
 
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...Nelson Calero
 
0396 oracle-goldengate-12c-tutorial
0396 oracle-goldengate-12c-tutorial0396 oracle-goldengate-12c-tutorial
0396 oracle-goldengate-12c-tutorialKlausePaulino
 
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)Kristofferson A
 
Extreme Replication - RMOUG Presentation
Extreme Replication - RMOUG PresentationExtreme Replication - RMOUG Presentation
Extreme Replication - RMOUG PresentationBobby Curtis
 
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...Zahid Anwar (OCM)
 
Database As A Service: OEM + ODA (OOW 15 Presentation)
Database As A Service: OEM + ODA (OOW 15 Presentation)Database As A Service: OEM + ODA (OOW 15 Presentation)
Database As A Service: OEM + ODA (OOW 15 Presentation)Bobby Curtis
 
Cloug Troubleshooting Oracle 11g Rac 101 Tips And Tricks
Cloug Troubleshooting Oracle 11g Rac 101 Tips And TricksCloug Troubleshooting Oracle 11g Rac 101 Tips And Tricks
Cloug Troubleshooting Oracle 11g Rac 101 Tips And TricksScott Jenner
 
MySQL InnoDB Cluster and Group Replication in a Nutshell
MySQL InnoDB Cluster and Group Replication in a NutshellMySQL InnoDB Cluster and Group Replication in a Nutshell
MySQL InnoDB Cluster and Group Replication in a NutshellFrederic Descamps
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014Philippe Fierens
 
Mastering Aurora PostgreSQL Clusters for Disaster Recovery
Mastering Aurora PostgreSQL Clusters for Disaster RecoveryMastering Aurora PostgreSQL Clusters for Disaster Recovery
Mastering Aurora PostgreSQL Clusters for Disaster RecoveryMydbops
 
Drupalcon2007 Sun
Drupalcon2007 SunDrupalcon2007 Sun
Drupalcon2007 Sunsmattoon
 

Ähnlich wie Managing troubleshooting cluster_360dgrees (20)

Upgrading to Oracle 11gR2
Upgrading to Oracle 11gR2Upgrading to Oracle 11gR2
Upgrading to Oracle 11gR2
 
Oracle 12c Multi Process Multi Threaded
Oracle 12c Multi Process Multi ThreadedOracle 12c Multi Process Multi Threaded
Oracle 12c Multi Process Multi Threaded
 
Presentation 12c grid_upgrade
Presentation 12c grid_upgradePresentation 12c grid_upgrade
Presentation 12c grid_upgrade
 
Manual Tecnico OGG Oracle to MySQL
Manual Tecnico OGG Oracle to MySQLManual Tecnico OGG Oracle to MySQL
Manual Tecnico OGG Oracle to MySQL
 
MySQL NoSQL APIs
MySQL NoSQL APIsMySQL NoSQL APIs
MySQL NoSQL APIs
 
OTN Tour 2014: Rac 11g vs 12c
OTN Tour 2014: Rac 11g vs 12cOTN Tour 2014: Rac 11g vs 12c
OTN Tour 2014: Rac 11g vs 12c
 
Oracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open WorldOracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open World
 
My sql fabric webinar tw2
My sql fabric webinar tw2My sql fabric webinar tw2
My sql fabric webinar tw2
 
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
 
0396 oracle-goldengate-12c-tutorial
0396 oracle-goldengate-12c-tutorial0396 oracle-goldengate-12c-tutorial
0396 oracle-goldengate-12c-tutorial
 
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
 
Extreme Replication - RMOUG Presentation
Extreme Replication - RMOUG PresentationExtreme Replication - RMOUG Presentation
Extreme Replication - RMOUG Presentation
 
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
 
Database As A Service: OEM + ODA (OOW 15 Presentation)
Database As A Service: OEM + ODA (OOW 15 Presentation)Database As A Service: OEM + ODA (OOW 15 Presentation)
Database As A Service: OEM + ODA (OOW 15 Presentation)
 
Cloug Troubleshooting Oracle 11g Rac 101 Tips And Tricks
Cloug Troubleshooting Oracle 11g Rac 101 Tips And TricksCloug Troubleshooting Oracle 11g Rac 101 Tips And Tricks
Cloug Troubleshooting Oracle 11g Rac 101 Tips And Tricks
 
MySQL InnoDB Cluster and Group Replication in a Nutshell
MySQL InnoDB Cluster and Group Replication in a NutshellMySQL InnoDB Cluster and Group Replication in a Nutshell
MySQL InnoDB Cluster and Group Replication in a Nutshell
 
NoSQL and MySQL
NoSQL and MySQLNoSQL and MySQL
NoSQL and MySQL
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
 
Mastering Aurora PostgreSQL Clusters for Disaster Recovery
Mastering Aurora PostgreSQL Clusters for Disaster RecoveryMastering Aurora PostgreSQL Clusters for Disaster Recovery
Mastering Aurora PostgreSQL Clusters for Disaster Recovery
 
Drupalcon2007 Sun
Drupalcon2007 SunDrupalcon2007 Sun
Drupalcon2007 Sun
 

Kürzlich hochgeladen

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 

Kürzlich hochgeladen (20)

Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 

Managing troubleshooting cluster_360dgrees

  • 1. Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 1 Managing & Troubleshooting Cluster - 360 degrees
  • 2. Disclaimer Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 2 This views/content in this slides are those of the author and do not necessarily reflect that of Oracle Corporation and/or its affiliates/subsidiaries. The material in this document is for informational purposes only and is published with no guarantee or warranty, express or implied.. This material should not be reproduced or used without the authors' written permission.
  • 3. About me Syed Jaffer Hussain Database Support Manager Over 20 years IT hands-on experience 14+ years as an Oracle DBA Technologist of the year, DBA 2011. Oracle ACE Director Oracle 10g Certified Master(OCM) Oracle 10g RAC Certified Expert OCP v8i,9i,10g & 11g ITIL v3 Foundation Certified Co-Authored - Oracle 11g R1/R2 Real Application Clusters Essentials - Expert Oracle RAC (in-progress) Twitter: @sjaffarhussain http://jaffardba.blogspot.com Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 3
  • 4. Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 4 A famous personality in the Oracle community once compared an Oracle DBA with a Pilot!!! Foreword
  • 5. Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 5 Any deployment (installation/upgrade) is a single shot Foreword
  • 6. Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 6 Where as Foreword
  • 7. Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 7 Administration and Troubleshooting is never ending Foreword
  • 8. What is covered o What's new in 11gR2 Clusterware – Key new features at a glance o Oracle 11gR2 Clusterware software stack o Clusterware start-up sequence o Cluster logs & directory tree structure o Analyzing Cluster logs o Cluster logs rotation/retention policy o Troubleshooting Cluster start-up failures o Debugging/Tracing Clusterware components o Tools & Utilities – how to pick the right one o References o Q & A Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 8
  • 9. Key new features at a glance Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 9 Oracle Grid Infrastructure • Clusterware and ASM binaries are installed together in a single home directory: Grid Home (GI)
  • 10. Key new features at a glance Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 10 • OCR can also be stored in ASM diskgroup • Upto five (05) copies of OCR files
  • 11. Key new features at a glance Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 11 Oracle Local Registry – OLR • Independent OLR copy for each node • Not shared between nodes • Stores local node configuration details required by OHASD • Configured upon installation/upgrade • Facilitates the CRS startup process when OCR/VD stored in ASM • ocrcheck -local • Located under $GRID_HOME/cdata/hostname/hostname.olr • $GRID_HOME/bin/ocrconfig –local –manualbakup/restore
  • 12. Key new features at a glance Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 12 • Voting Disk (files) can also be stored in ASM diskgroup • VD copies can’t resides in multiple ASM diskgroups
  • 13. Key new features at a glance Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 13 • crsctl start cluster –all -- starts cluster on all nodes • crsctl stop cluster –all -- stops cluster on all nodes • crsctl check cluster –all -- verify cluster health on all nodes Clusterized cluster-aware commands
  • 14. Key new features at a glance Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 14 Complete redesign of Cluster daemon in 11gR2, ohasd introduction Replaces RACG layer with Agents New Services Grid Plug and Play (GPnP) Cluster Time Synchronization Service (CTSS) Grid Name Service Cluster can be started in exclusive mode for maintenance purpose ./crsctl start res –t –init ---- list all Clusterware daemon resources
  • 15. Oracle 11gR2 Clusterware Stack Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 15 Oracle 11gR2 Clusterware Software Stacks Cluster Ready Services daemon(CRSD) 1 The upper stack
  • 16. Oracle 11gR2 Clusterware Stack Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 16 Oracle 11gR2 Clusterware Software Stack Cluster Ready Services daemon(CRS) Oracle High Availability Services daemon(OHASD) 1 2 The lower stack
  • 17. Oracle 11gR2 Clusterware Stack Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 17 Oracle 11gR2 Clusterware Software Stack Cluster Ready Services daemon(CRS) Oracle High Availability Services daemon(OHASD) 1 2 The upper stack
  • 18. Oracle 11gR2 Clusterware Stack Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 18 Oracle 11gR2 Clusterware Software Stack Cluster Ready Services Stack (CRS) Oracle High Availability Service Stack (OHASD) 1 2 The lower stack
  • 19. Oracle 11gR2 Clusterware Stack Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 19 Oracle High Availability Services daemon (ohasd) h1:3:respawn:/sbin/init.d/init.ohasd run >/dev/null 2>&1 </dev/null /etc/inittab entry point h1:3:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null Linux HPUX Oracle/RedHat Linux 6 /etc/init, /etc/init.d/init.ohasd run
  • 20. Clusterware Startup Sequence Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 20
  • 21. Clusterware Startup Sequence Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 21
  • 22. Clusterware Startup Sequence Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 22
  • 23. Clusterware Startup Sequence Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 23
  • 24. Clusterware Startup Sequence Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 24
  • 25. Clusterware Startup Sequence Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 25 Oracle High Availability Service Daemon (ohasd) Cluster Synchronization Service Daemon (cssd) Event Manager Daemon (evmd) Cluster Ready Service Daemon (crsd) Voting disk GPnP / OLR OCR
  • 26. Cluster logs & directory structure Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 26 alert<hostname>.log CRSD == crsd.log CSSD == ocssd.log OHASD == ohsad.log EVMD == evmd.log Operating System Logs GRID_HOME/log/host_name/
  • 27. Cluster logs & directory structure Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 27 • Writes Clusterware stack all important alerts messages • Posts Cluster stack start/stop messages • nodes eviction messages • OLR events • Voting and OCR disk related messages • Active nodes list • Preferably the first log file to review upon any cluster issues GRID_HOME/log/host_name/alterrac1.log
  • 28. Cluster logs & directory structure Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 28 GRID_HOME/log/host_name/alterrac1.log [ohasd(10937)]CRS-1301:Oracle High Availability Service started on node rac1. [cssd(19712)]CRS-1713:CSSD daemon is started in exclusive mode [ohasd(19506)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server ‘rac1’ [cssd(19951)]CRS-1605:CSSD voting file is online: /dev/rdsk/c0t13d2; details in /u00/app/11.2.0/grid_1/log/rac1/cssd/ocssd.log. 2013-04-23 16:42:28.906: [ CSSD][6]clssnmvFindInitialConfigs: No voting files found 2013-04-23 16:42:28.906: [ CSSD][6](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds [cssd(7945)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u00/app/11 .2.0/grid/log/usdbt01/cssd/ocssd.log [/u00/app/11.2.0/grid_1/bin/oraagent.bin(19914)]CRS-5815:Agent '/u00/app/11.2.0/grid_1/bin/oraagent_oracle' could not find any base type entry points for type 'ora.daemon.type'. Details at (:CRSAGF00108:) {0:9:2} in /u00/app/11.2.0/grid_1/log/usdbp01/agent/ohasd/oraagent_oracle/oraagent_oracle.log. [cssd(19951)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1,rac2. [cssd(3726)]CRS-1625:Node rac2, number 2, was manually shut down [cssd(3726)]CRS-1612:Network communication with node rac2 (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.145 seconds
  • 29. Cluster logs & directory structure Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 29 • Cluster Ready Services daemon (CRSD) maintains this log • Any cluster resources Start/stop/failure occurrence are written • Review the log when you have resources failures, unable to start etc • 2013-04-05 02:05:05.379: [ CRSPE][46] {13:24208:51894} Resource ora.prddb.db has been updated in the PE data model:6000000004f82610 • 2013-04-05 02:08:15.371: [ CRSPE][46] {0:33:627} State change received from rac1 for ora.prddb.db • 2013-04-05 02:08:22.698: [ CRSPE][46] {0:33:628} CRS-2672: Attempting to start 'ora.prddb.prddb_srv.svc' on ‘rac1‘ • 2013-04-23 17:32:46.340: [ OCRRAW][1]proprioini: all disks are not OCR/OLR formatted • 2013-04-23 17:32:46.340: [ OCRRAW][1]proprinit: Could not open raw device GRID_HOME/log/host_name/crsd.log
  • 30. Cluster logs & directory structure Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 30 • Cluster Synchronization daemon (CSSD) maintains this log • Busiest log file • Records node inter-communication messages • Heart beat missing and node eviction messages 2013-04-23 16:14:14.712: [ CSSD][6]clssnmvDiskVerify: Successful discovery of 0 disks 2013-04-23 16:14:14.712: [ CSSD][6]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery 2013-04-23 16:14:14.712: [ CSSD][6]clssnmvFindInitialConfigs: No voting files found 2013-04-23 16:14:14.713: [ CSSD][6](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds • 2013-04-17 11:33:23.286: [GIPCHALO][7] gipchaLowerProcessNode: bootstrap node considered dead because of idle connection time 600014 ms, node 60000000019645b0 { host ‘rac2', haName 'CSS_crs', srcLuid 8c78ad11-53582d91, dstLuid 2f6e4604-5e451051 numInf 1, contigSeq 6712447, lastAck 6697196, lastValidAck 6712447, sendSeq [6697202 : 6697202], createTime 2417887052, sentRegister 1, localMonitor 1, flags 0x808 } • 2013-04-20 15:13:18.953: [ CSSD][54]clssnmSendingThread: sending status msg to all nodes • 2013-04-20 15:13:18.954: [ CSSD][54]clssnmSendingThread: sent 4 status msgs to all nodes • 2013-04-12 17:35:55.351: [ CSSD][49]clssnmvReadDskHeartbeat: Reading DHBs to get the latest info for node rac1, 17 LATSvalid 0 uniqueness 1348227938 • [cssd(7335)]CRS-1612:Network communication with node rac2 (02) missing for 50% of timeout interval. Removal of this node from cluster in 14.397 seconds2013-03-15 17:02:44.964 [cssd(7335)]CRS-1611:Network communication with node rac2 (02) missing for 75% of timeout interval. Removal of this node from cluster in 7.317 seconds2013-03-15 17:02:50.024 [cssd(7335)]CRS-1610:Network communication with node rac2 (02) missing for 90% of timeout interval. Removal of this node from cluster in GRID_HOME/log/host_name/ocssd.log
  • 31. Cluster logs & directory structure Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 31 • Oracle High Availability Service (OHASD) maintains this log • High Availability service messages are written • Review the log when you have issues whist running root.sh/rootupgrd.sh • If the service unable to start or becomes unhealthy due to OLR problems • Loads default debugging levels 2013-04-17 11:32:47.096: [ default][1] OHASD Daemon Starting. Command string :reboot 2013-04-17 11:32:47.125: [ default][1] Initializing OLR 2013-04-17 11:32:47.255: [ OCRRAW][1]proprioo: for disk 0 (/u00/app/12.1.0/grid_1/cdata/rac2.olr), id match (1), total id sets, need recover (0), my votes (0), total votes (0), commit_lsn (3118), lsn (3118) 2013-04-17 11:32:47.368: [ default][1] Loading debug levels... 2013-04-17 11:32:47.803: [ clsdmt][13]Creating PID [6401] file for home /u00/app/12.1.0/grid_1 host usdbp10 bin ohasd to /u00/app/12.1.0/grid_1/ohasd/init/ GRID_HOME/log/host_name/ohasd.log
  • 32. Cluster logs & directory structure Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 32
  • 33. Cluster logs rotation/retention policy Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 33 Operating System logs HPUX - /var/adm/syslog/syslog.log AIX - /bin/errpt –a Linux - /var/log/messages Windows - Refer .TXT log files under Application/System log using Windows Event Viewer Solaris - /var/adm/messages
  • 34. Cluster logs rotation/retention policy Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 34 Managing Clusterware log files manually is not recommended...
  • 35. Cluster logs rotation/retention policy Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 35 They are governed and managed automatically ……
  • 36. Cluster logs rotation/retention policy Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 36 • Most Clusterware log files follow the 10x10 rule as part of automatic rotation/retention policy and governed automatically. • 10 copies of cssd.log files with 50M retained and rotated subsequently. • ohasd, evmd, crsd etc logs also retain 10 copies with 10M size. • The policy doesn’t applies to the alterHOSTNAME.log file.
  • 37. Troubleshooting Cluster start-up failures Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 37 $GRID_HOME/bin/crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
  • 38. Troubleshooting Cluster start-up failures Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 38 $GRID_HOME/bin/crsctl check crs $GRID_HOME/bin/crsctl check cluster CRS-4639: Could not contact Oracle High Availability Services CRS-4124: Oracle High Availability Services startup failed CRS-4000: Command Check failed, or completed with errors OR CRS-4535: Cannot communicate with Cluster Ready Services CRS-4530: Communications failure contacting Cluster Synchronization Services daemon CRS-4534: Cannot communicate with Event Manager
  • 39. Troubleshooting Cluster start-up failures Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 39 Oracle High Availability Service (ohasd) start-up failures – common causes CRS-4639: Could not contact Oracle High Availability Services CRS-4124: Oracle High Availability Services startup failed CRS-4000: Command Check failed, or completed with errors 1. Verify whether the cluster auto start-up is configured or not? • crsctl config has • /var/opt/oracle/scls_scr/hostname/root or /etc/oracle/scls_scr/hostname/root • Verify OS run level • Check whether ohasd daemon process is up or not: ps –ef |grep ohasd 2. Verify ohasd auto-start pointer in the /etc/init and /etc/inittab files ? • h1:3:respawn:/sbin/init.d/init.ohasd run >/dev/null 2>&1 </dev/null 3. Verify the OLR availability, corruption and accessibility on the local node • Review the ohasd.log file for more details 4. Verify whether the ohasd agents are up or not – for unhealthy cluster • ps –ef |grep oraagent|orarootagent|cssdagent|cssdmonitor • Review the ohasd.log file 5. Verify Grid Infrastructure location permission • Compare with a good node location
  • 40. Troubleshooting Cluster start-up failures Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 40 Oracle High Availability Service (ohasd) start-up failures – troubleshooting common causes 1. Enable Cluster auto start-up • crsctl enable has|crs • crsctl start crs/cluster 2. Put the following line in the respective OS files • h1:3:respawn:/sbin/init.d/init.ohasd run >/dev/null 2>&1 </dev/null 3. Restore or set permissions on the OLR • Restore from the recent OLR backup, if the file is corrupted • Reset appropriate permission on the local node for the file 4. Verify whether the ohasd agents are up or not – for unhealthy cluster status • Set permission if needed • If binaries are corrupted, restore them from a latest backup 5. Reset permissions or restore from the recent backups 6. Additionally, remove/rename the files from the /var/tmp/.oracle, /usr/tmp/.oracle or /tmp/.oracle locations
  • 41. Troubleshooting Cluster start-up failures Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 41 Cluster Synchronization Service (cssd) start-up failures – common causes CRS-4530: Communications failure contacting Cluster Synchronization Services daemon: 1. Verify the following: • GPnP profile accessibility • Voting disk files accessibility • Check the underlying network (private network) for any connectivity issues 2. Verify whether the daemon status on the OS • ps –ef |grep ocssd.bin
  • 42. Troubleshooting Cluster start-up failures Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 42 Cluster Synchronization Service (cssd) start-up failures – troubleshooting common causes 1. Review the ocssd.log file to diagnose the issue: • Review the ocssd.log file if the daemon is able to access the GPnP profile • Run, crsctl query css votedisk to verify whether the voting disk files are accessible • If Voting disk permissions are lost, reset them • Resolve underlying network issues for any heart-beat issues and bring up the interconnect resource: ./crsctl start res ora.cluster_interconnect.haip –init 2. Start the process manually • Try to start the daemon process manually it is not up or unhealthy: ./crsctl start res ora.cssd –init
  • 43. Troubleshooting Cluster start-up failures Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 43 Cluster Ready Service (crsd) start-up failures – common causes CRS-4535: Cannot communicate with Cluster Ready Services: 1. Verify the following: • Oracle Cluster Registry (OCR) accessibility ./ocrcheck • Look for any Grid Home ownership and permission changes • Check for the OCR mirror copy issues • Verify and validate underlying network (private network) 2. Verify whether the daemon status on the OS • ps –ef |grep crsd.bin • crsctl stat res –t –init, look for ora.crsd status 3. Verify crsd agents • ps –ef |grep oraagent|orarootagent
  • 44. Troubleshooting Cluster start-up failures Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 44 Cluster Ready Service (crsd) start-up failures – troubleshooting common causes 1. Take the following action: • Review the crsd.log file • Take appropriate steps to resolve ownership, privilege issues on the OCR files. • Compare with the a good node, and restore the directory • ./ocrcheck • Verify and validate underlying network (private network) 2. Verify whether the daemon status on the OS • Restart the process manually ./crsctl start res ora.crsd -init 3. Verify the following: • ./crs_stat –t 4. Ensure sufficient free space available under the $GRID_HOME to avoid cluster unhealthy issues.
  • 45. Troubleshooting Cluster start-up failures Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 45 Troubleshooting other clusterware process crsctl stat res –t –init Name Target State Server State details -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE ONLINE rac1 Started,STABLE ora.cluster_interconnect.haip 1 ONLINE OFFLINE rac1 STABLE ora.crsd 1 ONLINE OFFLINE rac1 STABLE ora.cssd 1 ONLINE OFFLINE rac1 STABLE ora.cssdmonitor 1 ONLINE UNKNOWN rac1 STABLE ora.ctssd 1 ONLINE ONLINE rac1 ACTIVE:0,STABLE ora.diskmon 1 OFFLINE OFFLINE STABLE ora.drivers.acfs 1 ONLINE ONLINE rac1 STABLE ora.evmd 1 ONLINE ONLINE rac1 STABLE ora.gipcd 1 ONLINE ONLINE rac1 STABLE ora.gpnpd 1 ONLINE ONLINE rac1 STABLE ora.mdnsd 1 ONLINE ONLINE rac1 STABLE ora.storage 1 ONLINE ONLINE rac1 STABLE
  • 46. Troubleshooting Cluster start-up failures Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 46 Troubleshooting other clusterware process ./crsctl start res ora.cluster_interconnect.haip –init ./crsctl start res ora.cssd –init The following output will be displayed at your screen: CRS-2679: Attempting to clean 'ora.cssdmonitor' on 'rac1' CRS-2681: Clean of 'ora.cssdmonitor' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1' CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'rac1' CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1' CRS-2672: Attempting to start 'ora.crsd' on 'rac1' CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
  • 47. Debugging/Tracing Cluster components Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 47 • Flexibility to modify the default tracing/logging levels for any Clusterware main and sub- processes • Range from 1 – 5, 0 value disables the tracing level •ohasd.log file also writes the default trace levels message when Oracle High Availability service daemon starts up on the local node • crsctl get log {css|crs|evm} ALL – lists existing trace levels for the modules • crsctl lsmodules – list the module detals • crsctl lsmodules {css|crs|evm}
  • 48. Debugging/Tracing Cluster components Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 48 Default trace levels: ./crsctl get log css all Get CSSD Module: BCCM Log Level: 2 Get CSSD Module: CLSF Log Level: 0 Get CSSD Module: CLSINET Log Level: 0 Get CSSD Module: CSSD Log Level: 2 Get CSSD Module: GIPCBCCM Log Level: 2 Get CSSD Module: GIPCCM Log Level: 2 Get CSSD Module: GIPCGM Log Level: 2 Get CSSD Module: GIPCNM Log Level: 2 Get CSSD Module: GPNP Log Level: 1 Get CSSD Module: OLR Log Level: 0 Get CSSD Module: SKGFD Log Level: 0 Default moduels: ./crsctl lsmodules Usage: crsctl lsmodules {mdns|gpnp|css|crf|crs|ctss|evm|gipc} where mdns multicast Domain Name Server gpnp Grid Plug-n-Play Service css Cluster Synchronization Services crf Cluster Health Monitor crs Cluster Ready Services ctss Cluster Time Synchronization Service evm EventManager gipc Grid Interprocess Communications
  • 49. Debugging/Tracing Cluster components Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 49 The following enable various tracing levels: ./crsctl set log crs crsmain=3 ./crsctl set log crs crsmain=3,crsevt=4 ./crsctl set log crs all=5 ./crsctl set log res ora.prddb.db:5 The following examples explains how to set tracing levels on the OS: export ORA_CRSDEBUG_ALL=1 --sets debugging level 1 to all modules export ORA_CRSDDEBUG_CRS=2 --sets debugging level 2 to CRS module
  • 50. Debugging/Tracing Cluster components Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 50 The following disable tracing: ./crsctl set log crs crsmain=0 ./ crsctl set log res ora.prddb.db:0 ./ crsctl set log res ora.crs:0 -init The following examples explains how to set tracing levels on the OS: export ORA_CRSDEBUG_ALL=1 --sets debugging level 1 to all modules export ORA_CRSDDEBUG_CRS=2 --sets debugging level 2 to CRS module
  • 51. Tools & Utilities Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 51
  • 52. Tools & Utilities - how to pick the right one Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 52
  • 53. Tools & Utilities - Diagnostic Collection Script Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 53 Diagcollection.pl: • Located under $GRID_HOME/bin location • Is a tool that gathers required Clusterware diagnostic information in a bunch of trace files from various resources: CRS logs, trace & core files, OCR data etc. • Can collect diagnostic infromation at different layers and homes: Cluster, Oracle RDBMS, Core , Oracle Base etc • All the information will be then zipped into a few zip files • Duration required to gather the information is directly propositional to the levels used • Upload these files to My Oracle Support for issue investigation
  • 54. Tools & Utilities - Diagnostic Collection Script Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 54 Examples: ./diagcollection.sh --collect –-chmos ./diagcollection.sh –-collect –-chmos –-incidenttime <timeperiod> -- incidentduration 05:00 (five hours report) Alternative, you can use the following: ./oclumon dumpnodeview -allnodes -v -last "04:59:59 >/tmp/output.txt ./oclumon dumpnodeview -allnodes -v -s "2013-04-24 09:00" -e "2013-04-24 03:15:00 Output files crsData_rac1_20121204_1103.tar.gz ocrData_rac1 _20121204_1103.tar.gz coreData_rac1 _20121204_1103.tar.gz osData_rac1 _20121204_1103.tar.gz
  • 55. Tools & Utilities - Cluster Health Monitor (CHM) Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 55 • Is a tool designed, developed to detect and analyze OS, Cluster resources failures etc. • Formerly known as Instantaneous Problem Detector for OS(IPD/OS) . • Pre 11gR2 versions, you need to download the tool from OTN. • With 11gR2, it is the integral part of the software and integrated closely with GI. • ora.crf CHM resource introduced | crsctl stat res –t -init • Not available on some platforms. • Can be used on RAC and non-RAC environments. • Collects OS real-time (every second, 5 sec from 11203) statistics : memory, swap, I/O, net work etc • Stores real-time monitoring metrics in the CHM repository. • Historical data can be used to diagnose: node eviction, instance hang, server perf. etc • Contains two services: System Monitoring Service (osysmond) • runs on every node, monitor and collect OS metrics and send data to OloggeredCluster Loger Service (ologgered) •Stores the information received from the nodes in the respository •Runs in one node as master service and standby service on other nodes • CHM vs OSWatcher: CHM takes less CPU, and less overhead on the node, OSWatcher doesn’t run When the server CPU is heavily used
  • 56. Tools & Utilities - Cluster Health Monitor (CHM) Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 56 • Consumes less than %5 CPU/core, minimal overhead on the server • Takes 1GB space by default across all nodes. • Apprx. 0.5GB data per day. • Data can be kept for 3 days. • ./oclumon manage –get repsize • oclumon – a command-line tool, used to manage CHM repository • Stores in a management repository database with 12c.
  • 57. Tools & Utilities - OSWatcher Block Box (osbb) Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 57 • Is a tool captures OS performance metrics and stores in the statistical data in a file • vmstat, netstat, top, tracerouts, ps, iostat etc • Available on MOS. • On RAC, need to configure, schedule on individual nodes • Supports most UNIX/LINUX platforms. • ./startoswbb.sh (default interval/retention, 30 sec/48 hrs) • ./startoswbb.sh 60 10 (60 seconds interval, 10 hrs data retention) • ./stoposwbb.sh • Review the dat file in the /archive directory
  • 58. Tools & Utilities - oratop Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 58 • Is a OS top-like utility on Linux Platforms • Provides nearly real-time database monitoring capabilities for a RAC and Non-RAC dbs for 11.2.0.3 or higher. • A very light weight utility, consumes 0.20% memory and <1% CPU. • Download the oratop.zip from MOS, set chmod 755. • Db init parameters: statistics_level = TYPICAL, timed_statistics = TRUE must be set • Need to input username/password, connects as system user when no credentials provides. • Set the following on the OS $ ORACLE_UNQNAME=<dbname> $ ORACLE_SID=<instance_name1> $ ORACLE_HOME=<db_home> $ export LD_LIBRARY_PATH=$ORACLE_HOME/lib $ export PATH=$ORACLE_HOME/bin:$PATH • Download the oratop.zip from MOS, set chmod 755 • Needs a TNS to the source database • Need privileges on, v_$SESSION, v_$SYSMETRIC, v_$INSTANCE, v_$PROCESS, V_$SYSTEM_EVENT etc
  • 59. Tools & Utilities - oratop Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 59 • Live window : lists 5 TOP wait events, top Oracle sessions of i/o, memory, db load, • Provides database blocking details. • Press q/Q or Control+C to abort ./oratop -i 5 / as sysdba - interval every 5 seconds ./oratop -i 5 username/password@tns_alias Databases Top 5 DB events Processes Header • % db - (values > 99%) • %CU - (load > 2 x cpu counts & host cpu > 99) • HLD - (load > 2 * cpu counts and aas > cpu counts) • IORL - (value > 20ms) • %FR - (value < 1%) • ASW - (value = session counts, USN) • AAS - (value > cpu counts) • DBW - (value > 50%) • EVENT – Active wait event • PGA – potential unusual memory growth • BLOCKER - a blocking session with (wait time > 5 minutes)
  • 60. Tools & Utilities – RACcheck v.2.2.1 Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 60 • A RAC configuration auditing utility that audits various important configuration settings: Cluster, ASM, Grid Infrastructure etc • Audits, OS Kernal parameters/Packages, 11.2.0.3 upgrade readiness etc • Download the raccheck.zip from MOS, chmod to 755. • Ability to compare between two outputs. • All the recommendations/output written to a HTML file. • The output include overall health check rating – out of 100, bug fixes, patch recommendations etc. • Upload the .zip file if MOS ask to do so.
  • 61. Tools & Utilities – RACcheck v.2.2.1 Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 61 Examples: ./raccheck – follow the interactive steps ./raccheck –u –o pre|post ./raccheck –h ./raccheck –s ./raccheck -diff report1 report2 Usage : ./raccheck [-abvhpfmsuSo:c:rt:] -a All (Perform best practice check and recommended patch check) -b Best Practice check only. No recommended patch check -h Show usage -v Show version -p Patch check only -m exclude checks for Maximum Availability Architecture -u Run raccheck to check pre-upgrade or post-upgrade best practices.-o pre or -o post is mandatory with -u option like ./raccheck -u -o pre -f Run Offline.Checks will be performed on data already -o Argument to an option. if -o is followed by v,V,Verbose,VERBOSE or Verbose, it will print checks which passs on the screen if -o option is not specified,it will print only failures on screen. for eg: raccheck -a -o v -r To include High availability best practices also in regular healthcheck eg ./racchekck -r(not applicable for exachk) -c Pass specific module or component to check best practice for. By default it will check for components indentified fr
  • 62. Tools & Utilities – Hang analysis/system state Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 62 • HANGANALYZE helps detecting the cause of database hang • Advised to run the HANGANLYZE when a database suffers from hang, performance degradation, latching issues etc • Available since 8.1.6, provides cluster-wide analysis from 9i Examples: SQL> sqlplus " / as sysdba" SQL> oradebug setmypid SQL> oradebug unlimit SQL> oradebug setinst all SQL> oradebug –g def hanganalyze 3 -- wait 90 seconds SQL> oradebug –g def hanganalyze 3 SQL> oradebug tracefile_name SQL> exit
  • 63. Tools & Utilities – Hang analysis/system state Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 63 HANGANLYZE Level: 10 Dump all processes 5 Level 4 + Dump all processes involved in wait chains (NLEAF state) 4 Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state) 3 Level 2 + Dump only processes thought to be in a hang (IN_HANG state) -- recommended 1-2 Only HANGANALYZE output, no process dump at all
  • 64. Tools & Utilities – Hang analysis/system state Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 64 Review the trace file: HANG ANALYZE (section) CYCLES list process dependencies for deadlock/hung state BLOCKER OF MANY SESSIONS: When a session block too many session, you will have this section, when a session block 10 or more sessions STATE OF NODES | OPEN CHAINS | OTHER CHAINS
  • 65. Tools & Utilities – Hang analysis/system state Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 65 Dumping system state When a database in a complete hung state and you can’t connect to the database as / as sysdba, when memory leaks are suspected, use the following: sqlplus – prelim / as sysdba SQL> oradebug setmypid SQL> oradebug unlimit; SQL> oradebug – g all dump systemstate 10|266 Wait for 60 seconds SQL> oradebug -g all dump systemstate 10|266 Review/upload the trace file to MOS
  • 66. References Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 66 RACcheck - RAC Configuration Audit Tool [ID 1268927.1] Oracle Premier Support - Oracle Database Support News - Issue November, 2012 Volume 22 [ID 1513219.1 Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1] Oracle Clusterware CRSD OCSSD EVMD Log Rotation Policy [ID 557204.1] CRS Diagnostic Data Gathering: A Summary of Common tools and their Usage [ID 783456.1] Remote Diagnostic Agent (RDA) 4 - Getting Started [ID 314422.1] Data Gathering for Troubleshooting Oracle Clusterware (CRS or GI) And Real Application Cluster (RAC) Issues [ID 289690.1]
  • 67. A big thank you all for listening ... Presented by : Syed Jaffer Hussain RedGate/AllThingsOracle Slide # 67 You can write me at sjaffarhussain@gmail.com