Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

RAC Diagnostics

982 Aufrufe

Veröffentlicht am

RAC Diagnostics with OraCHK, CHM and TFA

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

RAC Diagnostics

  1. 1. BASLE BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH RAC Diagnostics .. with OraCHK, CHM and TFA Markus Flechtner Senior Consultant
  2. 2. Our company. RAC Diagnostics2 30.11.15 Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interactive operation of your IT systems. O P E R A T I O N
  3. 3. COPENHAGEN MUNICH LAUSANNE BERN ZURICH BRUGG GENEVA HAMBURG DÜSSELDORF FRANKFURT STUTTGART FREIBURG BASLE VIENNA With over 600 specialists and IT experts in your region. RAC Diagnostics3 30.11.15 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 / EUR 4 million Financially self-supporting and sustainably profitable Experience from more than 1,900 projects per year at over 800 customers
  4. 4. About Markus Flechtner   Senior Consultant, Trivadis, Duesseldorf/Germany, since April 2008   Discipline Infrastructure Database @Trivadis   Working with Oracle since the 1990’s –  Development (Forms, Reports, PL/SQL) –  Support –  Database Administration   Focus –  Oracle Real Application Clusters –  Database Migration Projects   Teacher –  O-RAC – Oracle Real Application Clusters –  O-NF12CDBA – Oracle 12c New Features for the DBA Blog: http://markusdba.de/ @markusdba RAC Diagnostics4 30.11.15
  5. 5. Our database doctors ..   Dr. ORAchk –  Regular screening examination   Dr. CHM –  Electrocardiogram (ECG)   Dr. TFA –  In case of emergency RAC Diagnostics5 30.11.15
  6. 6. Oracle Support Tools Bundle   Collection of database and RAC support tools   Includes: –  ORAchk –  ExaChk (*) – like OraChk, but for Engineered Systems –  OSWatcher –  ProcWatcher (*) – tool to examine and monitor Oracle database and/or clusterware processes –  ORATOP (*) - near real-time monitoring of databases –  SQLT (*) – helps in tuning SQL statements –  DARDA (*) - Diagnostic Assistant - interface for other diagnostic tools   Integrated in TFA collector since release 12.1.2.3.0 –  Tools can be downloaded separately, too (*) not covered by this talk RAC Diagnostics6 30.11.15
  7. 7. Running the tools from TFA collector oracle> /u00/app/oracle/tools/tfa/bin/tfactl toolstatus .---------------------------------------. | External Support Tools | +----------+--------------+-------------+ | Host | Tool | Status | +----------+--------------+-------------+ | dbserver | alertsummary | DEPLOYED | | dbserver | darda | DEPLOYED | | dbserver | oratop | DEPLOYED | | dbserver | rgrep | DEPLOYED | | dbserver | exachk | DEPLOYED | | dbserver | orachk | DEPLOYED | | dbserver | oswbb | RUNNING | | dbserver | sqlt | DEPLOYED | | dbserver | prw | NOT RUNNING | | dbserver | tail | DEPLOYED | | dbserver | ncdvi | DEPLOYED | '----------+--------------+-------------' oracle> /u00/app/oracle/tools/tfa/bin/tfactl run alertsummary RAC Diagnostics7 30.11.15
  8. 8. Agenda RAC Diagnostics8 30.11.15 1.  ORAchk 2.  Cluster Health Monitor (CHM) 3.  Trace File Analyzer (TFA) Collector 4.  Other tools
  9. 9. RAC Diagnostics9 30.11.15 OraChk
  10. 10. ORAchk – Purpose & History   Available since July 2011   Oracle Configuration Audit Tool   Formerly known as "RACCheck"   Supported on Unix, Linux and Windows (Cygwin/Standalone version)   Checks your installation against more than 7.000 Oracle Best Practices   Results can be stored in a database RAC Diagnostics10 30.11.15
  11. 11. ORAchk – Not a RAC or database tool only ORAchk includes checks for –  Oracle Database (Single Instance + RAC) –  MAA Validation –  Upgrade Readiness –  Golden Gate –  Enterprise Manager 12c Cloud Control –  E-Business Suite –  Oracle Sun Server RAC Diagnostics11 30.11.15
  12. 12. ORAchk - Installation Clusterware 11.2.0.4 and 12.1.0.2 –  Installed with the software (into $ORACLE_HOME/suptools/orachk) –  So far not updated with the PSUs L   For older versions –  Install TFA Collector 12.1.2.3.0 or higher –  Download ORAchk via MOS 1268927.2 RAC Diagnostics12 30.11.15
  13. 13. ORAchk – Basic Command Line Options Option Meaning -a Run all Checks -b Best Practice Check only -p Patch Check Only -u –o pre|post Pre or Post Upgrade Checks -dbnames run for a subset of databases only -clusternodes run for a subset of nodes only -h Help on all available parameters (long list) RAC Diagnostics13 30.11.15
  14. 14. ORAchk – Sample Output (1) – at runtime ORAchk checks O/S, clusterware and databases on all nodes     Result: ZIP-File and HTML-Report RAC Diagnostics14 30.11.15
  15. 15. ORAchk – Sample Output (2) – final HTML report RAC Diagnostics15 30.11.15
  16. 16. ORAchk – Sample Output (3) – final HTML report RAC Diagnostics16 30.11.15
  17. 17. ORAchk – Advanced Command Line Options Option Meaning -diff Compare 2 reports -d Manage ORAchk daemon -profile Run for specific components or applications like: •  ASM •  Clusterware •  EBS •  MAA •  Goldengate •  Enterprise Manager 12c .. And more RAC Diagnostics17 30.11.15
  18. 18. ORAchk – Collection Manager (1) ORAchk results can be stored in a repository database   Collection Manager is a GUI for the repository database   APEX application (4.2.0 or higher) –  Import.sql is delivered with ORAchk software   Installation –  Create database user for ORAchk –  create 3 tables (see Appendix F of the OraChk Users Guide) –  Install APEX application RAC Diagnostics18 30.11.15
  19. 19. ORAchk – Collection Manager (2)   Set environment   Run ORAchk –  If the environment is set, then the data will be inserted into the repository database export RAT_UPLOAD_CONNECT_STRING="(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP) (HOST=dbserver)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED) (SERVICE_NAME=EMREP)))" export RAT_UPLOAD_TABLE=auditcheck_result export RAT_PATCH_UPLOAD_TABLE=auditcheck_patch_result export RAT_ZIP_UPLOAD_TABLE=RCA13_DOCS export RAT_UPLOAD_USER=orachk export RAT_UPLOAD_PASSWORD=orachk export RAT_UPLOAD_ORACLE_HOME=/u00/app/oracle/product/11.2.0.4 RAC Diagnostics19 30.11.15
  20. 20. ORAchk – Collection Manager (3) – some screenshots RAC Diagnostics20 30.11.15
  21. 21. ORAchk – Collection Manager (4) – some screenshots RAC Diagnostics21 30.11.15
  22. 22. ORAchk – Collection Manager (5) – some screenshots RAC Diagnostics22 30.11.15
  23. 23. ORAchk – Collection Manager (6) – some screenshots RAC Diagnostics23 30.11.15
  24. 24. RAC Diagnostics24 30.11.15 Cluster Health Monitor
  25. 25. Cluster Health Monitor (CHM)   Available since 11.2.0.2   Collects OS information of the cluster nodes –  CPU load –  Memory –  Top Processes –  File Systems –  System information   Components –  sysmond (on every cluster node) –  loggerd   Cluster Resource crf RAC Diagnostics25 30.11.15
  26. 26. Cluster Health Monitor (CHM) – CLI oclumon grid@rac1node1:~/ oclumon –h For help in interactive mode : <verb> -h Currently supported verbs are : dumpnodeview, manage, version, debug, analyze, quit, exit, and help Option Dumpnodeview Shows collected data (for specific nodes and/or a specific timewindow Manage Manages the CHM repository and show Version Shows version information Debug Debugs CHM components Analyze Deprecated, will be ignored RAC Diagnostics26 30.11.15
  27. 27. Cluster Health Monitor (CHM) – CLI show data grid@rac1node1:~/ [grid12102] oclumon dumpnodeview dumpnodeview: Node name not given. Querying for the local host ---------------------------------------- Node: rac1node1 Clock: '15-02-22 18.05.43 ' SerialNo:1440 ---------------------------------------- SYSTEM: #pcpus: 1 #vcpus: 2 cpuht: N chipname: Intel(R) cpu: 20.59 cpuq: 0 physmemfree: 393676 physmemtotal: 4958228 mcache: 2506540 swapfree: 3956548 swaptotal: 3964924 hugepagetotal: 0 hugepagefree: 0 hugepagesize: 2048 ior: 156 iow: 78 ios: 32 swpin: 0 swpout: 0 pgin: 155 pgout: 59 netr: 102.554 netw: 75.683 procs: 323 procsoncpu: 2 rtprocs: 13 rtprocsoncpu: N/A #fds: 20704 #sysfdlimit: 6815744 #disks: 9 #nics: 4 nicErrors: 0 TOP CONSUMERS: topcpu: 'mdb_vktm_-mgmtd(5402) 4.39' topprivmem: 'java(2046) 171088' topshm: 'ora_mman_raccdb(5479) 300808' topfd: 'oraagent.bin(4891) 251' topthread: 'console-kit- dae(3254) 64' [..] RAC Diagnostics27 30.11.15
  28. 28. Cluster Health Monitor (CHM) – -MGMTDB (1)   In Oracle 12c CHM data is stored in the Grid Infrastructure Management Repository (GIMR), SID=-MGMTDB –  Mandatory with 12.1.0.2 –  Single instance database –  Multitenant database with 12.1.0.2 (PDB-name = clustername) –  Basic installation needs about 5 GB in the diskgroup with OCR and voting files –  Additional listener MGMTLSNR   Required size depends on number of nodes and retention time –  About 1,3 GB + 500 MB/node –  Check and configure with "oclumon" RAC Diagnostics28 30.11.15
  29. 29. Cluster Health Monitor (CHM) – -MGMTDB (2) - Tools mgmtca (for initial configuration only) Srvctl oclumon –  Oracle recommends a retention time of 72 h ( = 259200 seconds) grid@rac1node2:~/ oclumon manage -h Manage verb usage ================= manage -repos {checkretentiontime <time> | changerepossize <memsize>} | - get {<key1> [<key2> ...] | alllogger [-details] | mylogger [-details]} .. grid@rac1node2:~/ oclumon manage -repos checkretentiontime 259200 The Cluster Health Monitor repository is too small for the desired retention. Please first resize the repository to 5844 MB RAC Diagnostics29 30.11.15
  30. 30. Cluster Health Monitor (CHM) – EM 12c Cloud Control   CHM data can be displayed in EM 12c Cloud Control RAC Diagnostics30 30.11.15
  31. 31. Cluster Health Monitor (CHM) – Memory Guard   Evaluates the memory usage on the cluster nodes based on data collected by Cluster Health Monitor (CHM)   Automatically stops database services (transactional) in case of memory pressure on a cluster node –  .. or even kills database sessions   .. and automatically reactivates the services when enough memory is available   Starting with Oracle12.1.0.2 Memory Guard is automatically activated RAC Diagnostics31 30.11.15
  32. 32. RAC Diagnostics32 30.11.15 Trace File Analyzer (TFA) Collector
  33. 33. Real life experience ..   26 node cluster –  5 databases   Strange ASM issue   Oracle Support requested –  Clusterware logs –  ASM alert.logs –  Database alert.logs For each of the 26 servers!! RAC Diagnostics33 30.11.15
  34. 34. Trace File Analyzer Collector   Initial release in January 2013   Current version 12.1.2.3.1   Collects trace and log files and system information from all nodes into a cluster with a single command initiated on one cluster node   Centralized output   Real-time scanning for specific error messages possible –  è Automatic collection of diagnostic information   Included in Clusterware 11.2.0.4 and 12.1.0.2   For other versions (10.2 or higher): –  Download from MOS: 1513912.1 –  RAC and DB Support Tools Bundle is included in current TFA package RAC Diagnostics34 30.11.15
  35. 35. TFA Collector – Installation   For Clusterware 11.2.0.4 and 12.1.0.2: No additional installation required   For older versions: [root@rac1node1 tmp]# ./installTFALite.sh Starting TFA installation Enter a location for installing TFA [/tmp]: /u00/app/oracle Checking for available space in /u00/app/oracle Enter a Java Home that contains Java 1.6 or later : /usr/java/jre1.7.0_13 Running Auto Setup for TFA as user root… Would you like to do a [L]ocal only or [C]lusterwide installation ? [L|l|C|c] [C] : C The following installation requires temporary use of SSH. If SSH is not configured already then we will remove SSH when complete. Do you wish to Continue ? [Y|y|N|n] [N] y Installing TFA at /u00/app/oracle in all hosts Discovering Nodes and Oracle resources Checking whether CRS is up and running .. RAC Diagnostics35 30.11.15
  36. 36. TFA Collector – Architecture   JAVA-based tool   TFA-daemon “TFAMain” running on all cluster nodes 
 
 
   Data Storage –  File-Repository for Diagnostic Information –  Berkeley Database for metadata, file inventory, event history, etc.   Command Line Interface –  tfactl (perl) –  Communication with daemon using secure sockets oracle@rac1node1:~/ [rdbms12102] ps -ef |grep tfa |grep –v grep root 2325 1 0 10:14 ? 00:00:03 /bin/sh /etc/init.d/init.tfa run root 3631 1 0 10:16 ? 00:05:10 /u00/app/grid/product/12.1.0.2/jdk/jre/bin/java – [..] oracle.rat.tfa.TFAMain /u00/app/grid/product/12.1.0.2/tfa/rac1node1/tfa_home RAC Diagnostics36 30.11.15
  37. 37. TFA Collector – Commands (1) – Command Overview oracle@rac1node1:/home/grid/ $ORACLE_HOME/tfa/bin/tfactl Usage : /u00/app/grid/product/12.1.0.2/bin/tfactl <command> [options] <command> = print Print requested details analyze List events summary and search strings in alert logs. diagcollect Collect logs from across nodes in cluster collection Manage TFA collections directory Add or Remove or Modify directory in TFA toolstatus Prints the status of TFA Support Tools run <tool> Run the desired support tool start <tool> Starts the desired support tool stop <tool> Stops the desired support tool restart <tool> Restarts the desired support tool For help with a command: /oracle/u00/app/oracle/tools/tfa/bin/tfactl <command> -help RAC Diagnostics37 30.11.15
  38. 38. TFA Collector – Commands (2) – commands for root   Configuration tasks must be done by root   The following additional commands are available: <command> = start Starts TFA stop Stops TFA enable Enable TFA Auto restart disable Disable TFA Auto restart access Add or Remove or List TFA Users and Groups purge Delete collections from TFA repository directory Add or Remove or Modify directory in TFA host Add or Remove host in TFA set Turn ON/OFF or Modify various TFA features uninstall Uninstall TFA from this node diagnosetfa Collect TFA Diagnostics .. RAC Diagnostics38 30.11.15
  39. 39. TFA Collector – Commands (3) – print config root@rac1node1:/home/grid/ $ORACLE_HOME/tfa/bin/tfactl print config +--------------------------------------------+------------+ | Configuration Parameter | Value | +---------------------------------------------+------------+ | TFA version | 12.1.2.3.1 | | Automatic diagnostic collection | OFF | | Trimming of files during diagcollection | ON | | Repository current size (MB) | 7 | | Repository maximum size (MB) | 10240 | | Inventory Trace level | 1 | | Collection Trace level | 1 | | Scan Trace level | 1 | | Other Trace level | 1 | | Max Size of TFA Log (MB) | 50 | | Max Number of TFA Logs | 10 | | Max Size of Core File (MB) | 20 | | Max Collection Size of Core Files (MB) | 200 | | Automatic Purging | ON | | Minimum Age of Collections to Purge (Hours) | 12 | '---------------------------------------------+------------' RAC Diagnostics39 30.11.15
  40. 40. TFA Collector – Commands (4) - diagcollect   Collects trace and log files from the cluster nodes grid@rac1node1:~/ [grid12102] $ORACLE_HOME/tfa/bin/tfactl diagcollect Collecting data for the last 4 hours for all components... Collecting data for all nodes Repository Location in rac1node1 : /u00/app/oracle/tfa/repository 2015/02/21 20:28:24 CET : Running an inventory clusterwide ... 2015/02/21 20:28:24 CET : Collection Name : tfa_Sat_Feb_21_20_28_06_CET_2015.zip 2015/02/21 20:28:24 CET : Sending diagcollect request to host : rac1node2 2015/02/21 20:28:24 CET : Sending diagcollect request to host : rac1node3 .. Logs are being collected to: /u00/app/oracle/tfa/repository/collection_Sat_Feb_21_20_28_06_CET_2015_node_all/ rac1node1.tfa_Sat_Feb_21_20_28_06_CET_2015.zip /u00/app/oracle/tfa/repository/collection_Sat_Feb_21_20_28_06_CET_2015_node_all/ rac1node2.tfa_Sat_Feb_21_20_28_06_CET_2015.zip /u00/app/oracle/tfa/repository/collection_Sat_Feb_21_20_28_06_CET_2015_node_all/ rac1node3.tfa_Sat_Feb_21_20_28_06_CET_2015.zip RAC Diagnostics40 30.11.15
  41. 41. TFA Collector – Commands (5) - diagcollect   Which data is collected by default? –  alert.log from all databases –  ASM log files –  listener.log files –  Clusterware logs –  CHM information –  Patch information   Components, node list and time window can be specified   Automatic diagnostic collection –  Tfa scans the alert.log files and runs "diagcollect" automatically –  Disabled by default root@rac1node1:~/ tfactl set autodiagcollect=<ON|OFF> [-c] RAC Diagnostics41 30.11.15
  42. 42. TFA Collector – Commands (6) - analyze   Checks system log files and Oracle log files on all nodes root@rac1node1:~/ [grid12102] $ORACLE_HOME/tfa/bin/tfactl analyze INFO: analyzing all (Alert and Unix System Logs) logs for the last 60 minutes... Please wait... INFO: analyzing host: rac1node1 Report title: Analysis of Alert,System Logs Report date range: last ~1 hour(s) Report (default) time zone: CET - Central European Time Analysis started at: 21-Feb-2015 09:02:34 PM CET [..] Message types for last ~1 hour(s) Occurrences percent server name type ----------- ------- -------------------- ----- 2 66.7% rac1node1 WARNING 1 33.3% rac1node1 generic [..] RAC Diagnostics42 30.11.15
  43. 43. RAC Diagnostics43 30.11.15 Other Tools
  44. 44. Cluster Verification Utility (cluvfy) (1)   Not only a tool for checking installation requirements J   Can check the integrity of all cluster components (OCR, OLR, ohasd, …)   Can run a healthcheck for cluster and databases   Can collect and compare configuration baselines   Output: Text or HTML RAC Diagnostics44 30.11.15
  45. 45. Cluster Verification Utility (cluvfy) (2) - Healthcheck grid@rac1node1: $ORACLE_HOME/bin/cluvfyrac.sh comp healthcheck -bestpractice Verifying OS Best Practice [..] ****************************************************************************************** Clusterware recommendations ****************************************************************************************** Verification Check : CSS misscount parameter Verification Description : Checks if the CSS misscount is set correctly on the system Verification Result : PASSED Verification Summary : Check for CSS misscount parameter passed Additional Details : The CSS misscount parameter represents the maximum time, in seconds, that a network heartbeat can be missed before entering into a cluster reconfiguration to evict the node References (URLs/Notes) : https://support.oracle.com/CSP/main/article?cmd=show&type=N OT&id=294430.1 Node Status Expected Value Actual Value ------------------------------------------------------------------------------------------ rac1node1 PASSED 30 30 rac1node3 PASSED 30 30 rac1node2 PASSED 30 30 [..] RAC Diagnostics45 30.11.15
  46. 46. OSWatcher (1)   Collects OS statistics in the background –  CPU –  Memory –  Disk I/O   Installed and activated with TFA collector   Can generate nice graphics OSWatcher vs. CHM –  CHM CPU overhead lower –  OSWatcher runs with user priority (CHM: Realtime) –  OSWatcher collects more information RAC Diagnostics46 30.11.15
  47. 47. OSWatcher (2) oracle> /u00/app/oracle/tools/tfa/bin/tfactl run oswbb Starting OSW Analyzer V7.3.1 OSWatcher Analyzer Written by Oracle Center of Expertise Copyright (c) 2014 by Oracle Corporation Parsing Data. Please Wait... Scanning file headers for version and platform info... Parsing file dbserver.markusflechtner.vm_iostat_15.02.22.0800.dat ... Parsing file dbserver.markusflechtner.vm_iostat_15.02.22.0900.dat ... [..] Parsing Completed. Enter 1 to Display CPU Process Queue Graphs Enter 2 to Display CPU Utilization Graphs Enter 3 to Display CPU Other Graphs Enter 4 to Display Memory Graphs Enter 5 to Display Disk IO Graphs [..] Enter Q to Quit Program Please Select an Option: RAC Diagnostics47 30.11.15
  48. 48. OSWatcher (3) RAC Diagnostics48 30.11.15
  49. 49. RAC Diagnostics49 30.11.15 Summary
  50. 50. Summary   Oracle provides a lot of tools to keep a cluster in a healthy state   There are multiple ways to install the same tool –  The toolset is not complete integrated in the PSU lifecycle so far   Overlapping functionality –  Healthchecks: OraChk vs. cluvfy –  System performance data: CHM vs. OSWatcher Σ RAC Diagnostics50 30.11.15
  51. 51. RAC Diagnostics51 30.11.15 Further Information Some MOS-Notes: •  TFA Collector - Tool for Enhanced Diagnostic Gathering (Doc ID 1513912.1) •  ORAchk - Health Checks for the Oracle Stack (Doc ID 1268927.2) •  oratop - Utility for Near Real-time Monitoring of Databases (Doc ID 1500864.1) •  SQLT Diagnostic Tool (Doc ID 215187.1) •  Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware (Doc ID 459694.1) •  Cluster Verification Utility (CLUVFY) FAQ (Doc ID 316817.1)
  52. 52. Questions and Answers Markus Flechtner Senior Consultant Phone +49 211 5866 6470 Markus.Flechtner@Trivadis.com @markusdba http://markusdba.de Download the slides from http://www.slideshare.net/markusdba Please don‘t forget the session evaluation – Thank you! 30.11.15 RAC Diagnostics52

×