SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
IBM zEnterprise – Freedom Through Design




                           Introducing the Linux Health Checker



                                    Peter Oberparleiter <peter.oberparleiter@de.ibm.com>




                                                                          © 2012 IBM Corporation
Linux Health Checker


 Trademarks
  The following are trademarks of the International Business Machines Corporation in the United States and/or other countries.

  AIX*                                IBM*                                          PowerVM                               System z10                        z/OS*
  BladeCenter*                        IBM eServer                                   PR/SM                                 WebSphere*                        zSeries*
  DataPower*                          IBM (logo)*                                   Smarter Planet                        z9*                               z/VM*
  DB2*                                InfiniBand*                                   System x*                             z10 BC                            z/VSE
  FICON*                              Parallel Sysplex*                             System z*                             z10 EC
  GDPS*                               POWER*                                        System z9*                            zEnterprise
  HiperSockets                        POWER7*
   * Registered trademarks of IBM Corporation

  The following are trademarks or registered trademarks of other companies.

   Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
   Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license there from.
   Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
   Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
   Windows Server and the Windows logo are trademarks of the Microsoft group of countries.
   InfiniBand is a trademark and service mark of the InfiniBand Trade Association.
   Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel
   Corporation or its subsidiaries in the United States and other countries.
   UNIX is a registered trademark of The Open Group in the United States and other countries.
   Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
   ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.
   IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.

   * All other products may be trademarks or registered trademarks of their respective companies.

  Notes:
  Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any
  user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the
  workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.
  IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
  All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have
  achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.
  This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to
  change without notice. Consult your local IBM business contact for information on the product or services available in your area.
  All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
  Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the
  performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
  Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
2 Tux logo by Larry Ewing.                                                                                                                                             © 2012 IBM Corporation
Linux Health Checker


Agenda – Part 1



          1. Introducing health checking
          2. Using the Linux Health Checker
          3. How to write a check




3                                             © 2012 IBM Corporation
Linux Health Checker


Introducing health checking



 What is a health check?
       ►A     process that identifies conditions which may lead to problems


 What is the Linux Health Checker?
       ►A     tool that performs an automated health check of a Linux system
       ► Checks            status and configuration
       ► Presents           report on identified problems


                      Helps keeping Linux systems
                      healthy (operational)




4                                                                              © 2012 IBM Corporation
Linux Health Checker


What does it do?



 Example problem classes
       ► Configuration          errors
       ► Deviations          from best-practice setups
       ► Hardware           running in degraded mode
       ► Unused            accelerator hardware
       ► Single       point-of-failures


 Detailed problem report
       ► Enable        users to understand and solve problems
       ► Make        expert knowledge available to wider audience



5                                                                   © 2012 IBM Corporation
Linux Health Checker


Goals



 Ease of use
       ► Simple        setup: Install and run
       ► Primary           tasks easily accessible through command line interface


 Flexibility through Framework/Plug-in concept
       ► Health       check plug-ins
          Contain all problem area specific knowledge
          ●
       ► Consumer plug-ins
          ●Handle output processing
       ► Extend functionality by adding new plug-ins




6                                                                                   © 2012 IBM Corporation
Linux Health Checker


Basic approach to health checking



                                      Collect      Analyze   Generate output

                     System                                                    Health Check
                   information                                                    output



 Collect system information
       ► File    contents, for example /var/log/messages
       ► Program           output, for example /bin/df
 Analyze information
       ► Find      relevant data points
       ► Compare           with best-practice values
 Produce report



7                                                                                             © 2012 IBM Corporation
Linux Health Checker


System overview


                                                      User


        Framework
                           Health Checker


                                Collect system
                                                                 Generate results
                                 information



             Target
             system


        Plug-ins
                                                 Health Checks                  Consumers

                                                                                    Process     Report
                                                    Analyze
                                                                                    results



                                                                                              Notification



8                                                                                               © 2012 IBM Corporation
Linux Health Checker


Health checks in version 1.0


      Verify that the bootmap file is up-to-date                                     Screen users with superuser privileges
      Check whether the path to the OpenSSL library is configured correctly          Identify network services that are known to be insecure
      Identify unusable I/O devices                                                  Identify multipath setups that consist of a single path only
      Check for CHPIDs that are not available                                        Confirm that automatic problem reporting is activated
      Identify I/O devices that are in use although they are on the exclusion list   Ensure that panic-on-oops is switched on
      Identify I/O devices that are not associated with a device driver              Check whether the CPUs run with reduced capacity
      Check for an excessive number of unused I/O devices                            Spot getty programs on the /dev/console device
      Check Linux on z/VM for the "nopav" DASD parameter                             Identify unused terminals (TTY)
      Check file systems for adequate free space                                                  Checks by component
      Check file systems for an adequate number of free inodes
      Check whether the recommended runlevel is used and set as default                                                             CSS
                                                                                                                                    System
      Check the kernel message log for out-of-memory (OOM) occurrences                                                              Network
                                                                                                                                    Storage
      Identify bonding interfaces that aggregate qeth interfaces with the same                                                      Filesystem
      CHPID                                                                                                                         Security
                                                                                                                                    Boot
      Check for an excessive error ratio for outbound HiperSockets traffic                                                          Crypto
                                                                                                                                    Init
      Check the inbound network traffic for an excessive error or drop ratio                                                        Kernel
                                                                                                                                    Hardware
      Identify qeth interfaces that do not have an optimal number of buffers
      Confirm that the dump-on-panic function is enabled




9                                                                                                                                         © 2012 IBM Corporation
Linux Health Checker


Agenda – Part 2



       1. Introducing health checking
       2. Using the Linux Health Checker
       3. How to write a check




10                                         © 2012 IBM Corporation
Linux Health Checker


Preparations



 Requirements
     ► Linux
       ● Framework should run on any hardware platform
       ● Health checks may be platform specific
     ► Perl 5.8 or later
       ●   Additional Perl modules which are usually part of default installation


 Obtaining the Linux Health Checker
     ► Open       source under Eclipse Public License v1.0
     ► Download         RPM or source package from http://lnxhc.sourceforge.net
     ► Install    using RPM command or make install


11                                                                                  © 2012 IBM Corporation
Linux Health Checker


First health check run
     [user@lnxhost ~]$ lnxhc run
     Collecting system information
     Running checks (12 checks)
     CHECK NAME                               HOST                     RESULT
     =================================================================================
     boot_zipl_update_required .............. lnxhost                  SUCCESS
     css_ccw_availability ................... lnxhost                  SUCCESS
     css_ccw_chpid .......................... lnxhost                  SUCCESS
     css_ccw_no_driver ...................... lnxhost                  SUCCESS
     css_ccw_unused_devices ................. lnxhost                  EXCEPTION-LOW

      >EXCEPTION css_ccw_unused_devices.many_unused_devices(low)
         Of 4664 I/O devices, 4659 (99.89%) are unused

     fs_disk_usage ..........................   lnxhost               SUCCESS
     mm_oom_killer_triggered ................   lnxhost               SUCCESS
     net_hsi_tx_errors ......................   lnxhost               NOT APPLICABLE
     ras_dump_on_panic ......................   lnxhost               EXCEPTION-HIGH

      >EXCEPTION ras_dump_on_panic.no_standalone(high)
         The dump-on-panic function is not enabled

     sec_services_insecure .................. lnxhost                 SUCCESS
     sys_sysctl_call_home ................... lnxhost                 NOT APPLICABLE
     sys_sysinfo_cpu_cap .................... lnxhost                 SUCCESS

     10 checks run, 2 exceptions found (use 'lnxhc run --replay -V' for details)

12                                                                                     © 2012 IBM Corporation
Linux Health Checker


Interpreting list view



 Some health checks found no problems
     boot_zipl_update_required ..............   lnxhost          SUCCESS
     css_ccw_availability ...................   lnxhost          SUCCESS
     css_ccw_chpid ..........................   lnxhost          SUCCESS
     css_ccw_no_driver ......................   lnxhost          SUCCESS

 A health check did not run
     net_hsi_tx_errors ...................... lnxhost            NOT APPLICABLE

      ►A    requirement was not met, for example platform, hypervisor or Linux version
 More reasons why a health check cannot run
      ► FAILED_SYSINFO
        ● Some of the required input data could not be collected
      ► FAILED_CHKPROG
        ●   There was a run-time error in the analysis step


13                                                                                © 2012 IBM Corporation
Linux Health Checker


Interpreting list view (continued)



 A potential problem was found
     css_ccw_unused_devices ................. lnxhost              EXCEPTION-LOW

      >EXCEPTION css_ccw_unused_devices.many_unused_devices(low)
         Of 4664 I/O devices, 4659 (99.89%) are unused


      ► Full   exception ID
        ● css_ccw_unused_devices.many_unused_devices
      ► Exception severity
        ● low
      ► Exception summary
        ●   Of 4664 I/O devices, 4659 (99.89%) are unused




14                                                                                 © 2012 IBM Corporation
Linux Health Checker


Getting more details
     [user@lnxhost ~]$ lnxhc run -V css_ccw_unused_devices
     CHECK NAME                               HOST                            RESULT
     ======================================================================================
     css_ccw_unused_devices ................. lnxhost                         EXCEPTION-LOW

     >EXCEPTION css_ccw_unused_devices.many_unused_devices(low)

      SUMMARY
        Of 4664 I/O devices, 4659(99.89%) are unused

      EXPLANATION
        The number of unused (offline) I/O devices, 4664 (99.89%) of a total of 4659,
        exceeds the specified threshold. During the boot process, Linux senses and analyzes
        All available I/O devices, including unused devices. Therefore, unused devices
        unnecessarily consume memory and CPU time.

      SOLUTION
        Use the "cio_ignore" feature to exclude I/O devices that you do not need from being
        sensed and analyzed. Be sure not to inadvertently exclude required devices. To ex-
        clude devices, you can use the "cio_ignore" kernel parameter or a command like this:

        echo "add <device_bus_id>" > /proc/cio_ignore

        where <device_bus_id> is the bus ID of an I/O device to be excluded.

      REFERENCE
        For more information about the "cio_ignore" feature, see the section about the
        "cio_ignore" kernel parameter in "Device Drivers, Features, and Commands".
15                                                                                   © 2012 IBM Corporation
Linux Health Checker


Additional functions



                        run        ► Run   health check

                        check      ► Manage   and configure health checks

                        consumer   ► Manage   and configure consumers
      $ lnxhc
                        sysinfo    ► Manage   stored system information

                        profile    ► Manage   configuration profiles

                        devel      ► Access   development functions




16                                                                      © 2012 IBM Corporation
Linux Health Checker


Viewing list of available health checks
     [user@lnxhost ~]$ lnxhc check --list

     CHECK NAME                               COMPONENT                      STATE
     ================================================================================
     boot_zipl_update_required                boot                           active
     crypto_openssl_ibmca_config              crypto                         inactive
     css_ccw_availability                     channel subsystem              active
     css_ccw_chpid                            channel subsystem              active
     css_ccw_ignored_online                   channel subsystem              active
     css_ccw_no_driver                        channel subsystem              active
     css_ccw_unused_devices                   channel subsystem              active
     dasd_zvm_nopav                           storage                        active
     fs_disk_usage                            filesystem                     active
     fs_inode_usage                           filesystem                     active
     init_runlevel                            init                           active
     mm_oom_killer_triggered                  kernel                         active
     net_bond_dev_chpid                       network                        active
     net_hsi_tx_errors                        network                        active
     net_inbound_packets                      network                        active
     net_qeth_buffercount                     network                        active
     ras_dump_on_panic                        system                         active
     sec_non_root_uid_zero                    security                       active
     sec_services_insecure                    security                       active
     ...




17                                                                                      © 2012 IBM Corporation
Linux Health Checker


Viewing health check information
     [user@lnxhost ~]$ lnxhc check --info fs_disk_usage

     Check fs_disk_usage (active)
     ============================

     Title:
       Check file systems for adequate free space

     Description:
       Some applications and administrative tasks require an adequate amount of free space on
       each mounted file system. If there is not enough free space, these applications might
       no longer be available or the complete system might be compromised. Regular monitoring
       of disk space usage averts this risk.

     Exceptions:
       critical_limit=high (active)
       warn_limit=low (inactive)

     Parameters:
       critical_limit=95
             File system usage (in percent) at which to raise a high-severity exception.
             Valid values are integers in the range 1 to 100.

              Default value is "95".

     ...


18                                                                                   © 2012 IBM Corporation
Linux Health Checker


Health check properties



 Fixed properties
     ► Name

     ► Meta-data
       ● Component to be checked
       ● Author name
     ► Exceptions
       ●   Exception IDs


 Some properties can be modified
     ► Activation       state
     ► Parameter        values
     ► Exception        activation state and severity

19                                                      © 2012 IBM Corporation
Linux Health Checker


Modifying health check properties



 Activation state
     ► Specifies        if a check should be performed during health check run
     [user@lnxhost ~]$ lnxhc check fs_disk_usage --state inactive
     Setting state of check 'fs_disk_usage' to 'inactive'
     Done.

 Parameter values
     ► Values       defined by health checks
     ► Enable       users to customize certain aspects of the health check
     [user@lnxhost ~]$ lnxhc check --param fs_disk_usage.critical_limit=99
     Setting value of parameter fs_disk_usage.critical_limit to '99'
     Done.


 See man page for full list of properties
     ► man      lnxhc_properties.7

20                                                                               © 2012 IBM Corporation
Linux Health Checker


Advanced health checking modes



 Collect data to file
     lnxhc sysinfo --collect --file lnxhost.sysinfo                  Collect


                                                                                          File
 Analyze from file
     lnxhc run --file lnxhost.sysinfo                                            Analysis

                                                              File
 Analyze from remote host
     ssh user@remote lnxhc sysinfo -c -f - | lnxhc run -f -


 Analyze from multiple hosts                                                  Collect             Collect

     lnxhc sysinfo –clear
     ssh user@remote1 lnxhc sysinfo -c -f - | lnxhc sysinfo -–merge -
     ssh user@remote2 lnxhc sysinfo -c -f - | lnxhc sysinfo -–merge -
     ...
     lnxhc run --current                                                             Internal DB


                                                                                         Analyze



21                                                                                           © 2012 IBM Corporation
Linux Health Checker


Agenda – Part 3



       1. Introducing health checking
       2. Using the Linux Health Checker
       3. How to write a check




22                                         © 2012 IBM Corporation
Linux Health Checker


Preparations for writing a health check



 Required skills
     ► Basic      programming skills
     ► Any     programming language
       ● Shell
       ● Perl
       ● C, C++, etc.

 Required resources
     ► Linux     system
       ● lnxhc package installed
       ● Text editor
     ► Some time




23                                        © 2012 IBM Corporation
Linux Health Checker


Finding a suitable idea



 Sources for health check ideas
     ► Own      experience
     ► Read      documentation
     ► Analyze          previous outages


 What to look for
     ► Single      points of failure
     ► System           values reaching limits
     ► Unhealthy          combinations of configurations




24                                                         © 2012 IBM Corporation
Linux Health Checker


Example idea



 What to check?
     ► Value      of sysctl setting 'panic_on_oops' should be '1'
 Why?
     ► “Kernel          oops” = severe kernel error
     ► Indication         that the kernel can no longer be trusted
     ► Kernel      will continue anyway if panic_on_oops is '0'
 How to check
     [user@lnxhost ~]$ cat /proc/sys/kernel/panic_on_oops
     0


 Solution
     echo 1 > /proc/sys/kernel/panic_on_oops




25                                                                   © 2012 IBM Corporation
Linux Health Checker


Implementation without framework



 Check program 'check.sh'
     #!/bin/bash

     FILENAME="/proc/sys/kernel/panic_on_oops"
     PANIC_ON_OOPS=`cat $FILENAME`

     if [ "$PANIC_ON_OOPS" -eq 0 ] ; then
             echo "The panic-on-oops setting is disabled"
             echo "Enable it using 'echo 1 > /proc/sys/kernel/panic_on_oops'"
     fi

     exit 0


 Sample output
     [user@lnxhost ~]$ ./check.sh
     The panic-on-oops setting is disabled
     Enable it using 'echo 1 > /proc/sys/kernel/panic_on_oops'




26                                                                              © 2012 IBM Corporation
Linux Health Checker


Writing checks for the Linux Health Checker framework



 One directory per check
     ► Directory        name is check name


 Files for
                                      panic_on_oops
     ► Meta      data                   definitions
     ► Text                             descriptions
                                        exceptions
     ► Check       program              check




27                                                      © 2012 IBM Corporation
Linux Health Checker


Definitions file



 Contains data about the health check
     [check]                                  Meta-data
     author = user@host
     component = system


     [sysinfo panic_on_oops]                  System information
     file = /proc/sys/kernel/panic_on_oops
                                              ► Files,   command output, etc.
                                              Exceptions
     [exception no_panic_on_oops]
     severity = high                          ► ID   and severity

                                              Optional parameters




28                                                                         © 2012 IBM Corporation
Linux Health Checker


Descriptions file



 Contains health check and
  parameter descriptions
     [title]
     Ensure that panic-on-oops is enabled
                                                 Check title

     [description]                               Basic check description
     The panic-on-oops setting ensures that a
     Linux instance is stopped if a kernel
     oops occurs.

                                                 Description of parameters




29                                                                          © 2012 IBM Corporation
Linux Health Checker


Exceptions file



 Contains problem report text

     [summary no_panic_on_oops]
     The panic-on-oops setting is disabled
                                                  Problem summary

     [explanation no_panic_on_oops]               Explanation
     Without the panic-on-oops setting, a
     Linux instance might keep running after       ► Why     is this a problem?
     an oops.


     [solution no_panic_on_oops]                  Solution
     Use the following command to enable the
     panic-on-oops setting                         ► Step-by-step     instruction
     echo 1 > /proc/sys/kernel/panic_on_oops

     [reference no_panic_on_oops]                 Reference for further reading
     See kernel documentation on panic-on-oops
     setting.                                      ► If   available

30                                                                                  © 2012 IBM Corporation
Linux Health Checker


Check program



 Implements health check analysis
  logic
     #!/bin/bash


     FILENAME=$LNXHC_SYSINFO_panic_on_oops            Access system information
     PANIC_ON_OOPS=`cat $FILENAME`


     if [ "$PANIC_ON_OOPS" -eq 0 ] ; then             Analyze and report exception
       echo "no_panic_on_oops" >> $LNXHC_EXCEPTION
     fi


     exit 0                                           Indicate result code
                                                       ►0   = Success
                                                       ► 64   = Missing dependency
                                                       ► Other   = Run-time error

31                                                                                  © 2012 IBM Corporation
Linux Health Checker


Putting it all together
     [user@lnxhost ~]$ lnxhc run -V ./panic_on_oops
     Collecting system information
     Running checks (1 checks)
     CHECK NAME                               HOST                               RESULT
     ==========================================================================================
     panic_on_oops .......................... lnxhost                            EXCEPTION-HIGH

     >EXCEPTION panic_on_oops.no_panic_on_oops(high)

      SUMMARY
        The panic-on-oops setting is disabled

      EXPLANATION
        Without the panic-on-oops        setting,   a Linux instance might
        keep running after an oops.

      SOLUTION
        Use the following command to enable the panic-on-oops setting
        echo 1 > /proc/sys/kernel/panic_on_oops

      REFERENCE
        See kernel documentation on panic-on-oops setting.


 If it doesn't work, add more “-V”s
      ► Increase        level of verbosity to help debugging
32                                                                                   © 2012 IBM Corporation
Linux Health Checker


Wrap-up



 To implement a check
     ► Create       a directory
     ► Add     files
       ● Meta-data
       ● Text files
       ● Check program
     ► Run/debug until it works

 Health check creation dialog                 lnxhc devel –-create-check my_check

     ► Creates          template files according to dialog input
 Once done, consider contributing it!
     ► Post     to lnxhc-list@lists.sourceforge.net
     ► See     contribution guidelines at http://lnxhc.sourceforge.net/contributing.html
33                                                                                   © 2012 IBM Corporation
Linux Health Checker


Further reading



 Man pages
     ► Once      installed use 'apropos lnxhc' to list man pages
     ► Also     available on the web: http://lnxhc.sourceforge.net/manpages.html
 User's Guide
     ► http://lnxhc.sourceforge.net/documentation.html

 Main web page
     ► http://lnxhc.sourceforge.net/

 Mailing list
     ► Open       for questions, comments, ideas, contributions, etc.
     ► lnxhc-list@lists.sourceforge.net




34                                                                             © 2012 IBM Corporation

Weitere ähnliche Inhalte

Was ist angesagt?

IBM i Technology Refreshes Overview 2012 06-04
IBM i Technology Refreshes Overview 2012 06-04IBM i Technology Refreshes Overview 2012 06-04
IBM i Technology Refreshes Overview 2012 06-04COMMON Europe
 
PMISV Symposium Presentation
PMISV Symposium PresentationPMISV Symposium Presentation
PMISV Symposium PresentationIvarsLenss
 
The Basics of Intel vPro
The Basics of Intel vProThe Basics of Intel vPro
The Basics of Intel vProManage Beyond
 
Considering Windows 7?
Considering Windows 7?Considering Windows 7?
Considering Windows 7?None
 
Comp tia a+_session_14
Comp tia a+_session_14Comp tia a+_session_14
Comp tia a+_session_14Niit Care
 
Planning and implementing windows 7
Planning and implementing windows 7Planning and implementing windows 7
Planning and implementing windows 7Interop
 
Comp tia a+_session_03
Comp tia a+_session_03Comp tia a+_session_03
Comp tia a+_session_03Niit Care
 
miniAVIT Brochure
miniAVIT BrochureminiAVIT Brochure
miniAVIT Brochurejohnuzz
 
ID114 - Wrestling the Snake: Performance Tuning 101
ID114 - Wrestling the Snake: Performance Tuning 101ID114 - Wrestling the Snake: Performance Tuning 101
ID114 - Wrestling the Snake: Performance Tuning 101Wes Morgan
 
New Diskeeper Corporation Ad
New Diskeeper Corporation  AdNew Diskeeper Corporation  Ad
New Diskeeper Corporation Adpaulhoole
 

Was ist angesagt? (11)

IBM i Technology Refreshes Overview 2012 06-04
IBM i Technology Refreshes Overview 2012 06-04IBM i Technology Refreshes Overview 2012 06-04
IBM i Technology Refreshes Overview 2012 06-04
 
PMISV Symposium Presentation
PMISV Symposium PresentationPMISV Symposium Presentation
PMISV Symposium Presentation
 
3 management
3 management3 management
3 management
 
The Basics of Intel vPro
The Basics of Intel vProThe Basics of Intel vPro
The Basics of Intel vPro
 
Considering Windows 7?
Considering Windows 7?Considering Windows 7?
Considering Windows 7?
 
Comp tia a+_session_14
Comp tia a+_session_14Comp tia a+_session_14
Comp tia a+_session_14
 
Planning and implementing windows 7
Planning and implementing windows 7Planning and implementing windows 7
Planning and implementing windows 7
 
Comp tia a+_session_03
Comp tia a+_session_03Comp tia a+_session_03
Comp tia a+_session_03
 
miniAVIT Brochure
miniAVIT BrochureminiAVIT Brochure
miniAVIT Brochure
 
ID114 - Wrestling the Snake: Performance Tuning 101
ID114 - Wrestling the Snake: Performance Tuning 101ID114 - Wrestling the Snake: Performance Tuning 101
ID114 - Wrestling the Snake: Performance Tuning 101
 
New Diskeeper Corporation Ad
New Diskeeper Corporation  AdNew Diskeeper Corporation  Ad
New Diskeeper Corporation Ad
 

Andere mochten auch

Andere mochten auch (7)

Virtualization Into Cloud
Virtualization Into CloudVirtualization Into Cloud
Virtualization Into Cloud
 
Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...
Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...
Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...
 
Current & Future Linux on System z Technology
Current & Future Linux on System z TechnologyCurrent & Future Linux on System z Technology
Current & Future Linux on System z Technology
 
IBM Smarter Systems : Solving Your Critical It Challenges: Executive Summit
IBM Smarter Systems  : Solving Your Critical It Challenges: Executive SummitIBM Smarter Systems  : Solving Your Critical It Challenges: Executive Summit
IBM Smarter Systems : Solving Your Critical It Challenges: Executive Summit
 
Infrastructure Matters 2014 IBM systems and servers
Infrastructure Matters 2014 IBM systems and serversInfrastructure Matters 2014 IBM systems and servers
Infrastructure Matters 2014 IBM systems and servers
 
What's New in RHEL 6 for Linux on System z?
What's New in RHEL 6 for Linux on System z?What's New in RHEL 6 for Linux on System z?
What's New in RHEL 6 for Linux on System z?
 
Ibm blade center
Ibm blade centerIbm blade center
Ibm blade center
 

Ähnlich wie Introducing the Linux Health Checker

Monitoring system performance and health of i CEC 2012
Monitoring system performance and health of i CEC 2012Monitoring system performance and health of i CEC 2012
Monitoring system performance and health of i CEC 2012COMMON Europe
 
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...MongoDB
 
Ims13 ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
Ims13   ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...Ims13   ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
Ims13 ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...Robert Hain
 
AV-Comparatives Performance Test
AV-Comparatives Performance TestAV-Comparatives Performance Test
AV-Comparatives Performance TestHerbert Rodriguez
 
Xtw01t9v0901 sys mgmt
Xtw01t9v0901 sys mgmtXtw01t9v0901 sys mgmt
Xtw01t9v0901 sys mgmtpgnguyen44
 
ISD 6.3 and IBM i june 2012
ISD 6.3 and IBM i june 2012ISD 6.3 and IBM i june 2012
ISD 6.3 and IBM i june 2012COMMON Europe
 
Innovate 2014: Get an A+ on Testing Your Enterprise Applications with Rationa...
Innovate 2014: Get an A+ on Testing Your Enterprise Applications with Rationa...Innovate 2014: Get an A+ on Testing Your Enterprise Applications with Rationa...
Innovate 2014: Get an A+ on Testing Your Enterprise Applications with Rationa...Teodoro Cipresso
 
AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...
AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...
AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...WASdev Community
 
zEnterpise integration of Linux and traditional workload
zEnterpise integration of Linux and traditional workloadzEnterpise integration of Linux and traditional workload
zEnterpise integration of Linux and traditional workloadIBM India Smarter Computing
 
RDz for DevOps Webcast Series: Implementing Continuous Integration with RDz
RDz for DevOps Webcast Series: Implementing Continuous Integration with RDzRDz for DevOps Webcast Series: Implementing Continuous Integration with RDz
RDz for DevOps Webcast Series: Implementing Continuous Integration with RDzSusan Yoskin
 
Systemz Security Overview (for non-Mainframe folks)
Systemz Security Overview (for non-Mainframe folks)Systemz Security Overview (for non-Mainframe folks)
Systemz Security Overview (for non-Mainframe folks)Mike Smith
 
Build Smarter User Interfaces for Legacy Applications with IBM Rational Host ...
Build Smarter User Interfaces for Legacy Applications with IBM Rational Host ...Build Smarter User Interfaces for Legacy Applications with IBM Rational Host ...
Build Smarter User Interfaces for Legacy Applications with IBM Rational Host ...Strongback Consulting
 
Intel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabIntel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabMichelle Holley
 
Windows 7 – Application Compatibility Toolkit 5.5 Overview
Windows 7 – Application Compatibility Toolkit 5.5 OverviewWindows 7 – Application Compatibility Toolkit 5.5 Overview
Windows 7 – Application Compatibility Toolkit 5.5 OverviewVijay Raj
 
categories of computer software
categories of computer softwarecategories of computer software
categories of computer softwareManidhar Chowdary
 

Ähnlich wie Introducing the Linux Health Checker (20)

Monitoring system performance and health of i CEC 2012
Monitoring system performance and health of i CEC 2012Monitoring system performance and health of i CEC 2012
Monitoring system performance and health of i CEC 2012
 
What’s New ?Linux on System z
What’s New ?Linux on System zWhat’s New ?Linux on System z
What’s New ?Linux on System z
 
IBM zAware
IBM zAwareIBM zAware
IBM zAware
 
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
 
Ims13 ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
Ims13   ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...Ims13   ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
Ims13 ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
 
Problem Determination with Linux on System z
Problem Determination with Linux on System zProblem Determination with Linux on System z
Problem Determination with Linux on System z
 
AV-Comparatives Performance Test
AV-Comparatives Performance TestAV-Comparatives Performance Test
AV-Comparatives Performance Test
 
Xtw01t9v0901 sys mgmt
Xtw01t9v0901 sys mgmtXtw01t9v0901 sys mgmt
Xtw01t9v0901 sys mgmt
 
ISD 6.3 and IBM i june 2012
ISD 6.3 and IBM i june 2012ISD 6.3 and IBM i june 2012
ISD 6.3 and IBM i june 2012
 
Avc per 201206_en
Avc per 201206_enAvc per 201206_en
Avc per 201206_en
 
Innovate 2014: Get an A+ on Testing Your Enterprise Applications with Rationa...
Innovate 2014: Get an A+ on Testing Your Enterprise Applications with Rationa...Innovate 2014: Get an A+ on Testing Your Enterprise Applications with Rationa...
Innovate 2014: Get an A+ on Testing Your Enterprise Applications with Rationa...
 
AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...
AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...
AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...
 
zEnterpise integration of Linux and traditional workload
zEnterpise integration of Linux and traditional workloadzEnterpise integration of Linux and traditional workload
zEnterpise integration of Linux and traditional workload
 
RDz for DevOps Webcast Series: Implementing Continuous Integration with RDz
RDz for DevOps Webcast Series: Implementing Continuous Integration with RDzRDz for DevOps Webcast Series: Implementing Continuous Integration with RDz
RDz for DevOps Webcast Series: Implementing Continuous Integration with RDz
 
Systemz Security Overview (for non-Mainframe folks)
Systemz Security Overview (for non-Mainframe folks)Systemz Security Overview (for non-Mainframe folks)
Systemz Security Overview (for non-Mainframe folks)
 
Build Smarter User Interfaces for Legacy Applications with IBM Rational Host ...
Build Smarter User Interfaces for Legacy Applications with IBM Rational Host ...Build Smarter User Interfaces for Legacy Applications with IBM Rational Host ...
Build Smarter User Interfaces for Legacy Applications with IBM Rational Host ...
 
Intel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabIntel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/Lab
 
Windows 7 – Application Compatibility Toolkit 5.5 Overview
Windows 7 – Application Compatibility Toolkit 5.5 OverviewWindows 7 – Application Compatibility Toolkit 5.5 Overview
Windows 7 – Application Compatibility Toolkit 5.5 Overview
 
categories of computer software
categories of computer softwarecategories of computer software
categories of computer software
 
BLADEHarmony Manager
BLADEHarmony ManagerBLADEHarmony Manager
BLADEHarmony Manager
 

Mehr von IBM India Smarter Computing

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments IBM India Smarter Computing
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...IBM India Smarter Computing
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceIBM India Smarter Computing
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM India Smarter Computing
 

Mehr von IBM India Smarter Computing (20)

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments
 
All-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage EfficiencyAll-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage Efficiency
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
 
IBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product GuideIBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product Guide
 
IBM System x3250 M5
IBM System x3250 M5IBM System x3250 M5
IBM System x3250 M5
 
IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4
 
IBM System x3650 M4 HD
IBM System x3650 M4 HDIBM System x3650 M4 HD
IBM System x3650 M4 HD
 
IBM System x3300 M4
IBM System x3300 M4IBM System x3300 M4
IBM System x3300 M4
 
IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4
 
IBM System x3500 M4
IBM System x3500 M4IBM System x3500 M4
IBM System x3500 M4
 
IBM System x3550 M4
IBM System x3550 M4IBM System x3550 M4
IBM System x3550 M4
 
IBM System x3650 M4
IBM System x3650 M4IBM System x3650 M4
IBM System x3650 M4
 
IBM System x3500 M3
IBM System x3500 M3IBM System x3500 M3
IBM System x3500 M3
 
IBM System x3400 M3
IBM System x3400 M3IBM System x3400 M3
IBM System x3400 M3
 
IBM System x3250 M3
IBM System x3250 M3IBM System x3250 M3
IBM System x3250 M3
 
IBM System x3200 M3
IBM System x3200 M3IBM System x3200 M3
IBM System x3200 M3
 
IBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and ConfigurationIBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and Configuration
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization Performance
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architecture
 
X6: The sixth generation of EXA Technology
X6: The sixth generation of EXA TechnologyX6: The sixth generation of EXA Technology
X6: The sixth generation of EXA Technology
 

Introducing the Linux Health Checker

  • 1. IBM zEnterprise – Freedom Through Design Introducing the Linux Health Checker Peter Oberparleiter <peter.oberparleiter@de.ibm.com> © 2012 IBM Corporation
  • 2. Linux Health Checker Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. AIX* IBM* PowerVM System z10 z/OS* BladeCenter* IBM eServer PR/SM WebSphere* zSeries* DataPower* IBM (logo)* Smarter Planet z9* z/VM* DB2* InfiniBand* System x* z10 BC z/VSE FICON* Parallel Sysplex* System z* z10 EC GDPS* POWER* System z9* zEnterprise HiperSockets POWER7* * Registered trademarks of IBM Corporation The following are trademarks or registered trademarks of other companies. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license there from. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Windows Server and the Windows logo are trademarks of the Microsoft group of countries. InfiniBand is a trademark and service mark of the InfiniBand Trade Association. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce. * All other products may be trademarks or registered trademarks of their respective companies. Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography. 2 Tux logo by Larry Ewing. © 2012 IBM Corporation
  • 3. Linux Health Checker Agenda – Part 1 1. Introducing health checking 2. Using the Linux Health Checker 3. How to write a check 3 © 2012 IBM Corporation
  • 4. Linux Health Checker Introducing health checking  What is a health check? ►A process that identifies conditions which may lead to problems  What is the Linux Health Checker? ►A tool that performs an automated health check of a Linux system ► Checks status and configuration ► Presents report on identified problems Helps keeping Linux systems healthy (operational) 4 © 2012 IBM Corporation
  • 5. Linux Health Checker What does it do?  Example problem classes ► Configuration errors ► Deviations from best-practice setups ► Hardware running in degraded mode ► Unused accelerator hardware ► Single point-of-failures  Detailed problem report ► Enable users to understand and solve problems ► Make expert knowledge available to wider audience 5 © 2012 IBM Corporation
  • 6. Linux Health Checker Goals  Ease of use ► Simple setup: Install and run ► Primary tasks easily accessible through command line interface  Flexibility through Framework/Plug-in concept ► Health check plug-ins Contain all problem area specific knowledge ● ► Consumer plug-ins ●Handle output processing ► Extend functionality by adding new plug-ins 6 © 2012 IBM Corporation
  • 7. Linux Health Checker Basic approach to health checking Collect Analyze Generate output System Health Check information output  Collect system information ► File contents, for example /var/log/messages ► Program output, for example /bin/df  Analyze information ► Find relevant data points ► Compare with best-practice values  Produce report 7 © 2012 IBM Corporation
  • 8. Linux Health Checker System overview User Framework Health Checker Collect system Generate results information Target system Plug-ins Health Checks Consumers Process Report Analyze results Notification 8 © 2012 IBM Corporation
  • 9. Linux Health Checker Health checks in version 1.0 Verify that the bootmap file is up-to-date Screen users with superuser privileges Check whether the path to the OpenSSL library is configured correctly Identify network services that are known to be insecure Identify unusable I/O devices Identify multipath setups that consist of a single path only Check for CHPIDs that are not available Confirm that automatic problem reporting is activated Identify I/O devices that are in use although they are on the exclusion list Ensure that panic-on-oops is switched on Identify I/O devices that are not associated with a device driver Check whether the CPUs run with reduced capacity Check for an excessive number of unused I/O devices Spot getty programs on the /dev/console device Check Linux on z/VM for the "nopav" DASD parameter Identify unused terminals (TTY) Check file systems for adequate free space Checks by component Check file systems for an adequate number of free inodes Check whether the recommended runlevel is used and set as default CSS System Check the kernel message log for out-of-memory (OOM) occurrences Network Storage Identify bonding interfaces that aggregate qeth interfaces with the same Filesystem CHPID Security Boot Check for an excessive error ratio for outbound HiperSockets traffic Crypto Init Check the inbound network traffic for an excessive error or drop ratio Kernel Hardware Identify qeth interfaces that do not have an optimal number of buffers Confirm that the dump-on-panic function is enabled 9 © 2012 IBM Corporation
  • 10. Linux Health Checker Agenda – Part 2 1. Introducing health checking 2. Using the Linux Health Checker 3. How to write a check 10 © 2012 IBM Corporation
  • 11. Linux Health Checker Preparations  Requirements ► Linux ● Framework should run on any hardware platform ● Health checks may be platform specific ► Perl 5.8 or later ● Additional Perl modules which are usually part of default installation  Obtaining the Linux Health Checker ► Open source under Eclipse Public License v1.0 ► Download RPM or source package from http://lnxhc.sourceforge.net ► Install using RPM command or make install 11 © 2012 IBM Corporation
  • 12. Linux Health Checker First health check run [user@lnxhost ~]$ lnxhc run Collecting system information Running checks (12 checks) CHECK NAME HOST RESULT ================================================================================= boot_zipl_update_required .............. lnxhost SUCCESS css_ccw_availability ................... lnxhost SUCCESS css_ccw_chpid .......................... lnxhost SUCCESS css_ccw_no_driver ...................... lnxhost SUCCESS css_ccw_unused_devices ................. lnxhost EXCEPTION-LOW >EXCEPTION css_ccw_unused_devices.many_unused_devices(low) Of 4664 I/O devices, 4659 (99.89%) are unused fs_disk_usage .......................... lnxhost SUCCESS mm_oom_killer_triggered ................ lnxhost SUCCESS net_hsi_tx_errors ...................... lnxhost NOT APPLICABLE ras_dump_on_panic ...................... lnxhost EXCEPTION-HIGH >EXCEPTION ras_dump_on_panic.no_standalone(high) The dump-on-panic function is not enabled sec_services_insecure .................. lnxhost SUCCESS sys_sysctl_call_home ................... lnxhost NOT APPLICABLE sys_sysinfo_cpu_cap .................... lnxhost SUCCESS 10 checks run, 2 exceptions found (use 'lnxhc run --replay -V' for details) 12 © 2012 IBM Corporation
  • 13. Linux Health Checker Interpreting list view  Some health checks found no problems boot_zipl_update_required .............. lnxhost SUCCESS css_ccw_availability ................... lnxhost SUCCESS css_ccw_chpid .......................... lnxhost SUCCESS css_ccw_no_driver ...................... lnxhost SUCCESS  A health check did not run net_hsi_tx_errors ...................... lnxhost NOT APPLICABLE ►A requirement was not met, for example platform, hypervisor or Linux version  More reasons why a health check cannot run ► FAILED_SYSINFO ● Some of the required input data could not be collected ► FAILED_CHKPROG ● There was a run-time error in the analysis step 13 © 2012 IBM Corporation
  • 14. Linux Health Checker Interpreting list view (continued)  A potential problem was found css_ccw_unused_devices ................. lnxhost EXCEPTION-LOW >EXCEPTION css_ccw_unused_devices.many_unused_devices(low) Of 4664 I/O devices, 4659 (99.89%) are unused ► Full exception ID ● css_ccw_unused_devices.many_unused_devices ► Exception severity ● low ► Exception summary ● Of 4664 I/O devices, 4659 (99.89%) are unused 14 © 2012 IBM Corporation
  • 15. Linux Health Checker Getting more details [user@lnxhost ~]$ lnxhc run -V css_ccw_unused_devices CHECK NAME HOST RESULT ====================================================================================== css_ccw_unused_devices ................. lnxhost EXCEPTION-LOW >EXCEPTION css_ccw_unused_devices.many_unused_devices(low) SUMMARY Of 4664 I/O devices, 4659(99.89%) are unused EXPLANATION The number of unused (offline) I/O devices, 4664 (99.89%) of a total of 4659, exceeds the specified threshold. During the boot process, Linux senses and analyzes All available I/O devices, including unused devices. Therefore, unused devices unnecessarily consume memory and CPU time. SOLUTION Use the "cio_ignore" feature to exclude I/O devices that you do not need from being sensed and analyzed. Be sure not to inadvertently exclude required devices. To ex- clude devices, you can use the "cio_ignore" kernel parameter or a command like this: echo "add <device_bus_id>" > /proc/cio_ignore where <device_bus_id> is the bus ID of an I/O device to be excluded. REFERENCE For more information about the "cio_ignore" feature, see the section about the "cio_ignore" kernel parameter in "Device Drivers, Features, and Commands". 15 © 2012 IBM Corporation
  • 16. Linux Health Checker Additional functions run ► Run health check check ► Manage and configure health checks consumer ► Manage and configure consumers $ lnxhc sysinfo ► Manage stored system information profile ► Manage configuration profiles devel ► Access development functions 16 © 2012 IBM Corporation
  • 17. Linux Health Checker Viewing list of available health checks [user@lnxhost ~]$ lnxhc check --list CHECK NAME COMPONENT STATE ================================================================================ boot_zipl_update_required boot active crypto_openssl_ibmca_config crypto inactive css_ccw_availability channel subsystem active css_ccw_chpid channel subsystem active css_ccw_ignored_online channel subsystem active css_ccw_no_driver channel subsystem active css_ccw_unused_devices channel subsystem active dasd_zvm_nopav storage active fs_disk_usage filesystem active fs_inode_usage filesystem active init_runlevel init active mm_oom_killer_triggered kernel active net_bond_dev_chpid network active net_hsi_tx_errors network active net_inbound_packets network active net_qeth_buffercount network active ras_dump_on_panic system active sec_non_root_uid_zero security active sec_services_insecure security active ... 17 © 2012 IBM Corporation
  • 18. Linux Health Checker Viewing health check information [user@lnxhost ~]$ lnxhc check --info fs_disk_usage Check fs_disk_usage (active) ============================ Title: Check file systems for adequate free space Description: Some applications and administrative tasks require an adequate amount of free space on each mounted file system. If there is not enough free space, these applications might no longer be available or the complete system might be compromised. Regular monitoring of disk space usage averts this risk. Exceptions: critical_limit=high (active) warn_limit=low (inactive) Parameters: critical_limit=95 File system usage (in percent) at which to raise a high-severity exception. Valid values are integers in the range 1 to 100. Default value is "95". ... 18 © 2012 IBM Corporation
  • 19. Linux Health Checker Health check properties  Fixed properties ► Name ► Meta-data ● Component to be checked ● Author name ► Exceptions ● Exception IDs  Some properties can be modified ► Activation state ► Parameter values ► Exception activation state and severity 19 © 2012 IBM Corporation
  • 20. Linux Health Checker Modifying health check properties  Activation state ► Specifies if a check should be performed during health check run [user@lnxhost ~]$ lnxhc check fs_disk_usage --state inactive Setting state of check 'fs_disk_usage' to 'inactive' Done.  Parameter values ► Values defined by health checks ► Enable users to customize certain aspects of the health check [user@lnxhost ~]$ lnxhc check --param fs_disk_usage.critical_limit=99 Setting value of parameter fs_disk_usage.critical_limit to '99' Done.  See man page for full list of properties ► man lnxhc_properties.7 20 © 2012 IBM Corporation
  • 21. Linux Health Checker Advanced health checking modes  Collect data to file lnxhc sysinfo --collect --file lnxhost.sysinfo Collect File  Analyze from file lnxhc run --file lnxhost.sysinfo Analysis File  Analyze from remote host ssh user@remote lnxhc sysinfo -c -f - | lnxhc run -f -  Analyze from multiple hosts Collect Collect lnxhc sysinfo –clear ssh user@remote1 lnxhc sysinfo -c -f - | lnxhc sysinfo -–merge - ssh user@remote2 lnxhc sysinfo -c -f - | lnxhc sysinfo -–merge - ... lnxhc run --current Internal DB Analyze 21 © 2012 IBM Corporation
  • 22. Linux Health Checker Agenda – Part 3 1. Introducing health checking 2. Using the Linux Health Checker 3. How to write a check 22 © 2012 IBM Corporation
  • 23. Linux Health Checker Preparations for writing a health check  Required skills ► Basic programming skills ► Any programming language ● Shell ● Perl ● C, C++, etc.  Required resources ► Linux system ● lnxhc package installed ● Text editor ► Some time 23 © 2012 IBM Corporation
  • 24. Linux Health Checker Finding a suitable idea  Sources for health check ideas ► Own experience ► Read documentation ► Analyze previous outages  What to look for ► Single points of failure ► System values reaching limits ► Unhealthy combinations of configurations 24 © 2012 IBM Corporation
  • 25. Linux Health Checker Example idea  What to check? ► Value of sysctl setting 'panic_on_oops' should be '1'  Why? ► “Kernel oops” = severe kernel error ► Indication that the kernel can no longer be trusted ► Kernel will continue anyway if panic_on_oops is '0'  How to check [user@lnxhost ~]$ cat /proc/sys/kernel/panic_on_oops 0  Solution echo 1 > /proc/sys/kernel/panic_on_oops 25 © 2012 IBM Corporation
  • 26. Linux Health Checker Implementation without framework  Check program 'check.sh' #!/bin/bash FILENAME="/proc/sys/kernel/panic_on_oops" PANIC_ON_OOPS=`cat $FILENAME` if [ "$PANIC_ON_OOPS" -eq 0 ] ; then echo "The panic-on-oops setting is disabled" echo "Enable it using 'echo 1 > /proc/sys/kernel/panic_on_oops'" fi exit 0  Sample output [user@lnxhost ~]$ ./check.sh The panic-on-oops setting is disabled Enable it using 'echo 1 > /proc/sys/kernel/panic_on_oops' 26 © 2012 IBM Corporation
  • 27. Linux Health Checker Writing checks for the Linux Health Checker framework  One directory per check ► Directory name is check name  Files for panic_on_oops ► Meta data definitions ► Text descriptions exceptions ► Check program check 27 © 2012 IBM Corporation
  • 28. Linux Health Checker Definitions file  Contains data about the health check [check]  Meta-data author = user@host component = system [sysinfo panic_on_oops]  System information file = /proc/sys/kernel/panic_on_oops ► Files, command output, etc.  Exceptions [exception no_panic_on_oops] severity = high ► ID and severity  Optional parameters 28 © 2012 IBM Corporation
  • 29. Linux Health Checker Descriptions file  Contains health check and parameter descriptions [title] Ensure that panic-on-oops is enabled  Check title [description]  Basic check description The panic-on-oops setting ensures that a Linux instance is stopped if a kernel oops occurs.  Description of parameters 29 © 2012 IBM Corporation
  • 30. Linux Health Checker Exceptions file  Contains problem report text [summary no_panic_on_oops] The panic-on-oops setting is disabled  Problem summary [explanation no_panic_on_oops]  Explanation Without the panic-on-oops setting, a Linux instance might keep running after ► Why is this a problem? an oops. [solution no_panic_on_oops]  Solution Use the following command to enable the panic-on-oops setting ► Step-by-step instruction echo 1 > /proc/sys/kernel/panic_on_oops [reference no_panic_on_oops]  Reference for further reading See kernel documentation on panic-on-oops setting. ► If available 30 © 2012 IBM Corporation
  • 31. Linux Health Checker Check program  Implements health check analysis logic #!/bin/bash FILENAME=$LNXHC_SYSINFO_panic_on_oops  Access system information PANIC_ON_OOPS=`cat $FILENAME` if [ "$PANIC_ON_OOPS" -eq 0 ] ; then  Analyze and report exception echo "no_panic_on_oops" >> $LNXHC_EXCEPTION fi exit 0  Indicate result code ►0 = Success ► 64 = Missing dependency ► Other = Run-time error 31 © 2012 IBM Corporation
  • 32. Linux Health Checker Putting it all together [user@lnxhost ~]$ lnxhc run -V ./panic_on_oops Collecting system information Running checks (1 checks) CHECK NAME HOST RESULT ========================================================================================== panic_on_oops .......................... lnxhost EXCEPTION-HIGH >EXCEPTION panic_on_oops.no_panic_on_oops(high) SUMMARY The panic-on-oops setting is disabled EXPLANATION Without the panic-on-oops setting, a Linux instance might keep running after an oops. SOLUTION Use the following command to enable the panic-on-oops setting echo 1 > /proc/sys/kernel/panic_on_oops REFERENCE See kernel documentation on panic-on-oops setting.  If it doesn't work, add more “-V”s ► Increase level of verbosity to help debugging 32 © 2012 IBM Corporation
  • 33. Linux Health Checker Wrap-up  To implement a check ► Create a directory ► Add files ● Meta-data ● Text files ● Check program ► Run/debug until it works  Health check creation dialog lnxhc devel –-create-check my_check ► Creates template files according to dialog input  Once done, consider contributing it! ► Post to lnxhc-list@lists.sourceforge.net ► See contribution guidelines at http://lnxhc.sourceforge.net/contributing.html 33 © 2012 IBM Corporation
  • 34. Linux Health Checker Further reading  Man pages ► Once installed use 'apropos lnxhc' to list man pages ► Also available on the web: http://lnxhc.sourceforge.net/manpages.html  User's Guide ► http://lnxhc.sourceforge.net/documentation.html  Main web page ► http://lnxhc.sourceforge.net/  Mailing list ► Open for questions, comments, ideas, contributions, etc. ► lnxhc-list@lists.sourceforge.net 34 © 2012 IBM Corporation