This document provides information on various topics related to UCS security including system policies, high availability, system events, SNMP, firmware, and TAC information. It discusses how high availability is achieved in UCS through clustering of fabric interconnects, replication of data between nodes, and use of chassis EEPROM. It also describes fault states, severity levels, and how to view system events and collect TAC support information.
4. High Availability
www.silantia.com4
Two fabric interconnects two IOM per chassis so two
data paths. Per blade.
Clustering of FI requires same UCS manager version
and same model of FI.
Clustering is done thru L1 and L2 port on Fabric
interconnect. These ports are non-configurable.
L1-L2 ports 1000BaseTX using straight through Cat6
cable
Pre-configured to run LACP and CDP.
Links are 802.3ad bond managed by underlying OS.
5. High Availability
www.silantia.com5
Cisco UCS manager controller:
Distributed application runs on both the primary and
subordinate UCS manager instance
Each instance is represented by node ID
Separate process running on Cisco NX-OS
Defines running mode UCS manager processes
Cisco NX-OS:
Starts all Cisco UCS manager processes
Monitors and restart UCS manager processes.
6. High Availability
www.silantia.com6
Local Storage:
NVRAM and flash stores static data
Read and written but local Cisco UCS manager
instance
Replicated when both nodes are up
Chassis EEPROM
Serial EEPROM stores state data
Upto 3 chassis has its EEPROM written with state
information in two partitions.
Read and written by both chassis management
controller
Used to assist the Cisco UCS manager in determining
state of the cluster.
7. Viewing and Changing Management HA
www.silantia.com7
connect local-mgmt
dc101-A# sh cluster extended-state
Cluster Id: 0x898942147f8311e2-0x8af9547feeed8104
Start time: Sun May 26 18:36:30 2013
Last election time: Sun May 26 18:36:33 2013
A: UP, PRIMARY
B: UP, SUBORDINATE
A: memb state UP, lead state PRIMARY, mgmt services state: UP
B: memb state UP, lead state SUBORDINATE, mgmt services state:
UP
heartbeat state PRIMARY_OK
INTERNAL NETWORK INTERFACES:
eth1, UP
eth2, UP
HA READY
Detailed state of the device selected for HA storage:
Chassis 1, serial: FOX1450H4JK, state: active
dc101-A#
cluster lead
cluster force
L1 and L2 ports
Serial EEPROM Chassis
8. High Availability (split brain issues)
www.silantia.com8
Partition in space:
A partition in space occurs when the private network fails (no
path from L1 to L1 and L2 to L2)
There is a risk of active-active management node.
Both nodes are demoted to subordinate and a quorun race
begins.
The node that claims the most resources wins.
Partition in time:
A partition in time occurs when a node boots alone in the cluster.
Node compares its database version against the serial EEPROM
and discovers that its version number is lower than current
database version.
There is risk of applying an old configuration to UCS
components.
This node will not become the active management node.
10. Fault severity
www.silantia.com10
Severity Description
Critical A service-affecting condition that requires immediate corrective
action. This severity might indicate that the managed object is out of
service and its capability must be restored.
Major A service-affecting condition that requires urgent corrective action,
This severity might indicate a severe degradation in the capability of
managed object and that its full capability must be restored.
Minor A non-service impacting fault condition that requires corrective action
to prevent a mode serious fault from occurring,.
Warning A potential service-affecting fault that currently has no significant
effects in the system.
Condition An informational message about a condition, possibly independently
insignificant.
Info A basic notification or informational message, possibly independently
insignificant.
11. Fault states
www.silantia.com11
State Description
Active A fault was raised and it currently active
Cleared A fault was raised but did not reoccur during the flapping interval.
The condition that caused the fault has been resolved, and the fault
has been cleared
Flapping A fault was raised, cleared, and then raised again within a short time
interval, known as flap interval.
Soaking A fault raised and then cleared but since it was a flapping condition,
the fault severity remains at its original active value, but this state
indicates that condition that raised the fault has cleared.
14. SNMP
www.silantia.com14
All SNMP versions are supported. V1,v2c and v3.
Username and password is configurable on device for
SNMP version 3.
Source IP address of all SNMP transaction uses
cluster IP address.
Admin Tab -> Communication management ->
Communication services -> SNMP
16. Firmware
www.silantia.com16
UCSM, IOM and Fabric interconnect upgrade
Following steps are done under Equipment-> firmware management -
> Update/Activate firmware.
Activate Cisco UCS Manager new image
Activate the I/O modules new image
Activate the subordinate fabric interconnect new image
Manually failover the primary fabric interconnect to the fabric interconnect
that has already been upgraded.
This step is done thru command line using following command
UCS-A (local-mgmt) # cluster {force primary | lead {a | b}}
Verify that the data path has been restored.
Activate the primary fabric interconnect new image
Note: During fabric interconnect upgrade each blade will lose
one path but other path is available so fabric failover from UCS
and/or vmware nic teaming should work.
Upon activating IOM image, does not reboot the IOM, IOM
reboots and upgrade when connected fabric interconnect
reboots and upgraded.
17. Firmware
www.silantia.com17
Host firmware packages.
Grouping of Adapter, BIOS, Board controller, Storage
controller firmwares in to an entity which can be then used
in service profile.
Management firmware packages.
Set of CIMC images for different kinds of blades.
When above applied to a service profile which is
already associated it will trigger maintenance task.
Depends on how it is scheduled this firmware updates
will be applied.