HP-UX 11i LVM Mirroring Features and Multi-threads by Dusan Baljevic
1. Dusan Baljevic
dusan.baljevic@ieee.org
a) LVM recover mirror consistency uses two methods:
MWC (Mirror Write Cache)
MCR (Mirror Consistency Record)
MCR and MWC are methods of keeping mirrors in synch and tracking writes to disk.
MCR is kept on the disks, in Volume Group Restricted Area (VGRA).
MWC is kept in core memory.
MWC/MCR is permanently running with the MWC in memory communicating with the MCR on
disk.
This can have an effect on performance. Also it is used because of quick recovery from a crash.
b) The purpose of the mirror write consistency cache (MWC) is to provide a list of possibly out of
sync mirrored areas. When a volume group is activated, the LVM copies all areas with an entry
in the MWC from one of the good copies to all the other copies.This ensures that the mirrors are
consistent, but makes no claims about the quality of the data.
c) On each write request to a mirrored logical volume that requests MWC, the LVM checks to
see if there is already an entry for the data area in the current MWC. If so, it just sends the write
to the underlying device driver. If there isn't an entry, it gets one and then waits for the now
updated MWC to be written to disk.
So, each write to one of these logical volumes will potentially introduce one extra serial disk
access. Whether or not this occurs is dependent on the degree to which accesses are random.
The more random, the higher probability of missing the MWC!
d) Getting an MWC entry can involve waiting for one to be available.If all the MWC entries are
currently being used by I/O in progress, a given request might have to wait in a queue of
requests until an entry becomes available.
Notice that the MWC entry is never freed on disk when a request returns to the LVM, it is merely
marked as available to be used by another outgoing request.
e) Whether or not you use the MWC will depend on which aspect of system performance is
more important to your environment:
run-time or
recovery-time
You can disable MWC to improve run-time performance. Entire data space will be resynched
after a crash. This may be done when a database is doing transaction logging for itself.
2. f) You can disable both MCR and MWC only if the application can maintain mirror consistency
itself (for example, database)!Mirrors will not be resynched by LVM after a crash.
MWC disabled gives better I/O performance.
If MCR is also disabled the mirrors will not synch at reboot.It will be up to you to decide if they
want these features in use or not.
With MCR enabled (that is the default), the LVM will not keep run-time records of modified
extents as MWC does, but in the event of a crash (followed by reboot and re-activation), the
LVM will copy all extents from one non-stale copy of the mirror to all other mirrored copies of
that extent.This is similar to the synchronization strategy used by DataPair/UX.The "good" copy
of the data is chosen arbitrarily from the non-stale extents as there is no record kept as to which
disk has the most recent copy of the data, so if a mirrored write is in progress during a crash, it
is possible that old data could be copied over new data during the mirrored recovery at
activation time. If this behavior is unacceptable, MWC should be chosen. For example, this
behavior would be preferred in situations where a database will re-write all incomplete
transactions after a crash, but relies on the file system as underlying structures: the consistent
mirrors will allow fsck to cleanly fix the file system, after which the database can update any of
its out-of-date data files.
g) If both mirrors are enabled, I/O is redirected to another mirror if one is busy - so it improves
performance. This should balance the I/O cost of MWC. The cost of disabling MWC and MCR is
a slower recovery after a crash.
h) In HP-UX 11.31, the MWC is larger in size than in previous releases.This leads to a better
logical volume I/O performance by allowing more concurrent writes. MWC has also been
enhanced to support large I/O sizes.
i) Logical volumes belonging to shared volume groups (those activated with "vgchange –a s") of
LVM version 1.0 and 2.0 must have the consistency recovery set to NOMWC or NONE.
Versions 1.0 and 2.0 do not support MWC for logical volumes belonging to shared volume
groups. This might have changed with some patches, but I did not check this yet...
With the September 2008 release of HP-UX 11i v3, LVM supports MWC for logical volumes
belonging to LVM version 2.1 shared volume groups. This ensures faster recovery following a
system crash.
j) Note that one cannot change MWC on an active logical volume.Here is an example for
primary paging device (swap):
Problem:While attempting to disable the "Mirror Write Cache" and "Mirror Consistency"for
primary swap (/dev/vg00/lvol2 ) which was mirrored, the following errormessage is shown:
The command used to modify logical volumes, /sbin/lvchange, has failed.
The stderroutput from the command is shown below. The logical volume has not been
modified.
3. lvchange: Could not change MirrorWriteCache while Logical Volume is opened orbeing
synchronized.
Solution:Since primary swap is activated when the system boots, even in single user mode,
theonly way to successfully use lvchange on the primary swap logical volume is fromLVM
maintenance mode.
To boot into LVM maintenance mode, reboot the machine and interrupt the
boot sequence.
>hpux -lm (PA-RISC)
Or
>boot -lm (IA64)
This will boot the machine into LVM maintenance mode. Use lvchange(1M) with the "-M" and "c"options to modify the mirror write cache and consistency settings.
# lvchange -M n -c n /dev/vg00/lvol2
k) A quick check of the system's lvol configurations will show if this parameter is misconfigured.
Assuming we are interested in vg00:
# lvdisplay /dev/vg00/lvol* | more
Look (or grep) for the lines which describe each lvol's "Consistency Recovery":
Consistency Recovery MWC
Consistency Recovery NOMWC
Consistency Recovery NONE
If the "Consistency Recovery" is set to NONE for anything other than a swap device (or a raw
database volume as stated above), it will need to be changed.Note that if the lvol is not currently
mirrored, this is not an issue, and can safely be ignored until the customer wants to mirror that
lvol.
It doesn't hurt to change the parameter early, and it could prevent stumbling later if they forget
about this problem by the time they go to mirroring.
l) If we need to change the MWC for logical volume that is already mirrored, the process is a
little bit more complex.
After determining which mirrored logical volumes need to have their consistency recovery
changed, the steps to take are: reduce the mirror to only one good copy (non-mirrored), change
the consistency recovery parameter, then recreate the mirroring configuration.
The simplest way to reduce a mirroring configuration to one without mirroring is to use "lvreduce
-m 0" to simply eliminate the mirror copies. Then use thelvchange(1M) to turn on consistency
followed by lvextend(1M) to re-add the mirrors.This reduction will minimize downtime, as it can
safely be done while the system is fully operational, but it has two drawbacks:
4. It allows the user less control over which copy of themirror will remain, and it may require
more reconstructionto recreate any specialized mirroring configuration such
as striped extents.
Although the logical volume can remain in-use during the operation,
it would be best to avoid using the logical volume until integrity
checks can be made on the data ().
Another way of getting to a non-mirrored state is to split-off the mirrored copies using
lvsplit(1M).
m) If importing a volume group from a previous release of HP-UX, there will be a full
resynchronization because the format of the MWC changed at HP-UX 11i v3. If the volume
group contains mirrored logical volumes using MWC, LVM converts the MWC at import time. It
also performs a complete resynchronization of all mirrored logical volumes, which can take
substantial time.
n) Now, let's list some of typical rules for MWC:
Disable MWC and set MCR to "none" for the database logical volume
because the
database logging mechanism already provides consistency recovery.
Disable MWC and MCR on mirrored logical volumes where the data is not needed after
a crash, such as paging device (swap space) or other raw scratch data.
Logical volumes containing database data or file systems with few or infrequently
written large files (greater than 256K) must not use the MWC when runtime performance
is more important than crash recovery time.
Use fast disks for the most intensive applications if they use mirrored logical
volumes.
Ensure that all physical volumes for mirrored logical volumes are active
because MWC and other I/O will be redirected to another mirror if one is busy so it improves performance.
Spread the data space across as many physical volumes as possible.
The number of volume groups is directly related to the MWC. Since there is only
one MWC per volume group, disk space that is used for many small random write
requests mustbe kept in distinct volume groups if possible when the MWC is used.
If possible, ensure that physical volumes in volume groups that contain mirrored
Logical volumes reside on different controllers. For example, in a system with several
disk devices on each card and several cards on each bus converter, create volume
groupsso that all disks off of one bus converter are in one group and all the disks on the
other are in another group (one way is via physical volume groups). This configuration
ensures that all mirrors are created with devices accessed through different I/O paths.
5. Since mirroring is typically used for root volume group only (these days all
other data is on SAN), it is strongly recommended not to allow any third-party
applications or software to run in it. I go to such an extreme that I even force
customers to use their own areas for temporary files:
1. Set TMPDIR variable to point to some other non-boot-volume.
I always encourage application admins to use their own areas for
temporary files.
Some applications look at TMPDIR environment variable.
Others look at two other variables: Try setting TEMP and TMP as well as
TMPDIR.
2. Mount /tmp file system with "tmplog" option in /etc/fstab.
/tmp is DESIGNED for temporary files, so it should not be abused for
other choices.
In "tmplog" mode, the intent log is almost always delayed.
This improves performance, but recent changes may disappear if the
system crashes.
3. Clean /tmp cleaned up at boot time (not really a performance issue
but usefulfor maintenance, especially if number of temporary files keep growing)?
By defaultI always enable it in /etc/rc.config.d/clean_tmps
CLEAR_TMP=1
Final comment is about multi-thread synching the mirror in LVM on HP-UX.
Option 1
lvsync(1M) recognizes the following option:
-T
Perform mirror synchronization of logical volumes
within a volume group using multiple parallel threads.
Logical volumes belonging to different volume groups
will be synchronized serially. It is possible that
logical volumes start and/or complete their
synchronization in a different order than specified on
the command line.
The maximum number of threads used can be controlled
using the PTHREAD_THREADS_MAX system tunable.
NOTE: This option has no effect if the volume group is
activated in shared mode.
For example, you can extend the logical volumes and then issue parallel threads:
# lvextend -m 1 -s /dev/vgapp/lvol1
# lvextend -m 1 -s /dev/vgapp/lvol2
# lvextend -m 1 -s /dev/vgapp/lvol3
6. # lvsync -T /dev/vgapp/lvol1 /dev/vgapp/lvol2 /dev/vgapp/lvol3
Option 2
Check the defragmentation on the file system which is linked tothe logical volumes you need to
mirror. For example
# fsadm -F vxfs -DEde -t 600 /mydata
… and take action if necessary.
Another advice is to do it on the weekends, when activityby the users decreases.
Note the following on HP-UX 11.31:
# getconf PTHREAD_THREADS_MAX
3002
# kctune -v max_thread_proc
Tunable
max_thread_proc
Description
Maximum number of threads in each process
Module
pm_proc
Current Value
3002
Value at Next Boot 3002
Value at Last Boot 3002
Default Value
256
Constraints
max_thread_proc>= 64
max_thread_proc<= nkthread
Can Change
Immediately or at Next Boot