http://www.ibm.com/support/docview.wss?uid=isg1IZ17558
On very loaded systems, this may not give the gsclvmd
process enough time to respond to the check, resulting in
the VG being forced offline during times of heavy system
load.
Customers could see, in the errpt, LVM_GS_LLEAVE
followed by LVM_SA_QUORCLOSE on the node where the VG was
forced offline, and see LVM_GS_RLEAVE on other nodes
in the cluster.
.
.
A related issue is: when an LVM configuration or stale
partition update happens in a concurrent VG, gsclvmd must
get approval from every node before making the change.
In doing so, currently gsclmvd will wait forever until
all remote nodes respond in some fashion.
.
Under certain problematic conditions, this behavior is
undesirable and can cause the LVM commands to wait
indefinitely.
Local fix
Problem summary
****************************************************************
* USERS AFFECTED:
* Customers may be exposed to this problem if the have the
* bos.clvm.enh fileset at a level below 6.1.0.2.
* They must also be using Concurrent LVM, which is utilized by
* HACMP Resource Groups using Fast Disk Takeover or the ‘Online
* on All Available Notes’ Startup Policy.
****************************************************************
* PROBLEM DESCRIPTION:
* On extremely busy clusters, or clusters experiencing poor
* network communication, the concurrent LVM daemon (gsclvmd) on
* a node may fail to respond to a responsiveness check issued by
* Group Services. In this case, we will force the Volume Group
* offline on that node to ensure there is no possibility that
* future LVM configuration changes will cause the Volume Group
* definition to become out of sync between the two nodes.
* However, forcing the VG offline could lead to unexpected
* downtime of applications using that volume group, or potential
* problems during HACMP failover.
****************************************************************
* RECOMMENDATION:
* Install APAR IZ17558.
****************************************************************
Problem conclusion
Both of the behaviors of gsclvmd described above will be
changed.
.
By default, we will no longer expel a node and force it’s
VG offline if it fails a responsiveness check.
A flag will be added to varyonvg that will allow you to
enable this behavior (expeling non responsive nodes) if
desired.
.
Also, if a node takes longer than 5 minutes to reply to
a vote (taken before making an LVM configuration change
or stale partition update on a concurrent VG), then we
will expel that node and the VG on that node will be
forced offline. You will see LVM_GS_CFGTIME followed
by LVM_GS_LLEAVE or LVM_GS_RLEAVE in the errpt if this
happens.
.
.
*Note: due to the changes in the default behavior of
gsclvmd, this apar needs to be applied to all nodes in
the cluster. If not, there may be problems if ever
a node is un-responsive to either a responsiveness
check or a vote request.
Temporary fix
*********
* HIPER *
*********
Comments
APAR information
APAR number IZ17558
Reported component name AIX 610
Reported component ID 5765G6200
Reported release 610
Status CLOSED PER
PE NoPE
HIPER YesHIPER
Submitted date 2008-03-11
Closed date 2008-03-11
Last modified date 2008-04-07
APAR is sysrouted FROM one or more of the following:
IZ13557
APAR is sysrouted TO one or more of the following:
U817458
Fix information
Fixed component name AIX 610
Fixed component ID 5765G6200
Applicable component levels
R610 PSY U817458 UP08/04/07 I 1000