Archive for April, 2008

PTF U814685 (for HMC V3 R3.3.7) readme updated.

Monday, April 28th, 2008

http://www14.software.ibm.com/webapp/set2/subscriptions/pqvcmjd?mode=18&ID=4219

Readme file updated for PTF U814685 for HMC Version 3 Release 3.3.7)
Description:

The readme file for PTF U814685 stated to install PTF U809968. However, since PTF U810401 supersedes PTF U809968, the readme file has been updated.

After installing PTF U814685, PTFs U808917 and U810401 must be installed.

In addition, the PTF U810401 readme file has been updated to state that it supersedes PTF U809968.

Note: PTF U809968 has been removed from the web.

View the U814685 Readme file for fix and enhancement information.

View the U810401 Readme file for fix and enhancement information.

Visit Hardware Management Console for all the latest updates.

PTF U814685 (for HMC V3 R3.3.7) readme updated

Monday, April 14th, 2008

http://www14.software.ibm.com/webapp/set2/subscriptions/pqvcmjd?mode=18&ID=4197

Readme file updated for PTF U814685 for HMC Version 3 Release 3.3.7)
Description:

A problem has been identified where the readme file for PTF U814685 incorrectly listed PTFs U810401, U809968 and U808917 as being included. When in fact, they were not.

PTFs U810401, U809968 and U808917 must be installed AFTER installing PTF U814685.

View the U814685 Readme file for fix and enhancement information.

Visit Hardware Management Console for all the latest updates

AIX 6.1 6100-00 Service Pack 4 Released

Monday, April 14th, 2008

Service Packs contain important fixes delivered between Technology Levels. 6100-00-04 is Service Pack 4 for the 6100-00 Technology Level.

Sockets may not get freed if an application is using the pollset APIs to poll on the sockets. This may cause the memory leak (i.e. memory usage slowly increasing) until the system becomes very sluggish or even hangs.

Tuesday, April 8th, 2008

http://www.ibm.com/support/docview.wss?uid=isg1IZ17881

High Impact/Highly Pervasive APAR

APAR Number: IZ17881

——————————————————————————–

APAR status
Closed as program error.

Error description
Sockets may not get freed if an application is using the
pollset APIs to poll on the sockets. This may cause the
memory leak (i.e. memory usage slowly increasing) until
the system becomes very sluggish or even hangs. The problem
has been seen when running a DB2 v9.1 client.
Local fix
Problem summary
****************************************************************
* USERS AFFECTED:
* Users of AIX 6.1 with the bos.mp64 fileset below the leve of
* 6.1.0.5.
****************************************************************
* PROBLEM DESCRIPTION:
* Sockets may not get freed if an application is using the
* pollset APIs to poll on the sockets. This may cause the
* memory leak (i.e. memory usage slowly increasing) until
* the system becomes very sluggish or even hangs. The problem
* has been seen when running a DB2 v9.1 client.
****************************************************************
* RECOMMENDAION:
* Install APAR IZ17881.
****************************************************************
Problem conclusion
AIX kernel was fixed to correctly manage the socket
reference count along the pollset API paths such that the
sockets can be freed after they are closed. Freeing the
sockets ensures that the memory is returned to AIX and is
not leaked.
Temporary fix
*********
* HIPER *
*********

Comments
APAR information
APAR number IZ17881
Reported component name AIX 610
Reported component ID 5765G6200
Reported release 610
Status CLOSED PER
PE NoPE
HIPER YesHIPER
Submitted date 2008-03-15
Closed date 2008-03-15
Last modified date 2008-04-07

APAR is sysrouted FROM one or more of the following:
IZ17873

APAR is sysrouted TO one or more of the following:

Fix information
Fixed component name AIX 610
Fixed component ID 5765G6200

Applicable component levels
R610 PSY U816201 UP08/04/07 I 1000

Currently, LVM will check the responsiveness of the concurrent LVM daemon (gsclvmd) on every node every 5 minutes and if that node doesn’t respond in 30 seconds, we will declare them unresponsive and the VG will be forced offline on that node

Tuesday, April 8th, 2008

http://www.ibm.com/support/docview.wss?uid=isg1IZ17558

On very loaded systems, this may not give the gsclvmd
process enough time to respond to the check, resulting in
the VG being forced offline during times of heavy system
load.
Customers could see, in the errpt, LVM_GS_LLEAVE
followed by LVM_SA_QUORCLOSE on the node where the VG was
forced offline, and see LVM_GS_RLEAVE on other nodes
in the cluster.
.
.
A related issue is: when an LVM configuration or stale
partition update happens in a concurrent VG, gsclvmd must
get approval from every node before making the change.
In doing so, currently gsclmvd will wait forever until
all remote nodes respond in some fashion.
.
Under certain problematic conditions, this behavior is
undesirable and can cause the LVM commands to wait
indefinitely.
Local fix
Problem summary
****************************************************************
* USERS AFFECTED:
* Customers may be exposed to this problem if the have the
* bos.clvm.enh fileset at a level below 6.1.0.2.
* They must also be using Concurrent LVM, which is utilized by
* HACMP Resource Groups using Fast Disk Takeover or the ‘Online
* on All Available Notes’ Startup Policy.
****************************************************************

* PROBLEM DESCRIPTION:
* On extremely busy clusters, or clusters experiencing poor
* network communication, the concurrent LVM daemon (gsclvmd) on
* a node may fail to respond to a responsiveness check issued by
* Group Services. In this case, we will force the Volume Group
* offline on that node to ensure there is no possibility that
* future LVM configuration changes will cause the Volume Group
* definition to become out of sync between the two nodes.
* However, forcing the VG offline could lead to unexpected
* downtime of applications using that volume group, or potential
* problems during HACMP failover.
****************************************************************
* RECOMMENDATION:
* Install APAR IZ17558.
****************************************************************
Problem conclusion
Both of the behaviors of gsclvmd described above will be
changed.
.
By default, we will no longer expel a node and force it’s
VG offline if it fails a responsiveness check.
A flag will be added to varyonvg that will allow you to
enable this behavior (expeling non responsive nodes) if
desired.
.
Also, if a node takes longer than 5 minutes to reply to
a vote (taken before making an LVM configuration change
or stale partition update on a concurrent VG), then we
will expel that node and the VG on that node will be
forced offline. You will see LVM_GS_CFGTIME followed
by LVM_GS_LLEAVE or LVM_GS_RLEAVE in the errpt if this
happens.
.
.
*Note: due to the changes in the default behavior of
gsclvmd, this apar needs to be applied to all nodes in
the cluster. If not, there may be problems if ever
a node is un-responsive to either a responsiveness
check or a vote request.
Temporary fix
*********
* HIPER *
*********
Comments
APAR information
APAR number IZ17558
Reported component name AIX 610
Reported component ID 5765G6200
Reported release 610
Status CLOSED PER
PE NoPE
HIPER YesHIPER
Submitted date 2008-03-11
Closed date 2008-03-11
Last modified date 2008-04-07

APAR is sysrouted FROM one or more of the following:
IZ13557

APAR is sysrouted TO one or more of the following:
U817458

Fix information
Fixed component name AIX 610
Fixed component ID 5765G6200

Applicable component levels
R610 PSY U817458 UP08/04/07 I 1000

On systems with devices.fcp.disk.rte greater than 5.3.0.63, the system may crash with a large stack traceback.

Tuesday, April 8th, 2008

1) AIX 6.1 : High impact/highly pervasive (2008.04.08)

http://www14.software.ibm.com/webapp/set2/subscriptions/pqvcmjd?mode=18&ID=4194#IZ17742

APAR Number: IZ17742
On systems with devices.fcp.disk.rte greater than
5.3.0.63,
the system may crash with a large stack traceback. The
stack traceback may look similar to the following:

# kdb
(2)> f
pvthread+0E5B00 STACK:
[000784D4]xmemdma64_list+000078 (0000000000000000,
F1000100CC565000, 0000000000020000, F1000100472ED608,
0000000000000001, 000000000000000E [??])
[03F31E18]d_map_list_tce+0005F0 (??, ??, ??, ??, ??)
[03FB1EC4]efc_mapdma_iocb+0005BC (??, ??)
[03FB830C]efc_start+0004E4 (??)
[03FB8EC4]efc_output+0004E0 (??, ??)
[03FFD59C]efsc_start+001050 (??, ??)
[04001878]efsc_strategy+00297C (??)
[000EEAE8]std_devstrat+000270 (??)
[000EEE94]devstrat@AF13_6+000058 (??)
[0404F000]scsidisk_start+001AE4 (??)
[04051504]scsidisk_strategy+000578 (??)
[000EEAE8]std_devstrat+000270 (??)



(2)> dr iar
iar : 00000000000784D4
.xmemdma64_list+000078 stdu stkp,FFFFFE60(stkp)
stkp=F0000000300140D0,FFFFFE60(stkp)=F000000030013F30

The system crashes while manipulating the stack pointer.