Operator Messages Manual

Chapter 84 SCL (ServerNet Cluster Subsystem) Messages

The messages in this chapter are generated by the HP NonStop™ ServerNet Cluster subsystem monitor process. The subsystem ID displayed by these messages includes SCL as the subsystem name.

NOTE: Negative-numbered messages are common to most subsystems. If you receive a negative-numbered message that is not described in this chapter, see Chapter 15.


1001

The ServerNet Cluster subsystem monitor process, process-name, has started in processor cpunum. Program file: filename Priority: pri Autorestart count: count Processor list: (first-processor-in-list [,next-processor-in-list, ..., last-processor-in-list])

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

cpunum

is the number of the processor in which the primary ServerNet cluster monitor process has started.

filename

is the name of the program file for the ServerNet cluster monitor process.

pri

is the priority at which the ServerNet cluster monitor process is running.

count

is the autorestart count configured for the ServerNet cluster monitor process.

first-processor-in-list ... last-processor-in-list

is the processor list configured for the ServerNet cluster monitor process.

Cause  The ServerNet cluster monitor process has been started by an operator or by the persistence manager process ($ZPM) after a failure of both the primary and backup ServerNet cluster monitor processes. (The ServerNet cluster monitor process has no means of distinguishing between the two cases.)

Effect  The ServerNet cluster monitor process is running.

Recovery  This is an informational message. No corrective action is required.



1002

The ServerNet Cluster subsystem monitor process, process-name, has terminated. Reason: reason.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

reason

is the reason for the termination. Possible values are:

100Process stopped by operator
101Environmental problem

Cause  The ServerNet cluster monitor process terminated voluntarily, either by an operator command or because an environmental problem caused it to self-terminate. If this event is due to self-termination, an SCL 1010 message reported the environmental problem.

Effect  The ServerNet cluster monitor process is no longer running.

Recovery  If this event is due to self-termination, follow recovery instructions for the SCL 1010 message. After correcting any environmental problems, restart the ServerNet cluster monitor process with an operator command.



1003

Process process-name: Primary processor cpunum

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

cpunum

is the number of the processor in which the primary ServerNet cluster monitor process is running.

Cause  Either the ServerNet cluster monitor process was initialized for the first time or a backup process has become the primary process.

Effect  The ServerNet cluster monitor process is running in the indicated processor.

Recovery  This is an informational message. No corrective action is required.



1004

Process process-name: backup process created in processor cpunum.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

cpunum

is the number of the processor in which the backup ServerNet cluster monitor process is running.

Cause  The ServerNet cluster monitor process successfully created a backup process.

Effect  The ServerNet cluster monitor process is no longer vulnerable to a single failure.

Recovery  This is an informational message. No corrective action is required.



1005

Process process-name: Unable to create backup in processor cpunum. process creation error: errnum, error detail: err-detail.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

cpunum

is the number of the processor in which the backup process creation attempt was made.

errnum

is the Guardian process creation error number.

err-detail

is the error detail subcode returned with the Guardian process creation error.

Cause  An attempt to create a ServerNet cluster monitor backup process has failed. For information on process creation errors and error detail subcodes, see the Guardian Procedures Errors and Messages Manual.

Effect  Until a backup process is started, the ServerNet cluster monitor process is vulnerable to a single failure. The ServerNet cluster monitor process attempts to start a backup process immediately if any processor in its processor list, other than that used by the primary process, is running. The ServerNet cluster monitor process makes two restart attempts in each processor eligible to contain the backup process. Each failed attempt results in an SCL 1005 message. If all attempts fail, an SCL 1007 message is generated.

Recovery  This is an informational message. Although no corrective action is required, the data in this message might provide information for recovery in the event of an SCL 1007 message.



1006

Process process-name: Backup process in processor cpunum failed.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

cpunum

is the number of the processor in which the backup ServerNet cluster monitor process was running.

Cause  The backup process of the ServerNet cluster monitor process pair failed.

Effect  Until a new backup process is started, the ServerNet Cluster monitor process is vulnerable to a single failure. The ServerNet cluster monitor process attempts to start a new backup process immediately if any processor in its processor list, other than that used by the primary process, is running. The ServerNet cluster monitor process makes two restart attempts in each processor eligible to contain the backup process. Each failed attempt results in an SCL 1005 event. If all of these attempts fail, an SCL 1007 event is generated.

Recovery  This is an informational message. Although no corrective action is required, the data in this message might provide information for recovery in the event of an SCL 1007 event.



1007

The ServerNet Cluster subsystem monitor process, process-name, is running without a backup. Reason: reason.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

reason

indicates why the backup process was terminated. Possible values are:

101No processor available
102Excess failed start attempts
103Backup Creating Failure

Cause  Either no processor is available for running the backup process, or there have been multiple failures of the backup process or attempts to create a backup process. If this event is due to backup process creation failures, there will have been associated SCL 1005 messages generated. Other possible precursors are SCL 1006 and SCL 1008 messages.

Effect  The ServerNet cluster monitor process is running without a backup, and the ServerNet cluster subsystem is vulnerable to a single failure. Whenever a processor in the processor list is reloaded, the ServerNet cluster monitor process attempts to create a backup there.

If this event is caused by repeated backup failures or backup process creation failures, the ServerNet cluster monitor process periodically attempts to create a backup.

Recovery  Either reload the processors on the processor list for the ServerNet cluster monitor process, or use the information in the associated SCL 1005 messages to determine the cause and recovery actions for backup process creation failures.

To list the processors that are configured in the processor list for the ServerNet cluster monitor process, issue an SCF INFO command. For example:

-> INFO PROCESS $ZZKRN.#ZZSCL, DETAIL


1008

The ServerNet Cluster subsystem monitor process, process-name, backup process is terminating. Reason: reason.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

reason

indicates the reason the backup process is terminating. Possible values are:

100Process stopped by operator
101Backup processor is down
102Checkpoint error

Cause  The backup processor failed, or the backup process was terminated by the primary process (for example, if the primary process found a fatal error when checkpointing to the backup.)

Effect  Until a new backup process has been started, the ServerNet cluster monitor process is vulnerable to a single failure. The ServerNet cluster monitor process attempts to start a new backup process immediately if any processor in its processor list, other than that used by the primary process, is running. The ServerNet cluster monitor process makes two restart attempts in each processor eligible to contain the backup process. Each failed attempt results in an SCL 1005 message. If all of these attempts fail, an SCL 1007 message is generated.

Recovery  This is an informational message. Although no corrective action is required, the data in this message might provide information for recovery in the event of an SCL 1007 message.



1009

ServerNet Cluster subsystem/Message System Monitor trace entry trace-entry.

trace-entry

contains an image of an internal ServerNet monitor process trace entry as it is recorded in memory.

Cause  An internal trace was initiated on the ServerNet monitor process.

Effect  Trace data is dumped into the EMS log.

Recovery  This is an informational message. No corrective action is required.



1010

ServerNet Cluster subsystem monitor process, process-name, reports info.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

info

describes the environmental problem. Possible values are:

1Wrong process name
2Bad CPU-List
3Wrong processor
5Internal error
6Nested signal
8Power on processing error
9Unsupported system topology
10Bad SvNet node number
11SvNet node number mismatch
12Missing ServerNet Cluster License

Cause  The ServerNet cluster monitor process found one or more environmental problems. The possible environmental problems are described in more detail under “Recovery.” This message is usually followed by an SCL 1002 termination message.

Effect  This is an informational message, but for certain environmental problems, the ServerNet Cluster monitor process terminates.

Recovery  Corrective action might be required to fix one or more environmental problems:

  • If info is “Wrong process name” or “Bad CPU-List,” alter the ServerNet cluster monitor process startup parameters (generic process configuration under SCF.)

  • If info is “Wrong processor,” correct the ServerNet cluster monitor process startup parameters or configure and start the ServerNet cluster monitor process through SCF. This error occurs only if the ServerNet cluster monitor process was started manually from an HP Tandem Advanced Command Language (TACL) prompt on a processor that is not in its processor list.

  • If info is “internal error,” “Nested signal,” or “Power on processing error,” the ServerNet cluster monitor process terminates and is restarted automatically. Submit the ZZSA* savefile to your service provider for analysis of this problem.

  • If info is “Unsupported system topology,”the system hardware has been incorrectly configured and the X and Y fabrics have different topologies. For example, the X fabric of a NonStop BladeSystem might be configured for connectivity to a BladeCluster Solution topology, but the Y fabric of the same system is not configured similarly (or vice-versa). Correct the configuration so that both fabrics are configured with the same topology and then restart SNETMON.

  • If info is “Bad SvNet node number,” the configured ServerNet node number is out of range with respect to the allowed node number range for the current configuration. An older version of ME firmware might be running on one or both fabrics. This environmental problem might also be reported if the local node number has changed, however SNETMON was not able to gracefully stop the subsystem to update the node number.

  • If info is “SvNet node number mismatch,” the X and Y fabrics have been configured differently through the OSM Low-Level Link. The ServerNet Cluster subsystem is shut down, an event is generated, and SNETMON is terminated. For recovery, configure the system correctly, then restart SNETMON.

  • If info is “Missing ServerNet Cluster License,” the $ZZSCL process finds that it is running on a node that has been configured using the OSM Low-Level Link for connectivity to a BladeCluster topology, but the node does not have a BladeCluster license file on either $SYSTEM.SYSnn or $SYSTEM.SYSTEM. The $ZZSCL process pair is terminated. Contact your service provider for the correct license file.



1011

$ZCNF Access error err, err-detail Operation: op.

err

is the error code returned from the DSM trace routine.

err-detail

is the error detail code returned from the DSM trace routine.

op

is the operation that was being performed when the error occurred. Possible values are:

 opDescription
1Record SizeThe error occurred while checking the size of the configuration record.
2Record FetchThe error occurred while trying to fetch the record.
3Record InsertThe error occurred while trying to insert the record.
4Database LockThe error occurred while trying to lock the database.
5Database UnlockThe error occurred while trying to unlock the database.
6Record UpdateThe error occurred while trying to update the record.

Cause  The ServerNet cluster monitor process encountered an error using the HP NonStop Kernel configuration services application program interface (API).

Effect  If this error occurs during process startup, the ServerNet cluster monitor process uses STARTSTATE STOPPED (default) instead of using the data stored in the private configuration record. If this error occurs later, the action is prompted by an SCF [ALTER | START | STOP] SUBSYS command. In this case, the command fails with an error.

Recovery  Investigate the cause of the error. Restart the process or reissue the failed SCF command.



1012

DSM Trace error err, err-detail Operation: op.

err

is the error code returned by the DSM trace routine.

err-detail

is the error detail returned by the DSM trace routine.

op

is the operation that was being performed when the error was encountered. Possible values are:

 opDescription
1InitThe error occurred in the DSM_TRACE_INIT_ function call.
2StartThe error occurred in the DSM_TRACE_NEW_(start trace) function call.
3VersionThe error occurred in the DSM_TRACE_NEW_(set version) function call.
5RecordThe error occurred in the DSM_TRACE_NEW_(add record) function call.
6StopThe error occurred in the DSM_TRACE_NEW_(stop trace) function call.

Cause  The ServerNet cluster monitor process encountered an error using the DSM trace routines.

Effect  Any pending trace is terminated.

Recovery  Investigate the cause of the error. Reissue the SCF TRACE command. If the problem persists, contact your service provider.



1013

ServerNet Cluster subsystem monitor process, process-name, cannot register for SANMAN notifications. Reason: reason.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

reason

is the reason that the registration attempt failed. Currently, the only reason is “Fabric Access Control struct Full” (-2).

Cause  The ServerNet cluster monitor process attempted to register for ServerNet SAN manager (SANMAN) notifications relative to external fabric connection status changes. However, no room remained in the Fabric Access Control (FAC) structure in system global memory, so the process could not be registered.

Probably all registration slots are being consumed by previously registered clients. Currently, the maximum number of clients allowed to register simultaneously is 32.

Effect  The ServerNet Cluster subsystem is placed in the STOPPED state. Direct ServerNet communication with processors on other servers, even when physically connected, is not possible.

Recovery  Stop at least one of the processes currently registered. Then use the SCF START SUBSYS command to restart the ServerNet Cluster subsystem.



1014

ServerNet Cluster subsystem, process-name, cannot start. Reason: reason.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

reason

describes determination made by the ServerNet cluster monitor process determination of the reason that the external fabric connections for this ServerNet node are not ready. Possible values are:

14Node invalid
15NNA not programmed

Cause  The ServerNet cluster monitor process has determined that the physical connection to the ServerNet Cluster is not ready for ServerNet connections.

Probably the ServerNet cluster monitor process was running before the ServerNet SAN manager process could communicate with the external fabrics or program the Node Numbering Agents (NNAs).

Effect  The ServerNet Cluster subsystem remains in the STOPPED state. Direct ServerNet communication with processors on other servers is not possible.

Recovery  The ServerNet SAN manager process automatically notifies the ServerNet cluster monitor process when the external fabric connections are ready for use. The ServerNet cluster monitor process then retries the Start command.



1015

InterProcessor Communication (IPC) Monitor Process, process-name, Reports cause.

process-name

is the name of the Message system monitor process ($ZIM<nn>).

cause

describes the environmental problem.

Cause  A message monitor process found some environmental problems such as the process name being wrong (not $ZIMnn). This event is usually followed by a message monitor termination event.

Effect  This is an informational message, but for certain environmental problems, the message monitor process terminates.

Recovery  Alter the message system monitor process’s (MSGMON) startup parameters (generic configuration under SCF).



1016

Process process-name is not compatible with the current version of the Kernel message system.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL) or the message system monitor process ($ZIM<nn>).

Cause  The ServerNet cluster monitor process or the message system monitor process compared its own version and that of the NonStop Kernel message system and determined that the versions were incompatible. This event will be followed by a process termination event.

Effect  The ServerNet cluster monitor process or the message system monitor process terminates.

Recovery  Ensure that the versions of the ServerNet cluster monitor process and message system monitor process (T0294) and the NonStop Kernel message system (T9050) are compatible. Check SPR requisites for T9050 and T0294.



1100

The ServerNet direct connection from processor local-cpu to processor remote-cpu in ServerNet node remote-node[,] [ system name sysname, Expand node remote-sysnum ] has become unusable.

local-cpu

is the number of the local processor for which connectivity is lost.

remote-cpu

is the number of the remote processor for which connectivity is lost.

remote-node

is the ServerNet node number of the remote system.

sysname

is the name of the remote system.

remote-sysnum

is the Expand node number of the remote system.

Cause  Both the X and Y paths from the local processor to the indicated remote processor are down.

This message is preceded by one or more IPC (NonStop Kernel Message System) 110 messages generated by the local processor. The IPC 110 messages report the causes of the failures.

If the lost connection is due to the failure of the remote processor, an SCL 1102 message is generated.

NOTE: If the ServerNet cluster monitor process receives the remote processor down information in time, the SCL 1100 message is suppressed.

Effect  All intersystem ServerNet traffic between the indicated local and remote processors is routed via the Expand-over-ServerNet line-handler process. Consequently, transmission is slower.

Recovery  This is an informational message. For recovery information, see the accompanying IPC 110 or SCL 1102 messages.



1101

The ServerNet direct connection from processor local-cpu to processor remote-cpu in ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has been restored.

local-cpu

is the number of the local processor for which connectivity is restored.

remote-cpu

is the number of the remote processor for which connectivity is restored.

remote-node

is the ServerNet node number of the remote system.

sysname

is the name of the remote system.

remote-sysnum

is the Expand node number of the remote system.

Cause  One or both of the paths between the indicated processors is restored. An IPC (NonStop Kernel Message System) 111 message with additional information is generated by the local processor.

NOTE: This event is not generated during ServerNet cluster monitor process initialization.

Effect  Direct ServerNet communication between the processors is possible.

Recovery  This is an informational message. No corrective action is required.



1102

Processor remote-cpu in ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has failed.

remote-cpu

is the number of the remote processor that failed.

remote-node

is the ServerNet node number of the remote system.

sysname

is the name of the remote system.

remote-sysnum

is the Expand node number of the remote system.

Cause  A remote processor failed. This event is logged on every other ServerNet cluster member when a processor on a node fails.

Effect  ServerNet paths between all local processors and the remote processor are taken down. The local processors suppress IPC 110 messages in this case. Possibly local processors detected path failures and logged path down events before being informed of the remote processor's failure. The particular sequence of events depends on the speed with which the ServerNet cluster monitor process is informed of the remote processor's failure and the levels of message traffic from the local system to the failed remote processor.

Recovery  Reload the remote processor.



1103

Processor remote-cpu in ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has been reloaded. ServerNet direct connectivity to that processor has been restored.

remote-cpu

is the number of the remote processor that is reloaded.

remote-node

is the ServerNet node number of the remote system.

sysname

is the name of the remote system.

remote-sysnum

is the Expand node number of the remote system.

Cause  A remote processor is reloaded, and its connections with this system are restored. This event is logged on every other ServerNet cluster member when a processor on a node is reloaded.

Effect  Direct ServerNet message traffic with the indicated processor can resume.

Recovery  This is an informational message. No corrective action is required.



1104

Local processor local-cpu was reloaded. ServerNet direct connectivity to remote systems have been restored.

local-cpu

is the number of the local processor that is reloaded.

Cause  A local processor is reloaded.

Effect  Local connections of the processor are restored.

Recovery  This is an informational message. No corrective action is required.



1105

Processor remote-cpu in ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has lost connectivity to the ServerNet fabric.

remote-cpu

is the number of the remote processor that lost its ServerNet connectivity.

remote-node

is the ServerNet node number of the remote system.

sysname

is the name of the remote system.

remote-sysnum

is the Expand node number of the remote system.

fabric

identifies the ServerNet fabric (X or Y) to which the indicated processor lost connectivity:

1X-Fabric
2Y-Fabric
4Fabric-Unknown

Cause  A remote processor detected that its connection to the indicated ServerNet fabric failed. Possibly the other PMF CRU in the processor enclosure was removed. The indicated processor logs an IPC (NonStop Kernel Message System) 112 message on its local system.

Effect  ServerNet paths on the indicated fabric between all local processors and the remote processor are taken down. Direct ServerNet message traffic is still possible by using the other fabric. When the local processors down the paths to the remote processor on the indicated fabric, IPC 110 message logging is suppressed.

Possibly local processors detected path failures and logged path down events before being informed of the remote processor's fabric failure. The particular sequence of events depends on the speed with which the Servernet cluster monitor process is informed of the remote processor's fabric failure and the levels of message traffic from the local system to the remote processor.

Recovery  This is an informational message. For recovery information, see the IPC 112 message.



1106

Processor remote-cpu in ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has regained fabric ServerNet connectivity.

remote-cpu

is the number of the remote processor whose fabric connection was restored.

remote-node

is the ServerNet node number of the remote system.

sysname

is the name of the remote system.

remote-sysnum

is the Expand node number of the remote system.

fabric

identifies the ServerNet fabric (X or Y) to which the indicated processor regained connectivity:

1X-Fabric
2Y-Fabric
4Fabric-Unknown

Cause  A remote processor regained connectivity to the indicated ServerNet fabric.

Effect  Paths between the local system are restored and paths between the local system and the indicated remote processor on the indicated fabric wait for automatic recovery kick in. The system on which the processor resides logs an IPC (NonStop Kernel Message System) 113 message.

Recovery  This is an informational message. No corrective action is required.



1107

ServerNet connectivity over the fabric to ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has been lost.

fabric

identifies the ServerNet fabric (X or Y) over which connectivity to the remote system is lost:

1X-Fabric
2Y-Fabric
4Fabric-Unknown

remote-node

is the ServerNet node number of the remote system.

sysname

is the name of the remote system.

remote-sysnum

is the Expand node number of the remote system.

Cause  All individual processor-to-processor paths over the indicated fabric from the local system to the remote system failed. These failures are documented by IPC (NonStop Kernel Message System) 110 messages generated by the individual processors. Possibly a ServerNet router or cable failed. If both fabrics are indicated through the IPC 110 messages, the remote system itself might have failed or lost power.

Effect  All communication with the remote system over the given fabric ceases.

Recovery  If only one fabric is involved, investigate the condition of the intervening ServerNet routers and cables. If both fabrics are involved, investigate the condition of the remote system itself. In either case, the associated IPC 110 messages might provide further recovery information.



1108

ServerNet direct connectivity with ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has been lost due to reason.

remote-node

is the ServerNet node number of the remote system to which ServerNet direct connectivity has been lost.

sysname

is the name of the remote system.

remote-sysnum

is the Expand node number of the remote system.

reason

indicates why connectivity to the remote system has been lost. Possible values are:

 reasonDescription
1Cause unknownThe disconnection is the result of the failure of each individual processor-to-processor connection between the systems. In this case, the processors generate IPC 110 messages that might provide more information.
2Cause remote, unknown 
4Remote operator commandThe remote system left the ServerNet cluster because of an operator command on the remote system.
6Power onPower fail/recovery on the local system.
16Duplicate System NumberThe remote system left the ServerNet Cluster because it has a duplicate system number.

Cause  ServerNet direct connectivity to a remote system is lost for the reason stated in the message. The remote system failed, lost power, or has a duplicate system number. The local system might have lost power. Router or cable failures might have occurred on both ServerNet fabrics.

Effect  All connections with the remote system are shut down.

Recovery  Depending upon the reason indicated, correct the problem.

For example, bring up the failed processors or bring up/replace the failed cable or router.

If a protocol error occurred, there might be an associated HP Tandem Failure Data System (TFDS) (TFDS subsystem ID: DMP) failure capture event for the primary ServerNet cluster monitor process.

If the reason was Duplicate System Number, change the Expand node number of the newly connected node to a unique number (range 0 to 254) in the cluster. For more information about changing the system number, see the SCF Reference Manual for the Kernel Subsystem and contact your service provider.



1109

ServerNet direct connectivity with ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has been initialized [WITH WARNINGS].

remote-node

is the ServerNet node number of the remote system with which a ServerNet direct connection is initialized.

sysname

is the name of the remote system.

remote-sysnum

is the Expand node number of the remote system.

Cause  A ServerNet connection is established with a remote system. This connection might be the initial one caused by starting ServerNet cluster services on either of the systems, or it might be the recovery of a failed connection.

Effect  Direct message system traffic of ServerNet between the two systems is possible.

Recovery  This is an informational message. No corrective action is required.



1110

ServerNet Cluster subsystem configuration error on fabric fabric while performing the test. [Fabric fabric is not usable for ServerNet connectivity with external nodes.]

fabric

identifies a ServerNet fabric (X or Y) that has a configuration error.

test

is the test that the service processor (SP) was running when it found the configuration error. Possible values are:

testDescription
SP-RouterThe router configuration test (for each fabric) tests that all appropriate routers are configured to provide for routing packets out to other clusters.
SP-Node-NumberThe node number test (for each fabric) tests that all SPs are using the same ServerNet node number when assigning ServerNet IDs.

Cause  The service processor found a configuration error when checking each ServerNet fabric prior to starting ServerNet cluster services.

The tokens in the event contain details of the configuration error, including:

  • The number of the enclosure, the module, and the slot containing the CRU that was found to be not configured for inclusion in a ServerNet cluster

  • The error code returned by the service processor for the CRU that was found to be not configured for inclusion in a ServerNet cluster

Effect  If either the X or Y fabric is correctly configured, ServerNet cluster services move to the STARTING state and connections with remote systems are established on the correctly configured fabric.

If both fabrics are incorrectly configured, ServerNet cluster services remain in the STOPPED state, and no connections with remote systems are established.

Recovery  Determine the details of the configuration error and correct it.



1111

No systems were discovered for ServerNet direct connectivity.

Cause  This system is the first in the ServerNet cluster to be started, or a ServerNet connectivity failure occurred.

Effect  The system attains the STARTED state, but there is no ServerNet connectivity with other systems.

Recovery  If this system is the first in the ServerNet cluster to be started, no corrective action is required. Otherwise, repair the ServerNet connectivity problem.



1112

Registration to listen to remote ServerNet Cluster discovery packets failed.

Cause  The MS Driver in the reporting processor could not register to listen to permissive packets.

Effect  Any discovery packets sent to the reporting processor are not delivered to the MS Driver. The MS Driver in the reporting processor cannot find out about any packets received in the permissive Access Validation and Translation Table Entry (AVTTE). If there are no registered processors in the target system, the target system can neither be discovered nor can it generate any SCL 1114 events.

Recovery  Stopping one or more subsystems that register for permissive packets should allow the MS Driver to register itself. MS Driver registration occurs when a processor is loaded or reloaded, when the message monitor process is initialized, and when the ServerNet cluster monitor process is initialized. The recommended long-term solution is to request a TNet Services (T8460) SPR capable of supporting a larger number of registered listeners.



1113

Discovery of remote ServerNet node node failed due to reason. Details of failed discovery attempt: Protocol Stage: stage [ Selected target processors: disc-targs | Sender protocol version: send-prot | Target protocol version: targ-prot | Sender minimum protocol version: send-min-prot | Target minimum protocol version: targ-min-prot | Target processor number: targ-cpu | Node instantiation error: inst-err ]

node

is the target node for which discovery failed in the sender system.

reason

indicates the reason discovery failed. Possible values are:

2response packet not received
3ServerNet send errors
6failure to instantiate remote node

stage

is the protocol stage at the time of discovery failure.

disc-targs

is a 16-bit binary mask representing the processors selected by the sender ServerNet cluster monitor process as discovery targets.

send-prot

is the discovery protocol version of the sender ServerNet cluster monitor process.

targ-prot

is the discovery protocol version of the target ServerNet cluster monitor process.

send-min-prot

is the minimum discovery protocol interpretation version of the sender ServerNet cluster monitor process.

targ-min-prot

is the minimum discovery protocol interpretation version of the target ServerNet cluster monitor process.

targ-cpu

is the processor number returned by the target ServerNet cluster monitor process in the discovery response packet payload.

inst-err

is the type of error encountered by the processor of the sender ServerNet cluster monitor process when instantiating the target node. Possible error types are:

1 Memory allocation 2 AVTTE allocation 3 Device installation

Cause  A discovery failure was detected by the sender system. The ServerNet cluster monitor process could not discover the target system.

Effect  A ServerNet cluster connection to the target system is not established. The systems perform periodic retries to discover each other. Once the cause of failure is corrected, discovery should proceed automatically.

Recovery  The recovery procedure depends on the value of reason:

reasonRecovery
response packet not receivedFind the matching SCL 1114 message on the target system to determine the exact cause of the discovery failure and take appropriate action.
ServerNet send errorsDetermine if the target system is down, not connected, or incorrectly connected to the ServerNet cluster. Then bring the system up, and connect or reconnect it to the ServerNet cluster as needed.
failure to instantiate remote nodeThis failure has several possible causes. The cause could be one of shortage of resources such as memory, ServerNet AVTTEs, or ServerNet devices in the sender ServerNet cluster monitor processor.



1114

Discovery started by remote ServerNet node node failed due to reason. Details of failed discovery attempt: Protocol Stage: stage | [ Sender protocol version: send-prot | Target protocol version: targ-prot | Sender minimum protocol version: send-min-prot | Target minimum protocol version: targ-min-prot | Sender processor number: send-cpu | Sender processor type: send-cpu | ServerNet reported sender node: reported-node | ServerNet reported sender processor:reported-cpu | Expected sender ServerNet ID: exp-snetID | Reported sender ServerNet ID: ret-snetID ]

node

is the target node to which discovery failed.

reason

indicates the reason discovery failed. Possible values are:

1protocol version error
4ServerNet ID mismatch
5duplicate node number
7non-existing SNetMon process
8SCL subsystem in STOPPED state
9unsupported node number
10bad processor number
11bad discovery packet
12unsupported processor type

stage

the protocol stage at the time of discovery failure.

send-prot

is the discovery protocol version of the sender ServerNet cluster monitor process.

targ-prot

is the discovery protocol version of the target ServerNet cluster monitor process.

send-min-prot

is the minimum discovery protocol interpretation version of the sender ServerNet cluster monitor process.

targ-min-prot

is the minimum discovery protocol interpretation version of the target ServerNet cluster monitor process.

send-cpu

is the processor number returned by the target ServerNet cluster monitor process in the discovery response packet payload.

reported-node

is the node number reported by ServerNet.

reported-cpu

is the processor number reported by ServerNet.

exp-snetID

is the ServerNet node identification (ServerNet ID) stored by the sender ServerNet cluster monitor process for the target (node, processor) during process initialization. This ServerNet ID is calculated by the service processor in the sender system.

ret-snetID

is the actual ServerNet ID of the target processor (determined by the sender on reception of a discovery response packet).

Cause  A discovery failure was detected by the target system. The ServerNet cluster monitor process could not discover the sender system.

Effect  A ServerNet cluster connection to the target system is not established. The systems perform periodic retries to discover each other. Once the cause of failure is corrected, discovery should proceed automatically.

Recovery  The recovery procedure depends on the value of reason:

reasonRecovery
protocol version errorVerify ServerNet cluster monitor and T9050 versions and update as needed.
ServerNet ID mismatch or duplicate node numberVerify the node number configuration and Service Processor versions, and make necessary changes.
nonexisting SNetMon process or SCL subsystem in the STOPPED stateStart up the ServerNet cluster subsystem with SCF START SUBSYS command.
unsupported node numberUpdate the operating system.



1115

Error PQE received From ServerNet node: remote-node, processor: remote-cpu [ SNError: snerror ] X Path Error: xerror Y Path Error: yerror SIB retry count: retry-count

remote-node

is the ServerNet node number of the remote system to which ServerNet cluster monitor process (SNETMON) connectivity is lost.

remote-cpu

is the number of the processor on the remote system on which SNETMON resides.

snerror

indicates the reason ServerNet cluster connectivity failed. Possible values are:

2001transfer fail
2006No paths available

xerror

identifies the reason for the transfer failure on the X fabric. This internal error code is used for troubleshooting.

yerror

identifies the reason for the transfer failure on the Y fabric. This internal error code is used for troubleshooting.

retry-count

is the number of times a packet was sent before the transfer was considered failed.

Cause  The connection between the ServerNet cluster monitor process (SNETMON) on the local system and SNETMON on a remote system is lost due to a low-level communication problem.

Effect  The connection between the SNETMON process on the local node and the SNETMON process on the remote node is lost.

Recovery  This is an informational message. No corrective action is required; recovery is automatic.



1116

ServerNet connectivity over the fabric to ServerNet node remote-node[,] [ system name sysname, Expand node sysnum ] has been established.

fabric

indicates which fabric’s connectivity has been established.

remote-node

is the ServerNet node number of the remote system.

sysname

is the Expand name of the remote system.

sysnum

is the Expand system number of the remote system.

Cause  There is at least one working path on the specified fabric from the local node to the specified remote system now, when previously there were none. The path state changes are documented by IPC 111 events generated by the individual processors.

Effect  There is at least partial connectivity with the remote system over the given fabric.

Recovery  This is an informational message only. No corrective action is required.



1117

A call to Service Processor (SP) I/O library routine routine failed.

routine

identifies the SP I/O library routine. These routines are possible:

  • SP-Session-Create

  • SP-Session-Destroy

  • CRU-Handle-Get

  • Device-Handle-Get

  • Cluster-Module-Info-Get

  • Cluster-Id-Info-Get

  • Cluster-Config-Get

Cause  The ServerNet cluster monitor process (SNETMON) made a call to an SP I/O library routine, but it failed.

Effect  The ServerNet cluster monitor process (SNETMON) automatically retries the SP I/O library routine, possibly after destroying its previous session with the Service Processor and starting a new session.

Recovery  The ServerNet cluster monitor process (SNETMON) automatically retries the SP I/O library routine. The Service Processor might have to be initialized, or the Service Processor firmware might have to be upgraded. Contact your service provider for assistance.



1118

The SMC Driver API interface call error returned error err, err-detail error detail.

[ Node Number nodeNum Fabric fabric.]

nodeNum

Identifies the ServerNet node number associated with the failed SMC Driver API interface call.

fabric

Identifies the external ServerNet fabric (X or Y).

err

is a map token containing data relative to the call and its error returns. The data includes:

err-detailDescription
SMC API Error VersionThe version of the data structure. The structure version is incremented whenever the structure changes.
SMC API An enumeration of the specific interface call which returned the error.
SMC API ErrorAn enumeration of the possible error returns from the SMC Driver.
SMC API Error DetailAn integer code of the lower level error responses received by the SMC Driver, and which resulted in the driver returning an error to SNETMON. These error codes are usually ServerNet Millicode codes.

Cause  SNETMON invoked an SMC Driver API interface function. The function returned a code other than SMC_RTN_OK.

Effect  SNETMON will automatically retry the SMC Driver API call.

Recovery  This event is primarily provided for support personnel. SNETMON will automatically retry the SMC Driver API call. If the error persists (as indicated by additional SCL 1118 events), use the error and error detail codes in the event to determine and correct the cause of the failure.



1119

The ServerNet Cluster subsystem monitor process, process-name, detected an error when registering with the SMC Driver. Error: err.

process-name

is the name of the ServerNet Cluster subsystem monitor process ($ZZSCL).

err

contains the reason for the failure. Possible values are:

2SMC Bad parameter
5Mismatch between the executing versions of SNETMON and the SMC Driver
8SMC No GLobal Space Error
10SMC Lock Memory Error

Cause  The primary SNETMON process attempted to register with the SMC Driver, but registration failed.

Effect  The primary SNETMON process will not be able to communicate directly with remote nodes via ServerNet reads and will terminate itself. The backup SNETMON process will take over and attempt to register with the SMC Driver when it becomes the new primary. The SNETMON process pair will terminate if registration with the SMC Driver does not succeed in at least one processor in SNETMON's configured CPU list.

Recovery  SNETMON will automatically attempt to register with the SMC Driver on a different processor. A ZZSA* save abend file will be created when the primary SNETMON process terminates. The ZZSA* save abend file should be provided for analysis, along with the ZZSV* service log event file containing the SCL 1119 event.



1120

The ServerNet Cluster subsystem monitor process, process-name, version is not compatible with the current version of the SMC driver.

process-name

is the name of the ServerNet Cluster subsystem monitor process ($ZZSCL).

Cause  SNETMON made a comparison of its own version and that of the SMC driver that it is using. The version was determined to be incompatible.

Effect  The ServerNet Cluster subsystem monitor process terminates.

Recovery  Check SPR requisites for the SMC Driver (T2800) and SNETMON (T0294). The operator needs to ensure that the versions of SNETMON and the SMC Driver are compatible.



1200

ServerNet statistics for {the local node | remote ServerNet node node } logged due to cause.

node

is the ServerNet node number of the remote system that collected the statistics data.

cause

indicates why the statistics were generated. Possible values are:

causeDescription
sampling period expirationThe sampling timer expired. Statistics events are automatically generated at one-hour intervals.
operator statistic requestOperator request for statistics.
operator statistics reset requestOperator request to reset the statistics counters.

Cause  Statistics data is sent to the EMS service log.

Effect  Statistics counters with nonzero values are written to the EMS service log. If a request to reset the statistics counter was received, all statistics counters are reset to 0.

Recovery  This is an informational message. No corrective action is required.

Information Reported in the Node Statistics Event (1200)

Each processor keeps a set of counters for each system in the ServerNet cluster. To keep the amount of data sent to the service log at a minimum, statistics counters are present in the node statistics event only if they have nonzero values. The node statistics event contains this information:

Statistics Node Identification Information

This information is always included in the statistics event to identify the node that is the source of the statistics counters:

HeadingDescription
Statistics-Node-NumberNode number of local (0) or remote (1-96) system that collected the statistics data
Statistics-System-NameSystem name
Statistics-System-NumberExpand node number
Statistics-Reset-TimeTimestamp indicating when the statistics data counters were last reset
Statistics-Sample-TimeTimestamp indicating when the statistics data counters were collected
Statistics-Event-ReasonReason the statistics event was generated

Messages Sent and Received Counters

The node statistics event reports the number of each message type sent or received to and from each system, including the local one. Counts are kept separately for each system.

This information is returned in the Datalist Messages Sent and Datalist Messages Received structures:

HeadingDescription
Sequenced-RequestsNumber of sequenced request messages sent or received
Sequenced-RepliesNumber of sequenced reply messages sent or received
Sequenced-AbandonsNumber of sequenced abandon (cancel) messages sent or received
Sequenced-PIOsNumber of sequenced processor I/O (PIO) messages sent or received
Sequenced-GLUPsNumber of sequenced global update (GLUP) messages sent or received
Unsequenced-Ack-NacksNumber of unsequenced ACK/NACK messages sent or received
Unseq-Handshake-RequestsNumber of unsequenced handshake requests sent or received
Unseq-Handshake-RepliesNumber of unsequenced handshake replies sent or received

Message System Error Counters

The node statistics event reports the number of errors detected on connections with each system (including the local one):

HeadingDescription
Wack-TimeoutsNumber of timeouts that occurred while the processor was waiting for acknowledgments
Sequence-ErrorsNumber of sequence errors
R10K-Speculative-WritesNumber of R10K speculative write errors
Unexpected-PacketsNumber of unexpected packets received from a processor that was considered to be down

Path Management Counters (X Fabric and Y Fabric)

Path management counters record path events pertaining to paths that originate within the processor. Counts are kept separately for each X and Y fabric.

This information is returned in the Datalist Path Management X and Datalist Path Management Y structures:

HeadingDescription
Path-Up-EventsNumber of path up events
Path-Down-EventsNumber of path down events
Path-Switch-EventsNumber of periodic path switches away from the fabric
Fabric-Up-EventsNumber of fabric up events
Fabric-Down-EventsNumber of fabric down events

ServerNet Path Error Counters (X Fabric and Y Fabric)

ServerNet path error counters are maintained separately for each system, including the local one. Counts are kept separately for each X and Y fabric.

This information is returned in the Datalist TNet Errors X and Datalist TNet Errors Y structures:

HeadingDescription
BTE-TimeoutsNumber of block transfer engine (BTE) timeouts
Transfer-NacksNumber of transfer negative acknowledgments (NACKs)
Barrier-TimeoutsNumber of barrier timeouts
Barrier-NacksNumber of barrier NACKs
Reception-ErrorsNumber of reception errors other than spurious acknowledgments (ACKs) and bad ServerNet destination ID packets
Spurious-AcksNumber of spurious ACKs received
Bad-Destination-IDsNumber of bad destination ServerNet ID packets received

Cause Register Error Counters

Cause register error counters that are not traceable to connections with any particular system are included only in Node statistics events generated for local nodes. Node statistics events for remote nodes do not contain cause register error counter statistics.

HeadingDescription
Xfer Sidebuffer CorruptNumber of incoming transfer sidebuffer corruptions
Exception-Queue-ErrorsNumber of exception interrupt packets received while its corresponding queue was full
Write-Overflow-ErrorsNumber of write overflow errors
Read-Overflow-ErrorsNumber of read overflow errors
Queue-Full-ErrorsNumber of interrupt packets received while its corresponding queue was full
Link-Exception-on-X-ErrorsNumber of link exception errors on the X fabric
Link-Exception-on-Y-ErrorsNumber of link exception errors on the Y fabric

Generic Error Counters (X Fabric, Y Fabric, and Unknown Fabric)

Counters are maintained in each processor for errors that are not traceable to connections with any particular system. These generic error counters are included only in statistics events generated for local nodes. Node statistics events for remote nodes do not contain generic error counter statistics. There is a separate set of generic error counter statistics for each X, Y, and unknown fabric.

This information is returned in the Datalist Generic Errors X, Datalist Generic Errors Y and Datalist Generic Errors U structures:

HeadingDescription
RCV-Ugly-ErrorsNumber of ill-formatted packet errors received
TPB-ErrorsNumber of packet errors with a “this packet bad” (TPB) symbol
CRC-ErrorsNumber packets with cyclic redundancy check (CRC) errors
TBP-CRC-ErrorsNumber packets with a CRC error and a TPB symbol
Underrun-ErrorsNumber of packets with an underrun error
Underrun-TPB-ErrorsNumber of packets with an underrun error and a TPB symbol
Underrun-CRC-ErrorsNumber of packets with an underrun error and a bad CRC
Underrun-TPB-CRC-ErrorsNumber of packets with an underrun error, a TPB symbol, and a bad CRC
Runt-ErrorsNumber of packets with a runt error
Runt-TPB-ErrorsNumber of packets with a runt error and a TPB symbol
Runt-CRC-ErrorsNumber of packets with a runt error and a bad CRC
Overrun-ErrorsNumber of packets with an overrun error
Overrun-TPB-ErrorsNumber of packets with an overrun error and a TPB symbol
Overrun-CRC-ErrorsNumber of packets with an overrun error and a bad CRC
Overrun-TPB-CRC-ErrorsNumber of packets with an overrun error, a TPB symbol, and a bad CRC
Unsupported-Type-ErrorsNumber of packets with an unsupported transaction type error
Unsupported-Length-ErrorsNumber of packets with an unsupported length error
Bad-Destination-ErrorsNumber of packets with a bad ServerNet destination ID error
BadSrcId-ErrorsNumber of packets with a bad source node ID error
Bad-Rdreqovflo-ErrorsNumber of packets with a bad read request overflow error
Spurious-Acknowledgment-ErrorsNumber of packets with spurious acknowledgment errors
Bad-Mask-ErrorsNumber of packets with bad mask errors
Bad-Path-ErrorsNumber of packets that failed the path bit address validation and translation (AVT) check
Bad-Source-ErrorsNumber of packets that failed a source node check in the AVT
Bad-Access-ErrorsNumber of packets that failed a permission check in the AVT
Bad-Interrupt-ErrorsNumber of packets that failed the interrupt AVT check
Packet-Abnormal-EndNumber of badly formatted packets with RCV abnormal end errors
Nonatomic-Wrt-During- SleepNumber of nonatomic packets received during sleep mode
Packet-Unknown-ErrorNumber of badly formatted packets with unknown AVT errors
Babble-Detect-ErrorsNumber of times an interrupt packet source generated too many interrupt packets, causing the interrupt queue to be full
Interrupt-With-No-Device-ErrorsNumber of interrupt packets that were posted to a subsystem that was not installed