Chapter 84 SCL (ServerNet Cluster Subsystem) Messages

Table of Contents

Information Reported in the Node Statistics Event (1200)

The messages in this chapter are generated by the HP NonStop™ ServerNet Cluster subsystem monitor process. The subsystem ID displayed by these messages includes SCL as the subsystem name.




	NOTE: Negative-numbered messages are common to most subsystems. If you receive a negative-numbered message that is not described in this chapter, see Chapter 15.

1001

The ServerNet Cluster subsystem monitor process, process-name, has started in processor cpunum. Program file: filename Priority: pri Autorestart count: count Processor list: (first-processor-in-list [,next-processor-in-list, ..., last-processor-in-list])

`process-name`	is the name of the ServerNet cluster monitor process ($ZZSCL).
`cpunum`	is the number of the processor in which the primary ServerNet cluster monitor process has started.
`filename`	is the name of the program file for the ServerNet cluster monitor process.
`pri`	is the priority at which the ServerNet cluster monitor process is running.
`count`	is the autorestart count configured for the ServerNet cluster monitor process.
`first-processor-in-list` ... `last-processor-in-list`	is the processor list configured for the ServerNet cluster monitor process.

Cause The ServerNet cluster monitor process has been started by an operator or by the persistence manager process ($ZPM) after a failure of both the primary and backup ServerNet cluster monitor processes. (The ServerNet cluster monitor process has no means of distinguishing between the two cases.)

Effect The ServerNet cluster monitor process is running.

Recovery This is an informational message. No corrective action is required.

1002

The ServerNet Cluster subsystem monitor process, process-name, has terminated. Reason: reason.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

reason

is the reason for the termination. Possible values are:

100	Process stopped by operator
101	Environmental problem

Cause The ServerNet cluster monitor process terminated voluntarily, either by an operator command or because an environmental problem caused it to self-terminate. If this event is due to self-termination, an SCL 1010 message reported the environmental problem.

Effect The ServerNet cluster monitor process is no longer running.

Recovery If this event is due to self-termination, follow recovery instructions for the SCL 1010 message. After correcting any environmental problems, restart the ServerNet cluster monitor process with an operator command.

1003

Process process-name: Primary processor cpunum

`process-name`	is the name of the ServerNet cluster monitor process ($ZZSCL).
`cpunum`	is the number of the processor in which the primary ServerNet cluster monitor process is running.

Cause Either the ServerNet cluster monitor process was initialized for the first time or a backup process has become the primary process.

Effect The ServerNet cluster monitor process is running in the indicated processor.

Recovery This is an informational message. No corrective action is required.

1004

Process process-name: backup process created in processor cpunum.

`process-name`	is the name of the ServerNet cluster monitor process ($ZZSCL).
`cpunum`	is the number of the processor in which the backup ServerNet cluster monitor process is running.

Cause The ServerNet cluster monitor process successfully created a backup process.

Effect The ServerNet cluster monitor process is no longer vulnerable to a single failure.

Recovery This is an informational message. No corrective action is required.

1005

Process process-name: Unable to create backup in processor cpunum. process creation error: errnum, error detail: err-detail.

`process-name`	is the name of the ServerNet cluster monitor process ($ZZSCL).
`cpunum`	is the number of the processor in which the backup process creation attempt was made.
`errnum`	is the Guardian process creation error number.
`err-detail`	is the error detail subcode returned with the Guardian process creation error.

Cause An attempt to create a ServerNet cluster monitor backup process has failed. For information on process creation errors and error detail subcodes, see the Guardian Procedures Errors and Messages Manual.

Effect Until a backup process is started, the ServerNet cluster monitor process is vulnerable to a single failure. The ServerNet cluster monitor process attempts to start a backup process immediately if any processor in its processor list, other than that used by the primary process, is running. The ServerNet cluster monitor process makes two restart attempts in each processor eligible to contain the backup process. Each failed attempt results in an SCL 1005 message. If all attempts fail, an SCL 1007 message is generated.

Recovery This is an informational message. Although no corrective action is required, the data in this message might provide information for recovery in the event of an SCL 1007 message.

1006

Process process-name: Backup process in processor cpunum failed.

`process-name`	is the name of the ServerNet cluster monitor process ($ZZSCL).
`cpunum`	is the number of the processor in which the backup ServerNet cluster monitor process was running.

Cause The backup process of the ServerNet cluster monitor process pair failed.

Effect Until a new backup process is started, the ServerNet Cluster monitor process is vulnerable to a single failure. The ServerNet cluster monitor process attempts to start a new backup process immediately if any processor in its processor list, other than that used by the primary process, is running. The ServerNet cluster monitor process makes two restart attempts in each processor eligible to contain the backup process. Each failed attempt results in an SCL 1005 event. If all of these attempts fail, an SCL 1007 event is generated.

Recovery This is an informational message. Although no corrective action is required, the data in this message might provide information for recovery in the event of an SCL 1007 event.

1007

The ServerNet Cluster subsystem monitor process, process-name, is running without a backup. Reason: reason.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

reason

indicates why the backup process was terminated. Possible values are:

101	No processor available
102	Excess failed start attempts
103	Backup Creating Failure

Cause Either no processor is available for running the backup process, or there have been multiple failures of the backup process or attempts to create a backup process. If this event is due to backup process creation failures, there will have been associated SCL 1005 messages generated. Other possible precursors are SCL 1006 and SCL 1008 messages.

Effect The ServerNet cluster monitor process is running without a backup, and the ServerNet cluster subsystem is vulnerable to a single failure. Whenever a processor in the processor list is reloaded, the ServerNet cluster monitor process attempts to create a backup there.

If this event is caused by repeated backup failures or backup process creation failures, the ServerNet cluster monitor process periodically attempts to create a backup.

Recovery Either reload the processors on the processor list for the ServerNet cluster monitor process, or use the information in the associated SCL 1005 messages to determine the cause and recovery actions for backup process creation failures.

To list the processors that are configured in the processor list for the ServerNet cluster monitor process, issue an SCF INFO command. For example:

-> INFO PROCESS $ZZKRN.#ZZSCL, DETAIL

1008

The ServerNet Cluster subsystem monitor process, process-name, backup process is terminating. Reason: reason.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

reason

indicates the reason the backup process is terminating. Possible values are:

100	Process stopped by operator
101	Backup processor is down
102	Checkpoint error

Cause The backup processor failed, or the backup process was terminated by the primary process (for example, if the primary process found a fatal error when checkpointing to the backup.)

Effect Until a new backup process has been started, the ServerNet cluster monitor process is vulnerable to a single failure. The ServerNet cluster monitor process attempts to start a new backup process immediately if any processor in its processor list, other than that used by the primary process, is running. The ServerNet cluster monitor process makes two restart attempts in each processor eligible to contain the backup process. Each failed attempt results in an SCL 1005 message. If all of these attempts fail, an SCL 1007 message is generated.

Recovery This is an informational message. Although no corrective action is required, the data in this message might provide information for recovery in the event of an SCL 1007 message.

1009

ServerNet Cluster subsystem/Message System Monitor trace entry trace-entry.

trace-entry

contains an image of an internal ServerNet monitor process trace entry as it is recorded in memory.

Cause An internal trace was initiated on the ServerNet monitor process.

Effect Trace data is dumped into the EMS log.

Recovery This is an informational message. No corrective action is required.

1010

ServerNet Cluster subsystem monitor process, process-name, reports info.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

info

describes the environmental problem. Possible values are:

1	Wrong process name
2	Bad CPU-List
3	Wrong processor
5	Internal error
6	Nested signal
8	Power on processing error
9	Unsupported system topology
10	Bad SvNet node number
11	SvNet node number mismatch
12	Missing ServerNet Cluster License

Cause The ServerNet cluster monitor process found one or more environmental problems. The possible environmental problems are described in more detail under “Recovery.” This message is usually followed by an SCL 1002 termination message.

Effect This is an informational message, but for certain environmental problems, the ServerNet Cluster monitor process terminates.

Recovery Corrective action might be required to fix one or more environmental problems:

If info is “Wrong process name” or “Bad CPU-List,” alter the ServerNet cluster monitor process startup parameters (generic process configuration under SCF.)
If info is “Wrong processor,” correct the ServerNet cluster monitor process startup parameters or configure and start the ServerNet cluster monitor process through SCF. This error occurs only if the ServerNet cluster monitor process was started manually from an HP Tandem Advanced Command Language (TACL) prompt on a processor that is not in its processor list.
If info is “internal error,” “Nested signal,” or “Power on processing error,” the ServerNet cluster monitor process terminates and is restarted automatically. Submit the ZZSA* savefile to your service provider for analysis of this problem.
If info is “Unsupported system topology,”the system hardware has been incorrectly configured and the X and Y fabrics have different topologies. For example, the X fabric of a NonStop BladeSystem might be configured for connectivity to a BladeCluster Solution topology, but the Y fabric of the same system is not configured similarly (or vice-versa). Correct the configuration so that both fabrics are configured with the same topology and then restart SNETMON.
If info is “Bad SvNet node number,” the configured ServerNet node number is out of range with respect to the allowed node number range for the current configuration. An older version of ME firmware might be running on one or both fabrics. This environmental problem might also be reported if the local node number has changed, however SNETMON was not able to gracefully stop the subsystem to update the node number.
If info is “SvNet node number mismatch,” the X and Y fabrics have been configured differently through the OSM Low-Level Link. The ServerNet Cluster subsystem is shut down, an event is generated, and SNETMON is terminated. For recovery, configure the system correctly, then restart SNETMON.
If info is “Missing ServerNet Cluster License,” the $ZZSCL process finds that it is running on a node that has been configured using the OSM Low-Level Link for connectivity to a BladeCluster topology, but the node does not have a BladeCluster license file on either $SYSTEM.SYSnn or $SYSTEM.SYSTEM. The $ZZSCL process pair is terminated. Contact your service provider for the correct license file.

1011

$ZCNF Access error err, err-detail Operation: op.

err

is the error code returned from the DSM trace routine.

err-detail

is the error detail code returned from the DSM trace routine.

op

is the operation that was being performed when the error occurred. Possible values are:

	`op`	Description
1	Record Size	The error occurred while checking the size of the configuration record.
2	Record Fetch	The error occurred while trying to fetch the record.
3	Record Insert	The error occurred while trying to insert the record.
4	Database Lock	The error occurred while trying to lock the database.
5	Database Unlock	The error occurred while trying to unlock the database.
6	Record Update	The error occurred while trying to update the record.

Cause The ServerNet cluster monitor process encountered an error using the HP NonStop Kernel configuration services application program interface (API).

Effect If this error occurs during process startup, the ServerNet cluster monitor process uses STARTSTATE STOPPED (default) instead of using the data stored in the private configuration record. If this error occurs later, the action is prompted by an SCF [ALTER | START | STOP] SUBSYS command. In this case, the command fails with an error.

Recovery Investigate the cause of the error. Restart the process or reissue the failed SCF command.

1012

DSM Trace error err, err-detail Operation: op.

err

is the error code returned by the DSM trace routine.

err-detail

is the error detail returned by the DSM trace routine.

op

is the operation that was being performed when the error was encountered. Possible values are:

	`op`	Description
1	Init	The error occurred in the DSM_TRACE_INIT_ function call.
2	Start	The error occurred in the DSM_TRACE_NEW_(start trace) function call.
3	Version	The error occurred in the DSM_TRACE_NEW_(set version) function call.
5	Record	The error occurred in the DSM_TRACE_NEW_(add record) function call.
6	Stop	The error occurred in the DSM_TRACE_NEW_(stop trace) function call.

Cause The ServerNet cluster monitor process encountered an error using the DSM trace routines.

Effect Any pending trace is terminated.

Recovery Investigate the cause of the error. Reissue the SCF TRACE command. If the problem persists, contact your service provider.

1013

ServerNet Cluster subsystem monitor process, process-name, cannot register for SANMAN notifications. Reason: reason.

`process-name`	is the name of the ServerNet cluster monitor process ($ZZSCL).
`reason`	is the reason that the registration attempt failed. Currently, the only reason is “Fabric Access Control struct Full” (-2).

Cause The ServerNet cluster monitor process attempted to register for ServerNet SAN manager (SANMAN) notifications relative to external fabric connection status changes. However, no room remained in the Fabric Access Control (FAC) structure in system global memory, so the process could not be registered.

Probably all registration slots are being consumed by previously registered clients. Currently, the maximum number of clients allowed to register simultaneously is 32.

Effect The ServerNet Cluster subsystem is placed in the STOPPED state. Direct ServerNet communication with processors on other servers, even when physically connected, is not possible.

Recovery Stop at least one of the processes currently registered. Then use the SCF START SUBSYS command to restart the ServerNet Cluster subsystem.

1014

ServerNet Cluster subsystem, process-name, cannot start. Reason: reason.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL).

reason

describes determination made by the ServerNet cluster monitor process determination of the reason that the external fabric connections for this ServerNet node are not ready. Possible values are:

14	Node invalid
15	NNA not programmed

Cause The ServerNet cluster monitor process has determined that the physical connection to the ServerNet Cluster is not ready for ServerNet connections.

Probably the ServerNet cluster monitor process was running before the ServerNet SAN manager process could communicate with the external fabrics or program the Node Numbering Agents (NNAs).

Effect The ServerNet Cluster subsystem remains in the STOPPED state. Direct ServerNet communication with processors on other servers is not possible.

Recovery The ServerNet SAN manager process automatically notifies the ServerNet cluster monitor process when the external fabric connections are ready for use. The ServerNet cluster monitor process then retries the Start command.

1015

InterProcessor Communication (IPC) Monitor Process, process-name, Reports cause.

`process-name`	is the name of the Message system monitor process ($ZIM<nn>).
`cause`	describes the environmental problem.

Cause A message monitor process found some environmental problems such as the process name being wrong (not $ZIMnn). This event is usually followed by a message monitor termination event.

Effect This is an informational message, but for certain environmental problems, the message monitor process terminates.

Recovery Alter the message system monitor process’s (MSGMON) startup parameters (generic configuration under SCF).

1016

Process process-name is not compatible with the current version of the Kernel message system.

process-name

is the name of the ServerNet cluster monitor process ($ZZSCL) or the message system monitor process ($ZIM<nn>).

Cause The ServerNet cluster monitor process or the message system monitor process compared its own version and that of the NonStop Kernel message system and determined that the versions were incompatible. This event will be followed by a process termination event.

Effect The ServerNet cluster monitor process or the message system monitor process terminates.

Recovery Ensure that the versions of the ServerNet cluster monitor process and message system monitor process (T0294) and the NonStop Kernel message system (T9050) are compatible. Check SPR requisites for T9050 and T0294.

1100

The ServerNet direct connection from processor local-cpu to processor remote-cpu in ServerNet node remote-node[,] [ system name sysname, Expand node remote-sysnum ] has become unusable.

`local-cpu`	is the number of the local processor for which connectivity is lost.
`remote-cpu`	is the number of the remote processor for which connectivity is lost.
`remote-node`	is the ServerNet node number of the remote system.
`sysname`	is the name of the remote system.
`remote-sysnum`	is the Expand node number of the remote system.

Cause Both the X and Y paths from the local processor to the indicated remote processor are down.

This message is preceded by one or more IPC (NonStop Kernel Message System) 110 messages generated by the local processor. The IPC 110 messages report the causes of the failures.

If the lost connection is due to the failure of the remote processor, an SCL 1102 message is generated.




	NOTE: If the ServerNet cluster monitor process receives the remote processor down information in time, the SCL 1100 message is suppressed.

Effect All intersystem ServerNet traffic between the indicated local and remote processors is routed via the Expand-over-ServerNet line-handler process. Consequently, transmission is slower.

Recovery This is an informational message. For recovery information, see the accompanying IPC 110 or SCL 1102 messages.

1101

The ServerNet direct connection from processor local-cpu to processor remote-cpu in ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has been restored.

`local-cpu`	is the number of the local processor for which connectivity is restored.
`remote-cpu`	is the number of the remote processor for which connectivity is restored.
`remote-node`	is the ServerNet node number of the remote system.
`sysname`	is the name of the remote system.
`remote-sysnum`	is the Expand node number of the remote system.

Cause One or both of the paths between the indicated processors is restored. An IPC (NonStop Kernel Message System) 111 message with additional information is generated by the local processor.




	NOTE: This event is not generated during ServerNet cluster monitor process initialization.

Effect Direct ServerNet communication between the processors is possible.

Recovery This is an informational message. No corrective action is required.

1102

Processor remote-cpu in ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has failed.

`remote-cpu`	is the number of the remote processor that failed.
`remote-node`	is the ServerNet node number of the remote system.
`sysname`	is the name of the remote system.
`remote-sysnum`	is the Expand node number of the remote system.

Cause A remote processor failed. This event is logged on every other ServerNet cluster member when a processor on a node fails.

Effect ServerNet paths between all local processors and the remote processor are taken down. The local processors suppress IPC 110 messages in this case. Possibly local processors detected path failures and logged path down events before being informed of the remote processor's failure. The particular sequence of events depends on the speed with which the ServerNet cluster monitor process is informed of the remote processor's failure and the levels of message traffic from the local system to the failed remote processor.

Recovery Reload the remote processor.

1103

Processor remote-cpu in ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has been reloaded. ServerNet direct connectivity to that processor has been restored.

`remote-cpu`	is the number of the remote processor that is reloaded.
`remote-node`	is the ServerNet node number of the remote system.
`sysname`	is the name of the remote system.
`remote-sysnum`	is the Expand node number of the remote system.

Cause A remote processor is reloaded, and its connections with this system are restored. This event is logged on every other ServerNet cluster member when a processor on a node is reloaded.

Effect Direct ServerNet message traffic with the indicated processor can resume.

Recovery This is an informational message. No corrective action is required.

1104

Local processor local-cpu was reloaded. ServerNet direct connectivity to remote systems have been restored.

local-cpu

is the number of the local processor that is reloaded.

Cause A local processor is reloaded.

Effect Local connections of the processor are restored.

Recovery This is an informational message. No corrective action is required.

1105

Processor remote-cpu in ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has lost connectivity to the ServerNet fabric.

remote-cpu

is the number of the remote processor that lost its ServerNet connectivity.

remote-node

is the ServerNet node number of the remote system.

sysname

is the name of the remote system.

remote-sysnum

is the Expand node number of the remote system.

fabric

identifies the ServerNet fabric (X or Y) to which the indicated processor lost connectivity:

1	X-Fabric
2	Y-Fabric
4	Fabric-Unknown

Cause A remote processor detected that its connection to the indicated ServerNet fabric failed. Possibly the other PMF CRU in the processor enclosure was removed. The indicated processor logs an IPC (NonStop Kernel Message System) 112 message on its local system.

Effect ServerNet paths on the indicated fabric between all local processors and the remote processor are taken down. Direct ServerNet message traffic is still possible by using the other fabric. When the local processors down the paths to the remote processor on the indicated fabric, IPC 110 message logging is suppressed.

Possibly local processors detected path failures and logged path down events before being informed of the remote processor's fabric failure. The particular sequence of events depends on the speed with which the Servernet cluster monitor process is informed of the remote processor's fabric failure and the levels of message traffic from the local system to the remote processor.

Recovery This is an informational message. For recovery information, see the IPC 112 message.

1106

Processor remote-cpu in ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has regained fabric ServerNet connectivity.

remote-cpu

is the number of the remote processor whose fabric connection was restored.

remote-node

is the ServerNet node number of the remote system.

sysname

is the name of the remote system.

remote-sysnum

is the Expand node number of the remote system.

fabric

identifies the ServerNet fabric (X or Y) to which the indicated processor regained connectivity:

1	X-Fabric
2	Y-Fabric
4	Fabric-Unknown

Cause A remote processor regained connectivity to the indicated ServerNet fabric.

Effect Paths between the local system are restored and paths between the local system and the indicated remote processor on the indicated fabric wait for automatic recovery kick in. The system on which the processor resides logs an IPC (NonStop Kernel Message System) 113 message.

Recovery This is an informational message. No corrective action is required.

1107

ServerNet connectivity over the fabric to ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has been lost.

fabric

identifies the ServerNet fabric (X or Y) over which connectivity to the remote system is lost:

1	X-Fabric
2	Y-Fabric
4	Fabric-Unknown

remote-node

is the ServerNet node number of the remote system.

sysname

is the name of the remote system.

remote-sysnum

is the Expand node number of the remote system.

Cause All individual processor-to-processor paths over the indicated fabric from the local system to the remote system failed. These failures are documented by IPC (NonStop Kernel Message System) 110 messages generated by the individual processors. Possibly a ServerNet router or cable failed. If both fabrics are indicated through the IPC 110 messages, the remote system itself might have failed or lost power.

Effect All communication with the remote system over the given fabric ceases.

Recovery If only one fabric is involved, investigate the condition of the intervening ServerNet routers and cables. If both fabrics are involved, investigate the condition of the remote system itself. In either case, the associated IPC 110 messages might provide further recovery information.

1108

ServerNet direct connectivity with ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has been lost due to reason.

remote-node

is the ServerNet node number of the remote system to which ServerNet direct connectivity has been lost.

sysname

is the name of the remote system.

remote-sysnum

is the Expand node number of the remote system.

reason

indicates why connectivity to the remote system has been lost. Possible values are:

	`reason`	Description
1	Cause unknown	The disconnection is the result of the failure of each individual processor-to-processor connection between the systems. In this case, the processors generate IPC 110 messages that might provide more information.
2	Cause remote, unknown
4	Remote operator command	The remote system left the ServerNet cluster because of an operator command on the remote system.
6	Power on	Power fail/recovery on the local system.
16	Duplicate System Number	The remote system left the ServerNet Cluster because it has a duplicate system number.

Cause ServerNet direct connectivity to a remote system is lost for the reason stated in the message. The remote system failed, lost power, or has a duplicate system number. The local system might have lost power. Router or cable failures might have occurred on both ServerNet fabrics.

Effect All connections with the remote system are shut down.

Recovery Depending upon the reason indicated, correct the problem.

For example, bring up the failed processors or bring up/replace the failed cable or router.

If a protocol error occurred, there might be an associated HP Tandem Failure Data System (TFDS) (TFDS subsystem ID: DMP) failure capture event for the primary ServerNet cluster monitor process.

If the reason was Duplicate System Number, change the Expand node number of the newly connected node to a unique number (range 0 to 254) in the cluster. For more information about changing the system number, see the SCF Reference Manual for the Kernel Subsystem and contact your service provider.

1109

ServerNet direct connectivity with ServerNet node remote-node [,] [ system name sysname, Expand node remote-sysnum ] has been initialized [WITH WARNINGS].

`remote-node`	is the ServerNet node number of the remote system with which a ServerNet direct connection is initialized.
`sysname`	is the name of the remote system.
`remote-sysnum`	is the Expand node number of the remote system.

Cause A ServerNet connection is established with a remote system. This connection might be the initial one caused by starting ServerNet cluster services on either of the systems, or it might be the recovery of a failed connection.

Effect Direct message system traffic of ServerNet between the two systems is possible.

Recovery This is an informational message. No corrective action is required.

1110

ServerNet Cluster subsystem configuration error on fabric fabric while performing the test. [Fabric fabric is not usable for ServerNet connectivity with external nodes.]

fabric

identifies a ServerNet fabric (X or Y) that has a configuration error.

test

is the test that the service processor (SP) was running when it found the configuration error. Possible values are:

`test`	Description
SP-Router	The router configuration test (for each fabric) tests that all appropriate routers are configured to provide for routing packets out to other clusters.
SP-Node-Number	The node number test (for each fabric) tests that all SPs are using the same ServerNet node number when assigning ServerNet IDs.

Cause The service processor found a configuration error when checking each ServerNet fabric prior to starting ServerNet cluster services.

The tokens in the event contain details of the configuration error, including:

The number of the enclosure, the module, and the slot containing the CRU that was found to be not configured for inclusion in a ServerNet cluster
The error code returned by the service processor for the CRU that was found to be not configured for inclusion in a ServerNet cluster

Effect If either the X or Y fabric is correctly configured, ServerNet cluster services move to the STARTING state and connections with remote systems are established on the correctly configured fabric.

If both fabrics are incorrectly configured, ServerNet cluster services remain in the STOPPED state, and no connections with remote systems are established.

Recovery Determine the details of the configuration error and correct it.

1111

No systems were discovered for ServerNet direct connectivity.

Cause This system is the first in the ServerNet cluster to be started, or a ServerNet connectivity failure occurred.

Effect The system attains the STARTED state, but there is no ServerNet connectivity with other systems.

Recovery If this system is the first in the ServerNet cluster to be started, no corrective action is required. Otherwise, repair the ServerNet connectivity problem.

1112

Registration to listen to remote ServerNet Cluster discovery packets failed.

Cause The MS Driver in the reporting processor could not register to listen to permissive packets.

Effect Any discovery packets sent to the reporting processor are not delivered to the MS Driver. The MS Driver in the reporting processor cannot find out about any packets received in the permissive Access Validation and Translation Table Entry (AVTTE). If there are no registered processors in the target system, the target system can neither be discovered nor can it generate any SCL 1114 events.

Recovery Stopping one or more subsystems that register for permissive packets should allow the MS Driver to register itself. MS Driver registration occurs when a processor is loaded or reloaded, when the message monitor process is initialized, and when the ServerNet cluster monitor process is initialized. The recommended long-term solution is to request a TNet Services (T8460) SPR capable of supporting a larger number of registered listeners.

1113

Discovery of remote ServerNet node node failed due to reason. Details of failed discovery attempt: Protocol Stage: stage [ Selected target processors: disc-targs | Sender protocol version: send-prot | Target protocol version: targ-prot | Sender minimum protocol version: send-min-prot | Target minimum protocol version: targ-min-prot | Target processor number: targ-cpu | Node instantiation error: inst-err ]

node

is the target node for which discovery failed in the sender system.

reason

indicates the reason discovery failed. Possible values are:

2	response packet not received
3	ServerNet send errors
6	failure to instantiate remote node

stage

is the protocol stage at the time of discovery failure.

disc-targs

is a 16-bit binary mask representing the processors selected by the sender ServerNet cluster monitor process as discovery targets.

send-prot

is the discovery protocol version of the sender ServerNet cluster monitor process.

targ-prot

is the discovery protocol version of the target ServerNet cluster monitor process.

send-min-prot

is the minimum discovery protocol interpretation version of the sender ServerNet cluster monitor process.

targ-min-prot

is the minimum discovery protocol interpretation version of the target ServerNet cluster monitor process.

targ-cpu

is the processor number returned by the target ServerNet cluster monitor process in the discovery response packet payload.

inst-err

is the type of error encountered by the processor of the sender ServerNet cluster monitor process when instantiating the target node. Possible error types are:

1 Memory allocation 2 AVTTE allocation 3 Device installation

Cause A discovery failure was detected by the sender system. The ServerNet cluster monitor process could not discover the target system.

Effect A ServerNet cluster connection to the target system is not established. The systems perform periodic retries to discover each other. Once the cause of failure is corrected, discovery should proceed automatically.

Recovery The recovery procedure depends on the value of reason:

`reason`	Recovery
response packet not received	Find the matching SCL 1114 message on the target system to determine the exact cause of the discovery failure and take appropriate action.
ServerNet send errors	Determine if the target system is down, not connected, or incorrectly connected to the ServerNet cluster. Then bring the system up, and connect or reconnect it to the ServerNet cluster as needed.
failure to instantiate remote node	This failure has several possible causes. The cause could be one of shortage of resources such as memory, ServerNet AVTTEs, or ServerNet devices in the sender ServerNet cluster monitor processor.

1114

Discovery started by remote ServerNet node node failed due to reason. Details of failed discovery attempt: Protocol Stage: stage | [ Sender protocol version: send-prot | Target protocol version: targ-prot | Sender minimum protocol version: send-min-prot | Target minimum protocol version: targ-min-prot | Sender processor number: send-cpu | Sender processor type: send-cpu | ServerNet reported sender node: reported-node | ServerNet reported sender processor:reported-cpu | Expected sender ServerNet ID: exp-snetID | Reported sender ServerNet ID: ret-snetID ]

node

is the target node to which discovery failed.

reason

indicates the reason discovery failed. Possible values are:

1	protocol version error
4	ServerNet ID mismatch
5	duplicate node number
7	non-existing SNetMon process
8	SCL subsystem in STOPPED state
9	unsupported node number
10	bad processor number
11	bad discovery packet
12	unsupported processor type

stage

the protocol stage at the time of discovery failure.

send-prot

is the discovery protocol version of the sender ServerNet cluster monitor process.

targ-prot

is the discovery protocol version of the target ServerNet cluster monitor process.

send-min-prot

is the minimum discovery protocol interpretation version of the sender ServerNet cluster monitor process.

targ-min-prot

is the minimum discovery protocol interpretation version of the target ServerNet cluster monitor process.

send-cpu

is the processor number returned by the target ServerNet cluster monitor process in the discovery response packet payload.

reported-node

is the node number reported by ServerNet.

reported-cpu

is the processor number reported by ServerNet.

exp-snetID

is the ServerNet node identification (ServerNet ID) stored by the sender ServerNet cluster monitor process for the target (node, processor) during process initialization. This ServerNet ID is calculated by the service processor in the sender system.

ret-snetID

is the actual ServerNet ID of the target processor (determined by the sender on reception of a discovery response packet).

Cause A discovery failure was detected by the target system. The ServerNet cluster monitor process could not discover the sender system.

Recovery The recovery procedure depends on the value of reason:

`reason`	Recovery
protocol version error	Verify ServerNet cluster monitor and T9050 versions and update as needed.
ServerNet ID mismatch or duplicate node number	Verify the node number configuration and Service Processor versions, and make necessary changes.
nonexisting SNetMon process or SCL subsystem in the STOPPED state	Start up the ServerNet cluster subsystem with SCF START SUBSYS command.
unsupported node number	Update the operating system.

1115

Error PQE received From ServerNet node: remote-node, processor: remote-cpu [ SNError: snerror ] X Path Error: xerror Y Path Error: yerror SIB retry count: retry-count

remote-node

is the ServerNet node number of the remote system to which ServerNet cluster monitor process (SNETMON) connectivity is lost.

remote-cpu

is the number of the processor on the remote system on which SNETMON resides.

snerror

indicates the reason ServerNet cluster connectivity failed. Possible values are:

2001	transfer fail
2006	No paths available

xerror

identifies the reason for the transfer failure on the X fabric. This internal error code is used for troubleshooting.

yerror

identifies the reason for the transfer failure on the Y fabric. This internal error code is used for troubleshooting.

retry-count

is the number of times a packet was sent before the transfer was considered failed.

Cause The connection between the ServerNet cluster monitor process (SNETMON) on the local system and SNETMON on a remote system is lost due to a low-level communication problem.

Effect The connection between the SNETMON process on the local node and the SNETMON process on the remote node is lost.

Recovery This is an informational message. No corrective action is required; recovery is automatic.

1116

ServerNet connectivity over the fabric to ServerNet node remote-node[,] [ system name sysname, Expand node sysnum ] has been established.

`fabric`	indicates which fabric’s connectivity has been established.
`remote-node`	is the ServerNet node number of the remote system.
`sysname`	is the Expand name of the remote system.
`sysnum`	is the Expand system number of the remote system.

Cause There is at least one working path on the specified fabric from the local node to the specified remote system now, when previously there were none. The path state changes are documented by IPC 111 events generated by the individual processors.

Effect There is at least partial connectivity with the remote system over the given fabric.

Recovery This is an informational message only. No corrective action is required.

1117

A call to Service Processor (SP) I/O library routine routine failed.

routine

identifies the SP I/O library routine. These routines are possible:

SP-Session-Create
SP-Session-Destroy
CRU-Handle-Get
Device-Handle-Get
Cluster-Module-Info-Get
Cluster-Id-Info-Get
Cluster-Config-Get

Cause The ServerNet cluster monitor process (SNETMON) made a call to an SP I/O library routine, but it failed.

Effect The ServerNet cluster monitor process (SNETMON) automatically retries the SP I/O library routine, possibly after destroying its previous session with the Service Processor and starting a new session.

Recovery The ServerNet cluster monitor process (SNETMON) automatically retries the SP I/O library routine. The Service Processor might have to be initialized, or the Service Processor firmware might have to be upgraded. Contact your service provider for assistance.

1118

The SMC Driver API interface call error returned error err, err-detail error detail.

[ Node Number nodeNum Fabric fabric.]

nodeNum

Identifies the ServerNet node number associated with the failed SMC Driver API interface call.

fabric

Identifies the external ServerNet fabric (X or Y).

err

is a map token containing data relative to the call and its error returns. The data includes:

`err-detail`	Description
SMC API Error Version	The version of the data structure. The structure version is incremented whenever the structure changes.
SMC API	An enumeration of the specific interface call which returned the error.
SMC API Error	An enumeration of the possible error returns from the SMC Driver.
SMC API Error Detail	An integer code of the lower level error responses received by the SMC Driver, and which resulted in the driver returning an error to SNETMON. These error codes are usually ServerNet Millicode codes.

Cause SNETMON invoked an SMC Driver API interface function. The function returned a code other than SMC_RTN_OK.

Effect SNETMON will automatically retry the SMC Driver API call.

Recovery This event is primarily provided for support personnel. SNETMON will automatically retry the SMC Driver API call. If the error persists (as indicated by additional SCL 1118 events), use the error and error detail codes in the event to determine and correct the cause of the failure.

1119

The ServerNet Cluster subsystem monitor process, process-name, detected an error when registering with the SMC Driver. Error: err.

process-name

is the name of the ServerNet Cluster subsystem monitor process ($ZZSCL).

err

contains the reason for the failure. Possible values are:

2	SMC Bad parameter
5	Mismatch between the executing versions of SNETMON and the SMC Driver
8	SMC No GLobal Space Error
10	SMC Lock Memory Error

Cause The primary SNETMON process attempted to register with the SMC Driver, but registration failed.

Effect The primary SNETMON process will not be able to communicate directly with remote nodes via ServerNet reads and will terminate itself. The backup SNETMON process will take over and attempt to register with the SMC Driver when it becomes the new primary. The SNETMON process pair will terminate if registration with the SMC Driver does not succeed in at least one processor in SNETMON's configured CPU list.

Recovery SNETMON will automatically attempt to register with the SMC Driver on a different processor. A ZZSA* save abend file will be created when the primary SNETMON process terminates. The ZZSA* save abend file should be provided for analysis, along with the ZZSV* service log event file containing the SCL 1119 event.

1120

The ServerNet Cluster subsystem monitor process, process-name, version is not compatible with the current version of the SMC driver.

process-name

is the name of the ServerNet Cluster subsystem monitor process ($ZZSCL).

Cause SNETMON made a comparison of its own version and that of the SMC driver that it is using. The version was determined to be incompatible.

Effect The ServerNet Cluster subsystem monitor process terminates.

Recovery Check SPR requisites for the SMC Driver (T2800) and SNETMON (T0294). The operator needs to ensure that the versions of SNETMON and the SMC Driver are compatible.

1200

ServerNet statistics for {the local node | remote ServerNet node node } logged due to cause.

node

is the ServerNet node number of the remote system that collected the statistics data.

cause

indicates why the statistics were generated. Possible values are:

`cause`	Description
sampling period expiration	The sampling timer expired. Statistics events are automatically generated at one-hour intervals.
operator statistic request	Operator request for statistics.
operator statistics reset request	Operator request to reset the statistics counters.

Cause Statistics data is sent to the EMS service log.

Effect Statistics counters with nonzero values are written to the EMS service log. If a request to reset the statistics counter was received, all statistics counters are reset to 0.

Recovery This is an informational message. No corrective action is required.

Information Reported in the Node Statistics Event (1200)

Each processor keeps a set of counters for each system in the ServerNet cluster. To keep the amount of data sent to the service log at a minimum, statistics counters are present in the node statistics event only if they have nonzero values. The node statistics event contains this information:

Statistics Node Identification Information

This information is always included in the statistics event to identify the node that is the source of the statistics counters:

Heading	Description
Statistics-Node-Number	Node number of local (0) or remote (1-96) system that collected the statistics data
Statistics-System-Name	System name
Statistics-System-Number	Expand node number
Statistics-Reset-Time	Timestamp indicating when the statistics data counters were last reset
Statistics-Sample-Time	Timestamp indicating when the statistics data counters were collected
Statistics-Event-Reason	Reason the statistics event was generated

Messages Sent and Received Counters

The node statistics event reports the number of each message type sent or received to and from each system, including the local one. Counts are kept separately for each system.

This information is returned in the Datalist Messages Sent and Datalist Messages Received structures:

Heading	Description
Sequenced-Requests	Number of sequenced request messages sent or received
Sequenced-Replies	Number of sequenced reply messages sent or received
Sequenced-Abandons	Number of sequenced abandon (cancel) messages sent or received
Sequenced-PIOs	Number of sequenced processor I/O (PIO) messages sent or received
Sequenced-GLUPs	Number of sequenced global update (GLUP) messages sent or received
Unsequenced-Ack-Nacks	Number of unsequenced ACK/NACK messages sent or received
Unseq-Handshake-Requests	Number of unsequenced handshake requests sent or received
Unseq-Handshake-Replies	Number of unsequenced handshake replies sent or received

Message System Error Counters

The node statistics event reports the number of errors detected on connections with each system (including the local one):

Heading	Description
Wack-Timeouts	Number of timeouts that occurred while the processor was waiting for acknowledgments
Sequence-Errors	Number of sequence errors
R10K-Speculative-Writes	Number of R10K speculative write errors
Unexpected-Packets	Number of unexpected packets received from a processor that was considered to be down

Path Management Counters (X Fabric and Y Fabric)

Path management counters record path events pertaining to paths that originate within the processor. Counts are kept separately for each X and Y fabric.

This information is returned in the Datalist Path Management X and Datalist Path Management Y structures:

Heading	Description
Path-Up-Events	Number of path up events
Path-Down-Events	Number of path down events
Path-Switch-Events	Number of periodic path switches away from the fabric
Fabric-Up-Events	Number of fabric up events
Fabric-Down-Events	Number of fabric down events

ServerNet Path Error Counters (X Fabric and Y Fabric)

ServerNet path error counters are maintained separately for each system, including the local one. Counts are kept separately for each X and Y fabric.

This information is returned in the Datalist TNet Errors X and Datalist TNet Errors Y structures:

Heading	Description
BTE-Timeouts	Number of block transfer engine (BTE) timeouts
Transfer-Nacks	Number of transfer negative acknowledgments (NACKs)
Barrier-Timeouts	Number of barrier timeouts
Barrier-Nacks	Number of barrier NACKs
Reception-Errors	Number of reception errors other than spurious acknowledgments (ACKs) and bad ServerNet destination ID packets
Spurious-Acks	Number of spurious ACKs received
Bad-Destination-IDs	Number of bad destination ServerNet ID packets received

Cause Register Error Counters

Cause register error counters that are not traceable to connections with any particular system are included only in Node statistics events generated for local nodes. Node statistics events for remote nodes do not contain cause register error counter statistics.

Heading	Description
Xfer Sidebuffer Corrupt	Number of incoming transfer sidebuffer corruptions
Exception-Queue-Errors	Number of exception interrupt packets received while its corresponding queue was full
Write-Overflow-Errors	Number of write overflow errors
Read-Overflow-Errors	Number of read overflow errors
Queue-Full-Errors	Number of interrupt packets received while its corresponding queue was full
Link-Exception-on-X-Errors	Number of link exception errors on the X fabric
Link-Exception-on-Y-Errors	Number of link exception errors on the Y fabric

Generic Error Counters (X Fabric, Y Fabric, and Unknown Fabric)

Counters are maintained in each processor for errors that are not traceable to connections with any particular system. These generic error counters are included only in statistics events generated for local nodes. Node statistics events for remote nodes do not contain generic error counter statistics. There is a separate set of generic error counter statistics for each X, Y, and unknown fabric.

This information is returned in the Datalist Generic Errors X, Datalist Generic Errors Y and Datalist Generic Errors U structures:

Heading	Description
RCV-Ugly-Errors	Number of ill-formatted packet errors received
TPB-Errors	Number of packet errors with a “this packet bad” (TPB) symbol
CRC-Errors	Number packets with cyclic redundancy check (CRC) errors
TBP-CRC-Errors	Number packets with a CRC error and a TPB symbol
Underrun-Errors	Number of packets with an underrun error
Underrun-TPB-Errors	Number of packets with an underrun error and a TPB symbol
Underrun-CRC-Errors	Number of packets with an underrun error and a bad CRC
Underrun-TPB-CRC-Errors	Number of packets with an underrun error, a TPB symbol, and a bad CRC
Runt-Errors	Number of packets with a runt error
Runt-TPB-Errors	Number of packets with a runt error and a TPB symbol
Runt-CRC-Errors	Number of packets with a runt error and a bad CRC
Overrun-Errors	Number of packets with an overrun error
Overrun-TPB-Errors	Number of packets with an overrun error and a TPB symbol
Overrun-CRC-Errors	Number of packets with an overrun error and a bad CRC
Overrun-TPB-CRC-Errors	Number of packets with an overrun error, a TPB symbol, and a bad CRC
Unsupported-Type-Errors	Number of packets with an unsupported transaction type error
Unsupported-Length-Errors	Number of packets with an unsupported length error
Bad-Destination-Errors	Number of packets with a bad ServerNet destination ID error
BadSrcId-Errors	Number of packets with a bad source node ID error
Bad-Rdreqovflo-Errors	Number of packets with a bad read request overflow error
Spurious-Acknowledgment-Errors	Number of packets with spurious acknowledgment errors
Bad-Mask-Errors	Number of packets with bad mask errors
Bad-Path-Errors	Number of packets that failed the path bit address validation and translation (AVT) check
Bad-Source-Errors	Number of packets that failed a source node check in the AVT
Bad-Access-Errors	Number of packets that failed a permission check in the AVT
Bad-Interrupt-Errors	Number of packets that failed the interrupt AVT check
Packet-Abnormal-End	Number of badly formatted packets with RCV abnormal end errors
Nonatomic-Wrt-During- Sleep	Number of nonatomic packets received during sleep mode
Packet-Unknown-Error	Number of badly formatted packets with unknown AVT errors
Babble-Detect-Errors	Number of times an interrupt packet source generated too many interrupt packets, causing the interrupt queue to be full
Interrupt-With-No-Device-Errors	Number of interrupt packets that were posted to a subsystem that was not installed