Chapter 45 IPC (NonStop Kernel Operating System Message System) Messages

The messages in this chapter are generated by the NonStop™ Kernel operating system Message System (IPC) subsystem. The subsystem ID displayed by these messages includes IPC as the subsystem name.




	NOTE: Negative-numbered messages are common to most subsystems. If you receive a negative-numbered message that is not described in this chapter, see Chapter 15.

100

Sequence error packets received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Expected sequence number: number-expected and the received sequence number: number-received

`path`	is the path that contains the fabric over which the message was transmitted.
`reporting-cpu`	is the processor number of the processor on which the error was detected.
`problem-cpu`	is the processor in which the message originated.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`number-expected`	is the sequence number that the processor expected to receive.
`number-received`	is the sequence number contained in the message received.

Cause A ServerNet message system interrupt packet was received twice or was dropped.

Effect The out-of-sequence message will not be acknowledged. The message sender will eventually get a WACK timeout and will undergo a recovery protocol to resynchronize the sequence number. The message will be retried.

Recovery This is an informational message only; no corrective action is needed.

101

Bad ServerNet packet received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Details of the bad packet are: status of packet: packet-status, ServerNet transaction type: tr-type, ServerNet address used: snet-addr, bytecount in packet: byte-count

`path`	is the path that contains the fabric over which the message was transmitted.
`reporting-cpu`	is the processor number of the processor on which the error was detected.
`problem-cpu`	is the processor in which the message originated.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`packet-status`	is the ServerNet error code contained in the packet.
`tr-type`	is the type of ServerNet transaction being attempted.
`addr-used`	is the target ServerNet address in the packet.
`byte-count`	is the length of the packet received, in bytes.

Cause This is probably caused by a protocol error on the part of the NonStop Kernel message system software.

Effect The packet received is ignored and the problem processor is responsible for initiating error recovery.

Recovery This is an informational message only; no corrective action is needed.

102

ServerNet nack received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Details of the nack are: Nack Status code: nack-code, sibs used for the transfer: sib-type, ServerNet address used for transfer: snet-addr, bytecount for the transfer: byte-count

`path`	is the path that contains the fabric over which the message was transmitted.
`reporting-cpu`	is the processor number of the processor on which the error was detected.
`problem-cpu`	is the processor in which the message originated.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`nack-code`	is the ServerNet NACK code contained in the packet.
`sib-type`	is the type of Send Info Block (SIB, a NonStop Kernel message system data structure) that was used by the NonStop Kernel sending logic to send the packet that was rejected in this event.
`snet-addr`	is the target ServerNet address in the packet.
`byte-count`	is the length of the packet received, in bytes.

Cause The problem processor might be down or recovering from a power failure. Also, this event might be caused by a protocol error on the part of the NonStop Kernel message-system.

Effect If the error is due to a power fail recovery or the nack status code indicates an interrupt to a full queue, the packet will eventually be resent. Otherwise, both plans will be downed and communication to the problem processor will be severed. If the problem and reporting processors are in the same system, regroup will be invoked, which will halt one of the processors.

Recovery This is an informational message only; no corrective action is needed unless a processor halts. If a processor does halt:

Take a dump of the halted processor.
Take an online dump of the problem or reporting processor.
RELOAD the halted processor.

103

Bad destination ServerNet packet received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Details of the bad packets are: ServerNet transaction type: tr-type, ServerNet address used: snet-addr, bytecount in the packet: byte-count, Intented destination node: dest-node

`path`	is the path that contains the fabric over which the message was transmitted.
`reporting-cpu`	is the processor number of the processor on which the error was detected.
`problem-cpu`	is the processor in which the message originated.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`tr-type`	is the type of ServerNet transaction being attempted.
`snet-addr`	is the target ServerNet address in the packet.
`byte-count`	is the length of the packet received, in bytes.
`dest-node`	is the ServerNet address of the intended destination processor.

Cause An error might have occurred in the routing tables of the ServerNet network.

Effect The reporting processor ignores the packet.

If the ServerNet routing tables are not reliable, then it is possible that the reporting and problem processors will not be able to communicate. The NonStop Kernel message system will detect the problem and, if both the processors are in the same system, cause one of the involved processors to halt.

Recovery This is an informational message only; no corrective action is needed unless a processor halts. If a processor does halt:

Take a dump of the halted processor.
Take an online dump of the problem or reporting processor.
RELOAD the halted processor.

104

Unexpected packet received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}.

`path`	is the path that contains the fabric over which the message was transmitted.
`reporting-cpu`	is the processor number of the processor on which the error was detected.
`problem-cpu`	is the processor in which the message originated.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

Cause The problem processor was sending “I'm alive” packets to the reporting processor, and the reporting processor has declared the problem processor as down.

Effect Generally, the reporting processor ignores the packet; but for packets originating in the local system, the reporting processor returns a “poison packet” to the problem processor, causing it to halt itself.

Recovery This is an informational message only; no corrective action is needed. If a processor halts, contact your service provider.

110

The path path from processor reporting-cpu to processor problem-cpu {in ServerNet node node-number} was DOWNED due to reason. OPERATOR ATTENTION NEEDED. Path had excessive failures and will NOT be recovered automatically.

`path`	is the path that contains the fabric over which the message was transmitted.
`reporting-cpu`	is the processor number of the processor on which the error was detected.
`problem-cpu`	is the processor in which the message originated.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`reason`	is the reason the path was taken down.

Cause The NonStop Kernel (NSK) has brought down the path used by the reporting processor in order to communicate with the problem processor.

Effect The downed path will no longer be used for communication between the indicated processors. Another path will be used.

Recovery The operator can try to bring the path back up using the SCF START SERVERNET command. Alternatively, unless the path was downed by the operator (SCF command) or due to fabric failure, NonStop Kernel automatic path recovery will attempt to recover the path.

111

The path path from processor reporting-cpu to processor problem-cpu {in ServerNet node node-number}, was brought UP due to reason.

`path`	is the path that contains the fabric over which the message was transmitted.
`reporting-cpu`	is the processor number of the processor making this report.
`problem-cpu`	is the processor at the other end of the point-to-point connection.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`reason`	indicates the reason why the path was brought down.

Cause The NonStop Kernel (NSK) message system has resumed using a path from the reporting processor. Generally this event occurs when a processor comes up or when an operator sends a Subsystem Control Facility (SCF) command to bring up the fabric.

Effect The indicated processors may resume communication via the restored path.

Recovery This is an informational message only; no corrective action is needed.

112

Processor reporting-cpu has lost connectivity to the path path due to path-reason.

`reporting-cpu`	is the processor number of the processor that reports the problem.
`path`	identifies the fabric to which connectivity has been lost.
`path-reason`	is the reason that connectivity was lost.

Cause The reporting processor’s connection to the indicated ServerNet fabric was brought down.

Effect The processor no longer attempts to communicate with the rest of the system or ServerNet cluster via the indicated fabric.

Recovery If the fabric was down because of:

A hardware problem- correct the problem then bring the fabric up using the SCF START SERVERNET command.
An operator SCF command- bring the fabric up using the SCF START SERVERNET command.

113

Processor reporting-cpu has recovered connectivity to the path path due to path-reason.

`reporting-cpu`	is the processor number of the processor which has just regained connectivity.
`path`	identifies the fabric to which the reporting processor has regained connectivity.
`path-reason`	is the reason connectivity was regained.

Cause Connectivity between the indicated processor and the fabric has been restored.

Effect The processor is able to again communicate with other system components through this fabric.

Recovery This is an informational message only; no corrective action is needed.

114

OPERATOR ATTENTION NEEDED. Connectivity on the path path of processor reporting-cpu is still down due to path-reason.

`path`	is the path that contains the fabric to which the reporting processor cannot connect.
`reporting-cpu`	is the processor number of the processor which has no connectivity to the fabric.
`path-reason`	is the reason that connectivity was lost.

Cause The processor has had no connection to the fabric for the duration displayed.

Effect The processor is not able to communicate with the other system components over the indicated fabric.

Recovery If the fabric is down because of:

A hardware problem- correct the problem then bring the fabric up using the SCF START SERVERNET command.
An operator (SCF) command- bring the fabric up using the SCF START SERVERNET command.

115

Event logging for path path from processor reporting-cpu to processor problem-cpu {in ServerNet node node-number} is suppressed due to excessive path state transitions.

`path`	is the fabric over which the message was transmitted.
`reporting-cpu`	is the processor number of the processor making the report.
`problem-cpu`	is the processor at the other end of the point-to-point connection.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

Cause There have been excessive path transitions.

Effect The logging of PATH-UP and PATH-DOWN events on the indicated path is suspended to avoid flooding the logs.

Recovery This is an informational message only; no corrective action is needed.

116

The path path from processor reporting-cpu to processor problem-cpu {in ServerNet node node-number} had count automatic recoveries since the last log.

`path`	is the fabric over which the message was transmitted.
`reporting-cpu`	is the processor number of the processor making this report.
`problem-cpu`	is the processor at the other end of the point-to-point connection.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and will not be passed if the reporting and problem processors are in the same (local) node.
`count`	is the count of automatic recoveries.

Cause The number of automatic recoveries has been recorded for the indicated path since the last log.

Effect None, the system is simply displaying a count indicating ongoing actions.

Recovery This is an informational message only; no corrective action is needed.

120

BTE timeouts reported on the path path from processor reporting-cpu to processor problem-cpu {in ServerNet node node-number}. Number of BTE timeouts: count

`path`	is the path that contains the fabric over which the message system was attempting to transmit.
`reporting-cpu`	is the processor number of the processor making this report, in this case, the sending processor.
`problem-cpu`	is the processor at the other end of the point-to-point connection, in this case, the target processor.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`count`	is the count of BTE timeout occurrences.

Cause BTE timeouts occurred on the indicated path.

Effect The transmission is automatically retried by the NonStop Kernel message-system.

Recovery This is an informational message only; no corrective action is needed.

121

BARRIER timeouts on the path path from processor reporting-cpu to processor problem-cpu {in ServerNet node node-number}. Number of BARRIER timeouts: count

`path`	is the path that contains the fabric over which the message system was attempting to transmit.
`reporting-cpu`	is the processor number of the processor making this report, in this case, the sending processor.
`problem-cpu`	is the processor at the other end of the point-to-point connection, in this case, the target processor.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`count`	is the number of barrier timeout occurrences.

Cause Either the network is congested, the problem processor is in a hardware freeze state, or the ServerNet connect is severed or unusable.

Effect The path is downed and the message is retried on the other fabric.




	NOTE: The PATH-DOWN events will be reported as a result of this error.

Recovery This is an informational message only; no corrective action is needed.

122

Spurious ServerNet acks received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Number of Spurious acks: count

`path`	is the path that contains the fabric over which the acknowledgments were received.
`reporting-cpu`	is the processor number of the processor making this report, in this case, the processor receiving the acknowledgments.
`problem-cpu`	is the processor at the other end of the point-to-point connection, in this case, the processor purportedly sending the acknowledgments.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`count`	is the count of spurious acknowledgments.

Cause Spurious ServerNet acknowledgments occurred on the indicated path.

Effect None.




	NOTE: Your_Spurious acks are normally reported if the problem processor is an S70000 processor that is subjected to heavy ServerNet traffic. Spurious acknowledgments may be accompanied by BTE timeouts. In this case, the transmission is automatically retried by the NonStop Kernel message-system.

Recovery This is an informational message only; no corrective action is needed.

`path`	is the path that contains the fabric on which the out-of-sequence message/s was/were received.
`reporting-cpu`	is the processor number of the processor making this report, in this case, the processor receiving the sequence errors.
`problem-cpu`	is the processor at the other end of the point-to-point connection, in this case, the processor sending the out-of-sequence message/s.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`count`	is the count of sequence errors.

Cause An out-of-sequence message or messages occurred.

Effect None. A summary of out-of-date sequence errors is logged periodically.

Recovery This is an informational message only; no corrective action is needed.

123

Sequence errors received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Number of sequence errors: count

path	is the path that contains the fabric on which the out-of-sequence message or messages were received.
reporting-cpu	is the processor number of the processor making this report; in this case, the processor sending the out-of-sequence messages.
problem-cpu	is the processor at the other end of the point-to-point connection; in this case, the processor sending the out-of-sequence message or messages.
node-number	is the cluster node number of the cluster node containing the problem processor. This is an optional parameter and is not passed if the reporting and the problem processors are i9n the same (local) node..
count	is the count of sequence errors.

Cause One or more out-of-sequence messages occurred.

Effect None. The summary is logged periodically.

Recovery This is an informational message only; no corrective action is needed.

124

Bad ServerNet packets received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. ServerNet Transaction typ: tr-type. Details of the error counts are: Unsupported pkt type: unsup-pkt-type, Unsupported pkt length: unsup-pkt-length, Bad ServerNet address mask: bad-mask, Bad ServerNet source: bad-source, AVT access error: access-error, Bad Interrupt: bad-interrupt, Interrupt to full Queue: int-to-full-q

`path`	is the path that contains the fabric on which the errors occurred.
`reporting-cpu`	is the processor number of the processor making this report.
`problem-cpu`	is the processor at the other end of the point-to-point connection.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`tr-type`	is the ServerNet transaction type (e.g. read, write, etc.).
`unsup-pkt-type`	is the count of unsupported packet type errors detected.
`unsup-pkt-length`	is the count of unsupported packet length errors detected.
`bad-mask`	is the count of bad ServerNet address mask errors detected.
`bad-source`	is the count of bad ServerNet source errors detected.
`access-error`	is the count of AVT access errors that occurred.
`bad-interrupt`	is the count of bad interrupts that occurred.
`int-to-full-q`	is the count of ServerNet interrupts that occurred while the queue was full.

Cause A summary of the bad packet-type errors detected on the indicated path is logged.

Effect None. The summary is logged periodically.

Recovery This is an informational message only; no corrective action is needed.

125

Nacks received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Details of the error counts are: Unsupported pkt type: unsup-pkt-type, Unsupported pkt length: unsup-pkt-length, Bad ServerNet address mask: bad-mask, Bad ServerNet source: bad-source, AVT access error: access-error, Bad Interrupt: bad-interrupt, Interrupt to full Queue: int-to-full-q

`path`	is the path that contains the fabric on which the negative acknowledgments occurred.
`reporting-cpu`	is the processor number of the processor making this report.
`problem-cpu`	is the processor at the other end of the point-to-point connection.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`unsup-pkt-type`	is the count of unsupported packet type errors detected.
`unsup-pkt-length`	is the count of unsupported packet length errors detected.
`bad-mask`	is the count of bad ServerNet address mask errors detected.
`bad-source`	is the count of bad ServerNet source errors detected.
`access-error`	is the count of AVT access errors that occurred.
`bad-interrupt`	is the count of bad interrupts that occurred.
`int-to-full-q`	is the count of ServerNet interrupts that occurred while the queue was full.

Cause A summary of NACKs encountered on the indicated path is logged.

Effect None. The summary is logged periodically.

Recovery This is an informational message only; no corrective action is needed.

126

Bad destination ServerNet packets are received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Number of bad destination packets: count

`path`	is the path that contains the fabric on which the errors were detected.
`reporting-cpu`	is the processor number of the processor making this report.
`problem-cpu`	is the processor at the other end of the point-to-point connection.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`count`	is the count of invalid destination ID errors.

Cause A summary of packet counts with invalid destination ID is received on the indicated path.

Effect None. The summary is logged periodically.

Recovery This is an informational message only; no corrective action is needed.

127

Unexpected ServerNet packets received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Number of unexpected packets: count

`path`	is the path that contains the fabric on which the errors were detected.
`reporting-cpu`	is the processor number of the processor making this report.
`problem-cpu`	is the processor at the other end of this point-to-point connection.
`node-number`	is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.
`count`	is the count of unexpected packets.

Cause A summary count of unexpected packets received on the indicated path is logged.

Effect None. The summary is logged periodically.

Recovery This is an informational message only; no corrective action is needed.

140

R10K speculative write problem(s) encountered on reporting-cpu Instances of this problem since: last log log-spec-write, coldload life-spec-write Attempts to use alternate buffer: Since last log: Since coldload: [Successful: log -alt-sw-ok life-alt-sw-ok] [Failed: log-alt-sw-fail life-alt-sw-fail ] Last occurrence of the problem: Buffer with error: Address: fail-buf-addr, Type: fail-buf-type, [Source processor: source-cpu, Req Ctrl Size: req-ctrl- size, Req Data Size: req-data-size ], [Source processor: source-cpu, Reply Data Size: reply-data- size], [Source processor: source-cpu, Reply Ctrl Size: reply-ctrl- size,], [Source processor: source-cpu, Reply Ctrl Size: reply-ctrl- size, Reply Data Size: reply-data-size], [Source: Node: source-cluster, Processor: source-cpu, PIN: pin [1], Destination: pin [2] [Source: Internal Use: internal-data]

`reporting-cpu`	is the processor number of the processor on which the speculative write error was detected.
`log-spec-write`	is the number of occurrences of this event since the last time one was logged.
`life-spec-write`	is the number of occurrences of this event, in this processor, since the system was coldloaded.
`log-alt-sw-ok`	is the number of times message transmission successfully used the alternate buffer (because an error occurred in the primary buffer) since the last time an occurrence of this event was logged.
`log-alt-sw-fail`	is the number of times the message system was unable to switch to the alternate buffer (after detecting a speculative writer error in the primary buffer) since the last time an occurrence of this event was logged.
`life-alt-sw-ok`	is the number of times message transmission successfully used the alternate buffer (because an error occurred in the primary buffer) since the reporting processor was coldloaded or reloaded.
`life-alt-sw-fail`	is the number of times the message system was unable to switch to the alternate buffer (after detecting a speculative write error in the primary buffer) since the reporting processor was coldloaded or reloaded.
`fail-buf-addr`	is the address of the buffer in which the speculative write error was detected.
`fail-buf-type`	is the type of buffer in which the speculative write error was detected..
`source-cpu`	is the processor number of the other processor involved in this event.
`req-ctrl-size`	is the size of the message’s request control element, in bytes.
`req-data-size`	is the size of the message’s data element, in bytes..
`reply-data-size`	is the size of the message’s data reply element, in bytes.
`reply-ctrl-size`	is the size of the message’s reply control element, in bytes.
`source-cluster`	is the cluster number of the other cluster involved in the problem of this event.
`pin [1]`	is the process identifier of the process which originated the message.
`pin [2]`	is the process identifier message’s destination process.
`internal-data`	is a piece of internal message system data intended to assist in the debugging.

Cause The message-system performed error recovery after detecting a potential buffer corruption during message traffic handling.

Effect The NonStop Kernel message system automatically retries the failing message.

Recovery This is an informational message only; no corrective action is needed.

150

Processor reporting-cpu started regroup because of processor problem-cpu with reason: regroup-reason. Processors: before regroup init-cpu-mask after regroup end-cpu-mask Regroup sequence numbers: Current sequence-no [1] Previous sequence-no [2]. Duration of this incident: duration milliseconds.

`reporting-cpu`	is the processor number of the processor making this report. It is the highest numbered processor which survived the regroup incident.
`problem-cpu`	is the subject token to which all communication was lost.
`regroup-reason`	is the reason that the regroup was started.
`init-cpu-mask`	is a bit mask of the “up” processors before the regroup began.
`end-cpu-mask`	is a bit mask of the “up” processors after the regroup completes.
`sequence-no` [1]	is the system regroup sequence number as of the end of the logged regroup incident.
`sequence-no` [2]	is the system regroup sequence number as of the beginning of the logged regroup incident.

Cause A regroup incident has occurred.

Effect One or more processors might have halted.

Recovery Determine the reason why the processors halted and follow the installation procedures to document the problem and reload the processors.

Minimally, any processor which halted should be dumped and the dumps should be transmitted to your service provider along with system documentation files (e.g. tsysclr, conflist, service logs, etc.).

160

RCVDUMP/RELOAD failed with the reason fail-cause for the processor cpu with the number of retries num-retries and type of dump is dumptype. Other details are dump-rld-type: Specification of the dump, either Reload or RCVDUMP, slice-id: Slice where error occurred, last-xfab-err: Last X fabric error, last-yfab-err: Last Y fabric error, avt-mapping: AVT mapping status for the dump.

`fail-cause`	indicates the reason for the RCVDUMP/RELOAD failure.
`cpu`	is the processor number where the failure occurred.
`num-retries`	is the number of implicit retries.
`dumptype`	is the type of dump.
`dump-rld-type`	is the dump specification, either Reload or RCVDUMP.
`slice-id`	identifies the slice where the error occurred.
`last-xfab-err`	is the last error that occurred on X fabric.
`last-yfab-err`	is the last error that occurred on Y fabric.
`avt-mapping`	is the AVT mapping status for the dump.

Cause A RCVDUMP/RELOAD failure occurred.

Effect The RCVDUMP/RELOAD did not complete.

Recovery This occurrence of this message indicates a likely hardware problem. Contact your service provider.

170

Message failed due to a request buffer being modified while the message was in transit. Sending Pin: sending-pin, Buffer size: buffer-size, Buffer context: buffer-context, Buffer context-relative address: buffer-craddr, Expected checksum: expected-checksum, Calculated checksum: calculated-checksum, Recalculated checksum: recalculated-checksum, Buffer type:buffer-type, Retry count: Num-retries.

`sending-pin`	is the client processor pin number that owned the buffer that was modified.
`buffer-size`	is the size of the message buffer that was modified.
`buffer-context`	is the buffer CBA context.
`buffer-craddr`	is the buffer CBA context-relative address.
`expected-checksum`	is the expected checksum calculated by the client CPU before the buffer was modified .
`calculated-checksum`	is the checksum calculated by the server CPU upon receiving the modified buffer.
`recalculated-checksum`	is the checksum recalculated by the client CPU after being informed by the server CPU that the expected and calculated checksums did not match.
`buffer-type`	is the type of message buffer that was modified (i.e., either a request control or a request data buffer).
`num-retries`	is the number of retries performed to attempt to recover from modifications in the message buffer.

Cause Failed memory handling check due to a message buffer being modified while the message was in transit.

Effect The message fails with File System error 654 (“A message or I/O operation failed due to a message or I/O buffer being modified while the operation was in progress.”)

Recovery Contact your service provider to determine if the buffer was modified due to a possible programming error in the process represented by the sending-pin. In particular, a programming error is highly likely if a retry count (num-retries) greater than 0 is reported in the event (this signifies that the buffer was modified multiple times, thereby preventing retries from succeeding). Note that the NonStop Kernel message system automatically retries the failing message (up to a maximum retry limit) if the AUTO_RETRY_ON_ERROR_654 Kernel subsystem parameter is configured with a value of ON. You can determine the value of this parameter by issuing the SCF INFO SUBSYS $ZZKRN, DETAIL command. Conversely, a retry count of 0 signifies that the AUTO_RETRY_ON_ERROR_654 Kernel subsystem parameter is configured with a value of OFF, thereby disallowing retries when the NonStop Kernel message system detects that a message buffer was modified while the message was in transit. A possible programming error in the process represented by the sending-pinshould also be suspected even if the retry count is 0. However, if applications running in the system have a legitimate reason to modify message buffers of in-transit messages, then consider enabling automatic retries for modified message buffers. This can be accomplished by configuring the AUTO_RETRY_ON_ERROR_654 Kernel subsystem parameter with a value of ON through the SCF ALTER SUBSYS $ZZKRN, AUTO_RETRY_ON_ERROR_654 on command. For more details please refer to the SCF Reference Manual for the Kernel Subsystem.

171

Message failed due to a reply buffer being modified while the message was in transit. Sending Pin: sending-pin, Buffer size: buffer-size, Buffer context: buffer-context, Buffer context-relative address: buffer-craddr, Expected checksum: expected-checksum, Calculated checksum: calculated-checksum, Buffer type:buffer-type, Retry count: Num-retries.

`sending-pin`	is the server processor pin number that owned the buffer that was modified.
`buffer-size`	is the size of the message buffer that was modified.
`buffer-context`	is the buffer CBA context.
`buffer-craddr`	is the buffer CBA context-relative address.
`expected-checksum`	is the expected checksum calculated by the server CPU before the buffer was modified.
`calculated-checksum`	is the checksum calculated by the client CPU upon receiving the modified buffer.
`buffer-type`	is the type of message buffer that was modified (either a reply control or a reply data buffer).
`num-retries`	is the number of retries performed to attempt to recover from modifications in the message buffer.

Cause Failed memory handling check due to a message buffer being modified while the message was in transit.

Effect The message fails with File System error 654 (“A message or I/O operation failed due to a message or I/O buffer being modified while the operation was in progress.”)

Recovery Contact your service provider to determine if the buffer was modified due to a possible programming error in the client process. In particular, a programming error is highly likely if a retry count (num-retries) greater than 0 is reported in the event (this signifies that the buffer was modified multiple times, thereby preventing retries from succeeding). Note that the NonStop Kernel message system automatically retries the failing message (up to a maximum retry limit) for an inflight reply buffer. A possible programming error in the client process must also be suspected even if the retry count is 0. For more information, see the SCF Reference Manual for the Kernel Subsystem.