Operator Messages Manual

Chapter 45 IPC (NonStop Kernel Operating System Message System) Messages

The messages in this chapter are generated by the NonStop™ Kernel operating system Message System (IPC) subsystem. The subsystem ID displayed by these messages includes IPC as the subsystem name.

NOTE: Negative-numbered messages are common to most subsystems. If you receive a negative-numbered message that is not described in this chapter, see Chapter 15.


100

Sequence error packets received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Expected sequence number: number-expected and the received sequence number: number-received

path

is the path that contains the fabric over which the message was transmitted.

reporting-cpu

is the processor number of the processor on which the error was detected.

problem-cpu

is the processor in which the message originated.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

number-expected

is the sequence number that the processor expected to receive.

number-received

is the sequence number contained in the message received.

Cause  A ServerNet message system interrupt packet was received twice or was dropped.

Effect  The out-of-sequence message will not be acknowledged. The message sender will eventually get a WACK timeout and will undergo a recovery protocol to resynchronize the sequence number. The message will be retried.

Recovery  This is an informational message only; no corrective action is needed.



101

Bad ServerNet packet received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Details of the bad packet are: status of packet: packet-status, ServerNet transaction type: tr-type, ServerNet address used: snet-addr, bytecount in packet: byte-count

path

is the path that contains the fabric over which the message was transmitted.

reporting-cpu

is the processor number of the processor on which the error was detected.

problem-cpu

is the processor in which the message originated.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

packet-status

is the ServerNet error code contained in the packet.

tr-type

is the type of ServerNet transaction being attempted.

addr-used

is the target ServerNet address in the packet.

byte-count

is the length of the packet received, in bytes.

Cause  This is probably caused by a protocol error on the part of the NonStop Kernel message system software.

Effect  The packet received is ignored and the problem processor is responsible for initiating error recovery.

Recovery  This is an informational message only; no corrective action is needed.



102

ServerNet nack received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Details of the nack are: Nack Status code: nack-code, sibs used for the transfer: sib-type, ServerNet address used for transfer: snet-addr, bytecount for the transfer: byte-count

path

is the path that contains the fabric over which the message was transmitted.

reporting-cpu

is the processor number of the processor on which the error was detected.

problem-cpu

is the processor in which the message originated.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

nack-code

is the ServerNet NACK code contained in the packet.

sib-type

is the type of Send Info Block (SIB, a NonStop Kernel message system data structure) that was used by the NonStop Kernel sending logic to send the packet that was rejected in this event.

snet-addr

is the target ServerNet address in the packet.

byte-count

is the length of the packet received, in bytes.

Cause  The problem processor might be down or recovering from a power failure. Also, this event might be caused by a protocol error on the part of the NonStop Kernel message-system.

Effect  If the error is due to a power fail recovery or the nack status code indicates an interrupt to a full queue, the packet will eventually be resent. Otherwise, both plans will be downed and communication to the problem processor will be severed. If the problem and reporting processors are in the same system, regroup will be invoked, which will halt one of the processors.

Recovery  This is an informational message only; no corrective action is needed unless a processor halts. If a processor does halt:

  • Take a dump of the halted processor.

  • Take an online dump of the problem or reporting processor.

  • RELOAD the halted processor.



103

Bad destination ServerNet packet received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Details of the bad packets are: ServerNet transaction type: tr-type, ServerNet address used: snet-addr, bytecount in the packet: byte-count, Intented destination node: dest-node

path

is the path that contains the fabric over which the message was transmitted.

reporting-cpu

is the processor number of the processor on which the error was detected.

problem-cpu

is the processor in which the message originated.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

tr-type

is the type of ServerNet transaction being attempted.

snet-addr

is the target ServerNet address in the packet.

byte-count

is the length of the packet received, in bytes.

dest-node

is the ServerNet address of the intended destination processor.

Cause  An error might have occurred in the routing tables of the ServerNet network.

Effect  The reporting processor ignores the packet.

If the ServerNet routing tables are not reliable, then it is possible that the reporting and problem processors will not be able to communicate. The NonStop Kernel message system will detect the problem and, if both the processors are in the same system, cause one of the involved processors to halt.

Recovery  This is an informational message only; no corrective action is needed unless a processor halts. If a processor does halt:

  • Take a dump of the halted processor.

  • Take an online dump of the problem or reporting processor.

  • RELOAD the halted processor.



104

Unexpected packet received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}.

path

is the path that contains the fabric over which the message was transmitted.

reporting-cpu

is the processor number of the processor on which the error was detected.

problem-cpu

is the processor in which the message originated.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

Cause  The problem processor was sending “I'm alive” packets to the reporting processor, and the reporting processor has declared the problem processor as down.

Effect  Generally, the reporting processor ignores the packet; but for packets originating in the local system, the reporting processor returns a “poison packet” to the problem processor, causing it to halt itself.

Recovery  This is an informational message only; no corrective action is needed. If a processor halts, contact your service provider.



110

The path path from processor reporting-cpu to processor problem-cpu {in ServerNet node node-number} was DOWNED due to reason. OPERATOR ATTENTION NEEDED. Path had excessive failures and will NOT be recovered automatically.

path

is the path that contains the fabric over which the message was transmitted.

reporting-cpu

is the processor number of the processor on which the error was detected.

problem-cpu

is the processor in which the message originated.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

reason

is the reason the path was taken down.

Cause  The NonStop Kernel (NSK) has brought down the path used by the reporting processor in order to communicate with the problem processor.

Effect  The downed path will no longer be used for communication between the indicated processors. Another path will be used.

Recovery  The operator can try to bring the path back up using the SCF START SERVERNET command. Alternatively, unless the path was downed by the operator (SCF command) or due to fabric failure, NonStop Kernel automatic path recovery will attempt to recover the path.



111

The path path from processor reporting-cpu to processor problem-cpu {in ServerNet node node-number}, was brought UP due to reason.

path

is the path that contains the fabric over which the message was transmitted.

reporting-cpu

is the processor number of the processor making this report.

problem-cpu

is the processor at the other end of the point-to-point connection.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

reason

indicates the reason why the path was brought down.

Cause  The NonStop Kernel (NSK) message system has resumed using a path from the reporting processor. Generally this event occurs when a processor comes up or when an operator sends a Subsystem Control Facility (SCF) command to bring up the fabric.

Effect  The indicated processors may resume communication via the restored path.

Recovery  This is an informational message only; no corrective action is needed.



112

Processor reporting-cpu has lost connectivity to the path path due to path-reason.

reporting-cpu

is the processor number of the processor that reports the problem.

path

identifies the fabric to which connectivity has been lost.

path-reason

is the reason that connectivity was lost.

Cause  The reporting processor’s connection to the indicated ServerNet fabric was brought down.

Effect  The processor no longer attempts to communicate with the rest of the system or ServerNet cluster via the indicated fabric.

Recovery  If the fabric was down because of:

  • A hardware problem- correct the problem then bring the fabric up using the SCF START SERVERNET command.

  • An operator SCF command- bring the fabric up using the SCF START SERVERNET command.



113

Processor reporting-cpu has recovered connectivity to the path path due to path-reason.

reporting-cpu

is the processor number of the processor which has just regained connectivity.

path

identifies the fabric to which the reporting processor has regained connectivity.

path-reason

is the reason connectivity was regained.

Cause  Connectivity between the indicated processor and the fabric has been restored.

Effect  The processor is able to again communicate with other system components through this fabric.

Recovery  This is an informational message only; no corrective action is needed.



114

OPERATOR ATTENTION NEEDED. Connectivity on the path path of processor reporting-cpu is still down due to path-reason.

path

is the path that contains the fabric to which the reporting processor cannot connect.

reporting-cpu

is the processor number of the processor which has no connectivity to the fabric.

path-reason

is the reason that connectivity was lost.

Cause  The processor has had no connection to the fabric for the duration displayed.

Effect  The processor is not able to communicate with the other system components over the indicated fabric.

Recovery  If the fabric is down because of:

  • A hardware problem- correct the problem then bring the fabric up using the SCF START SERVERNET command.

  • An operator (SCF) command- bring the fabric up using the SCF START SERVERNET command.



115

Event logging for path path from processor reporting-cpu to processor problem-cpu {in ServerNet node node-number} is suppressed due to excessive path state transitions.

path

is the fabric over which the message was transmitted.

reporting-cpu

is the processor number of the processor making the report.

problem-cpu

is the processor at the other end of the point-to-point connection.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

Cause  There have been excessive path transitions.

Effect  The logging of PATH-UP and PATH-DOWN events on the indicated path is suspended to avoid flooding the logs.

Recovery  This is an informational message only; no corrective action is needed.



116

The path path from processor reporting-cpu to processor problem-cpu {in ServerNet node node-number} had count automatic recoveries since the last log.

path

is the fabric over which the message was transmitted.

reporting-cpu

is the processor number of the processor making this report.

problem-cpu

is the processor at the other end of the point-to-point connection.

node-number

is the node number of the node containing the problem processor. This is an optional token and will not be passed if the reporting and problem processors are in the same (local) node.

count

is the count of automatic recoveries.

Cause  The number of automatic recoveries has been recorded for the indicated path since the last log.

Effect  None, the system is simply displaying a count indicating ongoing actions.

Recovery  This is an informational message only; no corrective action is needed.



120

BTE timeouts reported on the path path from processor reporting-cpu to processor problem-cpu {in ServerNet node node-number}. Number of BTE timeouts: count

path

is the path that contains the fabric over which the message system was attempting to transmit.

reporting-cpu

is the processor number of the processor making this report, in this case, the sending processor.

problem-cpu

is the processor at the other end of the point-to-point connection, in this case, the target processor.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

count

is the count of BTE timeout occurrences.

Cause  BTE timeouts occurred on the indicated path.

Effect  The transmission is automatically retried by the NonStop Kernel message-system.

Recovery  This is an informational message only; no corrective action is needed.



121

BARRIER timeouts on the path path from processor reporting-cpu to processor problem-cpu {in ServerNet node node-number}. Number of BARRIER timeouts: count

path

is the path that contains the fabric over which the message system was attempting to transmit.

reporting-cpu

is the processor number of the processor making this report, in this case, the sending processor.

problem-cpu

is the processor at the other end of the point-to-point connection, in this case, the target processor.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

count

is the number of barrier timeout occurrences.

Cause  Either the network is congested, the problem processor is in a hardware freeze state, or the ServerNet connect is severed or unusable.

Effect  The path is downed and the message is retried on the other fabric.

NOTE: The PATH-DOWN events will be reported as a result of this error.

Recovery  This is an informational message only; no corrective action is needed.



122

Spurious ServerNet acks received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Number of Spurious acks: count

path

is the path that contains the fabric over which the acknowledgments were received.

reporting-cpu

is the processor number of the processor making this report, in this case, the processor receiving the acknowledgments.

problem-cpu

is the processor at the other end of the point-to-point connection, in this case, the processor purportedly sending the acknowledgments.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

count

is the count of spurious acknowledgments.

Cause  Spurious ServerNet acknowledgments occurred on the indicated path.

Effect  None.

NOTE: Your_Spurious acks are normally reported if the problem processor is an S70000 processor that is subjected to heavy ServerNet traffic.

Spurious acknowledgments may be accompanied by BTE timeouts. In this case, the transmission is automatically retried by the NonStop Kernel message-system.

Recovery  This is an informational message only; no corrective action is needed.

path

is the path that contains the fabric on which the out-of-sequence message/s was/were received.

reporting-cpu

is the processor number of the processor making this report, in this case, the processor receiving the sequence errors.

problem-cpu

is the processor at the other end of the point-to-point connection, in this case, the processor sending the out-of-sequence message/s.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

count

is the count of sequence errors.

Cause   An out-of-sequence message or messages occurred.

Effect  None. A summary of out-of-date sequence errors is logged periodically.

Recovery  This is an informational message only; no corrective action is needed.



123

Sequence errors received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Number of sequence errors: count

path

is the path that contains the fabric on which the out-of-sequence message or messages were received.

reporting-cpu

is the processor number of the processor making this report; in this case, the processor sending the out-of-sequence messages.

problem-cpu

is the processor at the other end of the point-to-point connection; in this case, the processor sending the out-of-sequence message or messages.

node-number

is the cluster node number of the cluster node containing the problem processor. This is an optional parameter and is not passed if the reporting and the problem processors are i9n the same (local) node..

count

is the count of sequence errors.

Cause  One or more out-of-sequence messages occurred.

Effect  None. The summary is logged periodically.

Recovery  This is an informational message only; no corrective action is needed.



124

Bad ServerNet packets received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. ServerNet Transaction typ: tr-type. Details of the error counts are: Unsupported pkt type: unsup-pkt-type, Unsupported pkt length: unsup-pkt-length, Bad ServerNet address mask: bad-mask, Bad ServerNet source: bad-source, AVT access error: access-error, Bad Interrupt: bad-interrupt, Interrupt to full Queue: int-to-full-q

path

is the path that contains the fabric on which the errors occurred.

reporting-cpu

is the processor number of the processor making this report.

problem-cpu

is the processor at the other end of the point-to-point connection.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

tr-type

is the ServerNet transaction type (e.g. read, write, etc.).

unsup-pkt-type

is the count of unsupported packet type errors detected.

unsup-pkt-length

is the count of unsupported packet length errors detected.

bad-mask

is the count of bad ServerNet address mask errors detected.

bad-source

is the count of bad ServerNet source errors detected.

access-error

is the count of AVT access errors that occurred.

bad-interrupt

is the count of bad interrupts that occurred.

int-to-full-q

is the count of ServerNet interrupts that occurred while the queue was full.

Cause  A summary of the bad packet-type errors detected on the indicated path is logged.

Effect  None. The summary is logged periodically.

Recovery  This is an informational message only; no corrective action is needed.



125

Nacks received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Details of the error counts are: Unsupported pkt type: unsup-pkt-type, Unsupported pkt length: unsup-pkt-length, Bad ServerNet address mask: bad-mask, Bad ServerNet source: bad-source, AVT access error: access-error, Bad Interrupt: bad-interrupt, Interrupt to full Queue: int-to-full-q

path

is the path that contains the fabric on which the negative acknowledgments occurred.

reporting-cpu

is the processor number of the processor making this report.

problem-cpu

is the processor at the other end of the point-to-point connection.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

unsup-pkt-type

is the count of unsupported packet type errors detected.

unsup-pkt-length

is the count of unsupported packet length errors detected.

bad-mask

is the count of bad ServerNet address mask errors detected.

bad-source

is the count of bad ServerNet source errors detected.

access-error

is the count of AVT access errors that occurred.

bad-interrupt

is the count of bad interrupts that occurred.

int-to-full-q

is the count of ServerNet interrupts that occurred while the queue was full.

Cause   A summary of NACKs encountered on the indicated path is logged.

Effect  None. The summary is logged periodically.

Recovery  This is an informational message only; no corrective action is needed.



126

Bad destination ServerNet packets are received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Number of bad destination packets: count

path

is the path that contains the fabric on which the errors were detected.

reporting-cpu

is the processor number of the processor making this report.

problem-cpu

is the processor at the other end of the point-to-point connection.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

count

is the count of invalid destination ID errors.

Cause   A summary of packet counts with invalid destination ID is received on the indicated path.

Effect  None. The summary is logged periodically.

Recovery  This is an informational message only; no corrective action is needed.



127

Unexpected ServerNet packets received on the path path by processor reporting-cpu from processor problem-cpu {in ServerNet node node-number}. Number of unexpected packets: count

path

is the path that contains the fabric on which the errors were detected.

reporting-cpu

is the processor number of the processor making this report.

problem-cpu

is the processor at the other end of this point-to-point connection.

node-number

is the node number of the node containing the problem processor. This is an optional token and is not passed if the reporting and problem processors are in the same (local) node.

count

is the count of unexpected packets.

Cause  A summary count of unexpected packets received on the indicated path is logged.

Effect  None. The summary is logged periodically.

Recovery  This is an informational message only; no corrective action is needed.



140

R10K speculative write problem(s) encountered on reporting-cpu Instances of this problem since: last log log-spec-write, coldload life-spec-write Attempts to use alternate buffer: Since last log: Since coldload: [Successful: log -alt-sw-ok life-alt-sw-ok] [Failed: log-alt-sw-fail life-alt-sw-fail ] Last occurrence of the problem: Buffer with error: Address: fail-buf-addr, Type: fail-buf-type, [Source processor: source-cpu, Req Ctrl Size: req-ctrl- size, Req Data Size: req-data-size ], [Source processor: source-cpu, Reply Data Size: reply-data- size], [Source processor: source-cpu, Reply Ctrl Size: reply-ctrl- size,], [Source processor: source-cpu, Reply Ctrl Size: reply-ctrl- size, Reply Data Size: reply-data-size], [Source: Node: source-cluster, Processor: source-cpu, PIN: pin [1], Destination: pin [2] [Source: Internal Use: internal-data]

reporting-cpu

is the processor number of the processor on which the speculative write error was detected.

log-spec-write

is the number of occurrences of this event since the last time one was logged.

life-spec-write

is the number of occurrences of this event, in this processor, since the system was coldloaded.

log-alt-sw-ok

is the number of times message transmission successfully used the alternate buffer (because an error occurred in the primary buffer) since the last time an occurrence of this event was logged.

log-alt-sw-fail

is the number of times the message system was unable to switch to the alternate buffer (after detecting a speculative writer error in the primary buffer) since the last time an occurrence of this event was logged.

life-alt-sw-ok

is the number of times message transmission successfully used the alternate buffer (because an error occurred in the primary buffer) since the reporting processor was coldloaded or reloaded.

life-alt-sw-fail

is the number of times the message system was unable to switch to the alternate buffer (after detecting a speculative write error in the primary buffer) since the reporting processor was coldloaded or reloaded.

fail-buf-addr

is the address of the buffer in which the speculative write error was detected.

fail-buf-type

is the type of buffer in which the speculative write error was detected..

source-cpu

is the processor number of the other processor involved in this event.

req-ctrl-size

is the size of the message’s request control element, in bytes.

req-data-size

is the size of the message’s data element, in bytes..

reply-data-size

is the size of the message’s data reply element, in bytes.

reply-ctrl-size

is the size of the message’s reply control element, in bytes.

source-cluster

is the cluster number of the other cluster involved in the problem of this event.

pin [1]

is the process identifier of the process which originated the message.

pin [2]

is the process identifier message’s destination process.

internal-data

is a piece of internal message system data intended to assist in the debugging.

Cause   The message-system performed error recovery after detecting a potential buffer corruption during message traffic handling.

Effect  The NonStop Kernel message system automatically retries the failing message.

Recovery  This is an informational message only; no corrective action is needed.



150

Processor reporting-cpu started regroup because of processor problem-cpu with reason: regroup-reason. Processors: before regroup init-cpu-mask after regroup end-cpu-mask Regroup sequence numbers: Current sequence-no [1] Previous sequence-no [2]. Duration of this incident: duration milliseconds.

reporting-cpu

is the processor number of the processor making this report. It is the highest numbered processor which survived the regroup incident.

problem-cpu

is the subject token to which all communication was lost.

regroup-reason

is the reason that the regroup was started.

init-cpu-mask

is a bit mask of the “up” processors before the regroup began.

end-cpu-mask

is a bit mask of the “up” processors after the regroup completes.

sequence-no [1]

is the system regroup sequence number as of the end of the logged regroup incident.

sequence-no [2]

is the system regroup sequence number as of the beginning of the logged regroup incident.

Cause  A regroup incident has occurred.

Effect  One or more processors might have halted.

Recovery  Determine the reason why the processors halted and follow the installation procedures to document the problem and reload the processors.

Minimally, any processor which halted should be dumped and the dumps should be transmitted to your service provider along with system documentation files (e.g. tsysclr, conflist, service logs, etc.).



160

RCVDUMP/RELOAD failed with the reason fail-cause for the processor cpu with the number of retries num-retries and type of dump is dumptype. Other details are dump-rld-type: Specification of the dump, either Reload or RCVDUMP, slice-id: Slice where error occurred, last-xfab-err: Last X fabric error, last-yfab-err: Last Y fabric error, avt-mapping: AVT mapping status for the dump.

fail-cause

indicates the reason for the RCVDUMP/RELOAD failure.

cpu

is the processor number where the failure occurred.

num-retries

is the number of implicit retries.

dumptype

is the type of dump.

dump-rld-type

is the dump specification, either Reload or RCVDUMP.

slice-id

identifies the slice where the error occurred.

last-xfab-err

is the last error that occurred on X fabric.

last-yfab-err

is the last error that occurred on Y fabric.

avt-mapping

is the AVT mapping status for the dump.

Cause  A RCVDUMP/RELOAD failure occurred.

Effect  The RCVDUMP/RELOAD did not complete.

Recovery  This occurrence of this message indicates a likely hardware problem. Contact your service provider.



170

Message failed due to a request buffer being modified while the message was in transit. Sending Pin: sending-pin, Buffer size: buffer-size, Buffer context: buffer-context, Buffer context-relative address: buffer-craddr, Expected checksum: expected-checksum, Calculated checksum: calculated-checksum, Recalculated checksum: recalculated-checksum, Buffer type:buffer-type, Retry count: Num-retries.

sending-pin

is the client processor pin number that owned the buffer that was modified.

buffer-size

is the size of the message buffer that was modified.

buffer-context

is the buffer CBA context.

buffer-craddr

is the buffer CBA context-relative address.

expected-checksum

is the expected checksum calculated by the client CPU before the buffer was modified .

calculated-checksum

is the checksum calculated by the server CPU upon receiving the modified buffer.

recalculated-checksum

is the checksum recalculated by the client CPU after being informed by the server CPU that the expected and calculated checksums did not match.

buffer-type

is the type of message buffer that was modified (i.e., either a request control or a request data buffer).

num-retries

is the number of retries performed to attempt to recover from modifications in the message buffer.

Cause  Failed memory handling check due to a message buffer being modified while the message was in transit.

Effect  The message fails with File System error 654 (“A message or I/O operation failed due to a message or I/O buffer being modified while the operation was in progress.”)

Recovery  Contact your service provider to determine if the buffer was modified due to a possible programming error in the process represented by the sending-pin. In particular, a programming error is highly likely if a retry count (num-retries) greater than 0 is reported in the event (this signifies that the buffer was modified multiple times, thereby preventing retries from succeeding). Note that the NonStop Kernel message system automatically retries the failing message (up to a maximum retry limit) if the AUTO_RETRY_ON_ERROR_654 Kernel subsystem parameter is configured with a value of ON. You can determine the value of this parameter by issuing the SCF INFO SUBSYS $ZZKRN, DETAIL command. Conversely, a retry count of 0 signifies that the AUTO_RETRY_ON_ERROR_654 Kernel subsystem parameter is configured with a value of OFF, thereby disallowing retries when the NonStop Kernel message system detects that a message buffer was modified while the message was in transit. A possible programming error in the process represented by the sending-pinshould also be suspected even if the retry count is 0. However, if applications running in the system have a legitimate reason to modify message buffers of in-transit messages, then consider enabling automatic retries for modified message buffers. This can be accomplished by configuring the AUTO_RETRY_ON_ERROR_654 Kernel subsystem parameter with a value of ON through the SCF ALTER SUBSYS $ZZKRN, AUTO_RETRY_ON_ERROR_654 on command. For more details please refer to the SCF Reference Manual for the Kernel Subsystem.



171

Message failed due to a reply buffer being modified while the message was in transit. Sending Pin: sending-pin, Buffer size: buffer-size, Buffer context: buffer-context, Buffer context-relative address: buffer-craddr, Expected checksum: expected-checksum, Calculated checksum: calculated-checksum, Buffer type:buffer-type, Retry count: Num-retries.

sending-pin

is the server processor pin number that owned the buffer that was modified.

buffer-size

is the size of the message buffer that was modified.

buffer-context

is the buffer CBA context.

buffer-craddr

is the buffer CBA context-relative address.

expected-checksum

is the expected checksum calculated by the server CPU before the buffer was modified.

calculated-checksum

is the checksum calculated by the client CPU upon receiving the modified buffer.

buffer-type

is the type of message buffer that was modified (either a reply control or a reply data buffer).

num-retries

is the number of retries performed to attempt to recover from modifications in the message buffer.

Cause  Failed memory handling check due to a message buffer being modified while the message was in transit.

Effect  The message fails with File System error 654 (“A message or I/O operation failed due to a message or I/O buffer being modified while the operation was in progress.”)

Recovery  Contact your service provider to determine if the buffer was modified due to a possible programming error in the client process. In particular, a programming error is highly likely if a retry count (num-retries) greater than 0 is reported in the event (this signifies that the buffer was modified multiple times, thereby preventing retries from succeeding). Note that the NonStop Kernel message system automatically retries the failing message (up to a maximum retry limit) for an inflight reply buffer. A possible programming error in the client process must also be suspected even if the retry count is 0. For more information, see the SCF Reference Manual for the Kernel Subsystem.