Operator Messages Manual

Chapter 70 OSIAPLMG (OSI/Application Manager) Messages

The messages in this chapter are sent by the Tandem Open Systems Interconnection (OSI)/Application Manager (APLMG) subsystem, which works in conjunction with the Tandem OSI/File Transfer, Access, and Management (FTAM) subsystem to form Tandem FTAM. The subsystem ID displayed for these messages includes OSIAPLMG as the subsystem name.

NOTE: Negative-numbered messages are common to most subsystems. If you receive a negative-numbered message that is not described in this chapter, see Chapter 15.

Restoring the Tandem FTAM Environment

The recovery action for several of the APLMG operator messages involves restoring the Tandem FTAM environment when the APLMG process has stopped. To do this, use the following procedure.

  • If your APLMG process manages only responder processes, simply restart and reconfigure the APLMG process.

  • If your APLMG process manages one or more initiator processes, follow these steps:

    1. Restart and reconfigure the APLMG process.

    2. Restarting the APLMG process might cause discrepancies in information tracked by the Tandem FTAM initiator and APLMG processes. To prevent this, you must eliminate any active subdevices of the FTAM initiator processes associated with this APLMG process.

      To determine whether any of the Tandem FTAM initiator processes associated with the APLMG process have active subdevices, issue the Subsystem Control Facility (SCF) NAMES SU command to all associated initiator processes. If the NAMES SU command displays no subdevice names for any of the initiator processes, proceed with normal processing; in this case, there is no further recovery action.

    3. If the NAMES SU command displays subdevice names, you need to eliminate the subdevices either by allowing activities to terminate normally or by aborting the initiator process.

      • To allow normal termination of activities, follow these steps:

        1. Issue SCF STOPOPENS commands to all FTAM initiator processes associated with this APLMG process to prevent any additional open requests.

        2. Continue to issue the NAMES SU command periodically until the command displays no subdevice names for any of the associated FTAM initiator processes. At this point, activities have terminated normally.

        3. You can then issue the SCF ALLOWOPENS command to the FTAM initiator processes associated with this APLMG process and resume normal processing.

      • To abort the process, issue the ABORT command to all the FTAM initiator processes associated with this APLMG process and follow these steps:

        1. Examine files that were being written to see whether they are usable or salvageable.

        2. Delete unusable files.

        3. Perform recovery of files, if possible and as appropriate.

        4. Restart the FTAM initiator processes and rerun applications as required.

      • For more information on aborting and restarting initiator processes, see the Tandem OSI/FTAM Configuration and Management Manual. For details on the SCF NAMES, STOPOPENS, ALLOWOPENS, and ABORT commands, see the SCF Reference Manual for Tandem FTAM and APLMG.

The following messages are sent by the APLMG subsystem.



-1

process-name CPU changed from old-cpu to cpu @@@Takeover.

process-name

is the name of the APLMG process reporting the error.

old-cpu

identifies the processor the process was using before the error that this message is reporting.

cpu

identifies the processor of the processor that the process is currently using.

Cause  The backup APLMG process has taken over processing because the primary APLMG process has stopped or abended or because the processor of the primary APLMG process has failed.

Effect  The backup process takes over for the primary process, with the following possible results:

  • If the processor in which the primary process failed is available, the new primary process starts a backup process.

  • If the switch occurred because of a processor failure, the process runs without a backup until you perform the steps under “Recovery.”

Recovery  If it is important to have a backup APLMG process running at all times, either reload the old processor or stop and restart the APLMG process, designating a new backup processor. If you need to restart the APLMG process, follow the procedure given at the beginning of this chapter for restoring the Tandem FTAM environment.



-3

process-name State changed from old-objstate to objstate @@@{ Operator Request. }@@@{ Unknown. }

process-name

is the name of the APLMG process reporting the error.

old-objstate

is the state of the process object before the error that this message is reporting.

objstate

is the current state of the process object.

Cause  The APLMG process has changed from one PROCESS object state to another. If the text Operator Request is displayed, the change was the result of a Subsystem Control Facility (SCF) command issued by an operator. If the text Unknown is displayed, the cause of the state change is unknown.

Effect  The state of the process is changed as indicated. See the Tandem OSI/FTAM Configuration and Management Manual for an explanation of process-state transitions.

Recovery  In most cases, state transitions do not require However, if for any reason your APLMG process is no longer running, follow the procedure given at the beginning of this chapter to restore the Tandem FTAM environment.



1

process-name Internal Error: File: object-file,@@@Timestamp: bind-timestamp, @@@Procedure: entry-point-label, P=%p-register E=%e-register.

process-name

is the name of the APLMG process reporting the error.

object-file

identifies the file containing the object code for the process.

bind-timestamp

is the time at which the object file was bound.

entry-point-label

is the procedure entry-point label where the error occurred.

p-register

identifies the contents of the P (program counter) register when the error was detected. For more information on the P register, see the Guardian Programmer’s Guide.

e-register

identifies the contents of the E (environment) register when the error was detected. For more information on the P register, see the Guardian Programmer’s Guide.

Cause  The APLMG process has detected an unrecoverable internal error.

Effect  The process terminates abnormally, and the backup process takes over as the new primary process and starts a new backup process.

Recovery  Contact the Global NonStop Solution Center (GNSC) and provide all relevant information as follows:

  • Descriptions of the problem and accompanying symptoms

  • Details from the message or messages generated

  • Supporting documentation such as Event Management Service (EMS) logs, trace files, and a processor dump, if applicable

If your local operating procedures require contacting the Global Mission Critical Solution Center (GMCSC), supply your system number and the numbers and versions of all related products as well.



2

process-name Trap #trap-number: File: object-file,@@@Timestamp: bind-timestamp,@@@Procedure: entry-point-label, P=%p-register E=%e-register.

process-name

is the name of the APLMG process reporting the error.

trap-number

is the number of the trap encountered.

object-file

identifies the file containing the object code for the process.

bind-timestamp

is the time at which the object file was bound.

entry-point-label

is the procedure entry-point label where the error occurred.

p-register

identifies the contents of the P (program counter) register when the error was detected. For more information on the P register, see the Guardian Programmer’s Guide.

e-register

identifies the contents of the E (environment) register when the error was detected. For more information on the P register, see the Guardian Programmer’s Guide.

Cause  The primary APLMG process has detected a trap and issued this message.

Effect  The primary process terminates abnormally, and the backup process takes over as the new primary process. The new primary process starts a backup process.

Recovery  Contact the the Global NonStop Solution Center (GNSC) and provide all relevant information as follows:

  • Descriptions of the problem and accompanying symptoms

  • Details from the message or messages generated

  • Supporting documentation such as Event Management Service (EMS) logs, trace files, and a processor dump, if applicable

If your local operating procedures require contacting the Global Mission Critical Solution Center (GMCSC), supply your system number and the numbers and versions of all related products as well.



3

process-name Backup Up, CPU cpu.

process-name

is the name of the APLMG process reporting the error.

cpu

identifies the processor that the process is currently using.

Cause  The primary APLMG process has started the backup process successfully.

Effect  Processing continues normally.

Recovery  Informational message only; no corrective action is needed. For more information on backup processes, see the subsection on fault tolerance in the Tandem OSI/FTAM Configuration and Management Manual.



4

process-name Backup Lost, CPU old-cpu: Reason: @@@{ Backup Stopped. }@@@{ Backup Abended. }@@@{ Backup CPU Down. }@@@{ Unknown. }

process-name

is the name of the APLMG process reporting the error.

old-cpu

identifies the processor the process was using before the error that this message is reporting.

NOTE: This message represents a critical event because it warrants immediate attention if it is important to have a backup APLMG process running.

Cause  The primary APLMG process has detected that its backup process is not present for one of the following reasons:

  • The backup process has stopped or terminated abnormally.

  • The backup processor has gone down.

  • The cause is unknown.

Effect  The effect depends on the cause, as follows:

  • If the backup process has stopped or abended and the backup processor is available, the primary process starts a new backup process.

  • If the backup processor has gone down, the process runs without a backup until you perform the steps under “Recovery.”

  • If the cause is unknown, the effect on the subsystem is also unknown.

Recovery  If you wish to operate with a backup processor, either reload the old processor or stop and restart the APLMG process, designating a new backup processor. If you need to restart the APLMG process, follow the procedure given at the beginning of this chapter for restoring the Tandem FTAM environment.

For more information on backup processes, see the subsection on fault tolerance in the Tandem OSI/FTAM Configuration and Management Manual.



5

process-name Unable to Create Backup Process,@@@Error=%newprocess-error, File=object-file,@@@Swapvol=swapvol, Priority=priority, CPU=cpu.

process-name

is the name of the APLMG process reporting the error.

%newprocess-error

is an octal value that consists of the NEWPROCESS error number in bits 0 through 7 and, in some cases, a file-system error number or other code in bits 8 through 15. The most common causes of this error are as follows:

  • Backup processor is down (newprocess-error = %5000, or NEWPROCESS error 10).

  • Swap volume does not contain enough space for the swap file of the backup process (newprocess-error = %2453, or NEWPROCESS error 5 with file‑system error 43).

object-file

identifies the file containing the object code for the process.

swapvol

is the current swap volume associated with the indicated process name.

priority

is the execution priority specified in the NEWPROCESS call.

cpu

identifies the processor that the process is currently using.

NOTE: This message represents a critical event because it warrants immediate attention if it is important to have a backup APLMG process running.

Cause  The primary APLMG process was unable to create its backup process because it received a NEWPROCESS error.

Effect  The inability to create a backup has one of the following effects on the process:

  • If the backup processor is down, the primary process tries again to create the backup process after you perform the steps under “Recovery.”

  • If the backup processor is available but another error (such as insufficient space on the swap volume) has occurred, the effect depends upon the value of the newprocess-error code. In most cases, the backup process is not created until you correct the problem.

Recovery  If the backup processor is down (NEWPROCESS error %005000) and it is important to have a backup process running, do one of the following:

  • Reload the backup processor before bringing up the backup APLMG process.

  • Stop and restart the APLMG process, specifying a new backup processor. If you need to restart the APLMG process, follow the procedure given at the beginning of this chapter for restoring the Tandem FTAM environment.

If the problem is insufficient space on the specified swap volume, do one of the following:

  • Designate another disk volume with more available space as the swap volume.

  • Create more free space on the specified disk volume as described under file-system error 43 in the Guardian Procedure Errors and Messages Manual.

For any other NEWPROCESS error, see Appendix C, for information about the specified error. For more detailed information including recovery actions, see the Guardian Procedure Errors and Messages Manual.

If a file-system error is specified, see Appendix B, for a definition of the specified error. For more detailed information including recovery actions, see the Guardian Procedure Errors and Messages Manual.

Contact the Global NonStop Solution Center (GNSC) and provide all relevant information as follows:

  • Descriptions of the problem and accompanying symptoms

  • Details from the message or messages generated

  • Supporting documentation such as Event Management Service (EMS) logs, trace files, and a processor dump, if applicable

If your local operating procedures require contacting the Global Mission Critical Solution Center (GMCSC), supply your system number and the numbers and versions of all related products as well.



6

process-name Checkpoint Failure, Error=error.

process-name

is the name of the APLMG process reporting the error.

error

identifies a file-system error.

NOTE: For some other values of error, this message may indicate a system problem or a problem with the software. For this reason, this message represents a critical event.

Cause  A checkpoint input/output (I/O) message sent by the primary APLMG process to its backup process has returned a file-system error.

If error is 201, the backup process has gone down and the primary process has tried to send a checkpoint message to the backup before it detected that the backup was down.

If error is in the range 30 through 34, a system resource problem has occurred. For instance, if error is 30, all the link control blocks (LCBs) in the primary processor were in use.

Some other values of error may indicate a problem with the software.

Effect  If error is 201, the primary APLMG process creates a new backup process and retries the checkpoint. In this case, the error is self-correcting.

Recovery  Recovery action depends on the file-system error. See Appendix B, for a definition of the specified error. For more detailed information including recovery actions, see the Guardian Procedure Errors and Messages Manual.

If error is 201, no corrective action is needed.

If error is in the range 30 through 34, perform the corrective action indicated in the file-system errors chapter of the Guardian Procedure Errors and Messages Manual.

If the corrective action indicated in the Guardian Procedure Errors and Messages Manual does not solve the problem, or if some other file-system error has occurred, contact ythe Global NonStop Solution Center (GNSC) and provide all relevant information as follows:

  • Descriptions of the problem and accompanying symptoms

  • Details from the message or messages generated

  • Supporting documentation such as Event Management Service (EMS) logs, trace files, and a processor dump, if applicable

If your local operating procedures require contacting the Global Mission Critical Solution Center (GMCSC), supply your system number and the numbers and versions of all related products as well.



7

APLMG-name Process Creation Error,@@@Error=%ssid error,@@@File=filename, Swapvol=swapvol, Priority=priority, CPU=cpu,@@@NAME=process-name.

APLMG-name

is the name of the APLMG process reporting the error.

ssid

identifies the subsystem, in octal.

error

is the contents of the error parameter returned from the NEWPROCESS call, in octal.

For a detailed description of values in this field, see the NEWPROCESS procedure call in the Guardian Procedure Calls Reference Manual.

filename

is the name of the object file specified in the NEWPROCESS command.

swapvol

is the swap volume specified for the process being created.

priority

is the priority specified for the process being created.

cpu

is the processor number specified for the process being created.

process-name

is the name specified for the process that is being created.

Cause  The APLMG process was asked to create an initiator or responder process, but was unable to do so.

Effect  The process is not created.

Recovery  To determine the cause of the error, convert the octal error number into binary and consult the error output description of the NEWPROCESS procedure call in the Guardian Procedure Calls Reference Manual. In most cases, you simply need to repeat the NEWPROCESS call with corrected parameters.



99

process-name: The alternate collector, alt-collector-name, is not functional. All events will be sent to the primary collector, $0.

process-name

identifies the initiator or responder process reporting this event.

alt-collector-name

identifies the name of the alternate collector.

Cause  The specified alternate collector has been defined, but is not running. The most common causes of this error are as follows:

  • The alternate collector was not started before adding the FTC profile

  • The alternate collector failed to start processing for an unknown reason.

  • The ROTATEFILES attribute was set to OFF, and the value specified for MAXFILES has been reached.

This message is generated only after an APLMG process has performed the following steps:

  1. Attempted to send a message to the alternate collector and received an error.

  2. Closed the alternate collector and then reopened it.

  3. Tried again to send the message and received an error.

Effect  The error message in progress, this message (message 99), and all subsequent messages are sent to the primary collector ($0).

This message is generated once by the APLMG process and once by each initiator and responder process as each process encounters a problem with the alternate collector.

Recovery  If you wish to operate with an alternate collector, you must first determine whether the alternate collector has failed to start processing or whether you need to change the configuration of the alternate collector.

If the alternate collector was not started before adding the FTC profile, you must delete the FTC profile, start the alternate collector, and then add the FTC profile.

For more information on retrieving information about the alternate collector, see the subsection on retrieving configuration information in the Tandem OSI/FTAM Configuration and Management Manual. For more information on alternate collectors in general, see the EMS Manual.