Operator Messages Manual
Chapter 110 SYSH (Syshealth) Messages
The messages in this chapter are sent to $0 by the Syshealth
subsystem. You can view operator messages using either the Viewpoint
application or the Syshealth event viewing screen. To view messages
on the event viewing screen, you must first select the $0 message
log using the File menu Use Log command. Syshealth operator messages
are listed as subsystem SYSH. | | | | | NOTE: Negative-numbered messages are common to most subsystems. If
you receive a negative-numbered message that is not described in this
chapter, see Chapter 15. | | | | |
1000 module-title (process-name) - Module launched [with undefined externals]. | module-title | is the name of the Syshealth module that was launched.
It is derived from the unique module identification number. This ID
is globally defined for all modules in Syshealth. | process-name | is the process name or the processor number and process
identification number (PIN) of the launched process. |
Cause The Persistence Monitor is launching a Syshealth module. Effect The Syshealth module is started. Recovery Informational message only; no corrective action is needed. |
1001 module-title (process-name) - Too many module relaunch attempts. | module-title | is the name of the Syshealth module that failed to
start. It is derived from the unique module identification number.
This ID is globally defined for all modules in Syshealth. | process-name | is the process name of the failed module. It is the
process name or the processor number and process identification number
(PIN) of the launched process. |
Cause The monitored module has failed to start. Effect The module is not running. Recovery Analyze the previous Syshealth MODULE-FAILED error (#1002) and
the error buffer to determine the cause of the problem. Correct the
problem and use the Syshealth Management screen to start the module. |
1002 module-title (process-name) - Failed due to { CPU failure | ABEND } | module-title | is derived from the unique module identification number.
This ID is globally defined for all modules in Syshealth. | process-name | is derived from the process name, processor and process
identification number (PIN) of the launched process. |
Cause The monitored module failed. Effect The module is not running. Recovery Informational message only; no corrective action is needed.
The Persistence Monitor tries to relaunch the module. |
1003 module-title (process-name) - Internal Error. Error Information: | module-title | is the name of the module that had an error. It is
derived from the unique module identification number. This ID is globally
defined for all modules in Syshealth. | process-name | is the process name of the module in error. It is
the process name or the processor number and process identification
number (PIN) of the launched process. |
Cause The Persistence Monitor failed because of an internal error. Effect The Persistence Monitor is not running. Recovery Analyze the error information in this error message to determine
the cause of the problem. Correct the problem and use the Syshealth
Management screen to start the Persistence Monitor. |
2000 Unable to access Syshealth Command Database file. File error file-error while attempting proc-failing. | file | is the name of the Syshealth command database file
on which the error was detected. | file-error | is the file-system error associated with the failure. | proc-failing | is the file-system procedure associated with the failure. |
Cause When TMDSAUTO is run, it first tries to open the Syshealth command
database. Then it performs a read on the database’s header record.
If the open or read fails, this error is generated. Effect The Syshealth monitoring processes (Persistence Monitor, System
Health Monitor, and Notification system) and the Syshealth user interface
depend on the command database to implement command security. Syshealth
runs, but it uses its default command security. Recovery If the file was not found, it either was not installed or has
been purged. Reinstall the file using Install. If the error was other
than “File Not Found,” see Appendix B, for a definition of the specified error.
For more detailed information including recovery actions, see the Guardian Procedure Error and Messages Manual. |
2001 Unable to access Syshealth Management Database file during init-phase. File error file-error while attempting proc-failing. Syshealth was not started. | file | is the name of the Syshealth management database file
on which the error was detected. | init-phase | indicates what phase of the Syshealth initialization
failed. If message 2001 is generated, this parameter indicates how
the management database was being used when the error occurred. | file-error | is the file-system error associated with the failure. | proc-failing | is the file-system procedure associated with the failure. |
Cause When TMDSAUTO is run, the management database is used to bring
up Syshealth. First the database file is opened. Then the file description
records are read and used in checking the accessibility of all Syshealth
files. The Persistence Monitor process startup information is read
and used to start the monitor. Finally, other processes’ startup
information is read and sent to the Persistence Monitor to start them. If any of these steps fails due to an error when accessing the
management database, message 2001 is generated. Effect Syshealth depends on the management database being accessible.
If it is not accessible, Syshealth is not started, although TMDSAUTO
starts the Tandem Maintenance and Diagnostic Subsystem (TMDS) collector
($ZLOG). Recovery See Appendix B,
for a definition of the specified error. For more detailed information
including recovery actions, see the Guardian Procedure
Error and Messages Manual. |
2002 Unable to access Syshealth Configuration Database file. File error file-error while attempting proc-failing. | file | is the name of the Syshealth configuration database
file on which the error was detected. | file-error | is the file-system error associated with the failure. | proc-failing | is the file-system procedure associated with the failure. |
Cause When TMDSAUTO is run, it attempts to open the Syshealth configuration
database and perform a read on the database file’s header record.
If the database file is not found, TMDSAUTO attempts to create the
file. If the create, open, or read fails, then message 2002 is generated. Effect The Syshealth user interface depends on the configuration database
being accessible. If it is not, the Syshealth user interface cannot
alter any of the configurable options of Syshealth. Recovery See Appendix B,
for a definition of the specified error. For more detailed information
including recovery actions, see the Guardian Procedure
Error and Messages Manual. |
2003 Syshealth not properly installed. n Syshealth files missing | n | is the number of Syshealth files that were not found
on the system. |
Cause When TMDSAUTO is run, it checks that all of the appropriate
Syshealth files are present. It does this by reading the file-verification
records in the management database, then by opening each file that
should be on the current system. If any files are missing, this error
is generated. If the open fails for any other reason, this error also
is generated. | | | | | NOTE: This error message is generated only if one of the Syshealth
monitor files (Health Monitor, Persistence Monitor, or Notification
module) is inaccessible. | | | | |
Effect If a monitor file is missing (that is, one critical to the operation
of Syshealth automatic monitoring), then Syshealth does not run. The
effects of other files missing depend on the file: some screens might
not be available, help might be missing, and so on. Note that even
if Syshealth does not run, TMDSAUTO starts the Tandem Maintenance
and Diagnostic Subsystem (TMDS) message collector ($ZLOG). Recovery Examine the error using the TMDS FIND command (or, if sufficient
portions of Syshealth are available, using the Syshealth Event Viewing
screen) to see exactly which files are missing. The missing files
either were not installed or have been purged. Reinstall the files
using the INSTALL program. |
2004 Syshealth not properly installed. n files were inaccessible. | Cause When TMDSAUTO is run, it checks that all of the appropriate
Syshealth files are present. It does this by reading the file-verification
records in the management database, then by opening each file that
should be on the current system. If the open fails for any reason
other than “File Not Found,” then message 2004 is generated. Effect Syshealth may not be installed or secured correctly. If Syshealth
cannot access a monitor file (a file critical to the operation of
Syshealth automatic monitoring), it does not run. If other files are
inaccessible, some Syshealth functions may be lost, depending on which
files are not available. Note that even if Syshealth does not run,
TMDSAUTO starts the Tandem Maintenance and Diagnostic Subsystem (TMDS)
collector ($ZLOG). Recovery See Appendix B,
for a definition of the specified error. For more detailed information
including recovery actions, see the Guardian Procedure
Error and Messages Manual. |
2005 Syshealth Persistence Monitor could not be started
due to [ No definition of the process in the Syshealth Management
Database.] [ Newprocess error newprocess-error, with program file filename. ] [ File
error file-error when performing proc-failing. ] [ SPI error spi-error during proc-failing. ] [ TMDSAUTO internal error during proc-failing. ] Syshealth was
not started. | newprocess-error | is the error associated with the failure of a NEWPROCESS
procedure call. | filename | is the name of the program file that experienced the
NEWPROCESS error. | file-error | is the file-system error associated with the failure. | proc-failing | is the procedure associated with the failure. | spi-error | is the Subsystem Programmatic Interface (SPI) error
code associated with the failure. |
Cause When TMDSAUTO is run, it performs initial checks and then starts
the Syshealth Persistence Monitor. If any of the functions (memory
allocation, NEWPROCESS, OPEN, or WRITEREAD) fail, then Syshealth does
not start and message 2005 is generated. Also, if the Syshealth management
database was improperly built or corrupted, the Persistence Monitor
definition may not be accessible. Effect The Syshealth Persistence Monitor is a process pair that starts
the remaining Syshealth processes (which are not process pairs) and
monitors them to ensure that they remain running. If the Persistence
Monitor cannot be started, the remaining Syshealth processes are not
started and Syshealth automatic fault monitoring and reporting is
not operational, although TMDSAUTO starts the Tandem Maintenance and
Diagnostic Subsystem (TMDS) message collector ($ZLOG) and TMDS fault
analysis ($ZMOM). Recovery If a file-system error is specified, see Appendix B, for a definition
of the error. Check the Guardian Procedure Errors and Messages
Manual.for a description of the file-system and NEWPROCESS
errors and their recovery actions. If the problem resulted from an internal error, contact the
Global NonStop Solution Center (GNSC) and provide all relevant information
as follows: Descriptions of the problem and accompanying symptoms Details from the message or messages generated Supporting documentation such as Event Management
Service (EMS) logs, trace files, and a processor dump, if applicable
If your local operating procedures require contacting the Global
Mission Critical Solution Center (GMCSC), supply your system number
and the numbers and versions of all related products as well. |
2006 Syshealth Process Startup Error. target-process Not Started. { File Error | SPI Error
} error when attempting proc‑failing
while communicating with the Persistence Monitor. | target-process | is the name of a Syshealth process that was not started
due to the failure to communicate with the Persistence Monitor. | error | is the file-system error or the Subsystem Programmatic
Interface (SPI) error code associated with the failure, if any. | proc-failing | is the procedure associated with the failure. |
Cause This message reports a failure when an SPI ADD MODULE or SPI
LAUNCH command is sent to the Syshealth Persistence Monitor. The failure
can be either an SPI error or a file-system OPEN or WRITEREAD error.
Each Syshealth process is added to and then launched by the Persistence
Monitor when starting Syshealth. If any of the SPI commands fail,
the startup terminates. Effect If any portion of Syshealth fails to start, all of Syshealth
is shut down. In this case, Syshealth is not running. Nevertheless,
TMDSAUTO starts the Tandem Maintenance and Diagnostic Subsystem (TMDS)
message collector ($ZLOG). Recovery See Appendix B,
for a definition of the specified error. For more detailed information
including recovery actions, see the Guardian Procedure
Error and Messages Manual. If the error is an SPI error,
contact your service provider. |
2007 Syshealth Startup Verify Error. target-process failed verification. [{SPI error | File
Error} error when attempting proc‑failing.] [Verify code = verify-code | target-process | is the name of a Syshealth process that was not started
due to the failure to communicate with the Persistence Monitor. | error | is the file-system error or the Subsystem Programmatic
Interface (SPI) error code associated with the failure, if any. | proc-failing | is the procedure associated with the failure. | verify-code | is the error returned by the Syshealth target process,
describing its internal verification failure. Verify codes are defined
under “Recovery.” |
Cause Syshealth processes are started one at a time and are verified
(using an SPI VERIFY command) prior to starting the next process.
If the verify fails because the process is down, does not respond
within a 60-second time-out, or returns a verify code indicating that
it is not functioning correctly, then message 2007 is generated. Effect If Syshealth processes are down or did not verify, Syshealth
is not started, although TMDSAUTO starts the Tandem Maintenance and
Diagnostic Subsystem (TMDS) message collector ($ZLOG). Recovery If a Syshealth process is down, check to see whether a NEWPROCESS
error occurred, and examine the NEWPROCESS error codes in the Guardian Procedure Errors and Messages Manual to determine
the fault and corrective action. If the process failed its verification, check the codes below
for the fault and corrective actions: If the process was restarted the maximum number of times and
failed, check to see whether there is some reason (such as processors
failing) that the process stopped. If not, contact the Global NonStop
Solution Center (GNSC) and provide all relevant information as follows: Descriptions of the problem and accompanying symptoms Details from the message or messages generated Supporting documentation such as Event Management
Service (EMS) logs, trace files, and a processor dump, if applicable
If your local operating procedures require contacting the Global
Mission Critical Solution Center (GMCSC), supply your system number
and the numbers and versions of all related products as well. |
2008 Syshealth started OK. | Cause When TMDSAUTO is run, it performs the initial verification of
Syshealth to check that: The command and management databases are accessible. All Syshealth files are present and accessible.
Both Tandem Maintenance and Diagnostic Subsystem (TMDS) and
the Syshealth user interface can start the Persistence Monitor. Then
the remainder of the Syshealth processes are registered with the Persistence
Monitor using Subsystem Programmatic Interface (SPI) ADD MODULE commands
and are launched by the Persistence Monitor. TMDSAUTO (or the user
interface) sends SPI VERIFY commands to all Syshealth processes. If everything is working and there were no errors, message 2008
is generated. Effect Syshealth is operational on the system. Recovery Informational message only; no corrective action is needed.
This message indicates that Syshealth was started correctly. |
2009 A Syshealth Shutdown command was issued for target-system. | target-system | is the name of the system to which the Syshealth user
interface commands are directed. |
Cause The Syshealth user interface provides commands to shut down
and start up the Syshealth processes. Syshealth, $ZLOG, and $ZMOM
have been shut down. Effect Syshealth has been shut down. If the Tandem Maintenance and
Diagnostic Subsystem (TMDS) alternate collector was shut down, no
TMDS errors are logged on the system, so information about hardware
faults is lost. With Syshealth shut down, no hardware fault coverage
is provided and there are no dial-outs caused by system faults. Recovery None, if the shutdown was anticipated. Otherwise, the operators
should check to ensure that Syshealth has not been left inoperative,
which would compromise system fault detection and reporting. |
3000 Coldload report of report-severity
severity for system‑serial‑number system. resource-name: specific-problem. There are related-alarms related alarms. | report-severity | defines the perceived severity of the system-load
report. The levels of severity correspond to the Open Systems Interconnection
(OSI) perceived severity definitions for managed objects. This value
is determined by using the highest (most critical) perceived severity
of all the alarms contained in the system-load report. This variable
can have one of the following values: CRITICAL-SEVERITY | indicates that a service-affecting condition has occurred
and an immediate corrective action is needed. Such a severity can
be reported, for example, when a resource becomes totally out of service
and its capability must be restored. | MAJOR-SEVERITY | indicates that a service-affecting condition has developed
and urgent corrective action is required. Such a severity can be reported,
for example, when there is the potential for a single point of failure
(fault tolerance lost) or severe degradation in resource capability,
and full capability must be restored. | MINOR-SEVERITY | indicates that a non-service-affecting fault condition
exists and that corrective action should be taken in order to prevent
a more serious (for example, service-affecting) fault. Such a severity
can be reported, for example, when the dedicated alarm condition is
not currently degrading the capacity of the resource. | WARNING-SEVERITY | indicates the detection of a potential or impending
service-affecting fault before any significant effects have been felt.
Action should be taken to further diagnose (if necessary) and correct
the problem in order to prevent it from becoming a more serious service-affecting
fault. |
| report-severity | is the highest (most critical) perceived severity
of all the alarms contained in the system-load report. The values
for this variable are defined for message 3001. | system-serial-number | is the serial number of the NonStop Kernel from which
the system-load report originated. | resource-name | is the name of the system resource involved in the
most critical alarm in the system‑load report. | specific-problem | is the specific problem of the most critical alarm
in the system-load report. | related-alarms | is a count of the number of alarms in the system-load
report. |
Cause The NonStop Kernel identified by system-serial-number has completed a system load. The Syshealth Health Monitor has generated
a summary of the outstanding alarms after the system load finished. Effect If the system-load report contains alarms, then one or more
system resources have been compromised. The system-load report contains
a detailed status for each problem resource. Recovery On a remote system, use Syshealth to examine this system-load
report on the Syshealth Main screen and the Alarm Viewing screen.
If corrective action is needed, dial in to the system that generated
the report. On a local system or while dialed in remotely, use Syshealth
and other Tandem Maintenance and Diagnostic Subsystem (TMDS) diagnostic
tools to locate and repair the problem. |
3001 Problem report of report-severity severity for system‑serial-number system.
resource-name:specific-problem. There are related-alarms related alarms. | system-serial-number | is the serial number of the NonStop Kernel from which
the problem report originated. | report-severity | defines the perceived severity of the problem report.
The levels of severity correspond to the Open Systems Interconnection
(OSI) perceived severity definitions for managed objects. This value
is determined by using the highest (most critical) perceived severity
of all the alarms contained in the problem report. The
levels of severity can be one of the following values: CRITICAL-SEVERITY | indicates that a service-affecting condition has occurred
and an immediate corrective action is required. Such a severity can
be reported, for example, when a resource becomes totally out of service
and its capability must be restored. | MAJOR-SEVERITY | indicates that a service-affecting condition has developed
and urgent corrective action is required. Such a severity can be reported,
for example, when there is the potential for a single point of failure
(fault tolerance lost), or severe degradation in resource capability
and full capability must be restored. | MINOR-SEVERITY | indicates that a non-service-affecting fault condition
exists and that corrective action should be taken in order to prevent
a more serious (for example, service-affecting) fault. Such a severity
can be reported, for example, when the dedicated alarm condition is
not currently degrading the capacity of the resource. | WARNING-SEVERITY | indicates the detection of a potential or impending
service-affecting fault, before any significant effects have been
felt. Action should be taken to further diagnose (if necessary) and
correct the problem in order to prevent it from becoming a more serious
service-affecting fault. |
| system-serial-number | is the serial number of the NonStop Kernel from which
the problem report originated. | resource-name | is the name of the system resource involved in the
most critical alarm in the problem report. | specific-problem | is the specific problem of the most critical alarm
in the problem report. | related-alarms | is a count of the number of alarms in the problem
report. |
Cause The Health Monitor on the indicated system has detected faults
or error conditions in one or more system resources. The severity
field defines the urgency of the problem report. Effect System resources have been compromised. The problem report contains
a detailed status for each problem resource. Recovery On a remote system, use Syshealth to decode this problem report
on the Syshealth Main screen and the Alarm Viewing screen. If corrective
action is needed, dial in to the system that generated the report.
On a local system or while dialed in remotely, use Syshealth and other
Tandem Maintenance and Diagnostic Subsystem (TMDS) diagnostic tools
to locate and repair the problem. |
3002: System summary report for system-serial-number system. | system-serial-number | is the serial number of the NonStop Kernel from which
the system report originated. |
Cause The Syshealth Health Monitor is reporting a summary of system
activity since the last system summary report. Effect None Recovery Informational message only; no corrective action is needed. |
3100 System Health Monitor Internal Error. Error
Level: error‑level Error Type: error-type Error Code: error-code Error Tag: error-tag | error-level | is the severity level of the error. | error-type | is the type of internal error that occurred; for example,
Event Dispatcher Error. | error-code | further defines the cause of the error. This value
is different for each error type. | error-tag | is the location in code at which the internal error
occurred. The value has the format FILENAME_LINE-NUMBER. For example,
if an error occurred in source file DISPC at line 25, the value for error-tag is DISPC_25. |
Cause The Syshealth Health Monitor encountered an unexpected internal
error during execution. Effect The Syshealth Health Monitor abends. Recovery The Persistence Monitor restarts the Health Monitor a certain
number of times in case the condition is due to an intermittent problem.
This message should be reported to Tandem Development. |
3101 System Health Monitor External Error. Error
Level: error‑level Error Type: error-type Error Code: error-code Error Tag: error-tag | error-level | is the severity level of the error. | error-type | is the type of internal error that occurred. The following
types of internal errors are defined: Subsystem Programmatic
Interface (SPI) error. Operating system procedure call failed. Library
function failed. Variable overflowed. Event Management Service (EMS)
Distributor failed. Dynamic System Configuration (DSC) NEWPROCESS
call failed. DSC SPI open failed. DSC communication error occurred.
DSC returned unexpected error. Subsystem Control Point (SCP) NEWPROCESS
call failed. SCP Subsystem Programmatic Interface (SPI) open failed.
SCP communication error occurred. SCP returned unexpected error. Memory
error occurred. String list error occurred. Task Queue error occurred.
Event missing required tokens Required resource object missing Open
of $0 collector failed. Open of Tandem Maintenance and Diagnostic
System (TMDS) collector failed. Unable to load required filter file.
Alarm missing required tokens. Error accessing help file. Alarm database
operation error. Error accessing configuration file. $RECEIVE input/output
(I/O) error. Scripting error. Program abnormally terminated. Event
Dispatcher error. Program terminated because of too many takeovers. | error-code | further defines the cause of the error. This value
is different for each error type. | error-tag | is the location in code at which the internal error
occurred. The variable has the format FILENAME_LINE-NUMBER. For example,
if an error occurred in source file DISPC at line 25, the value for error-tag is DISPC_25. |
Cause The Syshealth Health Monitor encountered an external error during
execution. Effect The Syshealth Health Monitor stops. | | | | | NOTE: The Persistence Monitor does not restart
the Health Monitor for an external error condition, because the Health
Monitor cannot continue to execute until the external problem is resolved. | | | | |
Recovery Fix the external error condition and then start the Health Monitor
from the Syshealth Management screen. |
4001 Authorization to Deliver Remote Notification
Needed. Notification ID = action-id | action-id | is the ID (a 16-bit integer) of the notification that
needs to be authorized or denied authorization. This value is the
same as the notification identifier used in notification Subsystem
Programmatic Interface (SPI) commands. |
Cause An occurrence on the indicated system has triggered a remote
notification. What occurrences result in a notification depends on
the configuration of the notification module on that system. Typically,
notification triggers include problem reports (arising from system
resources that have encountered a problem), system-load summary reports,
or periodic system reports. Effect A system resource on the specified system may be unavailable. Recovery Run Syshealth on the indicated system, and examine unauthorized
notifications through the notification screen. Either authorize or
deny authorization to pending notifications. Authorized notifications
are forwarded through the configured notification ports (typically
to the NonStop Support Center). |
4002 Authorization to Deliver Remote Notification disposition. Notification ID = action-id. | disposition | indicates whether authorization has been granted.
It can have the value GRANTED or DENIED. | action-id | is the ID (a 16-bit integer) of the notification that
was authorized or denied authorization. This value is the same as
the notification identifier used in notification Subsystem Programmatic
Interface (SPI) commands. |
Cause A pending notification has been authorized for delivery. Effect The indicated notification is delivered to all appropriate destinations. Recovery Informational message only; no corrective action is needed. |
4003 Dial-out test performed | Cause A Syshealth user has issued a Test Dial-out command. Effect This error should cause a dial-out to occur, unless dial-out
is not enabled. Use the Syshealth Remote Notification screen to examine
the test results. Recovery Informational message only; no corrective action is needed. |
4004 Remote Notification Port Failing. error-text | error-text | is the text of the last port error message. |
Cause A notification path is failing. Examine the error-text portion of the message to determine the exact cause. Effect Remote notification for system resource problems is not occurring
on the specified system. Recovery Examine the error message and take corrective action. If no
corrective action is possible, disable the notification port through
the Syshealth user interface. Notify the Global NonStop Solution Center
(GNSC) of system failures manually until the port is repaired. |
4005 REMOTE NOTIFICATION PROCESS notif-process ENCOUNTERED EXCEPTION. CODE = module-error:text | notif-process | contains the name of the notification port process
that failed. | module-error | is the fatal error that was encountered by the notification
module. | text | is the text of the last notification module error
message. |
Cause The notification module encountered an exception that it recovered
from. Typical exceptions include the inability of the module to find
or read the notification database, a filter file, or other file. Effect The effect of this error depends on the nature of the exception.
If the notification module does not find the database or cannot read
it, the notification module creates a new one. Filter
files that cannot be found or read are ignored, leading to an increased
number of dial-outs. In any case, the notification function continues. Recovery Recovery action depends on the type of exception. |
4006 REMOTE NOTIFICATION PROCESS notif-process FAILED. ERROR = module-error:text | notif-process | contains the name of the notification port process
that failed. | module-error | is the fatal error that was encountered by the notification
module. | text | is the text of the last notification module error
message. |
Cause The notification module encountered an unrecoverable internal
error. Effect Remote notification is unavailable for a brief period, until
the Syshealth Persistence Monitor restarts it. Recovery Informational message only; no corrective action is needed. |
4007 Remote notification redelivery started. Notification
ID = action-id. | action-id | is the ID (a 16-bit integer) of the notification that
was redelivered. This value is the same as the notification identifier
used in notification Subsystem Programmatic Interface (SPI) commands. |
Cause A Syshealth user has requested redelivery for a remote notification
which could not be delivered earlier. Effect The indicated notification will be delivered to all appropriate
destinations. Recovery Informational message only; no corrective action is needed. |
|