Operator Messages Manual
Chapter 17 CPU Messages
The messages in this chapter are sent by the CPU subsystem.
The subsystem ID displayed by these messages includes CPU as the subsystem
name. | | | | | NOTE: Negative-numbered messages are common to most subsystems. If
you receive a negative-numbered message that is not described in this
chapter, see Chapter 15. | | | | |
100 Processor Up, CPU cpu | Cause The specified processor was loaded. Effect The system reports the event and continues processing. Recovery This is an informational message only; no corrective action
is needed. |
101 Processor Down, CPU cpu | Cause The sending processor failed to receive an “I'm alive”
message from the specified processor within twice the configured polling
interval. Effect The sending processor assumes that the specified processor is
inoperable and transmits no further messages to it. Recovery If the specified processor was manually reset, this message
is informational only; no corrective action is needed. If this message
is generated for any other reason, there is a problem with the specified
processor. If a halt code is indicated, refer to the Processor
Halt Codes Manual for recovery information.If the halt
code is % 000010, there is a hardware problem; use the Service Processor
(SP) Remote Maintenance Interface (RMI) screen to determine which
components are in error. |
102 Processor Power On, CPU cpu | Cause Power was restored to the specified processor after a power
failure occurred. Effect The system reports the problem and continues processing. Recovery This is an informational message only; no corrective action
is needed. |
103 Tandem Internal Maintenance Information (Microcode
Counters) | Cause This event generates a routine message every 24 hours. Effect None. Recovery This is an informational message only; no corrective action
is needed. |
104 Tandem Internal Maintenance Information (SCACHE or SP Checksum Microcode Counters are Nonzero) | Cause A nonzero value was detected in the secondary cache tokens or
the service processor (SP) checksum token. Effect This event is generated every two hours as long as the nonzero
condition remains and the nonzero values represent the total count
that has occurred in the last two hours. Recovery Contact your service provider. |
105 Processor Power Fail Warning, CPU cpu Computed Ride Through Time: n seconds Actual Ride Through Time : n seconds
Minimum Ride Through Time (All Cabinets): n seconds Maximum Ride Through Time (All Cabinets): n seconds | Cause A power-fail signal was detected by the service processor (SP)
and propagated to the processor module. Effect None. The system starts operating using power from the backup
battery. The indicated processor runs as long as there is enough battery
capacity to keep the processor running. Recovery This is an informational message only; no corrective action
is needed. |
106 Processor PON, CPU cpu Remaining Ride Through Time: n seconds | Cause Power was restored to the processor while in battery-backup
mode. Effect None. Recovery This is an informational message only; no corrective action
is needed. |
107 Shout Received Without PWARN by CPU cpu | Cause A processor received a power-fail shout packet without receiving
an earlier PWARN signal. Effect The processor indicated is allowed to regroup when power returns
to the processor; it is not marked “down.” Recovery This is an informational message only; no corrective action
is needed. |
200 CPU cpu-number taken
down by DIVER from terminal terminal‑name. | Cause An operator executed a DIVER command (executable only by a super-group
user (255, n)). Effect The DIVER command shuts down the processor specified in the
command. Recovery If the action was intentional, no corrective action is needed.
Otherwise, reload the processor. |
300 Hardware error freeze on processor
n | Cause A hardware error freeze (HEF) occurred in the specified processor.
The hardware has a lock-step error. Effect The processor is down. Recovery Replace the PMF or the memory that has the problem, and then
bring the processor back up again. Contact your service provider for
more assistance. |
306 Software halt halt-err-number,
CPU cpu | halt-err-code | is the halt error code. |
Cause The processor halted or the system froze because of a software
problem. Effect The processor is down. Recovery Take a memory dump and reload the processor. Contact your service
provider if you need help. | 311D-Series and G-Series RVUs Message Text Correctable Memory Error, CPU cpu H-Series RVUs Message Text CPU cpu, Slice slice had a {hard } CME
error at physical address 0xaddress in
memory module with DIMM dimm The memory
syndrome was 0xsyndrome1 0xsyndrome2. The number of times CME has occurred is times. The slice tracking id is track-id. {Page was not stealable. } {Too many CMEs.
} | Cause A hard correctable memory error (CME) occurred in the specified
processor. A CME is a single-bit failure that remains in error after
the memory access is retried. Effect If possible, pages with hard CMEs are deallocated, which means
that they are removed from the pool of pages available for future
use. If the page is currently in use, the contents are copied to a
new page, and the page table is updated to point to the new page.
Pages with hard CMEs are not deallocated if more than 2 percent of
the physical pages are already deallocated or if the page is a locked
data page. Locked data pages are deallocated when the last byte on the
page is unlocked, which might never occur. If deallocation is impossible,
the interrupt mask disables CMEs for up to 11 minutes. Unlocked pages
and locked code pages are deallocated on hard CMEs until the 2 percent
limit is achieved. Recovery The error is informational unless it occurs frequently. It is
correctable, but the condition is permanent and will always need correction. If this event occurs frequently, it can indicate that one or
more components are going bad. When there are many occurrences of
this event, look for patterns of occurrence. For example, check whether
the event occurs in the same physical page. |
312 (D-Series and G-Series RVUs) Uncorrectable Memory Error, CPU cpu | Cause An uncorrectable memory error (UCME) occurred in the specified
processor. Effect The affected page of memory is deallocated from use, and the
process that used it abends. If the failing memory is essential to
system operation, the sending processor is brought down until the
UCME is fixed. Recovery If the error continues to occur, retain a copy of the message,
contact your service provider, and provide all relevant information
including: Descriptions of the problem and accompanying symptoms Details from the message or messages generated Supporting documentation such as Event Management
Service (EMS) logs, trace files, and a processor dump, if applicable
If your local operating procedures require you to contact
the HP Global Mission Critical Solution Center (GMCSC), also supply
your system number and the numbers and versions of all related products. |
312 (H-Series RVUs) CPU cpu, Slice slice had a {hard} UCME error at physical address addressin memory module with DIMM dimm. The slice tracking id is track-id.
{Page was not stealable. } | Cause An UCME occurred in the sending processor. Effect The NonStop Blade Element is reset. Recovery If the error continues to occur, retain a copy of the message,
contact your service provider, and provide all relevant information
including: Descriptions of the problem and accompanying symptoms Details from the message or messages generated Supporting documentation such as Event Management
Service (EMS) logs, trace files, and a processor dump, if applicable
If your local operating procedures require you to contact
the HP Global Mission Critical Solution Center (GMCSC), also supply
your system number and the numbers and versions of all related products. |
313 CPU cpu, had a miscompare
of data at physical address address. {
All slices were different. } { Slice slice contained different data. } | cpu | is the processor that had the memory miscompare problem. | address | is the physical address where the problem occurred. | slice | is the NonStop Blade Element of the processor that
has a different value. If the value is -1, it is not possible to determine
which NonStop Blade Element has a problem, and the Sniffer does not
scrub the area. |
Cause The data in one or more NonStop Blade Elements of the processor
differs from the data in the other NonStop Blade Elements. Effect If the Sniffer can identify which NonStop Blade Element has
the bad data, Sniffer replaces the bad data with data copied from
another NonStop Blade Element; otherwise, Sniffer only marks the message
as critical. Recovery This is an informational message only; no corrective action
is needed. |
314 The Sniffer in CPU cpu has been instructed to sniff all of memory. It is starting at location address. There will be degradation in performance until
the sniffing is completed. | cpu | is the the processor in which the Sniffer is running. | address | is the address where the rapid sniffing begins. |
Cause A privileged process has called the procedure SNIFFER_RAPIDCYCLE_
in order to cause the Sniffer to sniff all of memory as rapidly as
possible. Effect The Sniffer dominates the processor, beginning at the location
reported in the message and sniffing rapidly pace until it reaches
that location again. The Sniffer does rendezvous and allow “I
am alive” messages to be delivered, but all other tasks become
extremely slow until the rapid cycle completes. Recovery This is an informational message only. |
315 The Sniffer in CPU cpu has completed its rapid cycle of memory. Sniffing will now run at
its normal pace. | cpu | is the the processor in which the Sniffer is running. |
Cause A privileged process called the procedure SNIFFER_RAPIDCYCLE_
in order to cause the Sniffer to sniff all of memory as rapidly as
possible. The cycle has completed, and sniffing returns to normal
speed. Effect The Sniffer and all other tasks resume their normal paces. Recovery This is an informational message only. |
316 The Sniffer in CPU cpu has been requested to stop sniffing. | cpu | is the the processor in which the Sniffer is running. |
Cause A privileged process called the procedure SNIFFER_SUSPEND_ in
order to cause the Sniffer to stop sniffing. Effect While the Sniffer is suspended, CMEs, UCMEs, and miscomparisons
are not detected. Recovery This is an informational message only. |
317 The Sniffer in CPU cpu has started sniffing again. | cpu | is the the processor in which the Sniffer is running. |
Cause A privileged process called the procedure SNIFFER_ACTIVATE_
in order to cause the Sniffer to resume sniffing. Effect The Sniffer resumes normal operation. Recovery This is an informational message only. |
318 The Sniffer in CPU cpu has asked to begin sniffing at location address. | cpu | is the the processor in which the Sniffer is running. | address | is the address where the sniffing begins. |
Cause A privileged process has called the procedure SNIFFER_STARTHERE_
in order to cause the Sniffer to begin sniffing at a specific address. Effect The Sniffer begins sniffing at the requested location at its
normal pace. Skipped locations might remain unsniffed for as long
as 24 hours. Recovery This is an informational message only. |
319 CPU cpu, slice slice had a miscompare of data at physical address address. It has been successfully scrubbed. And now
contains the same data as the other slices. | cpu | is the processor that has had the memory miscompare
problem. | address | is the physical address where the problem occurred. | slice | is the NonStop Blade Element of the processor that
has a different value. |
Cause The data in one NonStop Blade Element of the processor differs
from the data in the other NonStop Blade Elements. Effect The problem is fixed. Recovery This is an informational message only. |
323 Tandem Internal Information (TSM) | Cause The specified processor stopped sending an “I am alive”
message. Effect The specified processor is halted by TSM or OSM. Recovery Take a memory dump of the specified processor and contact your
service provider. |
326 Millicode halt halt-err-code, CPU
cpu | halt-err-code | is the halt error code. |
Cause A software error caused the indicated processor to halt. Effect The processor is down. Recovery Take a memory dump of the affected processor and reload the
processor. If you need further assistance, contact your service provider. |
400 CPU cpu PIN pin reported [(millicode log)] with reason. Program File Name: program-file-name Detection Address: 0xda (DA - program
counter) [ DA File Name: da-file-name ] [ Detection Return: 0xra (RA - return address) ] [ DR File Name: dr-file-name ] [ History Address: 0xha (HA - program counter) ] [ HA File Name: ha-file-name ] [ History Return: 0xhr (HR - return address) ] [ HR File Name: hr-file-name ] | cpu | is the processor in which the VRO-related performance
issue was reported. | pin | is the PIN of the process for which the VRO-related
performance issue was reported. | reason | is the reason for the event. (See Table 17-1.) | program-file-name | is the program file name associated with the process
for which the VRO-related performance issue was reported. | da | is the value (obtained by means of the History API
or from millicode logged information) of the program counter (PC)
register at the time the VRO-related performance issue in the process
was detected. This address (native) might be in the program itself
or might be in one of the DLLs (ordinary, public, or implicit) associated
with the program. | da-file-name | is the file name of the program or DLL associated
with da. | ra | is the value (obtained by means of the History API
or from millicode logged information) of the procedure return address
at the time the VRO-related performance issue was detected. This address
(native) might be in the program itself or in one of the DLLs (ordinary,
public, or implicit) associated with the program. | dr-file-name | is the file name of the program or DLL associated
with ra. | ha | is another PC register value obtained by unwinding
history information in an attempt to find a value that is nonnegative
(probably outside public or implicit DLL space). If found (or if the
end of the history information is reached), this token is included
in the event. This address (native) might be in the program itself
or in one of the ordinary DLLs associated with the program, but it
is less likely than ZCPU-TKN-UNCP-DETN-ADDR to be in a public or implicit
DLL. | ha-file-name | is the file name of the program or DLL associated
with ha. | hr | is the value of the procedure return address associated
with the PC register value in ZCPU-TKN-UNCP-HIST-ADDR. This address
(native) might be in the program itself or in one of the ordinary
DLLs associated with the program, but it is less likely than ZCPU-TKN-UNCP-DETN-RTRN
to be in a public or implicit DLL. | hr-file-name | is the file name of the program or DLL associated
with hr. Table 17-1 Reasons for VROs Reason Displayed | Enumeration Value |
---|
Name | Number |
---|
too infrequent VROs | CONSTANT ZCPU-VAL-UNCP-REASON-INFRQ-VRO | 0 | too frequent breakpoint register VROs | CONSTANT
ZCPU-VAL-UNCP-REASON-FREQ-BPTV | 1 | too frequent Break/VRO_SET_ calls | CONSTANT
ZCPU-VAL-UNCP-REASON-FREQ-BRKV | 2 | too frequent AutoVROs | CONSTANT ZCPU-VAL-UNCP-REASON-FREQ-AVRO | 3 |
|
Cause See Table 17-1 Effect The process continues to run, but the reported issue might affect
both its own performance and system performance. Recovery This is an informational event message only. |
401 CPU cpu reported
rendezvous fault of type fault-type. Master
Control & Status (old): 0xold-status Master Control & Status (new): 0xnew-status | cpu | is the processor in which millicode reported the rendezvous
fault. | fault-type | is the type of rendezvous fault. (See Table 17-2.) | old-status | is the old value of Master Control and Status Register. | new-status | is the new value of Master Control and Status Register. Table 17-2 Rendezvous Fault Types Fault Type Displayed | Enumeration Value |
---|
Name | Number |
---|
Voter Error | CONSTANT ZCPU-VAL-RVF-TYPE-VOTER-ERROR | 1 | Rendezvous Timeout | CONSTANT ZCPU-VAL-RVF-TYPE-RV-TIMEOUT | 2 | Aging Error | CONSTANT ZCPU-VAL-RVF-TYPE-AGING-ERROR | 3 | Sequence Number Mismatch | CONSTANT ZCPU-VAL-RVF-TYPE-SEQ-MISMATCH | 4 |
|
Cause A fault of one of the types in Table 17-2. Effect None. Recovery This is an informational event message only; recovery is automatic. |
402 CPU cpu VRO and RV
statistics for minutes elapsed minutes.
Ownership UNCP counts: GFH gfh-1 MER mer-1 report r-1 Ownership
N BPT VRO counts: GFH gfh-2 MER mer-2 report r-2 Ownership
N VRO counts: GFH gfh-3 MER mer-3 report r-3 Ownership
N AutoVRO counts: GFH gfh-4 MER mer-4 report r-4 Millicode
logged UNCP counts: MER mer-5 report r-5 Millicode logged UNCP PCB Not Active count: MER mer-6 VRO counts: Total total-vro Auto auto VRO calls: Break/VRO_SET_ vro VRO_SET_PRIV_ vro-priv UP counts: Total total-up Unique unique-up UP Inserted VRO count: inserted-vro UP In Priv counts: Total total-priv
Unique unique-priv UP In Priv With Lock
count: priv-with-lock UP In Priv Not Recorded
count: not-recorded UP False count: false UP Marginal count: marginal UP IIP PMU counts: Mismatch mismatch Near Match near-match | cpu | is the processor for which the statistics are reported. | minutes | is the elapsed time (in minutes) in which these statistics
were collected. The value is usually 1440, but it might differ for
the first such event after a processor is loaded. | gfh-1 | is the number of times GFH received ownership of a
process with too infrequent VROs. | mer-1 | is the number of times MER AP received ownership of
a process with too infrequent VROs. | r-1 | is the number of times MER AP reported (to $ZLOG)
a process with too infrequent VROs, as a result of receiving process
ownership. | gfh-2 | is the number of times GFH received ownership of a
process that reached a multiple of 0x20000 breakpoint-register VROs. | mer-2 | is the number of times MER AP received ownership of
a process that reached a multiple of 0x20000 breakpoint-register VROs. | r-2 | is the number of times MER AP reported (to $ZLOG)
a process with too frequent breakpoint-register VROs, as a result
of receiving process ownership. | gfh-3 | is the number of times GFH received ownership of a
process that reached a multiple of 0x20000 Break calls (including
VRO_SET_ calls). | mer-3 | is the number of times MER AP received ownership of
a process that reached a multiple of 0x20000 Break calls (including
VRO_SET_ calls). | r-3 | is the number of times MER AP reported (to $ZLOG)
a process with too frequent Break calls (including VRO_SET_ calls),
as a result of receiving process ownership. | gfh-4 | is the number of times GFH received ownership of a
process that reached a multiple of 0x20000 AutoVROs. | mer-4 | is the number of times MER AP received ownership of
a process that reached a multiple of 0x20000 breakpoint-register AutoVROs. | r-4 | is the number of times MER AP reported (to $ZLOG)
a process with too frequent AutoVROs, as a result of receiving process
ownership. | mer-5 | is the number of times MER AP obtained information
from the millicode in-memory log about a process with too infrequent
VROs. | r-5 | is the number of times MER AP reported (to $ZLOG)
a process with too infrequent VROs, as a result of information obtained
from the millicode in-memory log. | mer-6 | is the number of times MER AP obtained information
from the millicode in-memory log about a process with too infrequent
VROs but found that the PCB associated with the PIN obtained from
the millicode was no longer active. | total-vro | is the number of VROs. | auto | is the number of AutoVROs (triggered by page fault
resulting from compiler-inserted instructions). | vro | is the number of VRO_SET_ calls. | vro-priv | is the number of VRO_SET_PRIV_ calls. | total-up | is the number of times a process with too infrequent
VROs was detected. | unique-up | is the number of times a unique process with too infrequent
VROs was detected. | inserted-vro | is the number of breakpoint-register VROs. | total-priv | is the number of times a privileged process with too
infrequent VROs was detected. | unique-priv | is the number of times a unique privileged process
with too infrequent VROs was detected. | priv-with-lock | is the number of times a privileged process with too
infrequent VROs was detected while it held a lock. | not-recorded | is the number of times a privileged process with too
infrequent VROs was not recorded in the millicode in-memory log (because
the log was full). | false | is the number of times a process with too infrequent
VROs was detected on this PE but was found to have adequate VROs on
at least one other PE. | marginal | is the number of times a process with too infrequent
VROs was detected by means of the timer interrrupt but was found to
have adequate VROs during the synchronization phase. | mismatch | is the number of times that the IIP did not match
among the PEs after the process reached the leader's PMU instruction
count. | near-match | is the number of times that the IIP was within one
bundle (+ or -) among the PEs after the process reached the leader's
PMU instruction count. |
Cause This event is reported once per day for each processor at 4:00
AM local civil time (LCT). Effect None. Recovery This is an informational event message only; no corrective action
is needed. |
440CPU cpu-num Power Regulator setting has been changed (reason).
Power Regulator setting (old): old-setting
Power Regulator setting (new): new-setting |
| cpu-num | is the processor for which the Power Regulator setting
has changed. | reason | is the reason the Power Regulator setting has changed. | old-setting | is the old Power Regulator setting. One of the following
values: | new-setting | is the new Power Regulator setting. One of the following
values: |
Cause The processor Power Regulator setting has been changed, for
one of the following three reasons: | | | | | NOTE: Power efficiency is managed by HP Insight Control power management's
Power Regulator feature. | | | | |
The user has disabled the Power Regulator using OSM
Service Connection, thus reverting the processor Power Regulator setting
to HP Static High Power. In OSM, this is identified as HP Power Regulator
Not Enabled (High Power) and the setting is Static High Performance. The user has changed the Power Regulator setting via
iLO (Integrated Lights Out) using HP Insight Control power management. The user has enabled the Power Regulator using OSM
Service Connection.
Effect The system reports this event and the processor continues operating
(in the new Power Regulator setting). Recovery This is an informational message only; no corrective action
is needed. |
|