Chapter 17 CPU Messages

The messages in this chapter are sent by the CPU subsystem. The subsystem ID displayed by these messages includes CPU as the subsystem name.




	NOTE: Negative-numbered messages are common to most subsystems. If you receive a negative-numbered message that is not described in this chapter, see Chapter 15.

100

Processor Up, CPU cpu

Cause The specified processor was loaded.

Effect The system reports the event and continues processing.

Recovery This is an informational message only; no corrective action is needed.

101

Processor Down, CPU cpu

Cause The sending processor failed to receive an “I'm alive” message from the specified processor within twice the configured polling interval.

Effect The sending processor assumes that the specified processor is inoperable and transmits no further messages to it.

Recovery If the specified processor was manually reset, this message is informational only; no corrective action is needed. If this message is generated for any other reason, there is a problem with the specified processor. If a halt code is indicated, refer to the Processor Halt Codes Manual for recovery information.If the halt code is % 000010, there is a hardware problem; use the Service Processor (SP) Remote Maintenance Interface (RMI) screen to determine which components are in error.

102

Processor Power On, CPU cpu

Cause Power was restored to the specified processor after a power failure occurred.

Effect The system reports the problem and continues processing.

Recovery This is an informational message only; no corrective action is needed.

103

Tandem Internal Maintenance Information (Microcode Counters)

Cause This event generates a routine message every 24 hours.

Effect None.

Recovery This is an informational message only; no corrective action is needed.

104

Tandem Internal Maintenance Information (SCACHE or SP Checksum Microcode Counters are Nonzero)

Cause A nonzero value was detected in the secondary cache tokens or the service processor (SP) checksum token.

Effect This event is generated every two hours as long as the nonzero condition remains and the nonzero values represent the total count that has occurred in the last two hours.

Recovery Contact your service provider.

105

Processor Power Fail Warning, CPU cpu Computed Ride Through Time: n seconds Actual Ride Through Time : n seconds Minimum Ride Through Time (All Cabinets): n seconds Maximum Ride Through Time (All Cabinets): n seconds

Cause A power-fail signal was detected by the service processor (SP) and propagated to the processor module.

Effect None. The system starts operating using power from the backup battery. The indicated processor runs as long as there is enough battery capacity to keep the processor running.

Recovery This is an informational message only; no corrective action is needed.

106

Processor PON, CPU cpu Remaining Ride Through Time: n seconds

Cause Power was restored to the processor while in battery-backup mode.

Effect None.

Recovery This is an informational message only; no corrective action is needed.

107

Shout Received Without PWARN by CPU cpu

Cause A processor received a power-fail shout packet without receiving an earlier PWARN signal.

Effect The processor indicated is allowed to regroup when power returns to the processor; it is not marked “down.”

Recovery This is an informational message only; no corrective action is needed.

200

CPU cpu-number taken down by DIVER from terminal terminal‑name.

Cause An operator executed a DIVER command (executable only by a super-group user (255, n)).

Effect The DIVER command shuts down the processor specified in the command.

Recovery If the action was intentional, no corrective action is needed. Otherwise, reload the processor.

300

Hardware error freeze on processor n

Cause A hardware error freeze (HEF) occurred in the specified processor. The hardware has a lock-step error.

Effect The processor is down.

Recovery Replace the PMF or the memory that has the problem, and then bring the processor back up again. Contact your service provider for more assistance.

306

Software halt halt-err-number, CPU cpu

halt-err-code

is the halt error code.

Cause The processor halted or the system froze because of a software problem.

Effect The processor is down.

Recovery Take a memory dump and reload the processor. Contact your service provider if you need help.

311

D-Series and G-Series RVUs Message Text

Correctable Memory Error, CPU cpu

H-Series RVUs Message Text

CPU cpu, Slice slice had a {hard } CME error at physical address 0xaddress in memory module with DIMM dimm The memory syndrome was 0xsyndrome1 0xsyndrome2. The number of times CME has occurred is times. The slice tracking id is track-id. {Page was not stealable. } {Too many CMEs. }

Cause A hard correctable memory error (CME) occurred in the specified processor. A CME is a single-bit failure that remains in error after the memory access is retried.

Effect If possible, pages with hard CMEs are deallocated, which means that they are removed from the pool of pages available for future use. If the page is currently in use, the contents are copied to a new page, and the page table is updated to point to the new page. Pages with hard CMEs are not deallocated if more than 2 percent of the physical pages are already deallocated or if the page is a locked data page.

Locked data pages are deallocated when the last byte on the page is unlocked, which might never occur. If deallocation is impossible, the interrupt mask disables CMEs for up to 11 minutes. Unlocked pages and locked code pages are deallocated on hard CMEs until the 2 percent limit is achieved.

Recovery The error is informational unless it occurs frequently. It is correctable, but the condition is permanent and will always need correction.

If this event occurs frequently, it can indicate that one or more components are going bad. When there are many occurrences of this event, look for patterns of occurrence. For example, check whether the event occurs in the same physical page.

312 (D-Series and G-Series RVUs)

Uncorrectable Memory Error, CPU cpu

Cause An uncorrectable memory error (UCME) occurred in the specified processor.

Effect The affected page of memory is deallocated from use, and the process that used it abends. If the failing memory is essential to system operation, the sending processor is brought down until the UCME is fixed.

Recovery If the error continues to occur, retain a copy of the message, contact your service provider, and provide all relevant information including:

Descriptions of the problem and accompanying symptoms
Details from the message or messages generated
Supporting documentation such as Event Management Service (EMS) logs, trace files, and a processor dump, if applicable

If your local operating procedures require you to contact the HP Global Mission Critical Solution Center (GMCSC), also supply your system number and the numbers and versions of all related products.

312 (H-Series RVUs)

CPU cpu, Slice slice had a {hard} UCME error at physical address addressin memory module with DIMM dimm. The slice tracking id is track-id. {Page was not stealable. }

Cause An UCME occurred in the sending processor.

Effect The NonStop Blade Element is reset.

Recovery If the error continues to occur, retain a copy of the message, contact your service provider, and provide all relevant information including:

Descriptions of the problem and accompanying symptoms
Details from the message or messages generated
Supporting documentation such as Event Management Service (EMS) logs, trace files, and a processor dump, if applicable

If your local operating procedures require you to contact the HP Global Mission Critical Solution Center (GMCSC), also supply your system number and the numbers and versions of all related products.

313

CPU cpu, had a miscompare of data at physical address address. { All slices were different. } { Slice slice contained different data. }

`cpu`	is the processor that had the memory miscompare problem.
`address`	is the physical address where the problem occurred.
`slice`	is the NonStop Blade Element of the processor that has a different value. If the value is -1, it is not possible to determine which NonStop Blade Element has a problem, and the Sniffer does not scrub the area.

Cause The data in one or more NonStop Blade Elements of the processor differs from the data in the other NonStop Blade Elements.

Effect If the Sniffer can identify which NonStop Blade Element has the bad data, Sniffer replaces the bad data with data copied from another NonStop Blade Element; otherwise, Sniffer only marks the message as critical.

Recovery This is an informational message only; no corrective action is needed.

314

The Sniffer in CPU cpu has been instructed to sniff all of memory. It is starting at location address. There will be degradation in performance until the sniffing is completed.

`cpu`	is the the processor in which the Sniffer is running.
`address`	is the address where the rapid sniffing begins.

Cause A privileged process has called the procedure SNIFFER_RAPIDCYCLE_ in order to cause the Sniffer to sniff all of memory as rapidly as possible.

Effect The Sniffer dominates the processor, beginning at the location reported in the message and sniffing rapidly pace until it reaches that location again. The Sniffer does rendezvous and allow “I am alive” messages to be delivered, but all other tasks become extremely slow until the rapid cycle completes.

Recovery This is an informational message only.

315

The Sniffer in CPU cpu has completed its rapid cycle of memory. Sniffing will now run at its normal pace.

cpu

is the the processor in which the Sniffer is running.

Cause A privileged process called the procedure SNIFFER_RAPIDCYCLE_ in order to cause the Sniffer to sniff all of memory as rapidly as possible. The cycle has completed, and sniffing returns to normal speed.

Effect The Sniffer and all other tasks resume their normal paces.

Recovery This is an informational message only.

316

The Sniffer in CPU cpu has been requested to stop sniffing.

cpu

is the the processor in which the Sniffer is running.

Cause A privileged process called the procedure SNIFFER_SUSPEND_ in order to cause the Sniffer to stop sniffing.

Effect While the Sniffer is suspended, CMEs, UCMEs, and miscomparisons are not detected.

Recovery This is an informational message only.

317

The Sniffer in CPU cpu has started sniffing again.

cpu

is the the processor in which the Sniffer is running.

Cause A privileged process called the procedure SNIFFER_ACTIVATE_ in order to cause the Sniffer to resume sniffing.

Effect The Sniffer resumes normal operation.

Recovery This is an informational message only.

318

The Sniffer in CPU cpu has asked to begin sniffing at location address.

`cpu`	is the the processor in which the Sniffer is running.
`address`	is the address where the sniffing begins.

Cause A privileged process has called the procedure SNIFFER_STARTHERE_ in order to cause the Sniffer to begin sniffing at a specific address.

Effect The Sniffer begins sniffing at the requested location at its normal pace. Skipped locations might remain unsniffed for as long as 24 hours.

Recovery This is an informational message only.

319

CPU cpu, slice slice had a miscompare of data at physical address address. It has been successfully scrubbed. And now contains the same data as the other slices.

`cpu`	is the processor that has had the memory miscompare problem.
`address`	is the physical address where the problem occurred.
`slice`	is the NonStop Blade Element of the processor that has a different value.

Cause The data in one NonStop Blade Element of the processor differs from the data in the other NonStop Blade Elements.

Effect The problem is fixed.

Recovery This is an informational message only.

323

Tandem Internal Information (TSM)

Cause The specified processor stopped sending an “I am alive” message.

Effect The specified processor is halted by TSM or OSM.

Recovery Take a memory dump of the specified processor and contact your service provider.

326

Millicode halt halt-err-code, CPU cpu

halt-err-code

is the halt error code.

Cause A software error caused the indicated processor to halt.

Effect The processor is down.

Recovery Take a memory dump of the affected processor and reload the processor. If you need further assistance, contact your service provider.

400

CPU cpu PIN pin reported [(millicode log)] with reason. Program File Name: program-file-name Detection Address: 0xda (DA - program counter) [ DA File Name: da-file-name ] [ Detection Return: 0xra (RA - return address) ] [ DR File Name: dr-file-name ] [ History Address: 0xha (HA - program counter) ] [ HA File Name: ha-file-name ] [ History Return: 0xhr (HR - return address) ] [ HR File Name: hr-file-name ]

cpu

is the processor in which the VRO-related performance issue was reported.

pin

is the PIN of the process for which the VRO-related performance issue was reported.

reason

is the reason for the event. (See Table 17-1.)

program-file-name

is the program file name associated with the process for which the VRO-related performance issue was reported.

da

is the value (obtained by means of the History API or from millicode logged information) of the program counter (PC) register at the time the VRO-related performance issue in the process was detected. This address (native) might be in the program itself or might be in one of the DLLs (ordinary, public, or implicit) associated with the program.

da-file-name

is the file name of the program or DLL associated with da.

ra

is the value (obtained by means of the History API or from millicode logged information) of the procedure return address at the time the VRO-related performance issue was detected. This address (native) might be in the program itself or in one of the DLLs (ordinary, public, or implicit) associated with the program.

dr-file-name

is the file name of the program or DLL associated with ra.

ha

is another PC register value obtained by unwinding history information in an attempt to find a value that is nonnegative (probably outside public or implicit DLL space). If found (or if the end of the history information is reached), this token is included in the event. This address (native) might be in the program itself or in one of the ordinary DLLs associated with the program, but it is less likely than ZCPU-TKN-UNCP-DETN-ADDR to be in a public or implicit DLL.

ha-file-name

is the file name of the program or DLL associated with ha.

hr

is the value of the procedure return address associated with the PC register value in ZCPU-TKN-UNCP-HIST-ADDR. This address (native) might be in the program itself or in one of the ordinary DLLs associated with the program, but it is less likely than ZCPU-TKN-UNCP-DETN-RTRN to be in a public or implicit DLL.

hr-file-name

is the file name of the program or DLL associated with hr.

Table 17-1 Reasons for VROs

Reason Displayed	Enumeration Value
Reason Displayed	Name	Number
too infrequent VROs	CONSTANT ZCPU-VAL-UNCP-REASON-INFRQ-VRO	0
too frequent breakpoint register VROs	CONSTANT ZCPU-VAL-UNCP-REASON-FREQ-BPTV	1
too frequent Break/VRO_SET_ calls	CONSTANT ZCPU-VAL-UNCP-REASON-FREQ-BRKV	2
too frequent AutoVROs	CONSTANT ZCPU-VAL-UNCP-REASON-FREQ-AVRO	3

Cause See Table 17-1

Effect The process continues to run, but the reported issue might affect both its own performance and system performance.

Recovery This is an informational event message only.

401

CPU cpu reported rendezvous fault of type fault-type. Master Control & Status (old): 0xold-status Master Control & Status (new): 0xnew-status

cpu

is the processor in which millicode reported the rendezvous fault.

fault-type

is the type of rendezvous fault. (See Table 17-2.)

old-status

is the old value of Master Control and Status Register.

new-status

is the new value of Master Control and Status Register.

Table 17-2 Rendezvous Fault Types

Fault Type Displayed	Enumeration Value
Fault Type Displayed	Name	Number
Voter Error	CONSTANT ZCPU-VAL-RVF-TYPE-VOTER-ERROR	1
Rendezvous Timeout	CONSTANT ZCPU-VAL-RVF-TYPE-RV-TIMEOUT	2
Aging Error	CONSTANT ZCPU-VAL-RVF-TYPE-AGING-ERROR	3
Sequence Number Mismatch	CONSTANT ZCPU-VAL-RVF-TYPE-SEQ-MISMATCH	4

Cause A fault of one of the types in Table 17-2.

Effect None.

Recovery This is an informational event message only; recovery is automatic.

402

CPU cpu VRO and RV statistics for minutes elapsed minutes. Ownership UNCP counts: GFH gfh-1 MER mer-1 report r-1 Ownership N BPT VRO counts: GFH gfh-2 MER mer-2 report r-2 Ownership N VRO counts: GFH gfh-3 MER mer-3 report r-3 Ownership N AutoVRO counts: GFH gfh-4 MER mer-4 report r-4 Millicode logged UNCP counts: MER mer-5 report r-5 Millicode logged UNCP PCB Not Active count: MER mer-6 VRO counts: Total total-vro Auto auto VRO calls: Break/VRO_SET_ vro VRO_SET_PRIV_ vro-priv UP counts: Total total-up Unique unique-up UP Inserted VRO count: inserted-vro UP In Priv counts: Total total-priv Unique unique-priv UP In Priv With Lock count: priv-with-lock UP In Priv Not Recorded count: not-recorded UP False count: false UP Marginal count: marginal UP IIP PMU counts: Mismatch mismatch Near Match near-match

`cpu`	is the processor for which the statistics are reported.
`minutes`	is the elapsed time (in minutes) in which these statistics were collected. The value is usually 1440, but it might differ for the first such event after a processor is loaded.
`gfh-1`	is the number of times GFH received ownership of a process with too infrequent VROs.
`mer-1`	is the number of times MER AP received ownership of a process with too infrequent VROs.
`r-1`	is the number of times MER AP reported (to $ZLOG) a process with too infrequent VROs, as a result of receiving process ownership.
`gfh-2`	is the number of times GFH received ownership of a process that reached a multiple of 0x20000 breakpoint-register VROs.
`mer-2`	is the number of times MER AP received ownership of a process that reached a multiple of 0x20000 breakpoint-register VROs.
`r-2`	is the number of times MER AP reported (to $ZLOG) a process with too frequent breakpoint-register VROs, as a result of receiving process ownership.
`gfh-3`	is the number of times GFH received ownership of a process that reached a multiple of 0x20000 Break calls (including VRO_SET_ calls).
`mer-3`	is the number of times MER AP received ownership of a process that reached a multiple of 0x20000 Break calls (including VRO_SET_ calls).
`r-3`	is the number of times MER AP reported (to $ZLOG) a process with too frequent Break calls (including VRO_SET_ calls), as a result of receiving process ownership.
`gfh-4`	is the number of times GFH received ownership of a process that reached a multiple of 0x20000 AutoVROs.
`mer-4`	is the number of times MER AP received ownership of a process that reached a multiple of 0x20000 breakpoint-register AutoVROs.
`r-4`	is the number of times MER AP reported (to $ZLOG) a process with too frequent AutoVROs, as a result of receiving process ownership.
`mer-5`	is the number of times MER AP obtained information from the millicode in-memory log about a process with too infrequent VROs.
`r-5`	is the number of times MER AP reported (to $ZLOG) a process with too infrequent VROs, as a result of information obtained from the millicode in-memory log.
`mer-6`	is the number of times MER AP obtained information from the millicode in-memory log about a process with too infrequent VROs but found that the PCB associated with the PIN obtained from the millicode was no longer active.
`total-vro`	is the number of VROs.
`auto`	is the number of AutoVROs (triggered by page fault resulting from compiler-inserted instructions).
`vro`	is the number of VRO_SET_ calls.
`vro-priv`	is the number of VRO_SET_PRIV_ calls.
`total-up`	is the number of times a process with too infrequent VROs was detected.
`unique-up`	is the number of times a unique process with too infrequent VROs was detected.
`inserted-vro`	is the number of breakpoint-register VROs.
`total-priv`	is the number of times a privileged process with too infrequent VROs was detected.
`unique-priv`	is the number of times a unique privileged process with too infrequent VROs was detected.
`priv-with-lock`	is the number of times a privileged process with too infrequent VROs was detected while it held a lock.
`not-recorded`	is the number of times a privileged process with too infrequent VROs was not recorded in the millicode in-memory log (because the log was full).
`false`	is the number of times a process with too infrequent VROs was detected on this PE but was found to have adequate VROs on at least one other PE.
`marginal`	is the number of times a process with too infrequent VROs was detected by means of the timer interrrupt but was found to have adequate VROs during the synchronization phase.
`mismatch`	is the number of times that the IIP did not match among the PEs after the process reached the leader's PMU instruction count.
`near-match`	is the number of times that the IIP was within one bundle (+ or -) among the PEs after the process reached the leader's PMU instruction count.

Cause This event is reported once per day for each processor at 4:00 AM local civil time (LCT).

Effect None.

Recovery This is an informational event message only; no corrective action is needed.

440

CPU cpu-num Power Regulator setting has been changed (reason). 
Power Regulator setting (old): old-setting 
Power Regulator setting (new): new-setting

`cpu-num`	is the processor for which the Power Regulator setting has changed.
`reason`	is the reason the Power Regulator setting has changed.
`old-setting`	is the old Power Regulator setting. One of the following values: Static Low Power Static High Performance Dynamic Power Savings
`new-setting`	is the new Power Regulator setting. One of the following values: Static Low Power Static High Performance Dynamic Power Savings

Cause The processor Power Regulator setting has been changed, for one of the following three reasons:




	NOTE: Power efficiency is managed by HP Insight Control power management's Power Regulator feature.

The user has disabled the Power Regulator using OSM Service Connection, thus reverting the processor Power Regulator setting to HP Static High Power. In OSM, this is identified as HP Power Regulator Not Enabled (High Power) and the setting is Static High Performance.
The user has changed the Power Regulator setting via iLO (Integrated Lights Out) using HP Insight Control power management.
The user has enabled the Power Regulator using OSM Service Connection.

Effect The system reports this event and the processor continues operating (in the new Power Regulator setting).

Recovery This is an informational message only; no corrective action is needed.