Operator Messages Manual

Chapter 101 SPR (Service Processor) Messages

The messages in this chapter are generated by the Service Processor subsystem. The SPR subsystem records messages sent by the service processor (SP) concerning:

  • Customer-replacement unit (CRU) insertion and removal

  • Power and environment problems

  • ServerNet configuration and hardware problems

The $ZSPE process:

  • Receives service process messages.

  • Translates them to event management subsystem (EMS) format.

  • Logs them to $ZLOG (service event log) and in some cases $0 (operator log); messages sent to $0 are noted within the message effect.

Messages sent to $0 and/or $ZLOG are viewable using the OSM or TSM Event Viewer applications.

The subsystem ID displayed by these messages includes SPR as the subsystem name.

NOTE: Negative-numbered messages are common to most subsystems. If you receive a negative-numbered message that is not described in this chapter, see Chapter 15.


101

CRU in slot group.module.slot has lost power.

group.module.slot

is the three-part slot location identifier of the CRU that lost power, in the format GRP-nn.MOD-nn.SLOT-nn.

Cause  (1) An external power failure occurred, or (2) the CRU has failed; or (3) the NonStop™ S70000 power supply in the specified location has failed; or (4) the DC power cable has been disconnected from the NonStop S70000 PMF CRU in the specified location (possibly as part of a replacement operation).

Effect  (1) None; the CRU gets power from the battery packs, until the battery charge is exhausted. (2) None, the backup CRU will take over until the failed CRU can be replaced. (3) None; the NonStop S70000 PMF CRU continues operation, powered by the second power supply in the enclosure. (4) The processor and all components in this NonStop S70000 PMF CRU halt.

Recovery  (1) When power is restored, replace the batteries. (2) Use OSM or TSM to determine which CRU has failed, and replace it following the detailed procedure in the HP NonStop S-Series Hardware Support Guide. (3) Replace the failed S70000 power supply following the detailed procedure in the HP NonStop S-Series Hardware Support Guide. (4) If the DC power cable was disconnected as part of a replacement procedure; complete the replacement operation; otherwise reconnect the DC power cable.



103

System power was lost.

Cause  An external AC power failure occurred. This can be caused by (1) the AC power cord being pulled from the AC receptacle or from the NonStop™ S7000 processor multifunction (PMF) CRU or the NonStop S70000 power supply in the specified location, or (2) the loss of AC power to the receptacle, or (3) the loss of AC power to the building.

Effect  (1), (2) For the NonStop S7000 PMF CRU: the components in the CRU continue to operate using power supplied by the battery until the battery charge is exhausted; if the power outage is longer than this time, the components in the S7000 PMF CRU halt.

For the NonStop S70000 PMF CRU: none; the CRU continues to operate using the second power supply in the enclosure.

(3) For all CRUs: the system continues to operate during the calculated power-fail delay time. Then it saves the state of all registers in memory, the processors are reset, the disk and adapter CRUs are powered off, and the batteries maintain power only to the service processor and the processor memories. If power is restored before the battery charges are exhausted, recovery is automatic. If not and the battery charges are exhausted, the contents of memory are lost.

Recovery  (1) Check the AC power cord and reconnect as necessary. (2) Connect the AC power cord to another receptacle that provides a dedicated circuit with sufficient amperage. (3) Work with power source to correct problem.



104

System power has returned.

Cause  External power was restored to the CRU after a power loss. This can be caused by (1) the AC power cord being connected to the AC receptacle or to the CRU (NonStop™ S7000 processor multifunction (PMF) CRU or NonStop S70000 power supply) in the specified location, (2) the restoration of AC power to the receptacle or the insertion of the AC power cord in a working receptacle, or (3) the restoration of AC power to the building.

Effect  (1), (2) For the NonStop S7000 PMF CRU: if the battery was able to supply power during the power outage time, none. If the battery charge was exhausted, the processor is halted, and the contents of its memory are lost.

For the NonStop S70000 power supply: none; the CRUs in the enclosure continue to operate using power supplied by the second power supply in the enclosure.

(3) If the power outage time was less than the calculated power-fail delay time, none. If the power outage was longer than this time, but the batteries were able to maintain the processors memory, none. If the batteries were exhausted, the system is halted.

Recovery  (1), (2) For NonStop S7000 PMF CRU: informational message only; no corrective action is needed unless the power outage was long enough to exhaust the charge of the battery. In this case you must reload the processor. See the HP NonStop S-Series Operations Guide or the OSM or TSM online help for instructions on reloading a processor.

For NonStop S70000 power supply: informational message only; no corrective action is needed.

(3) If the power outage time was less than the calculated power-fail delay time, this is an informational message only; no corrective action is needed. If the power outage was longer than this time, but the batteries were able to maintain the processors’ memory, none. This is an informational message only; no corrective action is needed; the system is automatically restarted at the point of power interruption. If the batteries were exhausted, you must perform a system load.



105

Bulk power was lost from power supply group.module.slot.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU that has lost bulk power from its power supply.

Cause  (1) The NonStop™ S70000 power supply in the specified location has failed; or (2) the DC power cable has been disconnected from the NonStop S70000 PMF CRU in the specified location (possibly as part of a replacement operation).

Effect  (1) None; the NonStop S70000 PMF CRU continues operation, powered by the second power supply in the enclosure. (2) The processor and all components in this NonStop S70000 PMF CRU halt.

Recovery  (1) Replace the NonStop S70000 power supply in the specified location. See the HP NonStop S-Series Hardware Support Guide for the detailed procedure. (2) If the cable was disconnected as part of a replacement operation, complete the replacement operation; otherwise reconnect the DC power cable. For more information, see HP NonStop S-Series Hardware Support Guide.



106

Battery group.module.slot has exhausted all of its available power.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the battery that has exhausted its available power.

Cause  The battery in slot 23 or 28 of the NonStop™ S7000 or the NonStop S70000 has exhausted its time limit or the battery has malfunctioned.

Effect  The contents of memory are lost.

Recovery  Replace the battery in the specified location. For more information, see HP NonStop S‑Series Hardware Support Guide. Reload the processor that halted and that receives power from the replacement battery. For more information, see HP NonStop S‑Series Operations Guide or the OSM or TSM online help for instructions on reloading a processor. Make sure that the AC power cord is connected and that the PMF CRU is operating correctly.



108

Room temperature is out of limits. Estimated temperature outside enclosure is n degrees Centigrade.

n

is the temperature, in degrees Centigrade, of the room in which the enclosure containing the specified module is located. This temperature may vary from the actual room temperature by two degrees Centigrade.

Cause  Room temperature, as estimated by measurements made by a power monitor and control unit (PMCU) CRU on air that has passed the disk drives, is above or below acceptable limits. Either the room temperature surrounding the enclosure is excessively high or low, or the enclosure door has been open too long for proper air circulation inside the enclosure.

Effect  This event is delivered to both the service log, $ZLOG, and to the operator log, $0. System hardware components may fail if the condition is not corrected. The hardware failures may lead to a subsystem outage or a full system outage. The hardware damage may be permanent, requiring multiple CRU replacement, if the condition persists for an extended period of time.

Recovery  If the room temperature is out of the acceptable range, restore the proper environment as soon as possible. If the room temperature cannot be controlled, the affected system or group must be shut down and powered off to prevent a system failure and damage to the hardware. If the enclosure door is open and the actual room temperature is within limits, close the enclosure door and monitor the service event log ($ZLOG) or the operator log ($0) for SPR message 109.



109

Room temperature is back within acceptable limits. Estimated temperature outside enclosure is n degrees Centigrade.

n

is the temperature, in Centigrade, of the room in which the enclosure containing the specified module is located. This temperature may vary from the actual room temperature by two degrees Centigrade.

Cause  Room temperature, estimated by a power monitoring and control unit (PMCU) CRU, is back in limits after having gone out of limits. Whatever action was taken to bring temperature back in limits has succeeded for the measurement taken at one of the two sensors in an S7000/S70000 enclosure. (Note: Temperature sensors for S7000/S70000 enclosures reside on PMCU CRUs. There are two sensors in each enclosure, each of which independently estimates the temperature outside its enclosure.)

Effect  This event is delivered to both the service log, $ZLOG, and to the operator log, $0. System failure is no longer imminent, unless the another PMCU still reports an unacceptable temperature.

Recovery  Make certain that both temperature sensors in the enclosure report acceptable values.



110

CRU insertion for group.module.slot passed preliminary self‑tests.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU that was inserted.

Cause  A CRU was physically inserted in a slot monitored by a service processor, and the basic Service Processor Power-On step for the CRU succeeded.

Effect  A successful insertion was performed. Software can now initialize the CRU and add its location to ServerNet router tables, if necessary for system expansion, but these steps are not yet complete; system software will proceed with the initialization and configuration of the CRU.

Recovery  Informational message only; no corrective action is needed.



111

CRU insertion for group.module.slot failed preliminary self‑tests.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU that was inserted.

Cause  A CRU was physically inserted in a slot monitored by a service processor, but the basic CRU Power-On step attempted by the service processor failed. The firmware information record (FIR) on the CRU may be unreadable or corrupt.

Effect  An unsuccessful insertion was performed. Software cannot initialize the CRU. If the CRU was inserted as part of a system expansion operation, the CRU has not yet been added to the ServerNet router tables.

Recovery  Remove the CRU and return it with the following information:

  • A copy of all the files, including the service event log ($ZLOG), on $SYSTEM.ZSERVICE

  • The operator log ($0)

Obtain another CRU for repair or expansion purposes.



112

CRU group.module.slot was removed.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU that was removed.

Cause  A CRU was physically removed from a slot monitored by a service processor.

Effect  The CRU is no longer available to the system; it is removed from the OSM or TSM inventory. You can proceed with the replacement of the CRU if this action was part of a CRU replacement operation or with the installation of the CRU if this action was part of a system reconfiguration operation.

Recovery  Informational message only; no corrective action is needed.



114

A voltage measurement on battery group.module.slot is out of limits.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the battery.

Cause  Voltage measurements on a battery CRU are out of acceptable limits. Either the total voltage drop across a bank of cells (the upper or lower half of the battery cells) or the differential voltage between the upper and lower bank of cells has gone out of the acceptable range. A cell may have become weak within the battery. In addition, a test, system power outage, or a power monitor and control unit (PMCU) failure can cause battery voltages to go out of limits.

Effect  The battery may not be capable of supplying sufficient power during a system power outage, so that an operating system resource such as processor memory contents may be unavailable after power is restored.

Recovery  If there was no interruption of external AC power, then replace the battery CRU and monitor its readings over the next 24 hours. If there was an interruption of AC power to this power rail when this event occurred, the battery may appear to be out of limits while charging during the 24 hours following the restoration of AC power. A replacement battery should have acceptable readings after 24 hours has elapsed, as indicated by the event Battery-Back-In-Limits event (SPR message 115). If not, then a second replacement battery CRU may be required.

NOTE: Event 115 may not be generated if the system is down or if a service processor failure and system load occurs.


115

Voltage measurements on battery group.module.slot are back in limits.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the battery.

Cause  The measured battery voltages are within normal limits, after the battery in the specified location was out of limits. Battery voltage measurements have become normal after being abnormal. Most probably, the battery in the specified location was replaced with a good battery. Otherwise, the battery voltage measurements are fluctuating because of a test, normal system powerfail recovery, or possibly a power monitor and control unit (PMCU) failure.

Effect  None. The battery is functioning properly, as far as can be determined without a load-bearing test.

Recovery  Informational message only; no corrective action is needed.



116

A power measurement for S7000 power supply group.module.slot is out of limits.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the processor multifunction (PMF) or I/O multifunction (IOMF) customer‑replacement unit (CRU) that contains the power supply.

Cause  Voltage measurements on an S7000 (see note) power supply are out of acceptable limits. The power supply in the PMF CRU or the IOMF CRU is not providing correct voltages to the power rail. The power supply may be failing. This event may occur when removing a PMF or IOMF CRU, in which case the power supply is not necessarily defective.

NOTE: The S7000 in this message refers to all CRUs with an internal S7000 style power supply (including S7000 PMF and IOMF CRUs).

Effect  The PMF or IOMF CRU is in danger of losing sufficient power to run the components in the CRU, or may have already lost power and switched to battery backup. The components, such as the processor (PMF CRU only), the small computer system interface (SCSI) controllers, the ServerNet router, and the service processor all may become unavailable in that PMF or IOMF CRU.

Recovery  Replace the PMF or IOMF CRU in the specified location with a new or repaired PMF or IOMF CRU.



117

Power measurements for S7000 power supply group.module.slot are back in limits.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the processor multifunction (PMF) or I/O multifunction (IOMF) customer‑replacement unit (CRU) that contains the power supply.

Cause  Voltage measurements on an S7000 (see note) power supply are back within acceptable limits. Power voltage measurements have become normal after being abnormal. Most probably, the PMF or IOMF CRU in the specified location was replaced with a good PMF or IOMF CRU. Otherwise, the power voltage measurements are fluctuating because of a test, normal system power fail recovery, or possibly a power monitor and control unit (PMCU) failure.

NOTE: The S7000 in this message refers to all CRUs with an internal S7000 style power supply (including S7000 PMF and IOMF CRUs).

Effect  The power supply and mixing diodes are functioning properly, as far as can be determined without a power-margining test.

Recovery  If an IOMF CRU was just inserted, this is an informational message only; no corrective action is needed.

If a PMF CRU was just inserted, you must reload the processor. For more information, see HP NonStop S-Series Hardware Support Guide, the HP NonStop S-Series Operations Guide, or the OSM or TSM online help for instructions on reloading a processor. No further corrective action is needed, unless unexplained fluctuations in power voltage measurements are occurring. If unexplained power voltage fluctuations occur (in and out of limits repeatedly), then the PMF CRU power supply or the monitoring PMCU may be defective.



118

Fan group.module.slot speed is out of acceptable limits. Estimated fan speed is s +- 300 rpm.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the fan CRU that is operating at an incorrect speed.

s

is the estimated speed of the specified fan in revolutions per minute. This fan speed measurement may vary by 300 rpm from the actual fan speed.

Cause  A fan CRU is operating at an incorrect speed. The fan may be defective or failing because of age, a test may have been executed on the fan, or the fan was shut off during a system power failure to preserve battery power for processor memory.

Effect  This event is delivered to both the service log, $ZLOG, and to the operator log, $0. The CRUs in the same enclosure as the fan may overheat, especially if more than one fan is operating below the expected range of speeds. Fans are normally turned off during a system power failure, but there is no danger of an enclosure overheating during a system power failure because most CRUs are powered off during this time. Very brief (less than a minute) tests that turn off a fan do not endanger the system either.

Recovery  Replace the fan CRU.



119

Fan group.module.slot speed is back within acceptable limits. Estimated fan speed is s +- 300 rpm.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the fan CRU that is operating within normal RPM limits, after being out of limits.

s

is the estimated speed of the specified fan in revolutions per minute. This fan speed measurement may vary by 300 rpm from the actual fan speed.

Cause  The fan CRU in the specified location is operating within normal RPM limits, after the fan in that location was out of limits. Either the fan was replaced with a good fan, or else the original fan is fluctuating in speed because of a test, normal system power fail recovery, or equipment failure.

Effect  This event is delivered to both the service log, $ZLOG, and to the operator log, $0. The CRUs in the same enclosure as the fan should now be safe from overheating. If the fan fluctuates in speed with no evidence of a system power failure or a test, then the fan may go out of limits permanently unless it or the underlying problem is replaced or repaired.

Recovery  Informational message only; no corrective action is needed, unless unexplained fluctuations in fan speed are occurring. If unexplained fan speed fluctuations occur, then the fan or controlling power monitor and control unit (PMCU) may be defective and need replacing.



120

A power measurement for S70000 power supply group.module.slot is out of limits.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the power supply CRU that has stopped delivering power to the module in the specified group.

Cause  Voltage measurements on an S70000 (see note) power supply (located on power shelf at bottom of enclosure) are out of acceptable limits. The power supply CRU is not providing correct voltages to the power rail. The power supply may be failing. This event may occur when removing an S70000 power supply CRU, in which case the power supply is not necessarily defective.

NOTE: The S70000 in this message refers to all CRUs with an external S70000 style power supply including (but not limited to) S7000, S70000, S72000, S7400, S74000, S7600, S76000, S86000, and the IOMF2 CRU.

Effect  The S70000 enclosure is in danger of losing fault tolerance for power supply problems. The other power supply in the enclosure may become the only power source in the enclosure.

Recovery  Replace the power supply CRU in the specified location with a new or repaired power supply.



121

Power measurements for S70000 power supply group.module.slot are back in limits.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the power supply CRU that is now back within acceptable limits.

Cause  Voltage measurements on an S70000 (see note) power supply are back within acceptable limits. Power voltage measurements have become normal after being abnormal. Most probably, the power supply CRU in the specified location was replaced with a good power supply. Otherwise, the power voltage measurements are fluctuating because of a test, normal system power fail recovery, or possibly a power monitor and control unit (PMCU) failure.

NOTE: The S70000 in this message refers to all CRUs with an external S70000 style power supply including (but not limited to) S7000, S70000, S72000, S7400, S74000, S7600, S76000, S86000, and the IOMF2 CRU.

Effect  The power supply and mixing diodes are functioning properly, as far as can be determined without a power-margining test.

Recovery  Informational message only; no corrective action is needed, unless unexplained fluctuations in power voltage measurements are occurring. If unexplained power voltage fluctuations occur (in and out of limits repeatedly), then the power supply CRU or the monitoring PMCU may be defective.



122

Battery group.module.slot has changed state to: state

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the battery CRU that has changed its state.

state

is the new state of the battery CRU.

Cause  (1) System power has failed; or (2) a single power supply has failed or the AC or DC power cables have been disconnected; or (3) the battery state has been disabled or enabled using the OSM Service Connection or the TSM Service Application; or (4) a battery has been inserted or removed; or (5) the battery is operating out of specified measurement limits or is fluctuating while charging (fault state).

Effect  (1) The batteries are discharging. A power state change event will occur in this case. (2) The battery is left as the only source of power on the S7000 PMF CRU or on the IOMF CRU. When the AC power supply fails on an S70000 PMF CRU, the other power supply provides redundant power (a power state change event occurs). (3) If the battery state is disabled, the battery cannot supply power and cannot be charged. If the battery state is enabled, the battery can supply power and recharges itself automatically as needed. (4) The battery state changes to enabled when a battery is inserted. The battery state changes to fault if the battery is removed before it the battery is disabled using the OSM Service Connection or TSM Service Application. (5) If the battery is in a fault state, the battery may not be capable of supplying sufficient power during a power failure.

Recovery  (1) Restore external power before the battery is exhausted. Check that all batteries are in the enabled state 24 hours after power returns.

(2) Check to see if the power cord has been disconnected. If it has been disconnected, reconnect the power cord. If the AC power cord has been disconnected for a long period, then the PMF CRU will stop functioning and the system resources on that CRU will stop functioning. Reload the processor associated with the stopped PMF CRU and check I/O paths (for example, disk paths). If the power cord has not been disconnected, check the AC power supply for failure. Use the OSM or TSM Event Viewer to diagnose this problem.

(3) The new battery should be automatically enabled when you insert it. If it is not, enable it using OSM or TSM.

(4) Check the state of the battery within the OSM Service Connection or the TSM Service Application. Verify that the battery is present. State changes will be reflected within OSM or TSM when the battery state change event occurs.

(5) If the battery is in a fault state and there has been a power loss of some form, then you should wait 24 hours after power returns for the state to change to enabled. If the state remains fault, then you should replace the battery because the battery may be incapable of providing power in the event of another power failure.

If the battery state is in fault, and it has been in use for an extended period of time without a power failure, you should replace the battery.

After replacing a battery or after a power failure occurs, a completely discharged battery may fluctuate in and out of a fault state while it is charging. However, after charging 24 hours the battery should not be in a fault state. If after 24 hours, the battery is still in a fault state, replace the battery.

For instructions on replacing a battery, see the HP NonStop S-Series Hardware Support Guide. As part of the replacement procedure, you will probably see a battery state of disabled. After replacing a battery, you may receive a state change of fault or charging; wait 24 hours, and then review the event log for the following battery state changes: enabled or charged.



123

Fan group.module.slot has changed state to: state

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the fan CRU that has changed its state.

state

is the new state of the fan CRU.

Cause  (1) The SP disabled the fan to conserve power during a system power failure; or (2) the SP enabled the fan after external power was restored; or (3) the fan has been disabled using the OSM Service Connection or TSM Service Application; or (4) the fan has been enabled using the OSM Service Connection or TSM Service Application; or (5) the SP has placed the fan into a fault state due to one of the following reasons: (5a) the fan is not spinning fast enough or is spinning too fast while it is enabled. This is a serious fan failure; or (5b) the fan has been removed without disabling it using the OSM Service Connection or TSM Service Application; or (5c) the fan monitoring ability of the PMCU CRU has failed.

Effect  (1) The fan is not spinning. This is expected behavior during a system power failure. (2) The fan is spinning. (3) The fan is not spinning. This is expected during a fan replacement procedure. The other fan should speed up and may sound louder. Fault tolerant cooling of system components has been lost. (4) The fan is now spinning. The other fan that was not replaced should return to normal speed. Fault tolerant cooling of system components has been restored. (5) Fault tolerant cooling of system components has been lost. The other fan should speed up and may sound louder.

Recovery  (1) and (2) After power is restored, the SPs should place all fans in the enabled state. Use the OSM Service Connection or TSM Service Application to verify that all fans are enabled. If a fan is disabled/off, use the TSM Service Application to issue the Fan On command. (3) Informational message only; proceed with fan replacement procedure. (4) Informational message only; proceed with fan replacement procedure. (5a) Replace the fan using the procedure documented in the HP NonStop S-Series Hardware Support Guide. (5b) Use the fan replacement procedure documented in the HP NonStop S-Series Hardware Support Guide.

(5c) Use the OSM or TSM Event Viewer to check for a fan out of limits event. Determine whether the fan itself is out of limits or the PMCU is incapable of monitoring the fan. Use the OSM or TSM Event Viewer to make this determination.



124

Bulk power supply group.module.slot has changed state to: state

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the bulk power supply CRU that has changed its state.

state

is the new state of the bulk power supply CRU.

Cause  The AC power cord has been pulled or damaged or there is an external power loss to that AC cable. The DC power cord has been pulled or damaged or the power supply has been removed or has failed. You will also receive this event when you restore power to the bulk power supply or after you install an S7000 PMF CRU, IOMF CRU, or an S70000 AC power supply.

Effect  AC power is not being supplied to the bulk power supply. If the state goes to fault and it is not caused by an AC power loss, then the bulk power supply needs to be replaced and the PMF CRU or IOMF CRU that contains the bulk power supply needs to be replaced. If the state goes to fault, then the S7000 PMF CRU or IOMF CRU that contains the power supply will continue to run on battery power.

Recovery  If the bulk power supply has gone into a fault state, you need to replace the power supply and on the S70000 the PMF CRU.



125

Alarm panel LED is on check for error

Cause  An event has triggered an alarm on a Central Office server.

Effect  Depends on the cause.

Recovery  See the ERAD Database Utility User’s Guide to determine which component generated the event.



126

Cru group.module.slot changed state to: state

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU that has changed its state.

state

is the new state of the CRU.

Cause  A change in the CRU's power state has been detected. This could be as a result of a user command (for example, an OSM or TSM Disable Battery action), a physical CRU insertion or deletion, or a fault (for example, a DC/DC converter failure).

Effect  The effect depends on the nature of the specific power state transition and the CRU against which the power state transition is being reported. If the CRU reporting the state change is a battery, then battery backup capability may be lost. If the state change is being reported against a Bulk Power supply, then the system may be running on the battery, the redundant power capabilities in the enclosure may be lost, or the ability to manage the CRU through the maintenance bus may be lost.

Recovery  Recovery also depends on the nature of the specific power state transition.

  • If the CRU power state was previously compromised (for example, reported as ABSENT, DISABLED, DISCHARGING, FAULT) and is now healthy (for example, ENABLED, CHARGED, or CHARGING) then this is an informational message only; no corrective action is needed.

  • If the CRU power state had previously been healthy, but is now reported as bad, and if this event was generated as the result of a CRU removal, complete the CRU replacement service activity.

  • If this event was generated as the result of a fault, then isolate the fault and replace the defective CRU.



127

CRU group.module.slot reported a Group ID Switch change, from old-switch-val to new-switch-val.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU with the Group ID Switch change.

old-switch-val

is the old Group ID Switch value.

new-switch-val

is the new Group ID Switch value.

Cause  The service processor (SP) has detected a change in the setting of an enclosure’s Group ID Switch (used to set an enclosure’s group number). Either:

  • A user has changed the enclosure's Group ID Switch value.

  • A Group ID Switch has failed and the SP is unable to read the Group ID Switch value.

  • The SP is initializing the enclosure (typically done at Enclosure Power On, or whenever an SP takes over as this enclosure's Primary SP).

Effect  The SP will not update the actual enclosure number until the enclosure is powered on and both of the enclosure's Group ID Switch values agree.

Recovery  If the user’s intent was to change the Group ID Switch value, or if this message is generated as part of the normal enclosure Power On sequence, then this is an informational message only; no corrective action is needed.

If this message was generated because of a failure to access the Group ID Switch value, then the failed CRU must be identified and replaced (the most likely candidates are either the power monitor and control unit (PMCU) or the enclosure's backplane).



128

A CPU power configuration exceptions has been detected for CRU group.module.slot, the default configuration will be used.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU with the affected processor.

Cause  During S76000 or S86000 processor initialization, the primary SP was either unable to retrieve the target processor’s power configuration from its associated PMF CRU FIR, or an error was encountered while adjusting this processor’s power voltage regulators.

Effect  The processor’s default hardware power configuration will not be adjusted by software.

Recovery  If the problem persists, it could be an indication of a defective PMF CRU.



129

An exception has been detected on the maintenance bus for CRU group.module.slot.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU that generated this maintenance bus exception.

Cause  The primary SP has detected an error while configuring or monitoring the target CRU over the maintenance bus. This error can also occur during CRU replacement.

Effect  The current SP configuration or monitoring activity fails.

Recovery  If the problem persists, it could be an indication of a defective CRU.



130

CRU insertion for group.module.slot passed Initialization phase.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU being initialized.

Cause  The first phase of the CRU Initialization process has succeeded.

Effect  The CRU Initialization process continues.

Recovery  Informational message only; no corrective action needed.



131

CRU insertion for group.module.slot failed Initialization phase.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU being initialized.

Cause  The first phase of the CRU Initialization process has failed.

Effect  The CRU is not fully Initialized. It is likely to be inoperable.

Recovery  If you are inserting a CRU, retry the operation. If this event occurs at PON, consider replacing the CRU. If the problem persists, replace the CRU.



132

CRU insertion for group.module.slot passed Configuration phase.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU being initialized.

Cause  The configuration phase of the CRU Initialization process has succeeded.

Effect  The CRU Initialization process continues.

Recovery  Informational message only; no corrective action needed.



133

CRU insertion for group.module.slot failed Configuration phase.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU being initialized.

Cause  The configuration phase of the CRU Initialization process has failed.

Effect  The CRU is not fully Initialized. It is likely to be inoperable.

Recovery  If you are inserting a CRU, retry the operation. If this event occurs at PON, consider replacing the CRU. If the problem persists, replace the CRU.



301

message

message

is a message string from Fault Insertion and Simulation Testing (FIST) used by QA.

Cause  This event is generated by HP internal testing. It should never be seen in the field.

Effect  Various; dependant on the specific test.

Recovery  Various; dependant on the specific test.



326

CPU Millicode reported a state change for group.module.slot.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU whose processor has changed state.

Cause  This event is generated during an operator action to change the processor state (including power on) except an action to halt the processor. (Halt events continue to be reported as 306 or 326 events in the CPU subsystem).

Effect  The processor changes to the requested state.

Recovery  Informational message only; no corrective action is needed.



501

ServerNet configuration started for unit in group.module.slot

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU for which ServerNet configuration has been started.

Cause  The primary service processor has started the configuration of a ServerNet addressable component that is being replaced or added to the ServerNet system area network.

Effect  The ServerNet configuration may succeed (look for an SPR 502 message) or fail (look for an SPR 503 message).

Recovery  Informational message only; no corrective action is needed.



502

ServerNet configuration failed for unit in group.module.slot

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU for which ServerNet configuration failed.

Cause  The service processor configuration of a ServerNet addressable component has failed.

Effect  The CRU is not operational. A service processor could not complete the replacement or the addition of the CRU to the ServerNet system area network.

Recovery  Return CRU to your service provider and try another CRU.



503

ServerNet configuration succeeded for unit in group.module.slot

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU for which ServerNet configuration succeeded.

Cause  The service processor successfully completed the integration of a replacement or new CRU into the ServerNet system area network (SAN).

Effect  The ServerNet software in the operating system can communicate with the CRU.

Recovery  Informational message only; no corrective action is needed.



504

ServerNet link from router port on group.module.slot is now alive.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU whose router port is now alive.

Cause  A service processor has successfully added and configured the CRU.

Effect  The CRU is accessible over the new ServerNet link.

Recovery  Informational message only; no corrective action is needed.



505

ServerNet SBI ASIC on group.module.slot has detected a controller port error.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU with the ServerNet bus interface (SBI) application-specific integrated circuit (ASIC).

Cause  An SBI controller port error has occurred, not on the ServerNet link, but on the ASIC’s port to the bus connecting the ASIC to a service processor or to a small computer system interface (SCSI) controller.

Effect  An I/O operation may be retried by software, causing the operation to either succeed or fail. A path switch may occur in an affected operating system I/O process. Performance may be affected.

Recovery  This event can be ignored if it is an isolated instance. Replacement of the CRU containing the SBI ASIC may be required if an excessive number of controller port errors occur between the SBI ASIC and one or more controllers.



506

Servernet SBI ASIC on group.module.slot has detected a ServerNet port error.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU with the ServerNet bus interface (SBI) application-specific integrated circuit (ASIC).

Cause  The SBI ASIC detected a ServerNet port error, such as a bad packet, misaddressed packet, or a ServerNet link protocol error. A hardware problem may have occurred on the SBI itself, on another ServerNet component connected to the SBI and containing the port reporting the error, or on the physical link between the two. Alternatively, a programming defect may cause this error to occur because of a problem such as an incorrect destination address on a packet.

Effect  The operating system retries the operation that led to this low-level ServerNet exception. If the retry is not successful, the CRU or components on the CRU in the specified location may no longer be available, or a path to a component may become unavailable. In this case, a system resource limitation occurs. A ServerNet related timeout may be reported by an I/O process or the interprocessor communications (IPC) subsystem on the system. Multiple occurrences may lead to a path switch or a path down in one or more I/O processes or the IPC subsystem.

Recovery  If this event occurs multiple times with path switches occurring after attempting to use the original path, try the following troubleshooting sequence:

  1. Reseat the ServerNet cable.

  2. Check operation.

  3. Replace the remote SEB, which is the CRU that is corrupting packets on a path to the SBI ASIC.

  4. Check operation.

  5. If an excessive percentage of ServerNet packets are damaged, or if the link to the port becomes disabled, replace the CRU containing the SBI ASIC.



507

ServerNet router ASIC on group.module.slot has detected a ServerNet port error on port port-num.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU with the router application-specific integrated circuit (ASIC).

port-num

is the internal Router ASIC port number on which the port error was detected.

NOTE: Do not confuse the internal Router ASIC port-num reported in this message with the external CRU connector number that is visible on the outside of SEB and MSEB CRUs. The internal port-num is not visible on the outside.

Cause  The router detected a ServerNet port error, such as a bad packet, misaddressed packet, or a ServerNet link protocol error. A hardware problem may have occurred on the router itself, on the router or ASIC connected to the router on the port reporting the error, or on the physical link between the two. Alternatively, a programming defect could cause this error to occur because of a problem such as an incorrect destination address on a packet.

Effect  The operating system retries the operation that led to this low-level ServerNet exception. If the retry is not successful, the CRU or components on the CRU in the specified location may no longer be available, or a path to a component may become unavailable. In this case, a system resource limitation occurs. If this event is isolated, the problem is transient. A ServerNet related timeout may be reported by an I/O process or the interprocessor communications (IPC) subsystem on the system. Multiple occurrences of this event are more serious and may lead to a path switch or a path down in one or more I/O processes or the IPC subsystem.

Recovery  If this event occurs multiple times with path switches occurring after attempting to use the original path, try the following troubleshooting sequence:

  1. Reseat the ServerNet cable.

  2. Check operation.

  3. Replace the remote SEB, which is the CRU that is corrupting packets on a path to the SBI ASIC.

  4. Check operation.

  5. If an excessive percentage of ServerNet packets are damaged, or if the link to the port becomes disabled, replace the CRU containing the SBI ASIC.



508

ServerNet ASIC on group.module.slot has frozen with an internally detected error.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU that contains the ASIC that has frozen.

Cause  A ServerNet application-specific integrated circuit (ASIC) has ceased to function because of an internally detected error.

Effect  Fault tolerance has been lost because part of a ServerNet fabric has become unavailable. The ASIC halts all ServerNet activity through this ASIC. All ServerNet paths through this ASIC are inaccessible. I/O processes and the IPC subsystem will shut down paths using this ASIC, but some I/O processes may be slow in doing so.

Recovery  Replace the CRU in the specified location that contains the ASIC.



509

TNet new domain at group and module group.module on net side net inserted OK

group.module

is the group and module where the ServerNet domain insertion completed successfully.

net

identifies the ServerNet fabric (X or Y) of the groups into which the new group has been integrated.

Cause  A Service Processor (SP) detected ServerNet connectivity to a new domain (enclosure), or previously lost ServerNet connectivity to an existing domain (enclosure) was reestablished. Typically, this message is generated during:

  • PMF or SEB CRU insertion

  • System expansion while adding a new enclosure to a system

  • System or enclosure power on.

The new group has been integrated into the collection of known groups that comprise this system's fabric (X or Y).

This message is also generated during the normal initialization phase of a Primary SP; therefore an SP RESET may result in the generation of this message if it triggers a Primary takeover.

Effect  Processors and IO can begin to access devices within this group through the specified ServerNet fabric.

Recovery  Informational message only; no corrective action is needed.



510

TNet newdomain at group and module group.module on net side {X|Y} inserted FAILED.

group.module

identifies the enclosure no longer accessible from the reported ServerNet fabric.

Cause  During PMF or SEB CRU removal, enclosure power-off, or ServerNet cable removal or failure, a loss of connectivity with the enclosure over the specified ServerNet fabric was detected.

Effect  The system is unable to communicate with the enclosure over the specified ServerNet fabric.

Recovery  If this event is the result of an operator command or action intended to terminate this fabric's ServerNet connectivity with this enclosure, then this is informational only.

Otherwise, this event indicates a serious ServerNet fabric failure which may require any of the following recovery actions: power-on of the enclosure, reinsertion of the PMF or SEB CRU, or reconnection or replacement of the ServerNet cable.



511

message

message

is a message string from an internal testing program.

Cause  This event is generated by HP Internal testing. It should never be seen in the field.

Effect  Various; dependant on the specific test.

Recovery  Various; dependant on the specific test.



512

Enclosure group.module was powered off by an action through TSM.

group.module

identifies the enclosure that was powered off.

Cause  An operator executed enclosure power off action using OSM or TSM.

Effect  The enclosure powers off and all its devices are inaccessible.

Recovery  Provided the enclosure was intentionally powered off, this is an informational message only; no corrective action is needed. If the enclosure power off was not intentional, push one of the enclosure's Power On buttons.



513

ServerNet Router2 ASIC on group.module.slot has detected a Packet Grabber Interrupt.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU containing the Router2 ASIC that detected the interrupt.

Cause  The service processor (SP) has serviced a Router2 Packet Grabber Interrupt through an OSM or TSM Internal Loopback Test action on a ServerNet PIC.

Effect  The ServerNet PIC that is the target of the OSM or TSM Internal Loopback Test action is taken out of service. This may result in the loss of ServerNet fault tolerance and induce IO path switches and IPC retries.

Recovery  Run the OSM or TSM Clear Loopback Test action to reestablish ServerNet communications over all affected ports.



514

ServerNet Router2 ASIC on group.module.slot has detected an IBC Symbol Interrupt on port port-num external ServerNet connector conn-num.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU containing the Router2 ASIC that detected the interrupt.

port-num

is the port number on which the interrupt was detected.

conn-num

is the external ServerNet connector number.

Cause  The ServerNet Router ASIC has generated an IBC (In-Band Control) Interrupt. S‑Series systems do not currently use the in-band control (IBC) feature; therefore this event should not be seen in the field.

Effect  None

Recovery  Informational message only; no corrective action needed.



515

TNet domain at group and module group.module on net side net was deleted.

group.module

identifies the group and module of the deleted ServerNet domain.

net

identifies the ServerNet fabric (X or Y) of the deleted domain.

Cause  A Service Processor (SP) detected the loss of ServerNet connectivity to a domain (enclosure). Typically, this message is generated as a result of:

  • CRU (MF or SEB type) removal or failure.

  • Power loss of an individual enclosure.

Effect  There will be a loss of ServerNet fault-tolerance to the reported group (enclosure), possibly resulting in path switches.

Recovery  Depending on the cause of the message, either complete the CRU replacement, fix the ServerNet failure, or wait for the reset to complete.



517

ServerNet Router2 ASIC on group.module.slot has detected a Port Exception Interrupt on port port-num external ServerNet connector conn-num.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU containing the Router2 ASIC that detected the interrupt.

port-num

is the port number on which the interrupt was detected.

conn-num

is the external ServerNet connector number.

Cause  The router detected a ServerNet port error, such as a bad packet or a ServerNet link protocol error. A hardware problem could have occurred on the router itself, on the router or ASIC connected to the router on the port reporting the error, or on the physical link between the two.

Effect  The operating system retries the operation that led to this low-level ServerNet exception. If the retry is unsuccessful, the CRU or components on the CRU in the specified location may no longer be available, or a path to a component may become unavailable, resulting in a resource limitation. If this event is isolated, the problem is transient. A ServerNet related timeout may be reported by an I/O process or the interprocessor communications (IPC) subsystem on the system. Multiple occurrences of this event are more serious and may lead to a path switch or a path down in one or more I/O processes or the IPC subsystem.

Recovery  If this event occurs multiple times with path switches occurring after attempting to use the original path, try the following troubleshooting sequence:

  1. Reseat the ServerNet cable.

  2. Check operation.

  3. Replace the remote Servernet expansion board (SEB), which is the CRU that is corrupting packets on a path to the ServerNet bus interface (SBI) ASIC.

  4. Check operation.

  5. If an excessive percentage of ServerNet packets are damaged, or if the link to the port becomes disabled, replace the CRU containing the SBI ASIC.



518

ServerNet Router2 ASIC on group.module.slot has detected a Misc Exception Interrupt.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU containing the Router2 ASIC that detected the interrupt.

Cause  The ServerNet bus interface (SBI) ASIC detected a ServerNet link protocol error or a misaddressed packet. A hardware problem may have occurred on the SBI itself, on another ServerNet component connected to the SBI and containing the port reporting the error, or on the physical link between the two. Alternatively, a programming defect may cause this error to occur because of a problem such as an incorrect destination address on a packet.

Effect  The operating system retries the operation that led to this low-level ServerNet exception. If the retry is unsuccessful, the CRU or components on the CRU in the specified location may no longer be available, or a path to a component may become unavailable, causing a system resource limitation. If this event is isolated, the problem is transient. A ServerNet related timeout may be reported by an I/O process or the interprocessor communication (IPC) subsystem on the system. Multiple occurrences of this event are more serious and may lead to a path switch or a path down in one or more I/O processes or the IPC subsystem.

Recovery  If this event occurs multiple times with path switches occurring after attempts to use the original path, try the following troubleshooting sequence:

  1. Reseat the ServerNet cable.

  2. Check operation.

  3. Replace the remote ServerNet expansion board (SEB), which is the CRU that is corrupting packets on a path to the SBI ASIC.

  4. Check operation.

  5. If an excessive percentage of ServerNet packets are damaged, or if the link to the port becomes disabled, replace the CRU containing the SBI ASIC.



519

ServerNet Router2 ASIC on group.module.slot has detected a Perf Overflow Interrupt.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU containing the Router2 ASIC that detected the interrupt.

Cause  The ServerNet Router ASIC has generated a Performance Counter Overflow Interrupt. S‑Series systems do not currently use the Performance Counter Overflow feature; therefore this event should not be seen in the field.

Effect  None

Recovery  Informational message only; no corrective action is needed.



520

SP has { serviced | rejected } a command request from { a remote client at IP address ip-address | local NSK process process }.

ip-address

is the IP address of a remote client whose command request was serviced or rejected.

process

identifies a local NonStop™ Kernel process whose command request was serviced or rejected.

Cause  A service processor (SP) client (typically, but not limited to, OSM or TSM) has requested that the SP perform an action (for example, to disable a battery, halt a processor, or enable system freeze).

NOTE: The SP does not generate this event if the client is merely requesting information from the SP.

Effect  If the client cannot be authenticated (if, for example, a NonStop™ Kernel client is not in the SUPER group, or a LAN client does not have a valid SP session), the SP rejects the request. Otherwise, the SP services the client’s request.

Recovery  Informational message only; no corrective action is needed.



522

A miscellaneous SP event has been logged.

Cause  The service processor (SP) has logged an event that cannot be decoded.

Effect  The raw data associated with the unrecognized SP event is logged as a miscellaneous event.

Recovery  Informational message only; no corrective action is needed.



523

ServerNet Interrupt Burst Suppression started for router port port-num of group.module.slot.

port-num

is the port number for which port event burst suppression has been started.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the router port.

Cause  The service processor has detected too many ServerNet interrupts (typically SPR 507 or SPR 517 events) from the specified ServerNet port and has started ServerNet port event burst suppression.

Effect  The ServerNet interrupts for this port are not re-enabled, and will not be serviced by the SP for 5 minutes. No new ServerNet errors events will be generated for this port for 5 minutes. Any state changes driven by these ServerNet Port interrupts will not occur for 5 minutes (for example, no Link-Alive or Keep-Alive port events will be logged for this port during this time, and the ServerNet Link-Alive LED associated with ServerNet plug-in cards will not change state during this time).

NOTE: Every SPR 523 message should be followed by a matching SPR 524 message 5 minutes later.

Recovery  Informational message only; however the root cause of the burst of ServerNet port errors should be analyzed, diagnosed, and corrected.



524

ServerNet Interrupt Burst Suppression stopped for router port port-num of group.module.slot.

port-num

is the port number for which port event burst suppression has been stopped.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the router port.

Cause  The service processor (SP) 5-minute timer for the ServerNet port burst suppression has expired and the SP has stopped ServerNet port event burst suppression for the specified ServerNet port.

Effect  The ServerNet interrupts for this particular port are re-enabled. All new interrupts associated with this port will be serviced by the SP.

Recovery  Informational message only; however, if the underlying problem that triggered the original burst of ServerNet port interrupts has not been corrected, this message will be followed by a series of ServerNet port errors (typically SPR 507 or SPR 517 messages) and a new SPR 523 message.



525

ServerNet SBI ASIC on group.module.slot reported a state change.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU with the ServerNet bus interface (SBI) application-specific integrated circuit (ASIC).

Cause  The SBI ASIC detected a ServerNet port error, such as a bad packet, misaddressed packet, or a ServerNet link protocol error. A hardware problem may have occurred on the SBI itself, on another ServerNet component connected to the SBI and containing the port reporting the error, or on the physical link between the two. Alternatively, a programming defect may cause this error to occur because of a problem such as an incorrect destination address on a packet.

Effect  The operating system retries the operation that led to this low-level ServerNet exception. If the retry is not successful, the CRU or components on the CRU in the specified location may no longer be available, or a path to a component may become unavailable. In this case, a system resource limitation occurs. A ServerNet related timeout may be reported by an I/O process. Multiple occurrences may lead to a path switch or a path down in one or more I/O processes or the IPC subsystem.

Recovery  If this event occurs multiple times with path switches occurring after attempting to use the original path, try the following troubleshooting sequence:

  1. Reseat the ServerNet cable.

  2. Check operation.

  3. Replace the remote CRU, which is corrupting packets on a path to the SBI ASIC.

  4. Check operation.

  5. If an excessive percentage of ServerNet packets are damaged, or if the link to the port becomes disabled, replace the CRU containing the SBI ASIC.



526

ServerNet SBI ASIC on group.module.slot reported a packet exception.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU with the ServerNet bus interface (SBI) application-specific integrated circuit (ASIC).

Cause  The SBI ASIC detected a ServerNet error, such as a bad packet or a misaddressed packet. A programming defect may cause this error to occur because of a problem such as an incorrect destination address on a packet.

Effect  Maintenance Subsystem Firmware automatically handles this error condition.

Recovery  Informational message only; no corrective action needed.



527

ServerNet Router ASIC Port State Change on group.module.slot internal port port-num [at external connector slot.connector].

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU containing the Router ASIC that detected the interrupt.

port-num

is the Router internal port number on which the interrupt was detected.

slot.connector

is the external slot and connector number associated with the port on which the interrupt was detected.

Cause  The ServerNet Router ASIC detected a port error, such as a bad packet or a ServerNet link protocol error. A hardware problem could have occurred on the ServerNet Router ASIC itself, or on the ServerNet ASIC connected to the port reporting the error, or on the physical link between the two.

Effect  The operating system retries the operation that led to this low-level ServerNet exception. If the retry is unsuccessful, the CRU or components on the CRU in the specified location may no longer be available, or a path to a component may become unavailable, resulting in a resource limitation. If this event is isolated, the problem is transient. A ServerNet related timeout may be reported by an I/O process on the system. Multiple occurrences of this event are more serious and may lead to a path switch or a path down in one or more I/O processes or the IPC subsystem.

Recovery  If this event occurs multiple times with path switches occurring after attempting to use the original path, try the following troubleshooting sequence:

  1. Reseat the ServerNet cable.

  2. Check operation.

  3. Replace the remote CRU, which is corrupting packets on a path to the ServerNet Router ASIC.

  4. Check operation.

  5. If an excessive percentage of ServerNet packets are damaged, or if the link to the port becomes disabled, replace the CRU containing the ServerNet Router ASIC



531

ServerNet-Neighbor-Check-Passed

numberofNCEntries

is the number of ncEntries (array elements) present in this event.

reserved

is a field reserved for future use.

ncEntries

is a variable length array that should not be indexed beyond the numberofNCEntries entry.

Cause  The ServerNet Processor Subsystem has detected a change in the ServerNet connection within the neighbor check logic flow and is reporting that the neighbor check of ports and nodes passed.

Effect  Neighbor check was successful and the port(s) display a “LinkEnabled” status.

Recovery  Informational message only; no corrective action is needed.



532

ServerNet-Neighbor-Check-Failed

numberofNCEntries

is the number of ncEntries (array elements) present in this event.

reserved

is a field reserved for future use.

ncEntries

is a variable length array that should not be indexed beyond the numberofNCEntries entry.

Cause  The ServerNet Processor Subsystem has detected a change in the ServerNet connection within the neighbor check logic flow and is reporting that the neighbor check of ports or nodes failed.

Effect  Neighbor check was unsuccessful and the port(s) display a “LinkDisabled” status.

Recovery  Check your configuration to make sure the correct node is connected to the correct port(s).



601

PowerOn self test (POST) of CRU group.module.slot has succeeded.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU that had a successful POST.

Cause  A CRU was physically inserted in a slot monitored by a service processor, and the POST for the CRU succeeded.

Effect  A successful insertion was performed and the POST was successful.

Recovery  Informational message only; no corrective action is needed.



602

PowerOn self test (POST) of CRU group.module.slot has failed.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU that had a failed POST.

Cause  A PMF CRU was physically inserted in a slot monitored by a service processor, and the POST for the CRU failed.

Effect  For a PMF CRU, a processor halt code of %100237 or %100236, MICROHALT_UPSBAD_RESET_POST_COMPLETE, occurs. For a failed POST on an IOMF CRU, a halt code occurs.

Recovery  For a PMF CRU-related halt code (%100237) reload the processor. See the HP NonStop S-Series Hardware Support Guide, HP NonStop S-Series Operations Guide, or to the OSM or TSM online help for instructions on reloading a processor. For a PMF CRU-related halt code (%100237), which is a correctable memory error (CME), for more information, see HP NonStop S-Series Hardware Support Guide on dealing with CMEs. For an IOMF CRU-related halt code, replace the IOMF CRU.



611

Power-On self test of CRU group.module.slot has succeeded.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU/FRU that had a successful POST.

Cause  This event is generated for a CRU/FRU monitored by the Maintenance Subsystem, and the POST for the CRU/FRU succeeded.

Effect  A successful CRU/FRU initialization was performed and the POST was successful.

Recovery  Informational message only; no corrective action needed.



612

Power-On self test of CRU group.module.slot has failed.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU/FRU that had a failed POST.

Cause  This event is generated for a CRU/FRU monitored by the Maintenance Subsystem, and the POST for the CRU/FRU failed.

Effect  A successful CRU/FRU initialization was performed and the POST has failed.

Recovery  Replace the CRU/FRU for which the POST failure was logged.



700

Service processor group.module.slot has changed state to new‑state.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the service processor that has changed its state.

new-state

is the new state of the service processor.

Cause  This event is generated for any SP state change.

Effect  The SP changes its state.

Recovery  Informational message only; no corrective action is needed.



701

Service processor group.module.slot is disabled; it is not responding to peer.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the disabled service processor.

Cause  This event (SpEvAdmSpDisabled) is generated in any of the following cases:

  1. The SP detects that its peer is down, because of a failure of peer to communicate that it is online.

  2. The SP detects that the peer is requesting an image load because of the peer has a corrupt FLASH image. The flag field in this event will be one of the following: SP_EV_BAD_IMAGE; SP_EV_IMAGE_OK, SP_EV_IMAGE_DUMP, SP_EV_IP (IP stands for in progress).

  3. The SP has begun an image load to the peer that has a corrupt FLASH image. The flag field in this event will be SP_EV_BAD_IMAGE.

  4. The SP has completed an image load to the peer having a corrupt FLASH Image. The flag field in this event will be SP_EV_IMAGE_DUMP_DONE.

Effect  For more information, see OSM or TSM Event Viewer online help.

Recovery  For Cause 1, no action is required; this is an informational message. For Causes (2), (3), and (4), update SP firmware.



702

Service processor group.module.slot is enabled; it is now responding to peer.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the newly-enabled service processor.

Cause  The peer SP is now back online and communicating with the other SP.

Effect   The service processor is now operating correctly.

Recovery  Informational message only; no corrective action is needed.



704

Service processor group.module.slot RESET complete.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the successfully reset service processor.

Cause  A successful reset of the SP is now complete.

Effect   The SP boot reset has completed.

Recovery  Informational message only; no corrective action is needed.



705

Service processor group.module.slot dumped stack trace and registers.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the service processor.

Cause  The SP detected a software (sw) exception.

Effect   None

Recovery  See the stack trace dump for the cause of this sw exception



706

An SP-Info event-num event has been generated for CRU group.module.slot.

event-num

is the number of an SP Info event.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU for which this event was generated.

Cause  A generic SP-Info event has been generated which the running version of T5808 ($ZSPE) does not know how to decode. This most likely occurs when running SP Firmware which is newer.

Effect  None

Recovery  Informational message only; no corrective action needed.



707

CRU group.module.slot { was | is } running type version vproc

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU whose Service Processor reported this event.

type

is currently SPAPPL, the Application portion of the SP Firmware.

vproc

is the vproc of the SP Firmware.

Cause  This event is generated during one of the 2 following conditions: (1) After a 704 Reset Complete event to indicate the version of SP Firmware currently running, or (2) Following a 705 SPAR dump event to indicate the version of SP Firmware running at the time of the SP failure.

Effect  None

Recovery  Informational message only; no corrective action is needed.



708

type Firmware update complete for CRU group.module.slot

type

is SP Application, SP Boots, or boot millicode.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU which was the target of this firmware update.

Cause  The requested firmware update operation has completed.

Effect  None

Recovery  Informational message only; no corrective action is needed.



709

The SBI ASIC on CRU group.module.slot has been automatically reset.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU whose ServerNet bus interface (SBI) ASIC was reset.

Cause  In response to a loss of ServerNet connectivity to its own SBI ASIC, the SP has RESET the SBI ASIC.

Effect  All outstanding ServerNet communications over this SBI ASIC are terminated, resulting in 506 events. The SP automatically reestablishes required SBI ASIC ServerNet communications.

Recovery  If the problem occurs in isolation, it may indicate transient backpressure. If the problem persists, it could be an indication of a defective PMF or IOMF CRU.



710

Service Processor group.module.slot failed to change to the UPDATING state.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the service processor that failed to update.

Cause  The requested firmware update is not performed, typically because the requested firmware revision level was too old to support the target CRU.

Effect  None. No firmware is updated.

Recovery  Select a firmware version that support the CRU.



711

Service Processor group.module.slot can not access its network configuration.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the service processor that failed to retrieve its network configuration.

Cause  The primary SP is unable to access the PMF or IOMF CRU network configuration information stored in the enclosure backplane FIR. This information is either missing or inaccessible.

Effect  None, if the redundant network configuration information can be successfully retrieved from the peer CRU.

However, if this information cannot be retrieved, and this is an MSP (enclosure 01), then the SP will fail every 10 minutes (logging a 705 event) until this problem is corrected.

Recovery  If this event occurs for an MSP (enclosure 01), then use the OSM or TSM Low-Level-Link Application to reconfigure the network settings.

For any enclosure other than enclosure 01, this is an Informational message only; no corrective action needed.



712

Maintenance Subsystem Restart for group.module.slot.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU containing the Maintenance Subsystem Processor.

Cause  The Maintenance Subsystem in the specified slot detected a software or hardware inconsistency that prevent further execution and restarted.

Effect  None

Recovery  See the Maintenance Subsystem Processor memory dump for the cause of this exception.



713

Maintenance Subsystem Periodic Statistics for group.module.slot.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU containing the Maintenance Subsystem Processor.

Cause  The Maintenance Subsystem generates this periodic event once a day, which contains Maintenance Subsystem statistics.

Effect  None

Recovery  Informational message only; no corrective action needed.



718

Maintenance Subsystem has detected an EAT Insertion event in CRU group.module.slot. at { internal port router-port | external connector connector-number. fiber-number.}

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU containing the Maintenance Subsystem Processor.

router-port

is the Router internal port number.

connector-number. fiber-number

Is the physical connector and fiber number.

Cause  The Maintenance Subsystem has detected a viable ServerNet link connection to an external adapter, such as a CLIM, and that connection has completed its neighbor check protocol.

Effect  This event indicates that the ServerNet connection between Maintenance Subsystem and an external adapter, such as a CLIM, has been established.

Recovery  Informational message only; no corrective action needed.



719

Maintenance Subsystem has detected an EAT Deletion event in CRU group.module.slot. at { internal port router-port | external connector connector-number. fiber-number.}

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) of the CRU containing the Maintenance Subsystem Processor.

router-port

is the Router internal port number.

connector-number. fiber-number

Is the physical connector and fiber number.

Cause  The Maintenance Subsystem has detected the ServerNet link connection to an external adapter, such as a CLIM, has lost Link Alive.

Effect  This event indicates that the ServerNet connection between Maintenance Subsystem and an external adapter, such as a CLIM, no longer exists.

Recovery  Informational message only; no corrective action needed.



722

An ME-Info 1114115 event has been generated for CRU CRU group-module-slot number.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) where the CRU is located.

Cause  An SPIOLib client (probably OSM) has created a new session with the ME in group.module.slot. The event token details contain a client identifier, either a NonStop PID or an IP Address.

Effect  None.

Recovery  Informational message only; no corrective action is needed.



743

Mini IO Bulk Power Supply PWRCRU.group-module-slot number has changed its state.

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) where the CRU is located.

Cause  Loss of power on Power Supply, PWRCRU.group.module.slot.

Effect  Loss of power redundancy within the VIO enclosure.

Recovery  Check the power cord for bends and kinks. If necessary, replace the power supply.



745

ServerNet-Connector-Deleted [at external connector slot.connector].

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) where the CRU was extracted.

Cause  This event (SpEvSNetConnectorDeletion) is generated when a CRU's removable ServerNet connector is extracted from the CRU.

Effect  A hardware interrupt is detected and this event is generated once the Service Processor subsystem finishes the logical deletion process.

Recovery  Informational message only; no corrective action needed.



746

ServerNet-Connector-Inserted on group.module.slot [at external connector slot.connector].

group.module.slot

is the three-part slot location identifier (in the format GRP-nn.MOD-nn.SLOT-nn) where the CRU was extracted.

Cause  This event (SpEvSNetConnectorInsertion) is generated when a CRU's removable ServerNet connector is inserted into the CRU.

Effect  A hardware interrupt is detected and this event is generated once the Service Processor subsystem finishes the logical insertion process.

Recovery  Informational message only; no corrective action needed.