Chapter 59 NFS (Network File System) for Open System Services (OSS) Messages

The event messages in this chapter are generated by the Network File System (NFS) for the Open System Services (OSS) subsystem. This chapter completely replaces the former chapter of the same name, which described NFS error messages rather that NFS event messages. Descriptions of NFS error messages can be found instead in the NFS Management and Operations Manual for Open System Services.

The subsystem ID displayed by the event messages described herein includes NFS as the subsystem name.




	NOTE: Negative-numbered messages are common to most subsystems. If you receive a negative-numbered message that is not described in this chapter, see Chapter 15.

Proc: Backup open without primary open, process Issuing_process_name

`Proc`	is the name of the manager process.
`Issuing_process`	is the name of the process issuing the open.

Cause An OPEN message was received from the backup manager process, but no corresponding open primary was found. There are two possible reasons:

A CHECKOPEN was attempted after the OPEN failed.
The primary and backup manager processes are out of synchronization.

Effect The OPEN is rejected with Guardian File-System error 17. For more details, see the chapter on file-system errors in the Guardian Procedure Errors and Messages Manual.

Recovery Informational only; no action is needed. If the problem recurs, stop the backup manager process. It will automatically restart.

File: Invalid type_of_file file

`File`	is the name of the OSS NFS file.
`type_of_file`	specifies the file type, which can be either of the following: NFS subsystem user alias file (ZNFS User Alias) NFS subsystem configuration file (ZNFS Config)

Cause An invalid file was encountered in the NFS file system, or an invalid management file was found in a ZZNFSnnn or ZNFSUSR file. Either the file contains invalid entries, or it has the wrong file code for a management or NFS file.

Effect The requested operation is not completed, and one of the following occurs:

If an invalid configuration file is encountered when starting a process, the process is not started.
A process that encounters invalid entries will terminate.
Encountering other conditions causes an error to be returned to the process that attempted the operation.

Recovery Perform one of the following actions:

If the specified NFS configuration subvolume was incorrect, specify the correct one and restart the system.
If an invalid management file was named ZZNFSnnnor ZNFSUSR, rename it to a nonconflicting name. When an NFS management file is missing, NFS automatically creates one; however, all the information from the original file must be inserted into it.
If a valid management file contains invalid entries, report this situation to HP in a Genesis solution that includes the complete message and the EMS log.
If a valid file contains invalid entries, either repair it or rename it out of the NFS file system and replace it with a valid one.

Proc: Open from: \nodename, cpu, pin, name: Issuing_process, paid: paid rejected with error code: error_code

`Proc`	is the name of LAN or server process.
`nodename`	is the node name of the process issuing the open. Examples of `nodenames` are: \IDEV, \IGATE, and \IDC12.
`cpu`	is the CPU number of the process issuing the open.
`pin`	is the process identification number (PIN) of the process issuing the open. For more information, see the Guardian Procedure Calls Reference Manual.
`Issuing_process`	is the name of the process issuing the open.
`paid`	is the process Access ID (PAID) of the process issuing the open. For more information, see the chapter on Guardian System security in the Security Management Guide.
`error_code`	is the optional Guardian error code. For more details, see the chapter on file-system errors in the Guardian Procedure Errors and Messages Manual.

Cause An attempt by the indicated process to open a NFS process was rejected, which indicates either an incorrect open request or a possible security violation. An attempt to open the LAN process by any process other than the manager process is a security violation.

Effect The open is rejected with an error, and a bad-open event is logged.

Recovery Verify that the open is correctly specified and that the name of the issuing process is valid. Correct these items as necessary.

Proc: This product requires GUARDIAN 90 XF release C00 or later.

Proc

is the name of the LAN process.

Cause The indicated NFS process was initiated on a version of Guardian earlier than C00.

Effect NFS process terminates.

Recovery Restart on a suitable processor.

Proc: Backup process in CPU Backup_CPU died because cause

`Proc`	is the name of the manager process.
`Backup_CPU`	is the CPU where backup was running.
`cause`	is the reason the backup terminated.

Cause The backup manager process terminated, causing a loss of fault tolerance. The backup process has halted or abended, or the backup processor has gone down.

Effect The manager process no longer has backup, and fault tolerance is lost.

Recovery If the backup processor has gone down, either wait until the original backup processor is restored or, if appropriate, change the assigned backup processor. The primary manager process will automatically attempt to restart the backup on the designated processor. This event should not occur during normal operation and should be reported to HP in a Genesis case that includes the complete message and the EMS log.

Proc: Unable to start backup because Termination_cause procedure error_code error Procedure_number

`Proc`	is the name of the manager process.
`Termination_cause`	is the reason why backup terminated.
`error_code`	is the NEWPROCESS error code. See the Guardian Procedure Errors and Messages Manual for more information.
`Procedure_number`	is the procedure number.

Cause The primary manager process was unable to create its backup process, and the backup process terminated with the indicated error. The procedure number and error can give a more detailed explanation. Procedure numbers are defined in the ZGRDDDL file for Guardian procedures and in the ZFILDDL file for file-system procedures. Typical procedure numbers for this event include:

Guardian procedure 3	NEWPROCESS
FILESYS procedure 4	CHECKOPEN
FILESYS procedure 5	CHECKPOINT
FILESYS procedure 27	CHECKPOINTX

Effect The attempt to start the backup is abandoned and fault tolerance is lost.

Recovery If the backup CPU is down, a CPU RELOAD is needed to bring up the backup process. NFS will attempt to create the backup when it receives the CPU RELOADED system message. If it is unsuccessful, there may be insufficient resources on the backup processor. Appropriate recovery depends on the error cause, which is revealed in the error code.

First_proc: Unable to communicate with Second_process

`First_proc`	is the name of the manager, LAN, or server process that reports the communication failure.
`Second_process`	is the name of the manager, LAN, or server process with which the first process is unable to communicate.

Cause A NFS process (manager process, LAN interface process, or server process) was unable to communicate with another NFS process. The reason is usually any of:

NFS subsystem processes are running on different processors, and a processor failure occurred.
NFS components are running on different systems, and the network connection is lost.
An NFS process terminated abnormally, and the manager process has not yet started a replacement process.

Effect The NFS subsystem is inaccessible.

Recovery Perform one of these actions:

If the connection was lost because of a processor failure, reload the failed processor. When the manager process detects that the processor has been reloaded, it automatically restarts any NFS processes that were running there.
If the connection was lost because of a network connection failure, restore the network connection. Under most conditions, the process that generated the event will periodically try to re-establish communication.

Proc: EMS recording has been stopped.

Proc

is the name of the manager process.

Cause An interactive or programmatic command to stop event collection. This event occurs after all NFS subsystem components - manager process, LAN interface process, and server process or processes - have received the stop message.

Effect No NFS-subsystem components will generate further EMS events.

Recovery Informational message only; no corrective action is needed.

Proc: EMS recording switched from previous_collector to new_collector

`Proc`	is the name of the manager process.
`previous_collector`	is the EMS collector that formerly received events.
`new_collector`	is the EMS collector that now receives events.

Cause An interactive or programmatic ALTER PROCESS command that named the new collector and moved the EMS collection point.

Effect The event collection functions switch from the indicated old collector to the new collector. This message is both the last message sent to the old collector, if it is accessible, and the first message sent to the new one.

Recovery Informational message only; no corrective action is needed.

Proc: Error File_System_error_num encountered on procedure proc_name

`Proc`	is the name of a file or of the manager, LAN, or Server process.
`File_System_error_num`	is the Guardian file-system error number, as specified in the Guardian Procedure Errors and Messages Manual.
`proc_name`	is the name of the procedure that encountered the file system error. Possibilities are listed in the Guardian Procedure Call Reference Manual.

Cause The error was returned for an I/O operation on the indicated file/process. Probably the device on which the file or process exists is inaccessible (device down or network connection lost).

Effect If the file or process is critical to the NFS subsystem, the process abends and the subsystem becomes inaccessible.

Recovery Restore the network connection or bring up the device. An error on the $RECEIVE file should not occur during normal operation and should be reported to HP in a Genesis case that includes the complete message and the EMS log.

Proc: Memory full, text

`Proc`	is the name of the LAN or server process that detected the error.
`text`	is the message text that indicates the reason for the memory shortage

Cause Insufficient memory is available to satisfy internal needs for any of the following reasons:

Insufficient configured memory.
Insufficient disk space.The swap volume may not have enough space to accommodate the needs of the process.
Process overload. Too many demands were made of the process, causing it to use up internal resources.

Effect If this situation occurs when a process is starting up, the process will terminate. Otherwise, while the problem exists, requests that cannot be handled are rejected with the appropriate error.

Recovery Perform one of these actions:

If insufficient memory is available, stop the NFS subsystem, adjust the DATAPAGES start-up argument to a larger value, and restart NFS.
If insufficient disk space is available, move the data swap volume to a disk with more space available.
If an overload exists, examine the load configuration for problems in calling applications, such as looping on certain requests or making unnecessary demands.

Proc: Internal error err_code - file: file_name, Timestamp: timestamp, Procedure: entry_point

`Proc`	is the name of manager, LAN or server process.
`err_code`	is the code identifying the internal program error.
`file_name`	is the name of the object file.
`timestamp`	is the object file’s bind timestamp.
`entry_point`	is the entry-point label in the procedure where the error occurred.

Cause The indicated NFS process detected an internal error that must be corrected by HP.

Effect One or more NFS components will abend, and the operation in progress cannot be completed.

Recovery This event should not occur during normal operation, and it must be corrected by HP. Report this situation to HP in a Genesis solution that includes the complete message and the EMS log.

Proc: RPC procedure proc_num failed; errno = errno_code

Proc

is the name of LAN or Server process.

proc_num

is the RPC procedure number, which is documented in the program’s protocol specification and identifies the calling (client) procedure.

errno_code

is an error code that can be interpreted as follows:

errno < -2000	RPC library error (ZRPC.RPCPARMH)
-2000 < errno < 0	C library error (SYSTEM.ERRNOH)
1 < errno < 300	File-system error number (ZSPIDEF.ZFILDDL)
300 < errno	TCP/IP socket library error (ZTCPIP.PARAMH)

Cause An RPC library procedure failed and returned the indicated value.

Effect A client program’s request cannot be serviced, and the outcome depends on how the client responds.

Recovery The outcome depends on how the client program deals with this error.

Proc: Message with old sync ID, process: Manager_proc, file number file_num

`Proc`	is the name of the LAN or server process.
`Manager_proc`	is the name of the manager process.
`file_num`	is the number identifying the file that the request tried to access. This number was originally returned when the file was opened.

Cause The SYNC-LEVEL parameter, supplied on the OPEN call, tells the NFS subsystem component how many completed requests must be retained for protection in the event of a takeover. This event indicates that the manager process reissued a request older than the set of saved replies, and it might signal that the process is performing checkpoints improperly.

Effect The file system has rejected a client’s NFS request and returned a Guardian file-system error. The outcome depends on how the client is programmed to deal with this rejection.

Recovery This event should not occur during normal operation and should be reported to HP in a Genesis case that includes the complete message and the EMS log.

Proc: Trap # trap_num - File: file_name. Timestamp: time_stamp, Procedure: entry_point

`Proc`	is the name of the manager, LAN, or server process.
`trap_num`	is the trap number.
`file_name`	is the object filename of the running process: manager, LAN, or server process.
`time_stamp`	is the object file’s bind timestamp.
`entry_point`	is the entry-point label in the procedure where the error occurred.

Cause A hardware or software trap was detected, which should not occur during normal operation.

Effect The process abends.

Recovery This event should not occur during normal operation and should be reported in a Genesis case that includes the complete message and the EMS log.

Proc: Unrecognized systems message on $RECEIVE. code: code

`Proc`	is the name of the LAN or server process.
`code`	is the first word of the message, which is the message code.

Cause An unexpected system message was detected on $RECEIVE, which should not occur during normal operation.

Effect The system message is ignored, and a dummy reply is generated.

Recovery This event should not occur during normal operation and should be reported to HP in a Genesis case that includes the complete message and the EMS log.

Proc: Message from unknownsource, \ nodename, cpu, pin, name: proc_name, paid: paid

`Proc`	is the name of the LAN or server process issuing the message.
`nodename`	is the node name of the process issuing the message.
`cpu`	is the CPU number of the process issuing the message.
`pin`	is the process identification number (PIN) of the process issuing the message. For more information see the Guardian Procedure Calls Reference Manual.
`paid`	is the process access ID (PAID) of the process issuing the open. For more information, see the chapter on Guardian system security in the Security Management Guide.
`proc_name`	is the procedure name.

Cause An inbound message was rejected because it originated from a process that does not have a current open of the indicated NFS process. This situation can occur after the NFS subsystem starts when a previously running subsystem had the same name.

Effect The request is rejected with file system error 60. For more information, see the Guardian Procedure Errors and Messages Manual.

Recovery Before restarting the NFS subsystem, stop all previously running subsystem components.

Proc: Socket Library Routine: sock_lib_routine, Socket name: sock_name, Socket type: sock_type, Wait type: wait_type, Remote IP addr: IP_addr

Proc

is the name of the LAN process.

sock_lib_routine

indicates the failed socket library routine. For more information, see the TCP/IP and IPX/SPX Programming Manual.

sock_name

indicates the socket number

sock_type

is the socket type. Possible values:

1 -	A transmission control protocol (TCP) stream socket
2 -	A user datagram socket.

wait_type

is a numeric value indicating whether the failed operation was waited or nowaited.

1 -	The failed operation was waited
2 -	The failed operation was nowaited

IP_addr

is the IP address of the remote host where the NFS client is running.

Cause A socket call or I/O completion routine failed because of a failed socket library routine.

Effect The LAN process is unable to communicate with a client process, so clients cannot access NFS.

Recovery Ensure that the TCP/IP process is running and configured properly. If it is not, start it and retry the socket call.

Proc: Connection reestablished with Process

`Proc`	is the name of the LAN or server process.
`Process`	is the process name.

Cause The connection to a NFS client process is established.

Effect The NFS subsystem is up, and the client can start communicating.

Recovery Informational message only; no corrective action is needed.

Proc: Initialization failure due to text

Proc

is the name of the LAN or server process.

text

The message text can be any of the following:

invalid manager devtype -	Invalid manager process name
invalid TCP/IP devtype -	Invalid TCP/IP process name
process must be named -	Process is unnamed
process must be owned by super -	Process was not owned by super-user ID

Cause If logged by the LAN process, when the LAN process was started either:

An invalid TCP/IP process name was given as an argument.
The LAN process was unnamed.
If logged by the SERVER process, when the SERVER process was started either:
The server process was not started with super-user id
The server process was unable to start the timer because the system call SIGNALTIMEOUT failed.

Effect The NFS process fails to initialize and abends.

Recovery Based on the cause described in text, correct the problem and restart the process.

Proc: Too many takeovers -- backup not restarted

Proc

is the name of the manager process.

Cause Within the last 30 minutes (since the process was initiated or since the operator caused its primary instance to fail) multiple takeover attempts have been made on the process. Each takeover was due to abnormal termination of the primary manager process. When the takeover count exceeds five (MAX_TAKEOVERS), this event is generated, and the primary NFS manager assumes that recovery is not possible.

Effect NFS stops.

Recovery Assign a different backup CPU. This event is preceded by some other event, which specifies the reason for the primary abend. Examine the previous events and take appropriate action. If the problem persists, there might be an internal programming error in the subsystem. This event should not occur during normal operating environment and should be reported in a Genesis case that includes the complete message, the EMS log, and the preceding event that caused the primary ABEND.

Proc: Too many backup failures -- backup not restarted.

Proc

is the name of the manager process

Cause The primary server process has detected that its backup has stopped or abended multiple times. When that number exceeds 20 (MAX_BACKUP_FAILS), this event is generated. The cause can be hardware failure or an internal programming error.

Effect The backup is not restarted, and the primary process continues to run, but without backup.

Recovery Perform one of these actions:

If the processor in which the backup was running went down, either change the backup processor assignment to restore the process to full backup status, or wait until the processor is restored. The primary server process will automatically attempt to restart the backup on designated processor.
Otherwise, this event should not occur during normal operating environment and should be reported in a Genesis case that includes the complete message and the EMS log.

Proc: Backup process started in CPU cpu

`Proc`	is the name of the manager process.
`cpu`	is the CPU where backup process was started.

Cause NFS manager process has started its backup process.

Effect The backup process is running.

Recovery Informational message only; no corrective action is needed.

Proc: Netgroup netgroup is nested too deep. Maximum of 10 levels allowed.

`Proc`	is the name of the manager process.
`netgroup`	is the netgroup name.

Cause Netgroups can contain other netgroup objects as members, and the depth of these recursive netgroup definitions cannot exceed 10. This event is generated when this maximum recursive depth is reached.

Effect Further addition of netgroups to that netgroup will fail.

Recovery Reduce the number of netgroups in this netgroup.

Proc: Unable to recover corrupted database file

Proc

is the name of the manager process.

Cause When the NFS manager is restarted, it checks the integrity of the NFS configuration database by searching the file for a recovery record. If one is found and if the NFS manager does not have write access to the database, this event is generated to indicate a corrupt database. On the other hand, if a recovery record is found and if the NFS manager has write access to the database, it performs the operation indicated by the recovery record, and NFS operation continues.

Effect The NFS manager abends.

Recovery ZNFSUSR and ZNFSUSR1 files where user information is stored are Safeguard protected. Therefore, proceed as follows:

Permissions for these files must be set so that the NFS manager can access them.
The NFS subsystem must be started with the proper user ID.
As the last resort, recreate these two files.

Proc: Security violation on user userid, host hostname, NFS operation NFS_op, NFS Filename NFS_filename, Remote IP addr IP_addr

`Proc`	is the name of the manager process.
`userid`	is the NFS user ID of the user that attempted to violate security.
`hostname`	is the name of the host where the NFS client that caused the violation is running.
`NFS_op`	is the NFS operation that failed. For more information, see the subsection on the NFS protocol specification in the chapter on requests for comments reference specification in the Overview of NFS for Open System Services manual.
`NFS_filename`	is the optional name of the file to which access was attempted.
`IP_addr`	is the IP address of the remote host that runs the NFS client that attempted to violate security.

Cause The identified user has tried to access the named file without having access permission for it.

Effect Access to this file by this user is denied.

Recovery None required, but users needing access should contact the NFS administrator to obtain access permission.

Proc: QIO Monitor process QIO_proc is not running; NFS will run slower

`Proc`	is the name of the server process.
`QIO_proc`	is the name of the QIO monitor process.

Cause QIO monitor process (QIOMON)is not running.

Effect NFS runs slower. The master NFS process for a single OSS file system is called the headpin. It manages a pool of slave processes, called workerbees. If the QIO monitor process (QIOMON) is not running in the CPU assigned to the OSS NFS server, all messages between the headpin process and workerbee processes are sent over $RECEIVE.

Recovery Restart the QIO monitor in the CPU that runs the OSS NFS server.

Server: Internal error: Caller caller_fn, Callee callee_fn, Return status ret_status

`Server`	is the name of the NFS server that initiated the event.
`caller_fn`	is the caller function name.
`callee_fn`	is the name of the function called.
`ret_status`	is the status returned from the function called.

Cause If any call to the OSS file-system API fails, this event is generated. A hardware error, perhaps a disk error, occurred when the operation was in progress.

Effect The NFS operation fails.

Recovery Appropriate recovery depends on the return status of the event message. This event should not occur during normal operation and should be reported to HP in a Genesis case that includes the complete message and the EMS log.

Server: Internal error: EFS is not configured or started.

Server

is the name of the edit file server that is not started.

Cause The user tried to start the edit file server when it was not properly configured. The edit file server is used to access and operate on edit files.

Effect The named edit file server does not start.

Recovery Adjust the configuration of the edit file server object. The edit file server object is NFSEFS and is stored on the installation subvolume, typically $SYSTEM.ZOSSNFS. This object must be licensed and have the PROGID of SUPER.SUPER. Only one EFS server can be run for each volume to be converted from EDIT to UNIX or DOS format. For good performance, the EFS process priority must be high.

Server: Unable to start worker bee because text

`Server`	is the name of the NFS server process that tries to start the workerbee.
`text`	is the brief description of the reason why the server is unable to start the workerbee.

Cause The master NFS process for a single OSS file system is called the headpin. It manages a pool of slave processes called workerbees, which allow multithreading of requests to NFS. This failure can occur for either of the following reasons.

Excessive load on the headpin process
Lack of memory

Effect The request made by remote client will not be serviced. Headpin does not stay alive if it cannot create at least one workerbee, and NFS operation fails.

Recovery Depends on the cause of workerbee failure. Try the following:

Reduce the load on the server process indicated by Server.
Make more memory available.