Jobs fail with 'Too many open files' error in UNIX NetBackup environment configured to use service user account.

Article: 100052601
Last Published: 2023-05-01
Ratings: 4 1
Product(s): NetBackup & Alta Data Protection

Problem

Jobs fail with 'Too many open files' (errno 24 = EMFILE) in Unix NetBackup domain when primary server or media servers are configured to use nonroot service account. It may also cause connectivity issues between various NetBackup hosts.

Error Message

Lines similar to the following will be seen in the nbpxyhelper log which is located at the following location on Primary server or Media server.

/usr/openv/logs/nbpxyhelper/*.log

02/03/2022 14:40:32.333 [Application] NB 51216 nbpxyhelper 486 PID:3383 TID:140617991476992 File ID:486 [{880139A2-8531-11EC-AF42-292BC378EEA5}:INBOUND] [Error] V-486-90 ERR - Unable to read json file [/usr/openv/var/vxss/certmapinfo.json]: error details: text:[unable to open /usr/openv/var/vxss/certmapinfo.json: Too many open files] line:[-1] source:[/usr/openv/var/vxss/certmapinfo.json] column:[-1] position:[0]
02/03/2022 14:40:32.333 [Application] NB 51216 nbpxyhelper 486 PID:3383 TID:140617991476992 File ID:486 [{880139A2-8531-11EC-AF42-292BC378EEA5}:INBOUND] [Error] V-486-90 ERR - Error while reading mapping file: json invalid
02/03/2022 14:40:32.333 [Debug] NB 51216 nbpxyhelper 486 PID:3383 TID:140617991476992 File ID:486 [{880139A2-8531-11EC-AF42-292BC378EEA5}:INBOUND] 1 [JsonRequest::populateCertInfo] Error: Failed to read certificate information from certificate mapping file. (../machines/LibNbPxyProtocol.cpp:477)

Sometimes below error is seen in system dmesg logs,

[Tue May 17 09:16:16 2022] VFS: file-max limit 65536 reached

Cause

There could be two problems related to open file limits.

  1. The vnetd -proxy inbound and outbound processes are not able to increase their open file limit to 8192 as expected.

    On server reboot NetBackup services gets started in systemd session, thus NetBackup services gets the default open file limit configuration from systemd session.

    Observe which user account owns the vnetd -proxy processes.  It may be root, or the NetBackup SERVICE_USER beginning with version 9.1.

    $ bpps vnetd
    root      1882144  1  0  18:37  ?  00:00:00 /usr/openv/netbackup/bin/vnetd -standalone
    nbsvcusr  1882148  1  1  18:37  ?  00:00:00 /usr/openv/netbackup/bin/vnetd -proxy inbound_proxy -number 0
    nbsvcusr  1882151  1  1  18:37  ?  00:00:00 /usr/openv/netbackup/bin/vnetd -proxy outbound_proxy -number 0


    The normal ulimit (nofile) for the root root and SERVICE_USER accounts can be observed using these commands,

    root@server$ ulimit -n
    8192
    root@server$ su nbsvcusr --shell /bin/bash --command "ulimit -n"
    8192

    But notice that the vnetd proxy processes (both inbound and outbound) have open file limit that are set to less than the ulimit (nofile) for root and service user account.  In this example it is 4096.

    root@server$ prlimit --pid=`pgrep -f "vnetd -proxy inbound_proxy -number 0"` | grep open
              NOFILE   max number of open files   4096   4096 files

  2. System wide max-file limit was set to low value

    On RHEL platform, it is observed that max-file limit is strictly obeyed for processes running as non-root users and there is no such limitation for processes running with root user. So changing the NetBackup SERVICE_USER to non-root may cause "Too many open files" error.

    For example, max-file is set to 65536 on below RHEL server. This limit might get hit if server is heavily loaded

    root@server$ sysctl fs.file-nr 
    fs.file-nr = 21552 0 65536

Solution

Problem 1:

Note: Steps 1 & 2 are only appropriate for NetBackup versions less than 10.1.1.  NetBackup 10.1.1 vnetd proxy processes will detect the current (lower than expected) ulimit setting at startup and decrease the fd-in-use-threshold to match so that a 2nd/3rd/4th copy of the process can be started instead of reaching EMFILE.  Step 3 is applicable to and should be implemented for all NetBackup versions 8.1 and above to avoid encountering EMFILE for other/all NetBackup processes.

  1. (Linux) Temporarily increase the open file limit for already running vnetd proxy processes to 8192.  This change will not persist through a process restart such as a host reboot.

    root@server$ prlimit --pid=`pgrep -f "vnetd -proxy inbound_proxy -number 0"` --nofile=8192:8192

    root@server$ prlimit --pid=`pgrep -f "vnetd -proxy outbound_proxy -number 0"` --nofile=8192:8192
     
  2. (Linux) Verify the open file limit is increased for the processes.

    root@server$ prlimit --pid=`pgrep -f "vnetd -proxy inbound_proxy -number 0"` | grep open
            NOFILE     max number of open files                8192      8192 files

    root@server$ prlimit --pid=`pgrep -f "vnetd -proxy outbound_proxy -number 0"` | grep open
            NOFILE     max number of open files                8192      8192 files
     
  3. (Linux/UNIX) Permanently solve the problem by appropriately configuring the operating system and any clusterware used to start NetBackup.  The open file (nofile) ulimit should be 8192 or higher for any O/S or cluster utilities that start NetBackup processes, including command line shells, systemctl, etc.

    For details, see the Related Article: Minimum O/S ulimit settings on primary and media server Linux/UNIX platforms. 

Problem 2:

  1. Based upon the application load on the server, determine the concurrent open file count needed..

    root@server$ sysctl fs.file-nr 
     
  2. Increase the max-file limit value by an arbitrary, but hopefully appropriate, amount.

    For example, the value is being quadrupled to 262144 (4 * 65536) below.

    - Edit  /etc/sysctl.conf and change  fs.file-max = 262144
    - Run 'sysctl -p' to apply the change.
    - Verify the change via 'sysctl fs.file-nr'.

Was this content helpful?