Problem
On a Flex Appliance, when attempting to start a 4th (or 5th) NetBackup Master Server instance, the NetBackup Web Management service (nbwmc) will not start successfully. Thus causing the Master Server instance not to function properly.
The issue impacts the Master Server instances which are the last to start. If Master Server instances are shut down, and started in a different order, again the issue impacts the last instances to be started.
Error Message
Messages around this error can be found in two logs:
/usr/openv/wmc/webserver/logs/catalina.<date>.log24-May-2021 19:04:11.626 SEVERE [Catalina-utility-1] org.apache.catalina.core.ContainerBase.startInternal A child container failed during start
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: unable to create new native thread
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:916)
at org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:843)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1384)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1374)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
at java.util.concurrent.ThreadPoolExecutor.ensurePrestart(ThreadPoolExecutor.java:1603)
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:334)
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
at com.netbackup.common.threadpool.NbuScheduledThreadPool.schedule(NbuScheduledThreadPool.java:70)
/usr/openv/wmc/webserver/logs/catalina.out[Dynamic-linking native method java.net.PlainSocketImpl.socketShutdown ... JNI]
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
at java.util.concurrent.ThreadPoolExecutor.ensurePrestart(ThreadPoolExecutor.java:1603)
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:334)
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:549)
at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:648)
at org.apache.tomcat.util.threads.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:74)
at org.apache.catalina.core.ContainerBase.stopInternal(ContainerBase.java:976)
at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:257)
at org.apache.catalina.core.StandardService.stopInternal(StandardService.java:498)
at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:257)
at org.apache.catalina.core.StandardServer.stopInternal(StandardServer.java:982)
at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:257)
at org.apache.catalina.util.LifecycleBase.destroy(LifecycleBase.java:293)
at org.apache.catalina.startup.Catalina.start(Catalina.java:776)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:342)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:473)
Exception in thread "STP-WEB-SERVICE-thread-1" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Cause
This issue is caused by a limit to 'max user processes' (nproc) associated with the nbwebsvc account.
Upon creation, the nbwebsvc account defaults to:
max user processes (-u) 4096
Due to the way containers work on a Flex Appliance, although Master Server instances are unique, the processes from all of the nbwebsvc users amongst all of the containers on the Appliance collectively count towards that limit. This is why the issue does not impact the first few started Master Server instances.
Solution
Log on to each Master Server instance and complete the steps below to verify if the current value for 'max users processes' associated with 'nbwebsvc' remains at the default value of 4096, and if necessary, increase the value.
1. Execute the following command to verify the current value of 'max user processes' for the nbwebsvc account.$ sudo -u nbwebsvc sh -c "ulimit -u"
Note: Enter the 'sudo' password when prompted
Note: The default is: max user processes (-u) 4096
Note: If the value has already been increased to 65536 or higher, then there is no need to proceed with steps 2-7.
2. Stop all NetBackup processes, and verify no processes remain running.$ sudo /usr/openv/netbackup/bin/goodies/netbackup stop
$ sudo /usr/openv/netbackup/bin/bpps
3. Edit the file /etc/security/limits.conf
$ sudo vi /etc/security/limits.conf
4. Add the following two lines to the end of the file, just above the line "# End of file":nbwebsvc soft nproc 65536
nbwebsvc hard nproc 65536
Example:#@student hard nproc 20
#@faculty soft nproc 20
#@faculty hard nproc 50
#ftp hard nproc 0
#@student - maxlogins 4
nbwebsvc soft nproc 65536
nbwebsvc hard nproc 65536
# End of file
5. Confirm the change is reflected by executing the following.$ sudo -u nbwebsvc sh -c "ulimit -u"
Note: This should be the new value: max user processes (-u) 65536
6. Start NetBackup services$ sudo /usr/openv/netbackup/bin/goodies/netbackup start
7. Run commands to verify nbwmc is responsive:$ sudo /usr/openv/netbackup/bin/nbcertcmd -ping
$ sudo /usr/openv/netbackup/bin/nbcertcmd -getCACertificate
$ sudo /usr/openv/netbackup/bin/nbcertcmd -getCertificate -force
Note: For additional details, see the Related Article: Minimum O/S ulimit settings on primary and media server Linux/UNIX platforms.