Problem
- Backup is slow.
- Backup takes too long.
- Backups do not run because the scheduled backup window closes.
Error Message
Status codes: 23, 24, 25, 196 and others.
Windows: 10054, 10053
Cause
If one host is slow, it is usually not specifically related to NetBackup and/or environmental in nature. Common causes would include slow or bad disks, old NICS or network drivers, networking infrastructure issues, incorrectly configured or overloaded SANs, or overloaded NAS devices.
If all backups of hosts in one or more policies are impacted, then some NetBackup tuning parameters can be adjusted to attempt to improve performance.
Solution
This is a typical NetBackup environment. However, some environments may have the master and media servers combined.
There are multiple locations where there could be a performance bottleneck:
1. The first step is to establish the speed at which the client can read data off of the disk(s) (zone3). If the data resides on NAS2 (fiber drive, NFS, or UNC) (zone 4) the speed must be tested across that network as well. A bpbkar null test can be used to test the speed. See one of the Backup Planning and Performance Tuning Guides which describes how to run this and other tests. https://www.veritas.com/support/en_US/doc/21414900-146141073-0/id-SF0S0156465-146141073
If the speed of the 'null' tests on the client(s) is slow then it is the NAS network, or the local disk/file system. When this result is slow, there is nothing in NetBackup that can be changed to improve performance. You will need to examine the disks, NAS, SAN, or NFS drives to find the performance bottleneck.
Note: Local data can be slow when many thousands of files are in a single folder on the disk. This is a function of the file system and cannot be overcome by using the normal client software. If this is the issue, then the solution is to use FlashBackup, which performs a block level backup of the entire disk. (FlashBackup cannot be used with NFS or UNC mounted data.)
2. Perform a test zone 2 by using a 'null disk storage unit'. Suspend backups and do a test policy backup of the client to a disk storage unit using the 'null write' touch file. Since a 'null write' DSU is, for practical purposes, an infinitely fast drive. This tests the networking in zone 2 between the client and the media server. If the write speed of the 'null dsu' backup is 80% (or more) of the 'null' tests on the client then the problem lies in zone 1 (DSU/TSU). If it is not then it is in zone 2.
If the issue is found to be in zone 2 then there are many areas to check.
- NIC drivers on the client and media server
- TCP configurations on the client/media servers (TCP buffer sizes, DNS, segment size, windowing, etc)
- Network configuration (ports, speed, firewalls, MSS, etc)
- NetBackup configuration (SIZE_DATA_BUFFERS, NUMBER_DATA_BUFFERS)
Note: Adjusting the NetBackup buffers can improve speed. However, it cannot make up for problems in slow reading of data off of disks, poor network performance or for poor performance of storage units.
3. In some instances, there are problems in zone 1. This is usually due to poor or old drivers for the tape drives or robotic library issues. Tape drive issues could also include: outdated firmware, bad tapes, bad tape drives, controller configuration, or problems with the storage network on the media server side.
4. The least favorable location for data would be in the drive labeled NAS in the diagram above. If the client mounts data (via NFS or UNC) and then is backed up by the media server, the data must travel over the network twice to be backed up. If the NAS is on the same NIC as the backup network you will experience 40% or less of the speed the network is capable of due to the data stream traveling on the network twice.
5. Specific verbose NetBackup logs may also contain useful information. The bpbkar and bptm/bpdm log files contain messages about buffers from NetBackup (waited for full buffer/waited on empty buffer).The admin logs can tell us if there were system or NetBackup errors that were slowing down the backup and other logs are for resource issues.
If the waiting for full buffer and waiting on empty buffer messages are high, then adjusting the size data buffers and number data buffers can be adjusted to improve performance for all clients that are backed up by that media server.
Note: If opening a case with NetBackup technical support for perfomance issues, providing an NBSU can provide TSEs with a wealth of useful information very quickly:
- Operating system (OS) version and build number
- OS patches or hot fixes
- OS settings for some TCP parameters
- NIC driver version
- NetBackup version
- Detect mixed NetBackup versions on the hosts
- Snapshot settings for Windows hosts
- Policy scheduling information
A Microsoft Product Support Report (MPSR) may also be useful for Windows-based environments.