Problem
Individual backup and restore streams across an OC-3 WAN link (155 Mb/s) have a transfer rate of only 2 MB/sec. Six concurrent streams can consume the full 12 MB/s bandwidth but individual streams cannot consume the bandwidth.
The NetBackup LIMIT_BANDWIDTH feature is not in use.
Affects multiple remote client hosts to two local media servers.
Error Message
No specific error messages are encountered, the backup or restore is simply slower than expected. The following evidence is from a backup, but restores are similarly affected.
The bpbrm debug log shows that limit bandwidth settings were not received from the master, and also the IP addresses for the sockets from the mediaserver to the client.
09:44:31.421 [6219] <2> logparams: -backup -S mymaster -c myclient -ct 0 -ru root -cl mypolicy -sched full -bt 1337694270 -dt 0 -st 0 -b myclient_1337694270 -mediasvr mymm -jobid 1851559 -jobgrpid 1851559 -masterversion 710000 -maxfrag 20480 -reqid -1336758610 -mt 2 -to 10800 -stunit mySTU -rl 10 -rp 259200 -eari 0 -cj 1 -D 15 -rt 1 -rn 0 -pool entbkup-suit -use_ofb -use_otm -jm -secure 1 -kl 5 -rg root -ckpt_time 900 -connect_options 16974338 -ri mymaster
...snip...
09:44:33.411 [6219] <2> logconnections: BPCD CONNECT FROM 2.2.2.2.39302 TO 3.3.3.3.13724 fd = 5
09:44:33.526 [6219] <2> do_vnetd_service: ... VNETD CONNECT FROM 2.2.2.2.36229 TO 3.3.3.3.13724 fd = 7
09:44:33.526 [6219] <2> vnet_vnetd_connect_forward_socket_begin: ... 0: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a
...snip...
09:44:33.913 [6219] <2> set_job_details: Tfile (1851559): LOG 1337694273 4 bpbrm 6219 starting bpbkar on client
The bptm debug log shows NET_BUFFER_SZ file is present and contains 262144 but the O/S is only accepting a value of 262142.
09:44:34.414 [6246] <2> read_config_file: using 262144 value from /usr/openv/netbackup/NET_BUFFER_SZ
09:44:34.414 [6246] <2> io_set_recvbuf: setting receive network buffer to 262144 bytes
09:44:34.414 [6246] <2> io_set_recvbuf: receive network buffer is 262142 bytes
The backup was 240 MB of data. There are 0 delays waiting for the storage unit to accept data. However, inbound from the network, there are 883 waits of 10 ms, compounded by an additional 5,989 delays of 10 milliseconds. All delays are on either within the network layers or on the client host.
09:44:34.414 [6246] <2> io_init: child delay = 10, parent delay = 15 (milliseconds)
...snip...
09:47:04.196 [6249] <2> fill_buffer: [6246] socket is closed, waited for empty buffer 0 times, delayed 0 times, read 239,730,688 bytes
09:47:04.206 [6246] <2> write_data: waited for full buffer 883 times, delayed 6872 times
The bpbkar debug log confirms that NET_BUFFER_SZ is 262144. Other than transferring file contents onto the network, there are only two delays longer than 5 ms present. So all the delays are below the application layer, either in the TCP stack on one of the hosts or in the network between. These are the two delays looking for the next file to backup.
09:44:33.943 [28231] <4> bpbkar initialize: INF - Setting network send buffer size to 262144 bytes
...snip...
09:44:34.017 [28231] <4> bpbkar PrintFile: /NBU/openv/netbackup/client/INTEL/
09:44:34.020 [28231] <2> bpbkar SelectFile: INF - cwd = /NBU/openv/netbackup/client/INTEL
...snip...
09:46:20.733 [28231] <4> bpbkar PrintFile: /NBU/openv/netbackup/client/NDMP/NDMP/
09:46:20.739 [28231] <2> bpbkar SelectFile: INF - cwd = /NBU/openv/netbackup/client
An strace of the bpbkar process shows that it is providing full 512 KB buffers to the O/S with a consistent delay of ~0.2 seconds awaiting the transmission of each buffer.
09:45:01.364 139 read(7, "s"..., 524288) = 524288
09:45:01.364 371 write(1, "s"..., 524288) = 524288
09:45:01.586 356 access("", F_OK) = -1 ENOENT (No such file or directory)
09:45:01.586 541 time(NULL) = 1337713261
09:45:01.586 614 read(7, "\26"..., 524288) = 524288
09:45:01.586 836 write(1, "\26"..., 524288) = 524288
09:45:01.810 531 access("", F_OK) = -1 ENOENT (No such file or directory)
09:45:01.810 652 time(NULL) = 1337713261
09:45:01.810 716 read(7, "\305"..., 524288) = 524288
09:45:01.810 937 write(1, "\305"..., 524288) = 524288
09:45:02.063 118 access("", F_OK) = -1 ENOENT (No such file or directory)
09:45:02.063 329 time(NULL) = 1337713262
09:45:02.063 390 read(7, "\20"..., 524288) = 524288
09:45:02.063 588 write(1, "\20"..., 524288) = 524288
An strace of the bptm.child process shows that although the O/S is requested to fill a 256KB buffer, the data is coming off the network 1448 bytes at a time (1460 bytes - 12 bytes for timestamp) and with a 10-30 millisecond delay between each. This further confirms that the delays are below the application layer.
09:45:01.872 693 recvfrom(0, "\346"..., 47912, 0, NULL, NULL) = 1448
09:45:01.872 780 recvfrom(0, "c"..., 46464, 0, NULL, NULL) = 1448
09:45:01.872 865 recvfrom(0, "\227"..., 45016, 0, NULL, NULL) = 1448
09:45:01.872 945 recvfrom(0, "p"..., 43568, 0, NULL, NULL) = 1448
09:45:01.873 027 recvfrom(0, "\356"..., 42120, 0, NULL, NULL) = 1448
09:45:01.873 116 recvfrom(0, "\304"..., 40672, 0, NULL, NULL) = 1448
The network capture from the client host shows the following details.
· A consistent 27-28 milliseconds delay for the network to acknowledge each outbound segment due to the long 155Mb/s OC-3 ATM link between the hosts.
· The number of unacknowledged bytes in flight is never more than 64 KB.
· The TCP Window advertised by the media server TCP stack is never larger than 5-10 KB.
A network capture from the media server host confirmed the details. The behavior during the restore is the same except the small window is advertised by the client host.
The network captures also confirm that during the TCP SYN exchange, the TCP stacks on the two hosts negotiated a TCP Window Size of 5,800 bytes and a TCP Window Scale of 7 (x 128). This should result in a much larger number of bytes in flight. The small window advertised by the TCP stack on the receiving Red Hat host combined with the round trip time appears to be the result of the delay.
The TCP settings from both media server and client hosts show that the default TCP send/receive space is 80-160 KB which also should allow for a larger outstanding window.
$ sysctl -a | grep 'tcp_.*mem'
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.tcp_mem = 196608 262144 393216
Increasing the NET_BUFFER_SZ to 512KB and 1024KB had no effect. Neither did decreasing the NET_BUFFER_SZ to 32KB, 64KB, or 128KB.
A netperf test between the hosts showed better performance in some cases and confirmed that a larger TCP Window Size was utilized.
Cause
The RedHat implementation of SO_SNDBUF and SO_RCVBUF requires that they be adjusted before socket listen/accept or connect calls. This differs from other platforms and is not compatible with the NetBackup NET_BUFFER_SZ touch file.
In addition, using SO_SNDBUF or SO_RCVBUF disables the RedHat auto tune feature.
Solution
This problem can be avoided by ensuring that NetBackup does not attempt to adjust SO_SNDBUF or SO_RCVBUF on RedHat platforms. NetBackup attempts to use an optimal value by default. To disable the default usage, place a zero into the NET_BUFFER_SZ file. Removal of the file is not sufficient as that enables the default behavior.
echo "0" > /usr/openv/netbackup/NET_BUFFER_SZ
Once implemented, a network capture in the above environment showed that the TCP Window Size ranged from 100 - 300 KB and that the performance of a single stream backup improved from 2 MB/sec to 12 MB/sec, which was the maximum permitted on the shared OC-3 WAN link.
The bptm debug log then confirms that the TCP receive space is defaulting per the sysctl settings.
11:14:38.092 [5392] <2> io_set_recvbuf: receive network buffer is 87380 bytes
Note: If desired, the Red Hat settings for tcp_rmem and tcp_wmem could be further adjusted to increase the TCP receive space for all sockets. See the TCP(7) man page for further details.
Applies To
Red Hat 5.6 and 5.8 media servers with NetBackup 7.1.0.3
A mix of Red Hat and Windows clients.