NetBackup™ Backup Planning and Performance Tuning Guide

Last Published:
Product(s): NetBackup & Alta Data Protection (10.4, 10.3.0.1, 10.3, 10.2.0.1, 10.2, 10.1.1, 10.1, 10.0.0.1, 10.0, 9.1.0.1, 9.1, 9.0.0.1, 9.0, 8.3.0.2, 8.3.0.1, 8.3)
  1. NetBackup capacity planning
    1.  
      Purpose of this guide
    2.  
      Changes in Veritas terminology
    3.  
      Disclaimer
    4.  
      How to analyze your backup requirements
    5.  
      How to calculate the size of your NetBackup image database
    6. Sizing for capacity with MSDP
      1. Key sizing parameters
        1.  
          Data types and deduplication
        2.  
          Determining FETB for workloads
        3.  
          Retention periods
        4.  
          Change rate
        5.  
          Replication and duplication of backups
        6.  
          Sizing calculations for MSDP clients
    7.  
      About how to design your OpsCenter server
  2. Primary server configuration guidelines
    1.  
      Size guidance for the NetBackup primary server and domain
    2.  
      Factors that limit job scheduling
    3.  
      More than one backup job per second
    4.  
      Stagger the submission of jobs for better load distribution
    5.  
      NetBackup job delays
    6.  
      Selection of storage units: performance considerations
    7.  
      About file system capacity and NetBackup performance
    8.  
      About the primary server NetBackup catalog
    9.  
      Guidelines for managing the primary server NetBackup catalog
    10.  
      Adjusting the batch size for sending metadata to the NetBackup catalog
    11.  
      Methods for managing the catalog size
    12.  
      Performance guidelines for NetBackup policies
    13.  
      Legacy error log fields
  3. Media server configuration guidelines
    1. NetBackup hardware design and tuning considerations
      1.  
        PCI architecture
      2.  
        Central processing unit (CPU) trends
      3.  
        Storage trends
      4.  
        Conclusions
    2. About NetBackup Media Server Deduplication (MSDP)
      1.  
        Data segmentation
      2.  
        Fingerprint lookup for deduplication
      3.  
        Predictive and sampling cache scheme
      4.  
        Data store
      5.  
        Space reclamation
      6.  
        System resource usage and tuning considerations
      7.  
        Memory considerations
      8.  
        I/O considerations
      9.  
        Network considerations
      10.  
        CPU considerations
      11.  
        OS tuning considerations
      12. MSDP tuning considerations
        1.  
          Sample steps to change MSDP contentrouter.cfg
      13. MSDP sizing considerations
        1.  
          Data gathering
        2.  
          Leveraging requirements and best practices
    3.  
      Cloud tier sizing and performance
    4. Accelerator performance considerations
      1.  
        Accelerator for file-based backups
      2.  
        Controlling disk space for Accelerator track logs
      3.  
        Accelerator for virtual machine backups
      4.  
        Forced rescan schedules
      5.  
        Reporting the amount of Accelerator data transferred over the network
      6.  
        Accelerator backups and the NetBackup catalog
  4. Media configuration guidelines
    1.  
      About dedicated versus shared backup environments
    2.  
      Suggestions for NetBackup media pools
    3.  
      Disk versus tape: performance considerations
    4.  
      NetBackup media not available
    5.  
      About the threshold for media errors
    6.  
      Adjusting the media_error_threshold
    7.  
      About tape I/O error handling
    8.  
      About NetBackup media manager tape drive selection
  5. How to identify performance bottlenecks
    1.  
      Introduction
    2.  
      Proper mind set for performance issue RCA
    3.  
      The 6 steps of performance issue RCA and resolution
    4. Flowchart of performance data analysis
      1.  
        How to create a workload profile
  6. Best practices
    1.  
      Best practices: NetBackup SAN Client
    2. Best practices: NetBackup AdvancedDisk
      1.  
        AdvancedDisk performance considerations
      2.  
        Exclusive use of disk volumes with AdvancedDisk
      3.  
        Disk volumes with different characteristics
      4.  
        Disk pools and volume managers with AdvancedDisk
      5.  
        Network file system considerations
      6.  
        State changes in AdvancedDisk
    3.  
      Best practices: Disk pool configuration - setting concurrent jobs and maximum I/O streams
    4.  
      Best practices: About disk staging and NetBackup performance
    5.  
      Best practices: Supported tape drive technologies for NetBackup
    6. Best practices: NetBackup tape drive cleaning
      1.  
        How NetBackup TapeAlert works
      2.  
        Disabling TapeAlert
    7.  
      Best practices: NetBackup data recovery methods
    8.  
      Best practices: Suggestions for disaster recovery planning
    9.  
      Best practices: NetBackup naming conventions
    10.  
      Best practices: NetBackup duplication
    11.  
      Best practices: NetBackup deduplication
    12. Best practices: Universal shares
      1.  
        Benefits of universal shares
      2.  
        Configuring universal shares
      3.  
        Tuning universal shares
    13. NetBackup for VMware sizing and best practices
      1.  
        Configuring and controlling NetBackup for VMware
      2.  
        Discovery
      3.  
        Backup and restore operations
    14. Best practices: Storage lifecycle policies (SLPs)
      1.  
        Data flow and SLP design best practices
      2.  
        Targeted SLP
      3.  
        Limiting the number of SLP secondary operations to maximize performance
      4.  
        Storage Server IO
    15.  
      Best practices: NetBackup NAS-Data-Protection (D-NAS)
    16.  
      Best practices: NetBackup for Nutanix AHV
    17.  
      Best practices: NetBackup Sybase database
    18.  
      Best practices: Avoiding media server resource bottlenecks with Oracle VLDB backups
    19.  
      Best practices: Avoiding media server resource bottlenecks with MSDPLB+ prefix policy
    20.  
      Best practices: Cloud deployment considerations
  7. Measuring Performance
    1.  
      Measuring NetBackup performance: overview
    2.  
      How to control system variables for consistent testing conditions
    3.  
      Running a performance test without interference from other jobs
    4.  
      About evaluating NetBackup performance
    5.  
      Evaluating NetBackup performance through the Activity Monitor
    6.  
      Evaluating NetBackup performance through the All Log Entries report
    7. Table of NetBackup All Log Entries report
      1.  
        Additional information on the NetBackup All Log Entries report
    8. Evaluating system components
      1.  
        About measuring performance independent of tape or disk output
      2.  
        Measuring performance with bpbkar
      3.  
        Bypassing disk performance with the SKIP_DISK_WRITES touch file
      4.  
        Measuring performance with the GEN_DATA directive (Linux/UNIX)
      5.  
        Monitoring Linux/UNIX CPU load
      6.  
        Monitoring Linux/UNIX memory use
      7.  
        Monitoring Linux/UNIX disk load
      8.  
        Monitoring Linux/UNIX network traffic
      9.  
        Monitoring Linux/Unix system resource usage with dstat
      10.  
        About the Windows Performance Monitor
      11.  
        Monitoring Windows CPU load
      12.  
        Monitoring Windows memory use
      13.  
        Monitoring Windows disk load
    9.  
      Increasing disk performance
  8. Tuning the NetBackup data transfer path
    1.  
      About the NetBackup data transfer path
    2.  
      About tuning the data transfer path
    3.  
      Tuning suggestions for the NetBackup data transfer path
    4.  
      NetBackup client performance in the data transfer path
    5. NetBackup network performance in the data transfer path
      1.  
        Network interface settings
      2.  
        Network load
      3. Setting the network buffer size for the NetBackup media server
        1.  
          Network buffer size in relation to other parameters
      4.  
        Setting the NetBackup client communications buffer size
      5.  
        About the NOSHM file
      6.  
        Using socket communications (the NOSHM file)
    6. NetBackup server performance in the data transfer path
      1. About shared memory (number and size of data buffers)
        1.  
          Default number of shared data buffers
        2.  
          Default size of shared data buffers
        3.  
          Amount of shared memory required by NetBackup
        4.  
          How to change the number of shared data buffers
        5.  
          Notes on number data buffers files
        6.  
          How to change the size of shared data buffers
        7.  
          Notes on size data buffer files
        8.  
          Size values for shared data buffers
        9.  
          Note on shared memory and NetBackup for NDMP
        10.  
          Recommended shared memory settings
        11.  
          Recommended number of data buffers for SAN Client and FT media server
        12.  
          Testing changes made to shared memory
      2.  
        About NetBackup wait and delay counters
      3.  
        Changing parent and child delay values for NetBackup
      4. About the communication between NetBackup client and media server
        1.  
          Processes used in NetBackup client-server communication
        2.  
          Roles of processes during backup and restore
        3.  
          Finding wait and delay counter values
        4.  
          Note on log file creation
        5.  
          About tunable parameters reported in the bptm log
        6.  
          Example of using wait and delay counter values
        7.  
          Issues uncovered by wait and delay counter values
      5.  
        Estimating the effect of multiple copies on backup performance
      6. Effect of fragment size on NetBackup restores
        1.  
          How fragment size affects restore of a non-multiplexed image
        2.  
          How fragment size affects restore of a multiplexed image on tape
        3.  
          Fragmentation and checkpoint restart
      7. Other NetBackup restore performance issues
        1.  
          Example of restore from multiplexed database backup (Oracle)
    7.  
      NetBackup storage device performance in the data transfer path
  9. Tuning other NetBackup components
    1.  
      When to use multiplexing and multiple data streams
    2.  
      Effects of multiplexing and multistreaming on backup and restore
    3. How to improve NetBackup resource allocation
      1.  
        Improving the assignment of resources to NetBackup queued jobs
      2.  
        Sharing reservations in NetBackup
      3.  
        Disabling the sharing of NetBackup reservations
      4.  
        Disabling on-demand unloads
    4.  
      Encryption and NetBackup performance
    5.  
      Compression and NetBackup performance
    6.  
      How to enable NetBackup compression
    7.  
      Effect of encryption plus compression on NetBackup performance
    8.  
      Information on NetBackup Java performance improvements
    9.  
      Information on NetBackup Vault
    10.  
      Fast recovery with Bare Metal Restore
    11.  
      How to improve performance when backing up many small files
    12. How to improve FlashBackup performance
      1.  
        Adjusting the read buffer for FlashBackup and FlashBackup-Windows
    13.  
      Veritas NetBackup OpsCenter
  10. Tuning disk I/O performance
    1. About NetBackup performance and the hardware hierarchy
      1.  
        About performance hierarchy level 1
      2.  
        About performance hierarchy level 2
      3.  
        About performance hierarchy level 3
      4.  
        About performance hierarchy level 4
      5.  
        Summary of performance hierarchies
      6.  
        Notes on performance hierarchies
    2.  
      Hardware examples for better NetBackup performance

Conclusions

When specifying and building systems, understanding the use case is imperative. The following are recommended courses of action depending on the use case.

Processors

The large number of concurrent streams needed for nightly backups requires higher number of cores per processor. If looking at an enterprise-level backup it is recommended that 40 to 60 cores per compute node are required. More is not necessarily better, but if the user is backing up very large numbers of highly deduplicatable files, a high number of cores are required.

Mid-range stream requirements indicate a 12 to 36 core system. This assumes that the requirements are approximately 20 to 70% of the workload of the enterprise environment as shown above.

Small systems should look at 8 to 18 core systems and single processor motherboards as they will reduce cost and accommodate today's processor core count.

DRAM memory

Quality dynamic RAM (DRAM) is extremely important to ensure accurate operation. Because of the number of concurrent backups that users look to accomplish, Error Code Correction (ECC) and Registered (R) DRAM are required to ensure operation with no issue. Current systems use DDR4 SDRAM as the abbreviated "Double Data Rate Synchronous Dynamic Random-Access Memory" with the 4 representing the fourth generation of DDR memory. Users must use DDR4 ECC RDIMMs with current, as of the writing of this document, processors. Frequencies and generation of the DRAM must align with the processor recommendation and be of the same manufacturing lot to ensure smooth operation.

Current requirements of RAM in backup solutions are tied to the amount of MSDP data that is stored on the solution. To ensure proper and performant operation, 1 GB of RAM for every terabyte of MSDP data is recommended. For instance a system with 96TB of MSDP capacity requires the use of at least 96GB of RAM. DDR4 ECC RDIMMs come in 8, 16, 32, 64 and 128GB capacity. For this example, 12 each 8GB DIMMs would suffice, but may not be the most cost effective. Production amounts of the different sizes will change the cost per GB and the user may find that a population of 6 each 16GB or even 8 each 16GB, 128GB total may be a more cost effective solution and provide a future path to larger MSDP pools as the need for such increases.

PCIe

When selecting a system or motherboard, it is recommended that a PCIe 4 compliant system be chosen. In addition to the doubling of speed of the PCIe lanes, the number of lanes on processors will increase thereby creating a more than 2X performance enhancement. PCIe 4 Ethernet NICs, up to 200Gb, Fibre Channel HBAs up to 32Gb, SAS HBAs and RAID controllers at 4x10Gb per port all with up to 4 port or port groups can take advantage of this higher bandwidth. This level of system will be applicable for 7 to 10 years as opposed to PCIe 3 level systems that will likely disappear in the 2023 time frame. Users will be able to continue to utilize PCIe 3 based components as PCIe 4 is rearward compliant. However, it appears that the PCIe 4 components are in the same price range as PCIe 3, so the user is encouraged to utilize the newer protocol.

Disk drives and RAID storage

Disk drives have the potential to have rather large capacity in the future. HAMR and MAMR as noted earlier are technologies poised to create large, petabyte to exabyte scale repositories with up to 50TB drives. Assuming that consumption continues a 30% per year expansion, these sizes will fulfill the needs of backup storage for the foreseeable future.

For build-your-own (BYO) systems with present day 256TB capacity the best solution would be to design storage that brackets the 32 TiB volumes. For instance, using RAID 6 volumes with a hot spare, as the Veritas NetBackup and Flex appliances use, it is wise to create volumes that can contain those sizes of volumes efficiently. As an example, the NetBackup and Flex 5250 appliances utilize a 12 drive JBOD connected to a RAID controller in the Main Node. It uses 8TB drives and with a RAID 6 using 11 of the drives +1 for hot spare the resultant capacity is 72TB / 65.5 TiB. With this, two volumes of 32TiB fit well into the JBOD and can easily be stacked to arrive at the maximum capacity.

Solid-state disks (SSDs)

SSDs present a new variable into the solution as they act like disk drives but are not mechanical devices. They present lower power, high capacity, smaller size, significant access time improvement over disk and higher field reliability. The one downside, as compared to disks, is cost. For certain implementations though they are the best solution. Customers who require speed are finding that SSDs used for tape out of deduplicated data are 2.7 times faster than disk storage. If concurrent operations are required such as backup and then immediate replication to off-site, the access time of the SSDs used as the initial target make this possible in the time window necessary. Another use case is to use the SSDs as an Advance Disk pool and then, after the user feels the time is appropriate, the data could be deduplicated to a disk pool for medium or long-term retention.

As noted, earlier NVMe should be the choice for the best performance. Expectations are that the Whitley version of the Intel reference design, due for release in 2021, will be the best Intel platform as it will feature PCIe 4. With the incremental doubling of speed, only 2 lanes would be necessary allowing for an architecture that can handle a large number of SSDs, 24 in a 2u Chassis as well as accommodate the requisite Ethernet and Fibre Channel NIC/HBA to connect to clients.

Ethernet

As the predominant transport for backup, Ethernet NICs are of critical importance. Fortunately, there are a number of quality manufacturers of the NICs. For the time being, greater than 90% of the ports used will be 10GBASE-T, 10Gb optical or direct-attached copper (DAC) and 25Gb Optical / DAC. Broadcom and Marvell have NICs that support all three configurations. Intel and NVIDIA have 25-10 Optical / DAC NICs as well as10GBASE-T equipped NICs. Any of these can be used to accommodate the user's particular needs. Forecasts show that 50 and 100 and, to a lesser extent, 200 and 400Gb Ethernet will be growing quickly as the technology advances.

Fibre Channel

Fibre Channel (FC) will continue to exist for the foreseeable future, but much of its differentiation from other transports is lessening as NVMe over fabric becomes a more prevalent solution. FC is one of the transports, but it appears that Ethernet will have the speed advantage and will likely win out as the favored transport. For customers with FC SANs Marvell and Broadcom are the two choices for Host Bus Adapters as initiators and targets. Both are very good initiators, and the choice is up to the user as many sites have settled on a single vendor.

Figure: Media server block diagram

Media server block diagram