NetBackup™ Backup Planning and Performance Tuning Guide

Last Published:
Product(s): NetBackup & Alta Data Protection (10.4, 10.3.0.1, 10.3, 10.2.0.1, 10.2, 10.1.1, 10.1, 10.0.0.1, 10.0, 9.1.0.1, 9.1, 9.0.0.1, 9.0, 8.3.0.2, 8.3.0.1, 8.3)
  1. NetBackup capacity planning
    1.  
      Purpose of this guide
    2.  
      Changes in Veritas terminology
    3.  
      Disclaimer
    4.  
      How to analyze your backup requirements
    5.  
      How to calculate the size of your NetBackup image database
    6. Sizing for capacity with MSDP
      1. Key sizing parameters
        1.  
          Data types and deduplication
        2.  
          Determining FETB for workloads
        3.  
          Retention periods
        4.  
          Change rate
        5.  
          Replication and duplication of backups
        6.  
          Sizing calculations for MSDP clients
    7.  
      About how to design your OpsCenter server
  2. Primary server configuration guidelines
    1.  
      Size guidance for the NetBackup primary server and domain
    2.  
      Factors that limit job scheduling
    3.  
      More than one backup job per second
    4.  
      Stagger the submission of jobs for better load distribution
    5.  
      NetBackup job delays
    6.  
      Selection of storage units: performance considerations
    7.  
      About file system capacity and NetBackup performance
    8.  
      About the primary server NetBackup catalog
    9.  
      Guidelines for managing the primary server NetBackup catalog
    10.  
      Adjusting the batch size for sending metadata to the NetBackup catalog
    11.  
      Methods for managing the catalog size
    12.  
      Performance guidelines for NetBackup policies
    13.  
      Legacy error log fields
  3. Media server configuration guidelines
    1. NetBackup hardware design and tuning considerations
      1.  
        PCI architecture
      2.  
        Central processing unit (CPU) trends
      3.  
        Storage trends
      4.  
        Conclusions
    2. About NetBackup Media Server Deduplication (MSDP)
      1.  
        Data segmentation
      2.  
        Fingerprint lookup for deduplication
      3.  
        Predictive and sampling cache scheme
      4.  
        Data store
      5.  
        Space reclamation
      6.  
        System resource usage and tuning considerations
      7.  
        Memory considerations
      8.  
        I/O considerations
      9.  
        Network considerations
      10.  
        CPU considerations
      11.  
        OS tuning considerations
      12. MSDP tuning considerations
        1.  
          Sample steps to change MSDP contentrouter.cfg
      13. MSDP sizing considerations
        1.  
          Data gathering
        2.  
          Leveraging requirements and best practices
    3.  
      Cloud tier sizing and performance
    4. Accelerator performance considerations
      1.  
        Accelerator for file-based backups
      2.  
        Controlling disk space for Accelerator track logs
      3.  
        Accelerator for virtual machine backups
      4.  
        Forced rescan schedules
      5.  
        Reporting the amount of Accelerator data transferred over the network
      6.  
        Accelerator backups and the NetBackup catalog
  4. Media configuration guidelines
    1.  
      About dedicated versus shared backup environments
    2.  
      Suggestions for NetBackup media pools
    3.  
      Disk versus tape: performance considerations
    4.  
      NetBackup media not available
    5.  
      About the threshold for media errors
    6.  
      Adjusting the media_error_threshold
    7.  
      About tape I/O error handling
    8.  
      About NetBackup media manager tape drive selection
  5. How to identify performance bottlenecks
    1.  
      Introduction
    2.  
      Proper mind set for performance issue RCA
    3.  
      The 6 steps of performance issue RCA and resolution
    4. Flowchart of performance data analysis
      1.  
        How to create a workload profile
  6. Best practices
    1.  
      Best practices: NetBackup SAN Client
    2. Best practices: NetBackup AdvancedDisk
      1.  
        AdvancedDisk performance considerations
      2.  
        Exclusive use of disk volumes with AdvancedDisk
      3.  
        Disk volumes with different characteristics
      4.  
        Disk pools and volume managers with AdvancedDisk
      5.  
        Network file system considerations
      6.  
        State changes in AdvancedDisk
    3.  
      Best practices: Disk pool configuration - setting concurrent jobs and maximum I/O streams
    4.  
      Best practices: About disk staging and NetBackup performance
    5.  
      Best practices: Supported tape drive technologies for NetBackup
    6. Best practices: NetBackup tape drive cleaning
      1.  
        How NetBackup TapeAlert works
      2.  
        Disabling TapeAlert
    7.  
      Best practices: NetBackup data recovery methods
    8.  
      Best practices: Suggestions for disaster recovery planning
    9.  
      Best practices: NetBackup naming conventions
    10.  
      Best practices: NetBackup duplication
    11.  
      Best practices: NetBackup deduplication
    12. Best practices: Universal shares
      1.  
        Benefits of universal shares
      2.  
        Configuring universal shares
      3.  
        Tuning universal shares
    13. NetBackup for VMware sizing and best practices
      1.  
        Configuring and controlling NetBackup for VMware
      2.  
        Discovery
      3.  
        Backup and restore operations
    14. Best practices: Storage lifecycle policies (SLPs)
      1.  
        Data flow and SLP design best practices
      2.  
        Targeted SLP
      3.  
        Limiting the number of SLP secondary operations to maximize performance
      4.  
        Storage Server IO
    15.  
      Best practices: NetBackup NAS-Data-Protection (D-NAS)
    16.  
      Best practices: NetBackup for Nutanix AHV
    17.  
      Best practices: NetBackup Sybase database
    18.  
      Best practices: Avoiding media server resource bottlenecks with Oracle VLDB backups
    19.  
      Best practices: Avoiding media server resource bottlenecks with MSDPLB+ prefix policy
    20.  
      Best practices: Cloud deployment considerations
  7. Measuring Performance
    1.  
      Measuring NetBackup performance: overview
    2.  
      How to control system variables for consistent testing conditions
    3.  
      Running a performance test without interference from other jobs
    4.  
      About evaluating NetBackup performance
    5.  
      Evaluating NetBackup performance through the Activity Monitor
    6.  
      Evaluating NetBackup performance through the All Log Entries report
    7. Table of NetBackup All Log Entries report
      1.  
        Additional information on the NetBackup All Log Entries report
    8. Evaluating system components
      1.  
        About measuring performance independent of tape or disk output
      2.  
        Measuring performance with bpbkar
      3.  
        Bypassing disk performance with the SKIP_DISK_WRITES touch file
      4.  
        Measuring performance with the GEN_DATA directive (Linux/UNIX)
      5.  
        Monitoring Linux/UNIX CPU load
      6.  
        Monitoring Linux/UNIX memory use
      7.  
        Monitoring Linux/UNIX disk load
      8.  
        Monitoring Linux/UNIX network traffic
      9.  
        Monitoring Linux/Unix system resource usage with dstat
      10.  
        About the Windows Performance Monitor
      11.  
        Monitoring Windows CPU load
      12.  
        Monitoring Windows memory use
      13.  
        Monitoring Windows disk load
    9.  
      Increasing disk performance
  8. Tuning the NetBackup data transfer path
    1.  
      About the NetBackup data transfer path
    2.  
      About tuning the data transfer path
    3.  
      Tuning suggestions for the NetBackup data transfer path
    4.  
      NetBackup client performance in the data transfer path
    5. NetBackup network performance in the data transfer path
      1.  
        Network interface settings
      2.  
        Network load
      3. Setting the network buffer size for the NetBackup media server
        1.  
          Network buffer size in relation to other parameters
      4.  
        Setting the NetBackup client communications buffer size
      5.  
        About the NOSHM file
      6.  
        Using socket communications (the NOSHM file)
    6. NetBackup server performance in the data transfer path
      1. About shared memory (number and size of data buffers)
        1.  
          Default number of shared data buffers
        2.  
          Default size of shared data buffers
        3.  
          Amount of shared memory required by NetBackup
        4.  
          How to change the number of shared data buffers
        5.  
          Notes on number data buffers files
        6.  
          How to change the size of shared data buffers
        7.  
          Notes on size data buffer files
        8.  
          Size values for shared data buffers
        9.  
          Note on shared memory and NetBackup for NDMP
        10.  
          Recommended shared memory settings
        11.  
          Recommended number of data buffers for SAN Client and FT media server
        12.  
          Testing changes made to shared memory
      2.  
        About NetBackup wait and delay counters
      3.  
        Changing parent and child delay values for NetBackup
      4. About the communication between NetBackup client and media server
        1.  
          Processes used in NetBackup client-server communication
        2.  
          Roles of processes during backup and restore
        3.  
          Finding wait and delay counter values
        4.  
          Note on log file creation
        5.  
          About tunable parameters reported in the bptm log
        6.  
          Example of using wait and delay counter values
        7.  
          Issues uncovered by wait and delay counter values
      5.  
        Estimating the effect of multiple copies on backup performance
      6. Effect of fragment size on NetBackup restores
        1.  
          How fragment size affects restore of a non-multiplexed image
        2.  
          How fragment size affects restore of a multiplexed image on tape
        3.  
          Fragmentation and checkpoint restart
      7. Other NetBackup restore performance issues
        1.  
          Example of restore from multiplexed database backup (Oracle)
    7.  
      NetBackup storage device performance in the data transfer path
  9. Tuning other NetBackup components
    1.  
      When to use multiplexing and multiple data streams
    2.  
      Effects of multiplexing and multistreaming on backup and restore
    3. How to improve NetBackup resource allocation
      1.  
        Improving the assignment of resources to NetBackup queued jobs
      2.  
        Sharing reservations in NetBackup
      3.  
        Disabling the sharing of NetBackup reservations
      4.  
        Disabling on-demand unloads
    4.  
      Encryption and NetBackup performance
    5.  
      Compression and NetBackup performance
    6.  
      How to enable NetBackup compression
    7.  
      Effect of encryption plus compression on NetBackup performance
    8.  
      Information on NetBackup Java performance improvements
    9.  
      Information on NetBackup Vault
    10.  
      Fast recovery with Bare Metal Restore
    11.  
      How to improve performance when backing up many small files
    12. How to improve FlashBackup performance
      1.  
        Adjusting the read buffer for FlashBackup and FlashBackup-Windows
    13.  
      Veritas NetBackup OpsCenter
  10. Tuning disk I/O performance
    1. About NetBackup performance and the hardware hierarchy
      1.  
        About performance hierarchy level 1
      2.  
        About performance hierarchy level 2
      3.  
        About performance hierarchy level 3
      4.  
        About performance hierarchy level 4
      5.  
        Summary of performance hierarchies
      6.  
        Notes on performance hierarchies
    2.  
      Hardware examples for better NetBackup performance

Predictive and sampling cache scheme

Beginning with NetBackup 10.1, a new fingerprint (FP) cache lookup data scheme was introduced. The new scheme splits the current maximum cache size MaxCacheSize into two components, predictive cache (P-cache) and sampling cache (S-cache).

The P-cache is used to cache the fingerprints that are most likely used in the immediate future.

The S-cache is used to cache a percentage of the fingerprint from each backup and a subset of each sample fingerprint is inserted into the S-cache. P-cache is first used to find duplicates, and lookup misses reaching a threshold are searched in S-cache for possible matches. If found, the predicted relevant fingerprints are loaded from disk into the P-cache for deduplication.

For more information about P-cache and S-cache, refer to the NetBackup Deduplicaton Guide for 10.1 or later.

With NetBackup 10.1, the P-and-S-cache is the default FP lookup scheme for cloud LSUs (logical storage units), while the local LSU volume is still defaulted to using the MaxCacheSize. The configuration changes and default values for P-and-S-cache cache are listed in the following table:

Table: Configuration change and default value for P-and-S-cache

Configuration

Default Value

MaxCacheSize

512 MB

MaxPredictiveCacheSizeMax

40% in NetBackup 10.1

10% in NetBackup 10.1.1)

MaxSamplingCacheSize

10%

EnableLocalPredictivesSamplingCache in spa.cfg

true

EnableLocalPredictiveSamplingCache in contentrouter.cfg

true

MaxCloudCacheSize

Deprecated and replaced with Max P-cache size and Max S-cache size

With the above change, to ensure that memory is used for uploading, the formula before NetBackup 10.1 is changed to:

MaxCacheSize + MaxPredictiveCacheSize + MaxSamplingCacheSize + MaxCloudCacheSize (Cloud in-memory upload cache size) must be less than or equal to the value of UsableMemoryLimit.

With P-and-S-cache in 10.2, local and all cloud LSUs share the same P-and-S-cache, and the previous MaxCacheSize can be ignored. The P-and-S-cache setting needs to be done carefully. Setting them too high will waste memory, while setting them too low will lead to a poor deduplication ratio and impact backup performance.

In general, S-cache size should be proportional to the backend storage size, while P-cache size is determined by the maximum number of concurrent jobs. Use the following rules of thumb for the P-and-S-cache tuning:

  • For each 10 TB of backend storage, allocate 1 GB of RAM for S-cache

  • For each backup stream, allocate 250 MB of RAM for P-cache. So, the total P-cache allocated should be (250 MB) * (maximum number of concurrent jobs)

To ensure enough memory for other processes running on the system, P-and-S-cache size together should not exceed the MaxUsableMemory value.

Other processes that also need memory include:

  • Basic operating system with NetBackup if running as a media server

  • NetBackup processes if NetBackup runs in the same node

  • spad cache for the opt-dup source

  • mtstrd cache for the backup source

  • Spooler cache

Disk cache for cloud upload and download

The NetBackup cloud tier allows each media server to create one or more cloud logical storage units (LSUs). It is important to know that for each cloud LSU created, roughly 1 TB of MSDP storage pool is reserved for the LSU to be used as cloud disk cache.

Starting with NetBackup10.2, this preserved disk cache can be configured from the NetBackup web UI during LSU creation. The disk cache size for upload is 12 GB and is set by the parameter UploadCacheDB, while the default disk cache size for cloud download is 1 TB which is set by the parameters DownloadDataCacheGB and DownloadMetaCacheGB. The default values for parameters are set in contentrouter.cfg with CloudUploadCacheSize, CloudDataCacheSize, and CloudMetaCacheSize respectively.

As mentioned earlier, the disk caches occupy space in the MSDP pool. For the MSDP pool with limited storage size, the reserved disk cache can consume too much space, resulting in little usable space for regular backup jobs. If jobs are failing with error codes 129 and 84, it may indicate that there is no space left on the device, even though the MSDP pool may still have plenty of space according to df - h and dsstat. For this kind of case, we recommend:

  • Limit the number of cloud LSUs created per media instance, especially if storage pool is relatively small.

  • Reduce the default CloudDataCacheSize and CloudMetaCacheSize values.

If there is enough memory for upload to go through the memory cache, the UploadCacheGB can be set to (maximum number of concurrent streams * MaxFileSizeMB * 2) in the cloud.json file. If the maximum number of concurrent streams is 100, the UploadCacheGB value can be set to 12 GB. The DownloadDataCacheGB and DownloadMetaCacheGB used for the restore or opt-dup download cache can be as small as a few GBs to function. A larger download disk cache size can improve restore and opt-dup performance because it can help avoid downloading the same data object more than once.

Tuning the DownloadDataCacheGB and DownloadMetaCacheGB values requires knowing the maximum number of concurrent download streams. In most cases, restoring from the cloud requires downloading the entire data container (64 MB). This is because the container created at backup time usually consists of data from a single client, and MSDP-C will download the entire container so that the same container is only fetched once during a restore.

The default values of the parameters are set under <storage>/etc/puredisk/contentrouter.cfg, and the default values are used for all future LSUs. The parameters in cloud.json are used to set values used for each already created LSU. The file is found at <storage>/etc/puredisk/cloud.json.