NetBackup™ Backup Planning and Performance Tuning Guide

Last Published:
Product(s): NetBackup & Alta Data Protection (10.4, 10.3.0.1, 10.3, 10.2.0.1, 10.2, 10.1.1, 10.1, 10.0.0.1, 10.0, 9.1.0.1, 9.1, 9.0.0.1, 9.0, 8.3.0.2, 8.3.0.1, 8.3)
  1. NetBackup capacity planning
    1.  
      Purpose of this guide
    2.  
      Changes in Veritas terminology
    3.  
      Disclaimer
    4.  
      How to analyze your backup requirements
    5.  
      How to calculate the size of your NetBackup image database
    6. Sizing for capacity with MSDP
      1. Key sizing parameters
        1.  
          Data types and deduplication
        2.  
          Determining FETB for workloads
        3.  
          Retention periods
        4.  
          Change rate
        5.  
          Replication and duplication of backups
        6.  
          Sizing calculations for MSDP clients
    7.  
      About how to design your OpsCenter server
  2. Primary server configuration guidelines
    1.  
      Size guidance for the NetBackup primary server and domain
    2.  
      Factors that limit job scheduling
    3.  
      More than one backup job per second
    4.  
      Stagger the submission of jobs for better load distribution
    5.  
      NetBackup job delays
    6.  
      Selection of storage units: performance considerations
    7.  
      About file system capacity and NetBackup performance
    8.  
      About the primary server NetBackup catalog
    9.  
      Guidelines for managing the primary server NetBackup catalog
    10.  
      Adjusting the batch size for sending metadata to the NetBackup catalog
    11.  
      Methods for managing the catalog size
    12.  
      Performance guidelines for NetBackup policies
    13.  
      Legacy error log fields
  3. Media server configuration guidelines
    1. NetBackup hardware design and tuning considerations
      1.  
        PCI architecture
      2.  
        Central processing unit (CPU) trends
      3.  
        Storage trends
      4.  
        Conclusions
    2. About NetBackup Media Server Deduplication (MSDP)
      1.  
        Data segmentation
      2.  
        Fingerprint lookup for deduplication
      3.  
        Predictive and sampling cache scheme
      4.  
        Data store
      5.  
        Space reclamation
      6.  
        System resource usage and tuning considerations
      7.  
        Memory considerations
      8.  
        I/O considerations
      9.  
        Network considerations
      10.  
        CPU considerations
      11.  
        OS tuning considerations
      12. MSDP tuning considerations
        1.  
          Sample steps to change MSDP contentrouter.cfg
      13. MSDP sizing considerations
        1.  
          Data gathering
        2.  
          Leveraging requirements and best practices
    3.  
      Cloud tier sizing and performance
    4. Accelerator performance considerations
      1.  
        Accelerator for file-based backups
      2.  
        Controlling disk space for Accelerator track logs
      3.  
        Accelerator for virtual machine backups
      4.  
        Forced rescan schedules
      5.  
        Reporting the amount of Accelerator data transferred over the network
      6.  
        Accelerator backups and the NetBackup catalog
  4. Media configuration guidelines
    1.  
      About dedicated versus shared backup environments
    2.  
      Suggestions for NetBackup media pools
    3.  
      Disk versus tape: performance considerations
    4.  
      NetBackup media not available
    5.  
      About the threshold for media errors
    6.  
      Adjusting the media_error_threshold
    7.  
      About tape I/O error handling
    8.  
      About NetBackup media manager tape drive selection
  5. How to identify performance bottlenecks
    1.  
      Introduction
    2.  
      Proper mind set for performance issue RCA
    3.  
      The 6 steps of performance issue RCA and resolution
    4. Flowchart of performance data analysis
      1.  
        How to create a workload profile
  6. Best practices
    1.  
      Best practices: NetBackup SAN Client
    2. Best practices: NetBackup AdvancedDisk
      1.  
        AdvancedDisk performance considerations
      2.  
        Exclusive use of disk volumes with AdvancedDisk
      3.  
        Disk volumes with different characteristics
      4.  
        Disk pools and volume managers with AdvancedDisk
      5.  
        Network file system considerations
      6.  
        State changes in AdvancedDisk
    3.  
      Best practices: Disk pool configuration - setting concurrent jobs and maximum I/O streams
    4.  
      Best practices: About disk staging and NetBackup performance
    5.  
      Best practices: Supported tape drive technologies for NetBackup
    6. Best practices: NetBackup tape drive cleaning
      1.  
        How NetBackup TapeAlert works
      2.  
        Disabling TapeAlert
    7.  
      Best practices: NetBackup data recovery methods
    8.  
      Best practices: Suggestions for disaster recovery planning
    9.  
      Best practices: NetBackup naming conventions
    10.  
      Best practices: NetBackup duplication
    11.  
      Best practices: NetBackup deduplication
    12. Best practices: Universal shares
      1.  
        Benefits of universal shares
      2.  
        Configuring universal shares
      3.  
        Tuning universal shares
    13. NetBackup for VMware sizing and best practices
      1.  
        Configuring and controlling NetBackup for VMware
      2.  
        Discovery
      3.  
        Backup and restore operations
    14. Best practices: Storage lifecycle policies (SLPs)
      1.  
        Data flow and SLP design best practices
      2.  
        Targeted SLP
      3.  
        Limiting the number of SLP secondary operations to maximize performance
      4.  
        Storage Server IO
    15.  
      Best practices: NetBackup NAS-Data-Protection (D-NAS)
    16.  
      Best practices: NetBackup for Nutanix AHV
    17.  
      Best practices: NetBackup Sybase database
    18.  
      Best practices: Avoiding media server resource bottlenecks with Oracle VLDB backups
    19.  
      Best practices: Avoiding media server resource bottlenecks with MSDPLB+ prefix policy
    20.  
      Best practices: Cloud deployment considerations
  7. Measuring Performance
    1.  
      Measuring NetBackup performance: overview
    2.  
      How to control system variables for consistent testing conditions
    3.  
      Running a performance test without interference from other jobs
    4.  
      About evaluating NetBackup performance
    5.  
      Evaluating NetBackup performance through the Activity Monitor
    6.  
      Evaluating NetBackup performance through the All Log Entries report
    7. Table of NetBackup All Log Entries report
      1.  
        Additional information on the NetBackup All Log Entries report
    8. Evaluating system components
      1.  
        About measuring performance independent of tape or disk output
      2.  
        Measuring performance with bpbkar
      3.  
        Bypassing disk performance with the SKIP_DISK_WRITES touch file
      4.  
        Measuring performance with the GEN_DATA directive (Linux/UNIX)
      5.  
        Monitoring Linux/UNIX CPU load
      6.  
        Monitoring Linux/UNIX memory use
      7.  
        Monitoring Linux/UNIX disk load
      8.  
        Monitoring Linux/UNIX network traffic
      9.  
        Monitoring Linux/Unix system resource usage with dstat
      10.  
        About the Windows Performance Monitor
      11.  
        Monitoring Windows CPU load
      12.  
        Monitoring Windows memory use
      13.  
        Monitoring Windows disk load
    9.  
      Increasing disk performance
  8. Tuning the NetBackup data transfer path
    1.  
      About the NetBackup data transfer path
    2.  
      About tuning the data transfer path
    3.  
      Tuning suggestions for the NetBackup data transfer path
    4.  
      NetBackup client performance in the data transfer path
    5. NetBackup network performance in the data transfer path
      1.  
        Network interface settings
      2.  
        Network load
      3. Setting the network buffer size for the NetBackup media server
        1.  
          Network buffer size in relation to other parameters
      4.  
        Setting the NetBackup client communications buffer size
      5.  
        About the NOSHM file
      6.  
        Using socket communications (the NOSHM file)
    6. NetBackup server performance in the data transfer path
      1. About shared memory (number and size of data buffers)
        1.  
          Default number of shared data buffers
        2.  
          Default size of shared data buffers
        3.  
          Amount of shared memory required by NetBackup
        4.  
          How to change the number of shared data buffers
        5.  
          Notes on number data buffers files
        6.  
          How to change the size of shared data buffers
        7.  
          Notes on size data buffer files
        8.  
          Size values for shared data buffers
        9.  
          Note on shared memory and NetBackup for NDMP
        10.  
          Recommended shared memory settings
        11.  
          Recommended number of data buffers for SAN Client and FT media server
        12.  
          Testing changes made to shared memory
      2.  
        About NetBackup wait and delay counters
      3.  
        Changing parent and child delay values for NetBackup
      4. About the communication between NetBackup client and media server
        1.  
          Processes used in NetBackup client-server communication
        2.  
          Roles of processes during backup and restore
        3.  
          Finding wait and delay counter values
        4.  
          Note on log file creation
        5.  
          About tunable parameters reported in the bptm log
        6.  
          Example of using wait and delay counter values
        7.  
          Issues uncovered by wait and delay counter values
      5.  
        Estimating the effect of multiple copies on backup performance
      6. Effect of fragment size on NetBackup restores
        1.  
          How fragment size affects restore of a non-multiplexed image
        2.  
          How fragment size affects restore of a multiplexed image on tape
        3.  
          Fragmentation and checkpoint restart
      7. Other NetBackup restore performance issues
        1.  
          Example of restore from multiplexed database backup (Oracle)
    7.  
      NetBackup storage device performance in the data transfer path
  9. Tuning other NetBackup components
    1.  
      When to use multiplexing and multiple data streams
    2.  
      Effects of multiplexing and multistreaming on backup and restore
    3. How to improve NetBackup resource allocation
      1.  
        Improving the assignment of resources to NetBackup queued jobs
      2.  
        Sharing reservations in NetBackup
      3.  
        Disabling the sharing of NetBackup reservations
      4.  
        Disabling on-demand unloads
    4.  
      Encryption and NetBackup performance
    5.  
      Compression and NetBackup performance
    6.  
      How to enable NetBackup compression
    7.  
      Effect of encryption plus compression on NetBackup performance
    8.  
      Information on NetBackup Java performance improvements
    9.  
      Information on NetBackup Vault
    10.  
      Fast recovery with Bare Metal Restore
    11.  
      How to improve performance when backing up many small files
    12. How to improve FlashBackup performance
      1.  
        Adjusting the read buffer for FlashBackup and FlashBackup-Windows
    13.  
      Veritas NetBackup OpsCenter
  10. Tuning disk I/O performance
    1. About NetBackup performance and the hardware hierarchy
      1.  
        About performance hierarchy level 1
      2.  
        About performance hierarchy level 2
      3.  
        About performance hierarchy level 3
      4.  
        About performance hierarchy level 4
      5.  
        Summary of performance hierarchies
      6.  
        Notes on performance hierarchies
    2.  
      Hardware examples for better NetBackup performance

Data gathering

Workloads

Define the workload types that you have and the front-end terabytes (FETB) of each workload. A workload type would be classified as VMware, Oracle, SQL, MS-Exchange, NDMP, etc. For each workload type, then determine if there are some key data characteristics that could be of great importance.

For instance, if Enterprise Vault (EV) is one of the workload types, then a key data characteristic is that it can sometimes result in millions of little files which results in slower, resource intensive backups. Sometimes EV data can be grouped into Windows Cabinet Files (CAB), but if WORM is a factor, then CAB collections aren't an option which means being faced with protecting a workload type with a data characteristic of millions of files with an approximate size of 40KB per file. This is excluding databases and indexes that must be protected as part of a full EV backup.

If the workload types are Oracle and/or SQL, then understanding if transaction and/or archive logs are required to be protected is also very important as it can result in thousands of tiny jobs. The overhead of thousands of tiny jobs versus a smaller number of larger jobs has been observed to be a significant factor in determining compute requirements.

If VMware is the workload type, then often it makes sense to leverage the Accelerator feature to improve performance of VMware backups. The use of that feature leverages additional compute resources, but only for the first copy of the data.

In the case that a specific workload leverages 3rd party encryption, it is recommended that the customer consider leveraging our native encryption instead of a 3rd party encryption. In some cases, a customer may require a specific workload to be encrypted at the source. If this is case, any backups of this data will experience very poor dedupe rates. Therefore, the use of MSDP isn't a good fit for this type of workload. That said, understanding this requirement is very important as it can have a significant impact on solution design, feature use, and solution sizing.

Once the workload qualification is done, calculate how large a complete full backup would be for each workload type. Then, discuss with stakeholders the estimated daily rate of change per workload type. That information is also very important.

Data lifecycle

Complete a data lifecycle for the protection strategy. A data lifecycle traces each step of the primary and secondary processing of a specific type of workload. For example, for a specific workload, is there a secondary operation required? If so, is the secondary operation a duplication, or perhaps a replication via Auto-Image Replication (AIR)? Also, are there additional steps in the process that involve writing an N+1 copy of the data to S3, or perhaps to tape for offsite storage? Tracing each step is critical because each step will be part of a NetBackup Storage Lifecycle Policy (SLP), which then requires resources to complete.

Backup methodology and retention

It is important to consider what the retention requirements are for each copy. Perhaps the primary copy of the data would be retained in an MSDP pool for 35 days and the secondary copy via AIR is kept for 90 days. Then, perhaps this workload requires a long-term retention (LTR) whereby a 3rd copy is sent to an S3 bucket or out to tape for a period of several years.

As part of this step in defining the retention requirements, determine what type of backup is required and how often, as well as which type of backups are targeted for LTR. Is it a weekly full? A monthly full? Typically, frequent backups being duplicated or replicated to LTR aren't a good fit because of cost and long-term management of such a large amount of data. Consider the S3 storage costs for storing large amounts of data with an LTR of many years. For customers that require a tape copy, consider the management implications of an LTR of potentially hundreds of tapes offsite for many years. There is a cost of time and compute resources to produce a tape copy, and then the cost of maintaining that data in the NetBackup catalog, as well as the cost of storing those tapes in a vault offsite.

Some customers require incrementals or transaction logs to be subject to secondary operations, like replication or duplication. If such a requirement exists, it is important to narrow the requirement to the exact workloads or specific host clients that require incrementals and transaction logs to be replicated or duplicated. It is important not to paint this requirement with a broad brush because it seems expedient.

Another factor that many customers don't fully consider is the implication of an infinite retention. The idea of an infinite retention is not realistic.

Consider for a moment a large database containing patient records. That database should be protected by a variety of primary and secondary methods. Backup is a secondary method, whilst high availability and disaster recovery should be the primary protection methods. It is reasonable to require this database to maintain patient records infinitely, but the backups of that data shouldn't require an infinite retention. While most states and countries have requirements around record retention, it is important to understand those requirements and not just presume an infinite retention is required. It would be more reasonable for an LTR for monthly full backups to be up to 7 years. If the patient data is maintained in the database indefinitely, then that data would be in every single full backup taken. Why would a customer want to restore a database from greater than 30 days if the data in the source is never archived or purged? On the outside chance that the primary copy of the database is corrupted, the business owner would want the most recent, consistent copy restored.

Keep this in mind when retention levels are set for n+1 copies of specific workloads.

Timing

Determining the timing of when backups must run based upon internal customer RPO, RTO, and SLAs is extremely important because it is a significant variable that drives when, and how fast backups and secondary operations like replication and duplication must be completed. When MSDP pools are unable to keep up with backup and secondary operations, that can result in missed backup windows and SLP backlog.

Consider a scenario when backups are kept for only 7 days, but the N+1 copy taken during secondary operations via an SLP is kept for 35 days. If backups don't meet their backup windows because of workload imbalance and/or solution under-sizing, then that performance degradation can impact the performance of secondary operations. In extreme situations, that can result in data that is being replicated and/or duplicated that is already considered past the retention period. Clearly, it is important to avoid that type of scenario, which is why sizing for compute, as well as capacity, is paramount.