NetBackup™ Deduplication Guide
- Introducing the NetBackup media server deduplication option
- Quick start
- Planning your deployment
- Planning your MSDP deployment
- NetBackup naming conventions
- About MSDP deduplication nodes
- About the NetBackup deduplication destinations
- About MSDP storage capacity
- About MSDP storage and connectivity requirements
- About NetBackup media server deduplication
- About NetBackup Client Direct deduplication
- About MSDP remote office client deduplication
- About the NetBackup Deduplication Engine credentials
- About the network interface for MSDP
- About MSDP port usage
- About MSDP optimized synthetic backups
- About MSDP and SAN Client
- About MSDP optimized duplication and replication
- About MSDP performance
- About MSDP stream handlers
- MSDP deployment best practices
- Use fully qualified domain names
- About scaling MSDP
- Send initial full backups to the storage server
- Increase the number of MSDP jobs gradually
- Introduce MSDP load balancing servers gradually
- Implement MSDP client deduplication gradually
- Use MSDP compression and encryption
- About the optimal number of backup streams for MSDP
- About storage unit groups for MSDP
- About protecting the MSDP data
- Save the MSDP storage server configuration
- Plan for disk write caching
- Provisioning the storage
- Licensing deduplication
- Configuring deduplication
- Configuring MSDP server-side deduplication
- Configuring MSDP client-side deduplication
- About the MSDP Deduplication Multi-Threaded Agent
- Configuring the Deduplication Multi-Threaded Agent behavior
- Configuring deduplication plug-in interaction with the Multi-Threaded Agent
- About MSDP fingerprinting
- About the MSDP fingerprint cache
- Configuring the MSDP fingerprint cache behavior
- About seeding the MSDP fingerprint cache for remote client deduplication
- Configuring MSDP fingerprint cache seeding on the client
- Configuring MSDP fingerprint cache seeding on the storage server
- Enabling 400 TB support for MSDP
- About MSDP Encryption using NetBackup KMS service
- About MSDP Encryption using external KMS server
- Configuring a storage server for a Media Server Deduplication Pool
- About disk pools for NetBackup deduplication
- Configuring a disk pool for deduplication
- Creating the data directories for 400 TB MSDP support
- Adding volumes to a 400 TB Media Server Deduplication Pool
- Configuring a Media Server Deduplication Pool storage unit
- Configuring client attributes for MSDP client-side deduplication
- Disabling MSDP client-side deduplication for a client
- About MSDP compression
- About MSDP encryption
- MSDP compression and encryption settings matrix
- Configuring encryption for MSDP backups
- Configuring encryption for MSDP optimized duplication and replication
- About the rolling data conversion mechanism for MSDP
- Modes of rolling data conversion
- MSDP encryption behavior and compatibilities
- Configuring optimized synthetic backups for MSDP
- About a separate network path for MSDP duplication and replication
- Configuring a separate network path for MSDP duplication and replication
- About MSDP optimized duplication within the same domain
- Configuring MSDP optimized duplication within the same NetBackup domain
- About MSDP replication to a different domain
- Configuring MSDP replication to a different NetBackup domain
- About NetBackup Auto Image Replication
- About trusted primary servers for Auto Image Replication
- About the certificate to be used for adding a trusted master server
- Adding a trusted master server using a NetBackup CA-signed (host ID-based) certificate
- Adding a trusted primary server using external CA-signed certificate
- Removing a trusted primary server
- Enabling NetBackup clustered primary server inter-node authentication
- Configuring NetBackup CA and NetBackup host ID-based certificate for secure communication between the source and the target MSDP storage servers
- Configuring external CA for secure communication between the source MSDP storage server and the target MSDP storage server
- Configuring a target for MSDP replication to a remote domain
- About configuring MSDP optimized duplication and replication bandwidth
- About performance tuning of optimized duplication and replication for MSDP cloud
- About storage lifecycle policies
- About the storage lifecycle policies required for Auto Image Replication
- Creating a storage lifecycle policy
- About MSDP backup policy configuration
- Creating a backup policy
- Resilient Network properties
- Specifying resilient connections
- Adding an MSDP load balancing server
- About variable-length deduplication on NetBackup clients
- About the MSDP pd.conf configuration file
- Editing the MSDP pd.conf file
- About the MSDP contentrouter.cfg file
- About saving the MSDP storage server configuration
- Saving the MSDP storage server configuration
- Editing an MSDP storage server configuration file
- Setting the MSDP storage server configuration
- About the MSDP host configuration file
- Deleting an MSDP host configuration file
- Resetting the MSDP registry
- About protecting the MSDP catalog
- Changing the MSDP shadow catalog path
- Changing the MSDP shadow catalog schedule
- Changing the number of MSDP catalog shadow copies
- Configuring an MSDP catalog backup
- Updating an MSDP catalog backup policy
- About MSDP FIPS compliance
- Configuring the NetBackup client-side deduplication to support multiple interfaces of MSDP
- About MSDP multi-domain support
- About MSDP application user support
- About MSDP mutli-domain VLAN Support
- About NetBackup WORM storage support for immutable and indelible data
- MSDP cloud support
- About MSDP cloud support
- Create a Media Server Deduplication Pool (MSDP) storage server in the NetBackup web UI
- Creating a cloud storage unit
- Updating cloud credentials for a cloud LSU
- Updating encryption configurations for a cloud LSU
- Deleting a cloud LSU
- Backup data to cloud by using cloud LSU
- Duplicate data cloud by using cloud LSU
- Configuring AIR to use cloud LSU
- About backward compatibility support
- About the configuration items in cloud.json, contentrouter.cfg, and spa.cfg
- About the tool updates for cloud support
- About the disaster recovery for cloud LSU
- About Image Sharing using MSDP cloud
- About restore from a backup in Microsoft Azure Archive
- About MSDP cloud immutable (WORM) storage support
- Monitoring deduplication activity
- Monitoring the MSDP deduplication and compression rates
- Viewing MSDP job details
- About MSDP storage capacity and usage reporting
- About MSDP container files
- Viewing storage usage within MSDP container files
- Viewing MSDP disk reports
- About monitoring MSDP processes
- Reporting on Auto Image Replication jobs
- Managing deduplication
- Managing MSDP servers
- Viewing MSDP storage servers
- Determining the MSDP storage server state
- Viewing MSDP storage server attributes
- Setting MSDP storage server attributes
- Changing MSDP storage server properties
- Clearing MSDP storage server attributes
- About changing the MSDP storage server name or storage path
- Changing the MSDP storage server name or storage path
- Removing an MSDP load balancing server
- Deleting an MSDP storage server
- Deleting the MSDP storage server configuration
- Managing NetBackup Deduplication Engine credentials
- Managing Media Server Deduplication Pools
- Viewing Media Server Deduplication Pools
- Determining the Media Server Deduplication Pool state
- Changing OpenStorage disk pool state
- Viewing Media Server Deduplication Pool attributes
- Setting a Media Server Deduplication Pool attribute
- Changing a Media Server Deduplication Pool properties
- Clearing a Media Server Deduplication Pool attribute
- Determining the MSDP disk volume state
- Changing the MSDP disk volume state
- Inventorying a NetBackup disk pool
- Deleting a Media Server Deduplication Pool
- Deleting backup images
- About MSDP queue processing
- Processing the MSDP transaction queue manually
- About MSDP data integrity checking
- Configuring MSDP data integrity checking behavior
- About managing MSDP storage read performance
- About MSDP storage rebasing
- About the MSDP data removal process
- Resizing the MSDP storage partition
- How MSDP restores work
- Configuring MSDP restores directly to a client
- About restoring files at a remote site
- About restoring from a backup at a target master domain
- Specifying the restore server
- Managing MSDP servers
- Recovering MSDP
- Replacing MSDP hosts
- Uninstalling MSDP
- Deduplication architecture
- Configuring and using universal shares
- About Universal Shares
- Configuring and using an MSDP build-your-own (BYO) server for Universal Shares
- MSDP build-your-own (BYO) server prerequisites and hardware requirements to configure Universal Shares
- Configuring Universal Share user authentication
- Mounting a Universal Share created from the NetBackup web UI
- Creating a Protection Point for a Universal Share
- Using the ingest mode
- Changing the number of vpfsd instances
- Upgrading to NetBackup 10.0
- Troubleshooting
- About unified logging
- About legacy logging
- NetBackup MSDP log files
- Troubleshooting MSDP installation issues
- Troubleshooting MSDP configuration issues
- Troubleshooting MSDP operational issues
- Verify that the MSDP server has sufficient memory
- MSDP backup or duplication job fails
- MSDP client deduplication fails
- MSDP volume state changes to DOWN when volume is unmounted
- MSDP errors, delayed response, hangs
- Cannot delete an MSDP disk pool
- MSDP media open error (83)
- MSDP media write error (84)
- MSDP no images successfully processed (191)
- MSDP storage full conditions
- Troubleshooting MSDP catalog backup
- Storage Platform Web Service (spws) does not start
- Disk volume API or command line option does not work
- Viewing MSDP disk errors and events
- MSDP event codes and messages
- Unable to obtain the administrator password to use an AWS EC2 instance that has a Windows OS
- Trouble shooting multi-domain issues
- Appendix A. Migrating to MSDP storage
- Appendix B. Migrating from Cloud Catalyst to MSDP direct cloud tiering
- About migration from Cloud Catalyst to MSDP direct cloud tiering
- About Cloud Catalyst migration strategies
- About direct migration from Cloud Catalyst to MSDP direct cloud tiering
- About postmigration configuration and cleanup
- About the Cloud Catalyst migration -dryrun option
- About Cloud Catalyst migration cacontrol options
- Reverting back to Cloud Catalyst from a successful migration
- Reverting back to Cloud Catalyst from a failed migration
- Appendix C. Encryption Crawler
- Index
About the configuration items in cloud.json, contentrouter.cfg, and spa.cfg
The cloud.json file is available at: <STORAGE>/etc/puredisk/cloud.json.
The file has the following parameters:
Parameter | Details | Default value |
|---|---|---|
UseMemForUpload | If it is set to true, the upload cache directory is mounted in memory as tmpfs. It is especially useful for high speed cloud that disk speed is bottleneck. It can also reduce the disk competition with local LSU. The value is set to true if the system memory is enough. The default value is true if there is enough memory available. | true |
CachePath | The path of the cache. It is created under an MSDP volume according to the space usage of MSDP volumes. It will reserve some space that local LSU cannot write beyond. Usually you do not need to change this path, unless in some case that some volumes are much freer than others, multiple cloud LSUs may be distributed to the same disk volume. For performance consideration, you may need to change this option to make them distributed to different volumes. This path can be changed to reside in a non-MSDP volume. | NA |
UploadCacheGB | It is the maximum space usage of upload cache. Upload cache is a subdirectory named "upload" under CachePath. For performance consideration, it should be set to larger than: (max concurrent write stream number) * MaxFileSizeMB * 2. So, for 100 concurrent streams, about 13 GB is enough. Note: The initial value of UploadCacheGB in the When you add a new cloud LSU, the value of UploadCacheGB is equal to CloudUploadCacheSize. You can later change this value in the | 12 |
DownloadDataCacheGB | It is the maximum space usage of data file, mainly the Note: The initial value of DownloadDataCacheGB in the When you add a new cloud LSU, the value of DownloadDataCacheGB is equal to CloudDataCacheSize. You can later change this value in the | 500 |
DownloadMetaCacheGB | It is the maximum space usage of metadata file, mainly the Note: The initial value of DownloadMetaCacheGB in the When you add a new cloud LSU, the value of DownloadMetaCacheGB is equal to CloudMetaCacheSize. You can later change this value in the | 500 |
MapCacheGB | It is the max space usage of Note: The initial value of MapCacheGB in the When you add a new cloud LSU, the value of MapCacheGB is equal to CloudMapCacheSize. You can later change this value in the | 5 |
UploadConnNum | Maximum number of concurrent connections to the cloud provider for uploading. Increasing this value is helpful especially for high latency network. | 60 |
DataDownloadConnNum | Maximum number of concurrent connections to the cloud provider for downloading data. Increasing this value is helpful especially for high latency network. | 40 |
MetaDownloadConnNum | Maximum number of concurrent connections to the cloud provider for downloading metadata. Increasing this value is helpful especially for high latency network. | 40 |
MapConnNum | Maximum number of concurrent connections to the cloud provider for downloading map. | 40 |
DeleteConnNum | Maximum number of concurrent connections to the cloud provider for deleting. Increasing this value is helpful especially for high latency network. | 100 |
KeepData | Keep uploaded data to data cache. The value always false if UseMem is true. | false |
KeepMeta | Keep uploaded meta to meta cache, always false if UseMem is true. | false |
ReadOnly | LSU is read only, cannot write and delete on this LSU. | false |
MaxFileSizeMB | Max size of bin file in MB. | 64 |
WriteThreadNum | The number of threads for writing data to the data container in parallel that can improve the performance of IO. | 2 |
RebaseThresholdMB | Rebasing threshold (MB), when image data in container less than the threshold, all of the image data in this container will not be used for deduplication to achieve good locality. Allowed values: 0 to half of MaxFileSizeMB, 0 = disabled | 4 |
AgingCheckContainerIntervalDay | The interval of checking a container for this Cloud LSU (in days). Note: For upgraded system, you must add this manually if you want to change the value for a cloud LSU. | 180 |
The contentrouter.cfg file is available at: <STORAGE>/etc/puredisk/contentrouter.cfg.
The file has the following parameters:
Parameter | Details | Default value |
|---|---|---|
CloudDataCacheSize | Default data cache size when adding Cloud LSU. Decrease this value if enough free space is not available. | 500 GiB |
CloudMapCacheSize | Default map cache size when adding Cloud LSU. Decrease this value if enough free space is not available. | 5 GiB |
CloudMetaCacheSize | Default meta cache size when adding Cloud LSU. Decrease this value if enough free space is not available. | 500 GiB |
CloudUploadCacheSize | Default upload cache size when adding Cloud LSU. The minimum value is 12 GiB. | 12 GiB |
MaxCloudCacheSize | Specify the maximum cloud cache size in percentage. It is based on total system memory, swap space excluded. | 20 % |
CloudBits | The number of top-level entries in the cloud cache. This number is (2^CloudBits). Increasing this value improves cache performance, at the expense of extra memory usage. Minimum value = 16, maximum value = 48. | Auto-sized according to MaxCloudCacheSize |
DCSCANDownloadTmpPath | While using the dcscan to check cloud LSU, data gets downloaded to this folder. For details, see the dcscan tool in cloud support section. | disabled |
UsableMemoryLimit | Specify the maximum usable memory size in percentage. MaxCacheSize + MaxCloudCacheSize + Cloud in-memory upload cache size must be less than or equal to the value of UsableMemoryLimit | 80% |
MaxSamplingCacheSize | Specify the maximum sampling cache size in percentage for all cloud LSUs here. UsableMemoryLimit + MaxSamplingCacheSize must be less than or equal to 95%. If you want to limit the maximum sampling cache size for a cloud LSU, you can configure LSUSamplingCachePercent in | 5% |
Adding a new cloud LSU fails if no partition has free space more than the following:
CloudDataCacheSize + CloudMapCacheSize + CloudMetaCacheSize + CloudUploadCacheSize + WarningSpaceThreshold * partition size
Use thecrcontrol --dsstat 2 --verbosecloud command to check the space of each of the partition.
Note:
Each Cloud LSU has a cache directory. The directory is created under an MSDP volume that is selected according to the disk space usage of all the MSDP volumes. Cloud LSU reserves some disk space for cache from that volume, and the local LSU cannot utilize more disk space.
The initial reserved disk space for each of the cloud LSU is the sum of values of UploadCacheGB, DownloadDataCacheGB, DownloadMetaCacheGB, and MapCacheGB in the <STORAGE>/etc/puredisk/cloud.json file. The disk space decreases when the caches are used.
There is a Cache options in crcontrol --dsstat 2 --verbosecloud output:
# crcontrol --dsstat 2 --verbosecloud
=============== Mount point 2 ===============
Path = /msdp/data/dp1/1pdvol
Data storage
Raw Size Used Avail Cache Use%
48.8T 46.8T 861.4G 46.0T 143.5G 2%
Number of containers : 3609
Average container size : 252685915 bytes (240.98MB)
Space allocated for containers : 911943468161 bytes (849.31GB)
Reserved space : 2156777086976 bytes (1.96TB)
Reserved space percentage : 4.0%
The Cache option is the currently reserved disk space by cloud for this volume. The disk space is the sum of the reserved space for all cloud LSUs that have cache directories on this volume. The actually available space for Local LSU on this volume is Avail - Cache.
The contentrouter.cfg file has the following aging check related parameters:
Parameter | Details | Default value |
|---|---|---|
EnableAgingCheck | Enable or disable Cloud LSU container aging check. | true |
AgingCheckAllContainers | This parameter determines whether to check all containers or not. If set to 'false', it only checks containers in some latest images | false |
AgingCheckSleepSeconds | Aging check thread wakes up periodically with this time interval (in seconds). | 20 |
AgingCheckBatchNum | The number of containers for aging check each time. | 400 |
AgingCheckContainerInterval | Default interval value of checking a container when adding Cloud LSU (in days). | 180 |
AgingCheckSizeLowBound | This threshold is used to filter the containers whose size is less than this value for aging check. | 8Mib |
AgingCheckLowThreshold | This threshold is used to filter the containers whose garbage percentage is less than this value (in percentage). | 10% |
After you update the aging check related parameters, you must restart the MSDP service. You can use the crcontrol command line to update those parameters without restarting MSDP service.
To update the aging parameters using crcontrol command line
- Enable cloud aging check for all cloud LSUs.
/usr/openv/pdde/pdcr/bin/crcontrol --cloudagingcheckon
- Enable cloud aging check for a specified cloud LSU.
/usr/openv/pdde/pdcr/bin/crcontrol --cloudagingcheckon <dsid>
- Disable cloud aging check for all cloud LSUs.
/usr/openv/pdde/pdcr/bin/crcontrol --cloudagingcheckoff
- Disable cloud aging check for a specified cloud LSU.
/usr/openv/pdde/pdcr/bin/crcontrol --cloudagingcheckoff <dsid>
- Show cloud aging check state for all cloud LSUs.
/usr/openv/pdde/pdcr/bin/crcontrol --cloudagingcheckstate
- Show cloud aging check state for a specified cloud LSU.
/usr/openv/pdde/pdcr/bin/crcontrol --cloudagingcheckstate <dsid>
- Change cloud aging check to fast mode for all cloud LSUs.
/usr/openv/pdde/pdcr/bin/crcontrol --cloudagingfastcheck
- Change cloud aging check to fast mode for a specified cloud LSU.
/usr/openv/pdde/pdcr/bin/crcontrol --cloudagingfastcheck <dsid>
The spa.cfg file is available at: <STORAGE>/etc/puredisk/spa.cfg.
The file has the following parameters:
Parameter | Details | Default value |
|---|---|---|
CloudLSUCheckInterval | The check cloud LSU status interval in seconds. | 1800 |
EnablePOIDListCache | The status of the POID (Path Object ID) list cache as enabled or disabled. Path Object contains the metadata associated with that image. . | true |