Veritas NetBackup™ Deduplication Guide
- Introducing the NetBackup media server deduplication option
- Planning your deployment
- About MSDP storage and connectivity requirements
- About NetBackup media server deduplication
- About NetBackup Client Direct deduplication
- About MSDP remote office client deduplication
- About MSDP performance
- MSDP deployment best practices
- Provisioning the storage
- Licensing deduplication
- Configuring deduplication
- Configuring the Deduplication Multi-Threaded Agent behavior
- Configuring the MSDP fingerprint cache behavior
- Configuring MSDP fingerprint cache seeding on the storage server
- About MSDP Encryption using KMS service
- Configuring a storage server for a Media Server Deduplication Pool
- Configuring a disk pool for deduplication
- Configuring a Media Server Deduplication Pool storage unit
- About MSDP optimized duplication within the same domain
- Configuring MSDP optimized duplication within the same NetBackup domain
- Configuring MSDP replication to a different NetBackup domain
- About NetBackup Auto Image Replication
- Configuring a target for MSDP replication to a remote domain
- Creating a storage lifecycle policy
- Resilient Network properties
- Editing the MSDP pd.conf file
- About protecting the MSDP catalog
- Configuring an MSDP catalog backup
- Configuring deduplication to the cloud with NetBackup CloudCatalyst
- Using NetBackup CloudCatalyst to upload deduplicated data to the cloud
- Configuring a CloudCatalyst storage server for deduplication to the cloud
- Monitoring deduplication activity
- Viewing MSDP job details
- Managing deduplication
- Managing MSDP servers
- Managing NetBackup Deduplication Engine credentials
- Managing Media Server Deduplication Pools
- Changing a Media Server Deduplication Pool properties
- Configuring MSDP data integrity checking behavior
- About MSDP storage rebasing
- Managing MSDP servers
- Recovering MSDP
- Replacing MSDP hosts
- Uninstalling MSDP
- Deduplication architecture
- Troubleshooting
- About unified logging
- About legacy logging
- Troubleshooting MSDP installation issues
- Troubleshooting MSDP configuration issues
- Troubleshooting MSDP operational issues
- Troubleshooting CloudCatalyst issues
- CloudCatalyst logs
- Problems encountered while using the Cloud Storage Server Configuration Wizard
- Disk pool problems
- Problems during cloud storage server configuration
- CloudCatalyst troubleshooting tools
- Appendix A. Migrating to MSDP storage
About the CloudCatalyst cache
The administrator configures a local cache directory as part of configuring a CloudCatalyst storage server. The primary function of the local cache directory (or CloudCatalyst cache) is to allow the CloudCatalyst to continue to deduplicate data. Deduplication of data occurs even if the ingest rate from targeted backup and duplication jobs temporarily exceeds the available upload throughput to the destination cloud storage.
For example, if backup and duplication jobs transfer 10 TB of data per hour to the CloudCatalyst storage server, and the CloudCatalyst deduplicates the data at a ratio of 10:1, the 1 TB of deduplicated data may exceed the upload capacity of .7 TB per hour of writes to cloud storage. The cache allows the jobs to continue to send and process the data, assuming that at some point the incoming data rate slows. The CloudCatalyst cache only stores the deduplicated data. Jobs are not marked as complete until all data is uploaded to the cloud.
While a CloudCatalyst cache of 4 TB is recommended, a larger cache has the following benefits:
For restores:
If the data exists in the CloudCatalyst cache, it is restored from the cache instead of the cloud. The larger the cache, the more deduplicated objects can reside in the cache.
For data with poor deduplication rates:
A larger cache may be required since the poor deduplication ratios require that larger amounts of data be uploaded to the cloud.
For job windows that experience bursts of activity:
A larger cache can be helpful if frequent jobs are targeted to the CloudCatalyst storage server within a narrow window of time.
While a larger cache can be beneficial, jobs are not marked as complete until all data is uploaded to the cloud. Data is uploaded from the cache to the cloud when an MSDP container file is full. This occurs soon after the backup or duplication job begins, but not immediately. Deduplication makes it possible for second and subsequent backup jobs to transfer substantially less data to the cloud, depending on the deduplication rate.
For example, 4 TB of cache is expected to manage 1 PB of data in the cloud without issue.
Note:
If you initiate a restore from Glacier or Glacier Deep Archive, NetBackup initiates a warming step. NetBackup does not proceed with the restore until all the data is available in S3 storage to be read.
The warming step is always done if using Amazon, even if the data is in the CloudCatalyst cache. For storage classes other than Glacier and Glacier Deep Archive, the warming step is almost immediate with no meaningful delay. For Glacier and Glacier Deep Archive, the warming step may be immediate if files were previously warmed and are still in S3 Standard storage. However, it may take several minutes, hours, or days depending on settings being used.
The CloudCatalyst manages the cache based on the configuration settings in the esfs.json file. Once the high watermark is reached, data is purged when the used space reaches the midpoint between HighWatermark
and LowWatermark
(high+low)/2
and continues until LowWatermark
is reached. If the rate of incoming data exceeds the rate where the watermark can be maintained, the jobs begin to fail. Administrators should not manually delete or purge the managed data in the cache storage unless directed to do so by NetBackup Technical Support.