NetBackup™ Deduplication Guide
- Introducing the NetBackup media server deduplication option
- Planning your deployment
- About MSDP storage and connectivity requirements
- About NetBackup media server deduplication
- About NetBackup Client Direct deduplication
- About MSDP remote office client deduplication
- About MSDP stream handlers
- MSDP deployment best practices
- Provisioning the storage
- Configuring deduplication
- About the MSDP Deduplication Multi-Threaded Agent
- About MSDP fingerprinting
- Enabling 400 TB support for MSDP
- Configuring a storage server for a Media Server Deduplication Pool
- About disk pools for NetBackup deduplication
- Configuring a Media Server Deduplication Pool storage unit
- Configuring client attributes for MSDP client-side deduplication
- About MSDP encryption
- About MSDP Encryption using NetBackup Key Management Server service
- About a separate network path for MSDP duplication and replication
- About MSDP optimized duplication within the same domain
- Configuring MSDP replication to a different NetBackup domain
- About NetBackup Auto Image Replication
- Configuring a target for MSDP replication to a remote domain
- About storage lifecycle policies
- Resilient network properties
- About variable-length deduplication on NetBackup clients
- About the MSDP pd.conf configuration file
- About saving the MSDP storage server configuration
- About protecting the MSDP catalog
- About NetBackup WORM storage support for immutable and indelible data
- Running MSDP services with the non-root user
- Running MSDP commands with the non-root user
- MSDP volume group (MVG)
- About the MSDP volume group
- Configuring the MSDP volume group
- MSDP cloud support
- About MSDP cloud support
- Cloud space reclamation
- About the disaster recovery for cloud LSU
- About Image Sharing using MSDP cloud
- About MSDP cloud immutable (WORM) storage support
- About immutable object support for AWS S3
- About object-level immutable storage support for Google Cloud Storage
- About AWS IAM Role Anywhere support
- About Azure service principal support
- About NetBackup support for AWS Snowball Edge
- About the cloud direct
- S3 Interface for MSDP
- Configuring S3 interface for MSDP on MSDP build-your-own (BYO) server
- Identity and Access Management (IAM) for S3 interface for MSDP
- S3 APIs for S3 interface for MSDP
- Disaster recovery in S3 interface for MSDP
- Monitoring deduplication activity
- Viewing MSDP job details
- Managing deduplication
- Managing MSDP servers
- Managing NetBackup Deduplication Engine credentials
- Managing Media Server Deduplication Pools
- Changing a Media Server Deduplication Pool properties
- About MSDP data integrity checking
- About MSDP storage rebasing
- Managing MSDP servers
- Recovering MSDP
- Replacing MSDP hosts
- Uninstalling MSDP
- Deduplication architecture
- Configuring and managing universal shares
- Introduction to universal shares
- Prerequisites to configure universal shares
- Managing universal shares
- Restoring data using universal shares
- Advanced features of universal shares
- Direct universal share data to object store
- Universal share accelerator for data deduplication
- Configure a universal share accelerator
- About the universal share accelerator quota
- Load backup data to a universal share with the ingest mode
- Universal share scale out
- Managing universal share services
- Troubleshooting issues related to universal shares
- Configuring isolated recovery environment (IRE)
- Configuring an isolated recovery environment using the web UI
- Configuring an isolated recovery environment using the command line
- Using the NetBackup Deduplication Shell
- Managing users from the deduplication shell
- About the external MSDP catalog backup
- Managing certificates from the deduplication shell
- Managing NetBackup services from the deduplication shell
- Monitoring and troubleshooting NetBackup services from the deduplication shell
- Managing S3 service from the deduplication shell
- Troubleshooting
- About unified logging
- About legacy logging
- Troubleshooting MSDP configuration issues
- Troubleshooting MSDP operational issues
- Trouble shooting multi-domain issues
- Appendix A. Migrating to MSDP storage
- Appendix B. Migrating from Cloud Catalyst to MSDP direct cloud tiering
- About direct migration from Cloud Catalyst to MSDP direct cloud tiering
- Appendix C. Encryption Crawler
About sampling and predictive cache
MSDP uses a memory up to a size that is configured in MaxCacheSize to cache fingerprints for efficient deduplication lookup. A new fingerprint cache lookup data scheme that is introduced in NetBackup release 10.1 reduces the memory usage. It splits the current memory cache into two components, sampling cache (S-cache) and predictive cache (P-cache). S-cache caches a percentage of the fingerprints from each backup and is used to find similar data from the samples of previous backups for deduplication. P-cache caches the fingerprints that are most likely used in the immediate future for deduplication lookup.
At the start of a job, a small portion of the fingerprints from its last backup is loaded into P-cache as initial seeding. The fingerprint lookup is done with P-cache to find duplicates, and the lookup misses are searched from S-cache samples to find the possible matches of previous backup data. If found, part of the matched backup fingerprints is loaded into P-cache for future deduplication.
The S-cache and P-cache fingerprint lookup method is enabled for local and cloud storage volumes with MSDP non-BYO deployments including Flex, Flex Worm, Flex Scale, NetBackup Appliance, AKS, and EKS deployment. This method is also enabled for cloud-only volumes for MSDP BYO platforms. For the platforms with cloud-only volume support, local volume still uses the original cache lookup method. You can find S-cache and P-cache configuration parameters under the Cache section of the configuration file contentrouter.cfg
.
From NetBackup 10.2, S-cache and P-cache fingerprint lookup method for local storage is used with the new setup for Flex, Flex WORM, and NetBackup Appliance. The upgrade does not change the S-cache and P-cache fingerprint lookup method.
The default values for S-cache and P-cache:
Configuration | Default value |
---|---|
MaxCacheSize | 512MiB |
MaxPredictiveCacheSize | 40% |
MaxSamplingCacheSize | 20% |
EnableLocalPredictiveSamplingCache in | true |
EnableLocalPredictiveSamplingCache in | true |
For the systems that use P/S cache, the local volume and cloud volumes share the same S-cache and P-cache size, and the overall memory is limited by UsableMemoryLimit.
The S-cache size is determined by the back-end MSDP capacity or the number of fingerprints from the back-end data. With the assumption that an average segment size of 32KB, the S-cache size is about 100MB per TB of back-end capacity. P-cache size is determined by the number of concurrent jobs and data locality or working set of the incoming data. With working set of 250MB per stream (about 5 million fingerprints). For example, 100 concurrent streams need a minimum memory of 25GB (100*250MB). The working set can be larger for certain applications with multiple streams and large data sets. As P-cache is used for fingerprint deduplication lookup and all fingerprints that are loaded into P-cache stay there until its allocated capacity is reached, the larger the P-cache size, the better the potential lookup hit rate, and the more memory usage. Under-sizing S-cache or P-cache leads to reduced deduplication rates and over-sizing increases the memory cost.