NetBackup Flex Scale Provides Enterprise Resiliency Protecting Against Multiple Concurrent Failures

보호 March 16, 2021
BlogHeroImage

When considering a scale-out data protection solution, you must consider more than just the backup and scaling capabilities, but also the resiliency. That is why Veritas designed NetBackup Flex Scale to provide enterprise-level resiliency while also maximizing performance and usable capacity.

NetBackup Flex Scale includes self-healing capabilities that ensure seamless operation in the event of multiple concurrent failures without interrupting operations or losing data. These resiliency features are automatically configured and protect against failures of nodes, disks, containers, networks, and a full site failure.

As a cluster grows in size, so does the number of components that could potentially experience a failure. With this complexity in mind, Veritas designed NetBackup Flex Scale’s integrated resiliency features to scale along with cluster size automatically, increasing the resiliency of larger clusters. NetBackup will continue running and maintain data integrity in any of the failure scenarios shown here.

Check out my video showing NetBackup jobs continuing on a four-node cluster that experiences both a disk and a node failure.

You can also configure NetBackup Flex Scale to protect against site-wide failures in either a:

  • Single NetBackup Domain, dual-site configuration or
  • Dual NetBackup Domain dual-site configuration.

It starts with the clustered filesystem that stores your backup data, and protects it with erasure coding using an 8:4 data-to-parity ratio.

The way that works is:

  • Firstly, NetBackup Flex Scale takes a 2 MB slice of the deduplicated backup data
  • Then NetBackup Flex Scale runs erasure coding on the slice.  Erasure coding involves:
    • dividing the data into equal-sized data fragments, in this case, eight 256K chunks (2M / 8). 
    • creating four equally sized parity fragments to protect the data

Note: In the event of failure, such as disk failure, access to any eight of these twelve blocks maintains the data integrity and allows for the recreation of missing fragments automatically.

  • Then these fragments are written and committed to the storage unit in a stripe, which is a logical sequence of blocks across different independent devices and their relationship to each other. These 12 fragments are distributed equally across the nodes and disks in the cluster.   

If any node gets more than one fragment of data from a stripe, NetBackup Flex Scale ensures that the fragments land on different drives to deliver the highest resiliency.  That way, if a disk fails, you are only losing a single fragment from a stripe, and you can lose a total of 4 fragments.

In short:  you can lose any 4 disks in a cluster and your data remains intact.

In addition to high data durability levels, erasure coding 8:4 also optimizes NetBackup Flex Scale performance for both backup and restores by using the I/O resources of the entire disk pool to ingest backup data. This eliminates bottlenecks otherwise caused by limiting the data protection jobs to a single node’s disks. Also, during data restore or when a failed disk is rebuilt, many disks contribute to the workload simultaneously by reading erasure-coded data fragments.

Check out my video showing four concurrent disk failures and NetBackup jobs continuing uninterrupted.

NetBackup Flex Scale uses erasure coding 8:4 to provide the best balance of performance, usable capacity, and data durability:

  • Optimal performance for both backup and restores: Performance is optimized by utilizing all the resources within the cluster—each slice utilizes 12 disks from up to 12 nodes for each 2M slice of a backup and each slice of a restore reads from 8 disks from up to 8 nodes. This eliminates bottlenecks otherwise caused by limiting the data protection jobs to a single node’s disks. Also, during data restore or when a failed disk is rebuilt, many disks contribute to the workload simultaneously by reading erasure-coded data fragments.
  • High usable capacity: 67% usable capacity to store your backup data
  • Data Durability: able to survive the loss of any 4 data disks in your cluster

Next, we need to ensure we are protecting the metadata, in NetBackup Flex Scale’s case this is stored in the catalog on the SSD disks. Here the initial cluster setup automatically configured this such that it is protected not one or two but four different ways:

  • Triple mirroring: ensuring that there are three copies of the catalog allowing the failure of SSDs from any two nodes. 
  • Backup copy: The initial configuration also sets up a NetBackup policy to back up the catalog regularly and store a copy on the erasure-coded hard drives. 
  • Snapshot copies: automatically taken every two hours, retaining 36 snapshots or three days’ worth of snapshots. 
  • Site replication: if you have a secondary site configured, the catalog is replicated to the secondary site.

The NetBackup services are containerized allowing them to be monitored and managed by the built-in cluster management software, InfoScale, which makes them highly resilient by automating their recovery. It works by monitoring the containers, when it detects a container failure, it will first automatically restart it in place, and if that isn’t possible, let’s say because of a node failure, it will be restarted on another node in the cluster. The built-in intelligent load balancer is aware of any cluster changes and automatically updates its algorithms for optimal job distribution.

The nodes themselves have redundant power supplies and two dual-port network adapters.  The network cabling connects one port from each NIC to different switches or VLANs for protection against switch failure.  

All in all, the NetBackup Flex Scale architecture delivers the enterprise resiliency you need in scale-out data protection solutions.

For more information on NetBackup Flex Scale, check out this white paper or visit the Appliance Solutions page on Veritas.com and watch the recorded NetBackup Flex Scale announcement and technical breakout sessions from the Conquer Every Cloud Virtual Conference.

blogAuthorImage
Sandra Moulton
Dir, Solutions Architect
VOX Profile