NetBackup™ for Hadoop Administrator's Guide

Last Published:
Product(s): NetBackup & Alta Data Protection (10.4)
  1. Introduction
    1.  
      Protecting NetBackup for Hadoop data using NetBackup
    2.  
      Backing up NetBackup for Hadoop data
    3.  
      Restoring NetBackup for Hadoop data
    4.  
      NetBackup for NetBackup for Hadoop terms
    5.  
      Limitations
  2. Prerequisites and best practices for the NetBackup for Hadoop plug-in for NetBackup
    1.  
      About deploying the Hadoop plug-in
    2. Prerequisites for the NetBackup for Hadoop plug-in
      1.  
        Operating system and platform compatibility
      2.  
        License for NetBackup for Hadoop plug-in for NetBackup
    3.  
      Preparing the NetBackup for Hadoop cluster
    4.  
      Best practices for deploying the NetBackup for Hadoop plug-in
  3. Configuring NetBackup for Hadoop
    1.  
      About configuring NetBackup for NetBackup for Hadoop
    2. Managing backup hosts
      1.  
        Including a NetBackup client on NetBackup primary server allowed list
      2.  
        Configure a NetBackup Appliance as a backup host
    3.  
      Adding NetBackup for Hadoop credentials in NetBackup
    4. Configuring the NetBackup for Hadoop plug-in using the NetBackup for Hadoop configuration file
      1.  
        Configuring NetBackup for a highly-available NetBackup for Hadoop cluster
      2.  
        Configuring a custom port for the NetBackup for Hadoop cluster
      3.  
        Configuring number of threads for backup hosts
      4.  
        Configuring number of streams for backup hosts
      5.  
        Configuring distribution algorithm and golden ratio for backup hosts
      6. Configuring communication between NetBackup and Hadoop clusters that are SSL-enabled (HTTPS)
        1.  
          ECA_TRUST_STORE_PATH for NetBackup servers and clients
        2.  
          ECA_CRL_PATH for NetBackup servers and clients
        3.  
          HADOOP_SECURE_CONNECT_ENABLED for servers and clients
        4.  
          HADOOP_CRL_CHECK for NetBackup servers and clients
        5.  
          Example values for the parameters in the bp.conf file
    5.  
      Configuration for a NetBackup for Hadoop cluster that uses Kerberos
    6.  
      Hadoop.conf configuration for parallel restore
    7.  
      Create a BigData policy for Hadoop clusters
    8.  
      Disaster recovery of a NetBackup for Hadoop cluster
  4. Performing backups and restores of Hadoop
    1. About backing up a NetBackup for Hadoop cluster
      1.  
        Prerequisites for running backup and restore operations for a NetBackup for Hadoop cluster with Kerberos authentication
      2.  
        Best practices for backing up a NetBackup for Hadoop cluster
      3.  
        Backing up a NetBackup for Hadoop cluster
    2. About restoring a NetBackup for Hadoop cluster
      1.  
        Best practices for restoring a Hadoop cluster
      2. Restoring Hadoop data on the same Hadoop cluster
        1.  
          Restore Hadoop data on the same Hadoop cluster
      3.  
        Restoring Hadoop data on an alternate Hadoop cluster
    3.  
      Best practice for improving performance during backup and restore
  5. Troubleshooting
    1.  
      About troubleshooting NetBackup for NetBackup for Hadoop issues
    2.  
      About NetBackup for Hadoop debug logging
    3. Troubleshooting backup issues for NetBackup for Hadoop data
      1.  
        Backup operation fails with error 6609
      2.  
        Backup operation failed with error 6618
      3.  
        Backup operation fails with error 6647
      4.  
        Extended attributes (xattrs) and Access Control Lists (ACLs) are not backed up or restored for Hadoop
      5.  
        Backup operation fails with error 6654
      6.  
        Backup operation fails with bpbrm error 8857
      7.  
        Backup operation fails with error 6617
      8.  
        Backup operation fails with error 6616
      9.  
        Backup operation fails with error 84
      10.  
        NetBackup configuration and certificate files do not persist after the container-based NetBackup appliance restarts
      11.  
        Unable to see incremental backup images during restore even though the images are seen in the backup image selection
      12.  
        One of the child backup jobs goes in a queued state
    4. Troubleshooting restore issues for NetBackup for Hadoop data
      1.  
        Restore fails with error code 2850
      2.  
        NetBackup restore job for NetBackup for Hadoop completes partially
      3.  
        Extended attributes (xattrs) and Access Control Lists (ACLs) are not backed up or restored for Hadoop
      4.  
        Restore operation fails when Hadoop plug-in files are missing on the backup host
      5.  
        Restore fails with bpbrm error 54932
      6.  
        Restore operation fails with bpbrm error 21296
      7.  
        Hadoop with Kerberos restore job fails with error 2850
      8.  
        Configuration file is not recovered after a disaster recovery

Protecting NetBackup for Hadoop data using NetBackup

Using the NetBackup Parallel Streaming Framework (PSF), NetBackup for Hadoop data can now be protected using NetBackup.

The following diagram provides an overview of how NetBackup for Hadoop data is protected by NetBackup.

Also, review the related terms for Hadoop.

See NetBackup for NetBackup for Hadoop terms.

Figure: Architectural overview

Architectural overview

As illustrated in the diagram:

  • The data is backed up in parallel streams wherein the DataNodes stream data blocks simultaneously to multiple backup hosts. The job processing is accelerated due to multiple backup hosts and parallel streams.

  • The communication between the NetBackup for Hadoop cluster and the NetBackup is enabled using the NetBackup plug-in for NetBackup for Hadoop.

    The plug-in is installed as part of the NetBackup installation.

  • For NetBackup communication, you need to configure a BigData policy and add the related backup hosts.

  • You can configure a NetBackup media server, client, or primary server as a backup host. Also, depending on the number of DataNodes, you can add or remove backup hosts. You can scale up your environment easily by adding more backup hosts.

  • The NetBackup Parallel Streaming Framework enables agentless backup wherein the backup and restore operations run on the backup hosts. There is no agent footprint on the cluster nodes. Also, NetBackup is not affected by the NetBackup for Hadoop cluster upgrades or maintenance.

For more information: