Veritas NetBackup for Hadoop Administrator's Guide

Last Published:
Product(s): NetBackup (8.1)
  1. Introduction
    1.  
      Protecting Hadoop data using NetBackup
    2.  
      Backing up Hadoop data
    3.  
      Restoring Hadoop data
    4.  
      Deploying the Hadoop plug-in
    5.  
      NetBackup for Hadoop terminologies
    6.  
      Limitations
  2. Installing and deploying Hadoop plug-in for NetBackup
    1.  
      About installing and deploying the Hadoop plug-in
    2. Pre-requisites for installing the Hadoop plug-in
      1.  
        Operating system and platform compatibility
      2.  
        License for Hadoop plug-in for NetBackup
    3.  
      Best practices for deploying the Hadoop plug-in
    4.  
      Preparing the Hadoop cluster
    5.  
      Downloading the Hadoop plug-in
    6.  
      Installing the Hadoop plug-in
    7.  
      Verifying the installation of the Hadoop plug-in
  3. Configuring NetBackup for Hadoop
    1.  
      About configuring NetBackup for Hadoop
    2. Managing backup hosts
      1.  
        Whitelisting a NetBackup client on NetBackup master server
      2.  
        Configure a NetBackup Appliance as a backup host
    3.  
      Adding Hadoop credentials in NetBackup
    4. Configuring the Hadoop plug-in using the Hadoop configuration file
      1.  
        Configuring NetBackup for a highly-available Hadoop cluster
      2.  
        Configuring a custom port for the Hadoop cluster
      3.  
        Configuring number of threads for backup hosts
    5.  
      Configuration for a Hadoop cluster that uses Kerberos
    6. Configuring NetBackup policies for Hadoop plug-in
      1. Creating a BigData backup policy
        1. Creating BigData policy using the NetBackup Administration Console
          1.  
            Using the Policy Configuration Wizard to create a BigData policy for Hadoop clusters
          2.  
            Using the NetBackup Policies utility to create a BigData policy for Hadoop clusters
        2.  
          Using NetBackup Command Line Interface (CLI) to create a BigData policy for Hadoop clusters
    7.  
      Disaster recovery of a Hadoop cluster
  4. Performing backups and restores of Hadoop
    1. About backing up a Hadoop cluster
      1.  
        Pre-requisite for running backup and restore operations for a Hadoop cluster with Kerberos authentication
      2.  
        Backing up a Hadoop cluster
      3.  
        Best practices for backing up a Hadoop cluster
    2. About restoring a Hadoop cluster
      1. Restoring Hadoop data on the same Hadoop cluster
        1.  
          Using the Restore Wizard to restore Hadoop data on the same Hadoop cluster
        2.  
          Using the bprestore command to restore Hadoop data on the same Hadoop cluster
      2.  
        Restoring Hadoop data on an alternate Hadoop cluster
      3.  
        Best practices for restoring a Hadoop cluster
  5. Troubleshooting
    1.  
      About troubleshooting NetBackup for Hadoop issues
    2.  
      About NetBackup for Hadoop debug logging
    3. Troubleshooting backup issues for Hadoop data
      1.  
        Backup operation for Hadoop fails with error code 6599
      2.  
        Backup operation fails with error 6609
      3.  
        Backup operation failed with error 6618
      4.  
        Backup operation fails with error 6647
      5.  
        Extended attributes (xattrs) and Access Control Lists (ACLs) are not backed up or restored for Hadoop
      6.  
        Backup operation fails with error 6654
      7.  
        Backup operation fails with bpbrm error 8857
      8.  
        Backup operation fails with error 6617
      9.  
        Backup operation fails with error 6616
    4. Troubleshooting restore issues for Hadoop data
      1.  
        Restore fails with error code 2850
      2.  
        NetBackup restore job for Hadoop completes partially
      3.  
        Extended attributes (xattrs) and Access Control Lists (ACLs) are not backed up or restored for Hadoop
      4.  
        Restore operation fails when Hadoop plug-in files are missing on the backup host
      5.  
        Restore fails with bpbrm error 54932
      6.  
        Restore operation fails with bpbrm error 21296

NetBackup for Hadoop terminologies

The following table defines the terms you will come across when using NetBackup for protecting Hadoop cluster.

Table: NetBackup terminologies

Terminology

Definition

Compound job

A backup job for Hadoop data is a compound job.

  • The backup job runs a discovery job for getting information of the data to be backed up.

  • Child jobs are created for each backup host that performs the actual data transfer.

  • After the backup is complete, the job cleans up the snapshots on the NameNode and is then marked complete.

Discovery job

When a backup job is executed, first a discovery job is created. The discovery job communicates with the NameNode and gathers information of the block that needs to be backed up and the associated DataNodes. At the end of the discovery, the job populates a workload discovery file that NetBackup then uses to distribute the workload amongst the backup hosts.

Child job

For backup, a separate child job is created for each backup host to transfer data to the storage media. A child job can transfer data blocks from multiple DataNodes.

Workload discovery file

During discovery, when the backup host communicates with the NameNode, a workload discovery file is created. The file contains information about the data blocks to be backed up and the associated DataNodes.

Workload distribution file

After the discovery is complete, NetBackup creates a workload distribution file for each backup host. These files contain information of the data that is transferred by the respective backup host.

Parallel streams

The NetBackup parallel streaming framework allows data blocks from multiple DataNodes to be backed up using multiple backup hosts simultaneously.

Backup host

The backup host acts as a proxy client. All the backup and restore operations are executed through the backup host.

You can configure media servers, clients, or a master server as a backup host.

The backup host is also used as destination client during restores.

BigData policy

The BigData policy is introduced to:

  • Specify the application type.

  • Allow backing up distributed multi-node environments.

  • Associate backup hosts.

  • Perform workload distribution.

Application server

Namenode is referred to as a application server in NetBackup.

Primary NameNode

In a high-availability scenario, you need to specify one NameNode with the BigData policy and with the tpconfig command. This NameNode is referred as the primary NameNode.

Fail-over NameNode

In a high-availability scenario, the NameNodes other than the primary NameNode that are updated in the hadoop.conf file are referred as fail-over NameNodes.

Table: Hadoop terminologies

Terminology

Definition

NameNode

NameNode is also used as a source client during restores.

DataNode

DataNode is responsible for storing the actual data in Hadoop.

Snapshot-enabled directories (snapshottable)

Snapshots can be taken on any directory once the directory is snapshot-enabled.

  • Each snapshot-enabled directory can accommodate 65,536 simultaneous snapshots. There is no limit on the number of snapshot-enabled directories.

  • Administrators can set any directory to be snapshot-enabled.

  • If there are snapshots in a snapshot-enabled directory, it can cannot be deleted or renamed before all the snapshots are deleted.

  • A directory cannot be snapshot-enabled if one of its ancestors or descendants is a snapshot-enabled directory.