Veritas NetBackup for Hadoop Administrator's Guide
- Introduction
- Installing and deploying Hadoop plug-in for NetBackup
- Configuring NetBackup for Hadoop
- Managing backup hosts
- Configuring the Hadoop plug-in using the Hadoop configuration file
- Configuring NetBackup policies for Hadoop plug-in
- Performing backups and restores of Hadoop
- Troubleshooting
- Troubleshooting backup issues for Hadoop data
- Troubleshooting restore issues for Hadoop data
NetBackup for Hadoop terminologies
The following table defines the terms you will come across when using NetBackup for protecting Hadoop cluster.
Table: NetBackup terminologies
Terminology | Definition |
---|---|
Compound job | A backup job for Hadoop data is a compound job.
|
Discovery job | When a backup job is executed, first a discovery job is created. The discovery job communicates with the NameNode and gathers information of the block that needs to be backed up and the associated DataNodes. At the end of the discovery, the job populates a workload discovery file that NetBackup then uses to distribute the workload amongst the backup hosts. |
Child job | For backup, a separate child job is created for each backup host to transfer data to the storage media. A child job can transfer data blocks from multiple DataNodes. |
Workload discovery file | During discovery, when the backup host communicates with the NameNode, a workload discovery file is created. The file contains information about the data blocks to be backed up and the associated DataNodes. |
Workload distribution file | After the discovery is complete, NetBackup creates a workload distribution file for each backup host. These files contain information of the data that is transferred by the respective backup host. |
Parallel streams | The NetBackup parallel streaming framework allows data blocks from multiple DataNodes to be backed up using multiple backup hosts simultaneously. |
Backup host | The backup host acts as a proxy client. All the backup and restore operations are executed through the backup host. You can configure media servers, clients, or a master server as a backup host. The backup host is also used as destination client during restores. |
BigData policy | The BigData policy is introduced to:
|
Application server | Namenode is referred to as a application server in NetBackup. |
Primary NameNode | In a high-availability scenario, you need to specify one NameNode with the BigData policy and with the tpconfig command. This NameNode is referred as the primary NameNode. |
Fail-over NameNode | In a high-availability scenario, the NameNodes other than the primary NameNode that are updated in the |
Table: Hadoop terminologies
Terminology | Definition |
---|---|
NameNode | NameNode is also used as a source client during restores. |
DataNode | DataNode is responsible for storing the actual data in Hadoop. |
Snapshot-enabled directories (snapshottable) | Snapshots can be taken on any directory once the directory is snapshot-enabled.
|