Veritas NetBackup for Hadoop Administrator's Guide
- Introduction
- Installing and deploying Hadoop plug-in for NetBackup
- Configuring NetBackup for Hadoop- About configuring NetBackup for Hadoop
- Managing backup hosts
- Adding Hadoop credentials in NetBackup
- Configuring the Hadoop plug-in using the Hadoop configuration file
- Configuration for a Hadoop cluster that uses Kerberos
- Configuring NetBackup policies for Hadoop plug-in
- Disaster recovery of a Hadoop cluster
 
- Performing backups and restores of Hadoop
- Troubleshooting- About troubleshooting NetBackup for Hadoop issues
- About NetBackup for Hadoop debug logging
- Troubleshooting backup issues for Hadoop data- Backup operation for Hadoop fails with error code 6599
- Backup operation fails with error 6609
- Backup operation failed with error 6618
- Backup operation fails with error 6647
- Extended attributes (xattrs) and Access Control Lists (ACLs) are not backed up or restored for Hadoop
- Backup operation fails with error 6654
- Backup operation fails with bpbrm error 8857
- Backup operation fails with error 6617
- Backup operation fails with error 6616
 
- Troubleshooting restore issues for Hadoop data- Restore fails with error code 2850
- NetBackup restore job for Hadoop completes partially
- Extended attributes (xattrs) and Access Control Lists (ACLs) are not backed up or restored for Hadoop
- Restore operation fails when Hadoop plug-in files are missing on the backup host
- Restore fails with bpbrm error 54932
- Restore operation fails with bpbrm error 21296
 
 
Protecting Hadoop data using NetBackup
Using the NetBackup Parallel Streaming Framework (PSF), Hadoop data can now be protected using NetBackup.
The following diagram provides an overview of how Hadoop data is protected by NetBackup.
Also, review the definitions of terminologies.See NetBackup for Hadoop terminologies.
As illustrated in the diagram:
- The data is backed up in parallel streams wherein the DataNodes stream data blocks simultaneously to multiple backup hosts. The job processing is accelerated due to multiple backup hosts and parallel streams. 
- The communication between the Hadoop cluster and the NetBackup is enabled using the NetBackup plug-in for Hadoop. - The plug-in is available separately and must be installed on all the backup hosts. 
- For NetBackup communication, you need to configure a Big Data policy and add the related backup hosts. 
- You can configure a NetBackup media server, client, or master server as a backup host. Also, depending on the number of DataNodes, you can add or remove backup hosts. You can scale up your environment easily by adding more backup hosts. 
- The NetBackup Parallel Streaming Framework enables agentless backup wherein the backup and restore operations run on the backup hosts. There is no agent footprint on the cluster nodes. Also, NetBackup is not affected by the Hadoop cluster upgrades or maintenance. 
For more information:
- See Limitations. 
- For information about the NetBackup Parallel Streaming Framework (PSF) refer to the NetBackup Administrator's Guide, Volume I.