Product Documentation
- Section I. Overview and planning
- Introduction to Resiliency Platform
- About Resiliency Platform features and components
- Replication in a Resiliency Platform deployment
- About Veritas Resiliency Platform Data Mover
- Recovery options using Resiliency Platform
- Deployment checklist
- System requirements
- Manage licenses
- Using the Web Console
- Introduction to Resiliency Platform
- Section II. Deploying and configuring the virtual appliances
- Deploy and configure
- Deploying the virtual appliances in AWS through AWS Marketplace
- Deploying the virtual appliances in AWS using OVA files
- Deploying the Data Gateway in AWS
- Deploying the virtual appliances in Azure using PowerShell script
- Deploying the virtual appliances in Azure through Azure Marketplace
- Deploying the virtual appliances in vCloud
- Deploying the virtual appliances in HUAWEI CLOUD
- Deploying the virtual appliances in Orange Recovery Engine
- About configuring the Resiliency Platform components
- Virtual appliance security features
- About hotfixes
- Apply Updates
- About applying updates to Resiliency Platform
- Setting up the YUM server
- Deploy and configure
- Section III. Setting up and managing the resiliency domain
- Managing the resiliency domain
- Getting started with a new Resiliency Platform configuration
- Managing Resiliency Managers
- Managing Infrastructure Management Servers
- Managing on-premises data centers
- Managing cloud configurations
- Managing private cloud configurations
- Integrating with NetBackup
- Integrating with InfoScale Operations Manager
- Managing the resiliency domain
- Section IV. Adding the asset infrastructure
- Manage Resiliency Platform host assets
- Prerequisites for adding hosts
- Removing hosts
- Preparing host for replication
- Manage VMware assets
- Managing VMware virtualization servers
- Prerequisites for adding VMware virtualization servers
- Prerequisites for adding VMware virtualization servers
- Managing VMware virtualization servers
- Manage Veritas Replication VIB
- Manage Hyper-V assets
- Manage Gateways
- About Replication Gateway pair
- Managing Data Gateway
- Manage enclosure assets
- Adding a discovery host
- Configuration prerequisites for adding storage enclosures to an IMS
- Adding storage enclosures
- Adding RecoverPoint appliance for replication
- Manage Resiliency Platform host assets
- Section V. Managing networks
- Manage networks
- About network objects
- Manage settings
- Managing user authentication and permissions
- Configuring authentication domains
- Managing user authentication and permissions
- Manage networks
- Section VI. Working with resiliency groups
- Organize assets
- Viewing resiliency group details
- Manage virtual business services
- Organize applications
- Managing custom applications
- Managing service objectives
- Organize assets
- Section VII. Configuring for disaster recovery
- Configure using Resiliency Platform Data Mover
- Managing virtual machines for remote recovery (DR) in Amazon Web Services
- Prerequisites for configuring VMware virtual machines for recovery to AWS
- AWS Customization options panel
- Managing virtual machines for remote recovery (DR) to Azure
- Managing virtual machines for remote recovery (DR) to OpenStack
- Managing virtual machines for remote recovery (DR) to HUAWEI CLOUD
- Managing virtual machines for remote recovery (DR) to Orange Recovery Engine
- Managing virtual machines for remote recovery (DR) in vCloud Director
- Managing virtual machines for remote recovery (DR) using Resiliency Platform Data Mover
- Managing physical machines for remote recovery (DR) using Resiliency Platform Data Mover
- Configure using NetBackup
- Configure using 3rd party replication technology
- Preparing VMware virtual machines for using array-based replication
- Preparing Hyper-V virtual machines for using array-based replication
- Managing virtual machines for remote recovery (DR) using 3rd party replication technology
- Managing applications for remote recovery (DR)
- Preparing VMware virtual machines for using array-based replication
- Configure using Resiliency Platform Data Mover
- Section VIII. Managing disaster recovery
- Perform DR operations for virtual machines
- Performing the rehearsal operation for virtual machines
- Perform DR operations on a VBS
- Perform DR operations for applications
- Evacuate assets
- Manage Resiliency Plans
- About custom script
- Perform DR operations for virtual machines
- Section IX. Product settings
- View activities
- Manage reports
- View logs
- Manage Risk Notifications
- Managing settings for alerts and notifications and miscellaneous product settings
- Section X. Using Resiliency Platform APIs
- Section XI. Troubleshooting and Using command line interface
- Troubleshoot
- Recovery of Resiliency Platform components from disaster scenarios
- Resolving the Admin Wait state
- Use klish menu
- Use Application Enablement SDK
- Troubleshoot
Predefined risks in Resiliency Platform
Table: Predefined risks lists the predefined risks available in Resiliency Platform. These risks are reflected in the current risk report and the historical risk report.
Table: Predefined risks
Risks | Description | Risk detection time | Risk type | Affected operation | Fix if violated |
---|---|---|---|---|---|
Veritas Infoscale Operations Manager disconnected | Checks for Veritas Infoscale Operations Manager to Resiliency Manager connection state | 1 minute | Error | All operations | Check Veritas Infoscale Operations Manager reachability Try to reconnect Veritas Infoscale Operations Manager |
vCenter Password Incorrect | Checks if vCenter password is incorrect | 15 minutes | Error |
| In case of a password change, resolve the password issue and refresh the vCenter configuration |
VM tools not installed | Checks if VM Tools are not Installed. It may affect IP Customization and VM Shutdown | 5 minutes | Error |
|
|
Snapshot reverted on Virtual Machine | Checks if snapshot has been reverted on virtual machine | 5 minutes | Error | Resiliency Platform Data Mover replication | Perform the Resync operation on the resiliency group. |
Resiliency Platform Data Mover daemon crashed | Resiliency Platform Data Mover filter is not able to connect to its counterpart in ESX. The replication process has stopped and is at risk | 5 minutes | Error | Resiliency Platform Data Mover replication |
|
DataMover virtual machine in no-op mode | Checks if VM Data Mover filter is not able to connect to its counterpart in ESX | 5 minutes | Error | Resiliency Platform Data Mover replication | In order to continue the replication, you can move (VMotion) the VM to a different ESX node in the cluster and either troubleshoot the issue with this ESX node or raise a support case with Veritas |
Veritas Replication policy has been detached | Veritas Replication policy has been detached from the disk associated with virtual machine. | 5 minutes | Error | Migrate | Perform Resync operation on the affected resiliency group. |
Asset disk configuration changed | Checks if disk configuration of any of the assets in the resiliency group has changed. | 30 minutes | Error |
| Refresh the respective hosts, vCenter servers or Hyper-V servers and the cloud discovery. After refresh, probe the risk. After performing the above mentioned step even if the risk still exists, edit the resiliency group to first remove the impacted virtual machine from the resiliency group and then add it back to the resiliency group. |
Asset NIC configuration changed | Checks if NIC configuration of any of the assets in the resiliency group has changed. | 30 minutes | Error |
| If the resilience group is online on the target data center, then either revert the NIC changes done on the virtual machines or suppress the risk to be able to migrate the assets back to the source data center. If the resiliency group is online on source data center, edit the resiliency group with Edit Configuration or Customize Network option to update the NIC configuration. |
Invalid NIC Configuration | One or more NICs on the host are not configured properly. | Real time, while creating resiliency group | Error | Create resiliency group | Ensure that the keys NAME, DEVICE and HWADDR have appropriate values as per the details of each NIC in its configuration file. |
Global user deleted | Checks if there are no global users. In this case, the user will not be able to customize the IP for Windows machines in VMware environment | Real time | Warning |
| Edit the resiliency group or add a Global user |
Failure to validate Windows Global User credentials for IP customization | This risk is raised if:
| After the resiliency group is configured for disaster recovery | Warning |
| Add Windows Global Users with appropriate credentials. Edit the resiliency group using the Network Customization option to resolve the risk. |
Missing heartbeat from Resiliency Manager | Checks for heartbeat failure from a Resiliency Manager | 5 minutes | Error | All | Fix the Resiliency Manager connectivity issue |
Infrastructure Management Server disconnected | Check for Infrastructure Management Server(IMS) to Resiliency Manager(RM) connection state | 1 minute | Error | All | Check IMS reachability Try to reconnect IMS |
Storage Discovery Host down | Checks if the discovery daemon is down on the storage discovery host | 15 minutes | Error | Migrate | Resolve the discovery daemon issue |
DNS removed | Checks if DNS is removed from the resiliency group where DNS customization is enabled | real time | Warning |
| Edit the Resiliency Group and disable DNS customization |
IOTap driver not configured | Checks if the IOTap driver is not configured | 2 hours | Error | None | Configure the IOTap driver This risk is removed when the workload is configured for disaster recovery. |
VMware Discovery Host Down | Checks if the discovery daemon is down on the VMware Discovery Host | 15 minutes | Error | Migrate | Resolve the discovery daemon issue |
VM restart is pending | Checks if the virtual machine has not been restarted after add host operation | 2 hours | Error | Create resiliency group | Restart the virtual machine after add host operation |
New virtual machine added to replication storage | Checks if a virtual machine that is added to a Veritas Replication Set on a primary site, is not a part of the resiliency group |
5 minutes | Error |
| Add the virtual machine to the resiliency group |
Replication lag exceeding RPO | Checks if the replication lag exceeds the thresholds defined for the resiliency group. This risk affects the SLA for the services running on your production data center | 5 minutes | Warning |
| Check if the replication lag exceeds the RPO that is defined in the Service Objective |
Replication state broken/critical | Checks if the replication is not working or is in a critical condition for each resiliency group | 5 minutes | Error |
| Contact the enclosure vendor. In case of Resiliency Platform Data Mover, See Admin Wait state codes . or raise a support case with Veritas |
Remote mount point already mounted | Checks if the mount point is not available for mounting on target site for any of the following reasons:
|
| Warning |
| Unmount the mount point that is already mounted or is being used by other assets Risk gets resolved after 30 minutes if a successful cleanup rehearsal, migrate, or takeover operation performed and VMware vCenter gets refreshed within 30 minutes. |
Disk utilization critical | Checks if at least 80% of the disk capacity is being utilized. The risk is generated for all the resiliency groups associated with that particular file system |
| Warning |
| Delete or move some files or uninstall some non-critical applications to free up some disk space |
ESX not reachable | Checks if the ESX server is in a disconnected state | 5 minutes | Error |
| Resolve the ESX server connection issue |
vCenter Server not reachable | Checks if the virtualization server is unreachable or if the password for the virtualization server has changed | 5 minutes | Error |
| Resolve the virtualization server connection issue In case of a password change, resolve the password issue |
Insufficient compute resources on failover target | Checks if there are insufficient CPU resources on failover target in a virtual environment | 6 hours | Warning |
| Reduce the number of CPUs assigned to the virtual machines on the primary site to match the available CPU resources on failover target |
Host not added on recovery data center | Checks if the host is not added to the IMS on the recovery data center | 30 minutes | Error | Migrate | Check the following and fix:
|
NetBackup Notification channel disconnected | Checks for NetBackup Notification channel connection state | 5 minutes | Error | Restore | Check if the NetBackup Notification channel is added to the NetBackup master server |
Backup image violates the defined RPO | Checks if the backup image violates the defined RPO | 30 minutes | Warning | No operation |
|
NetBackup master server disconnected | Checks if NetBackup master server is disconnected or not reachable | 5 minutes | Error | Restore | Check if IMS is added as an additional server to the NetBackup master server |
Assets do not have copy policy | Checks if the assets do not have a copy policy | 3 hours | Warning | No operation | Set up copy policy and then refresh the NetBackup master server |
Target replication is not configured | Checks if the target replication is not configured | 3 hours | Warning | No operation | Configure target replication and then refresh the NetBackup master server |
Disabled NetBackup Policy | Checks if NetBackup policy associated with the virtual machine is disabled | 3 hours | Warning | No operation | Fix the disabled policy |
Replication block tracking disk not found | Checks for the replication block tracking disk. If the replication block tracking disk is not found, then virtual machine does not get configured for remote recovery and the replication stops | 30 minutes | Error | Migrate | Ensure that the RBT disk is attached to the virtual machine. After the risk gets resolved, perform reboot of VM then perform the resync operation to avoid disk corruption during migrate or migrate back. If you are not able to locate the RBT disk then perform following steps in the order listed:
|
Members are manually deleted from network groups | Network group goes into faulted state when a member is manually removed. The risk is circulated to resiliency group | Immediate | Warning | Migrate, Rehearse | Edit the network group by adding the missing member and then edit the resiliency group details |
Members deleted from network groups | Network group goes into faulted state when a discovered member gets deleted from IMS. The risk is circulated to resiliency group | 5 minutes | Warning | Migrate, Rehearse | Edit the network group by adding the missing member and then edit the resiliency group details |
Virtual machine configuration not backed up | Unable to take a backup of virtual machine configuration file. | Immediate | Error |
| Check the state of the IMS and its corresponding assets such as the hypervisors and vCenter servers. Perform edit resiliency group operation. |
Unable to backup latest Virtual machine configuration | Unable to take a backup of the latest configuration file of the virtual machine. | Immediate | Warning |
| Check the state of the IMS and its corresponding assets such as the hypervisors and vCenter servers. Perform edit resiliency group operation. |
Datastore for disk has changed to X, this datastore is not part of resiliency group | If virtual disk is moved to a non-compliant datastore. Applicable for 3rd party replication technology | 5 to 15 minutes | Error | All operations except start and stop resiliency group | Edit the resiliency group or move the disk to a datastore which is part of the resiliency group. |
Datastore for configuration file has changed to X, this datastore is not part of resiliency group. Previous datastore was Y. | If the virtual machine configuration file is moved to a non-compliant datastore. Applicable for 3rd party replication technology | 5 to 15 minutes | Error | All operations except start and stop resiliency group | Edit the resiliency group or move the disk to a datastore which is part of the resiliency group. |
Disk path has changed | Displayed when virtual machine snapshot is taken. Risk is resolved automatically after updating the blob. | 5 to 15 minutes | Error | All operations | Risk is automatically resolved. |
New datastore added to the consistency group is not part of resiliency group | New datastore added to consistency group Applicable for 3rd party replication technology | 6 hours | Error |
| Edit the resiliency group |
Datastore removed from resiliency group | Datastore removed from consistency group Applicable for 3rd party replication technology | 6 hours | Error |
| Edit the resiliency group |
Veritas Replication VIB upgrade pending | Checks if the Veritas Replication VIB version on ESXi cluster has latest version installed. | 6 hours | Error | None | Upgrade the Veritas Replication VIB to the latest version. |
Veritas Replication VIB is in partial state. | Checks if the Veritas Replication VIB installation on ESXi cluster is in partial or unknown state. | 6 hours | Error |
| Perform Resolve and Verify operation on the ESXi cluster to fix the installation issues. |
Insufficient privileges on vCenter server | Operations on the resiliency group may fail because of missing privileges on vCenter server data centers. | 6 hours | Warning | One or more operations on resiliency group may fail because of missing privileges on vCenter server data center. | Ensure that appropriate privileges are configured on vCenter server data center before invoking any operation. Refer to the documentation for the required privileges. |
Infrastructure Management Server data reporting disabled | Infrastructure Management Server cannot report data to Resiliency Manager due to version incompatibility | As soon as IMS connects to the Resiliency Manager after the Resiliency Manager upgrade | Error | All | Upgrade IMS to the latest version that is specified in the risk message |
DRS Datastore Is Added Or Removed | New datastore is added to the cluster or is removed from the cluster | 6 Hours | Warning | None | Edit the resiliency group |
Datastore Cluster Deleted | Datastore cluster is deleted from the data center | 6 Hours | Error |
| Edit the resiliency group |
SNMP Trap Receiver Not Added Or Deleted | SNMP trap receiver is either not added or is deleted | 6 Hours | Error |
| Add the SNMP trap receiver |
vCloud Director discovery failed | Checks whether vCloud Director assets can be discovered using the vCloud Director configuration | 10 mins | Error | None | Check the user privileges and then refresh the discovery for vCloud Director. If the password has changed, you need to edit the cloud configuration to update the new password. |
All the hosts on the applications are not reachable | All the hosts for the application are not reachable | 15 minutes | Error | None | Check the connectivity with the application hosts |
Application host is disconnected due to change in MAC address | Application Host is in Disconnected state | 15 minutes | Error |
| Retry Add Host operation |
Assets does not have copy policy | Assets does not have copy policy | When vrp_host unassociated with copy policy. | Warning | None | Check if any asset has no copy policy |
Backup image violates the defined RPO | Checks if the backup image violates the defined RPO | Immediate | Warning |
| |
CPU Usage Critical | Available compute capacity on the recovery site may be inadequate for recovering this application. This risk affects the recoverability of the services running on your production data center. | 6 hours | Warning | None | Reduce the number of CPUs assigned to the virtual machines on the primary site to match the available CPU resources on failover target |
Incorrect .Net version is installed | The expected .NET version is not installed or it is not compatible with the PowerShell version | 2 hours | Error |
| Ensure that the .NET version is installed with its compatible PowerShell version. Refer to the HSCL for compatible versions of .NET and PowerShell. |
Editing the resiliency group is required | Resiliency group needs an upgrade or perform Edit operation. | Immediate | Warning | None | Edit the resiliency group using the Edit Configuration intent. Ensure that the resiliency group is online on the source datacenter before performing the edit operation |
Evacuation plan for data center has been invalidated. | Evacuation plan for data center has been invalidated, due to adding , deleting or updating a resiliency group or a VBS | Immediate | Error | None | Regenerate the evacuation plan. |
Host reboot is pending after upgrade | The OS is not rebooted after upgrade operation | Immediate | Warning | None | Virtual machine requires to be rebooted after the upgrade operation |
Mount point is deleted | Check if the mount point on which the assets of the resiliency group are configured, is deleted or renamed | 6 hours | Error |
| Remount using the same mount point else you need to edit the resiliency group |
PowerShell is not initialized | PowerShell is not initialized | 2 hours | Error |
| Check PowerShell Initialization on host |
PowerShell is not installed | PowerShell is not installed | 2 hours | Error |
| Install PowerShell (version > 2.0) on host |
Powershell Version is incorrect | Expected Powershell version not found | 2 hours | Error |
| Install Powershell version should be 2.0 and above |
Registry Parameter LSI_SAS is not set | Registry Parameter LSI_SAS is not set | 2 hours | Error |
| Change the value for registry parameter LSI_SAS->Start to 0 and refresh host discovery |
Replication Gateway is not reachable | The Replication Gateway is down or not reachable from the IMS | 15 minutes | Error | None | Make sure the replication gateway appliance is running and is reachable from the IMS |
Replication state synchronizing | Data synchronization is in progress. | 5 minutes | Warning | None | Wait for synchronization to complete (Replication state should be Active (Connected |Consistent)) |
Resync operation is pending on a resiliency group | Resync operation is pending on current resiliency group | Immediate | Error | On Secondary site: migrate operation | Execute Resync operation on current resiliency group |
Resiliency group configuration drift | Disk configuration for asset(s) in the resiliency group is changed. This is a configuration drift. | 2 minutes | Error |
| Refresh the respective hosts, vCenter servers or Hyper-V servers and the cloud discovery. After refresh, probe the risk. If the risk still exists, remove the virtual machine from the resiliency group and re-add using the Edit operation |
Resiliency group configuration error | The disk size of the virtual machine in the resiliency group has changed. This is a configuration error | 2 hours | Error |
|
|
Resiliency group outage in datacenter | Outage has been declared for the resiliency group in the datacenter | Immediate | Error | None | Perform remediation steps to clear outage in the specified datacenter. Run a Resync or Clear outage operation (as applicable) to indicate that the outage has been cleared |
Data sync failed between Resiliency Manager and database. | Data sync failed between Resiliency Manager and database. | As soon as the vrp_rm vertex gets updated with property db_status as value "Data sync failed" | Error | None | Perform Resync operation for Resiliency Manager |
SAN Policy Offline Shared | SAN policy on the Windows host is Offline Shared | 2 hours | Warning | None | Change the SAN policy on the Windows host to Online Shared and refresh the host discovery information |
Stale configuration :: Object Deleted | Asset is unavailable | As soon as discovery reports delete of addressable objects. | Error |
| Reconfigure the asset |
Stale configuration :: Object Unreachable | Asset is unreachable | As soon as discovery reports DISCONNECTED or NOT REACHABLE fault for addressable objects. | Error |
| Check the connectivity of the asset. |
The migrated virtual machine is not added to the target IMS. | The migrated virtual machine is not added to the target IMS. | 45 minutes | Error |
| Refer to the documentation to know the possible reasons for failure of add host operation |
Unable to get VMX | Unable to backup virtual machine configuration file | Immediate | Error |
| Check the state of IMS, its corresponding assets such as the hypervisors and vCenter servers. Perform edit resiliency group operation. |
Unable to update virtual machine configurations file | Unable to backup latest virtual machine configuration | Immediate | Error | None | Check the state of IMS, its corresponding assets such as the hypervisors and vCenter servers. Perform edit resiliency group operation. |
vCenter server is removed from IMS | vCenter server is removed from IMS | Immediate | Error |
| Add the vCenter server to the IMS. |
VCS Servicegroup Faulted | VCS Servicegroup is in Faulted state | 1 hour | Error | None | Resolve the fault on VCS Servicegroup |
Insufficient quota on target vCloud Director | Sufficient quota(CPUs/Memory/Storage) is not available on target vCloud Director. | 5 minutes | Error | None | Sufficient quota should be available on the target vCloud Director |
Virtual machine is deleted | One or more virtual machines are deleted or unregistered. The virtual machines belong to a resiliency group that is configured for remote recovery. This affects the recoverability of the resiliency group. | 6 hours | Error | On Secondary site: migrate operation | Edit the resiliency group to remove the virtual machines that are deleted or unregistered. |
Virtual machine is not protected | Virtual machine is not configured for remote recovery | Immediate | Error | None | If the virtual machine is in production data center then configure the virtual machine for remote recovery. If the virtual machine is in vCloud data center then ensure that disk.EnableUUID property is set to TRUE on the VRP_VAPP_TEMPLATE virtual machine as well as on the migrated virtual machine. After the risk is resolved, perform the Resync operation to avoid disk corruption during migrate or migrate back operation. |
VMware discovery failed | VMware discovery is failed | 6 hours | Error | None | In case of a password change, resolve the password issue and refresh the vCenter server configuration |
IO Filter is not replicating the IOs from the virtual machine | IO Filter has encountered a fatal error | When IMS is receiving NOOP snmp event. | Error |
| If IO filter has encountered errors, either invoke the edit resiliency group workflow to remove and re-add asset from the resiliency group or delete the resiliency group and create it again |
Cloud discovery failed | Cloud discovery has failed. | After 5 minutes | Error |
| Edit the cloud configuration to resolve the issue. If risk persists contact Veritas Support. |
Cloud authentication failed | Cloud credentials are incorrect | After 5 minutes | Error |
| Edit cloud configuration and provide correct credentials to resolve the issue. In case of AWS, check the IAM role with proper privileges is attached to IMS. |
Cloud connection timeout | Connection timed out fetching information about cloud resources. | After 5 minutes | Error |
| Resolve network connectivity between IMS and cloud data center and then refresh the cloud configuration. |
NTP Time Sync Failed | NTP time skew. Time skew must be less than 3 seconds. | 5 minutes | Warning | None | Synchronize with NTP server. |
NTP Time Unsynchronized | Not able to synchronize with the NTP server. | 5 minutes | Warning | None | Synchronize with NTP server. |
NTP Time Indeterminate | NTP status indeterminate | 5 minutes | Warning | None | Synchronize with NTP server. |
Resiliency Group Configuration Drift for Network changed of some of the assets in the Resiliency Group | This risk is raised if network of some of the assets in the Resiliency Group is changed after the Resiliency Group is created. The change can be in the VLAN, vSwitch or cloud network settings. | Error | The risk is resolved when the deleted network gets discovered in Veritas Resiliency Platform. Or the network update risk will be resolved after successful editing the Resiliency Group. |