Search <book_title>...

Important Update: Cohesity Products Documentation

All Cohesity product documentation are now managed via the Cohesity Docs Portal: https://docs.cohesity.com/HomePage/Content/home.htm. Some documentation available here may not reflect the latest information or may no longer be accessible.

Veritas NetBackup™ OpsCenter Administrator's Guide

Last Published: 2021-06-07

Product(s): NetBackup (9.1)

OpsCenter Alert conditions

OpsCenter comes with a set of predefined alert conditions. You can create alert policies based on these alert conditions to detect when something goes wrong in your NetBackup environment and troubleshoot NetBackup. The alerts help you to anticipate and handle problems before they occur. You can receive these alerts by logging on to OpsCenter, and also by email or SNMP traps. You can specify email and SNMP recipients while creating an alert policy.

Alert conditions can be divided into the following categories:

Event-based alert conditions	For these alert conditions, OpsCenter retrieves data from NetBackup based on notifications from NBSL.
Periodic alert conditions	For these alert conditions, OpsCenter retrieves data from NetBackup based on a wait time (of up to 15 minutes).

Table: Alert conditions in OpsCenter lists the alert conditions, alert category, and descriptions.

Table: Alert conditions in OpsCenter

Alert type	Alert condition	Alert category	Description
Job	High job failure rate	Event-based	An alert is generated when the job failure rate becomes more than the specified rate.
	Hung job	Periodic	An alert is generated when a job hangs (runs for more than the specified time) for a selected policy or a client for a specified period. The Hung Job condition is checked every 15 minutes. Depending upon when a job starts within a check cycle, an alert may not occur. For Hung Job alert, you can configure OpsCenter to ignore the time for which a job is in a queued state. While checking the Hung Job condition, OpsCenter considers the start time of a job by default. This also includes the time for which a job is in a queued state. A job may not always be in an active state after it starts. Due to unavailability of resources, a job may first be in a queued state before it becomes active. If you configure OpsCenter to ignore the queued time for a job, OpsCenter considers the time when a job becomes active while checking the Hung Job condition. Note that the active start time of the first attempt is considered. For example, suppose a policy is created with a job threshold of 25 minutes. A job starts 10 minutes after a first check cycle and ends 13 minutes after the third check cycle is done. This time is a total execution of 33 (5 + 15 + 13) minutes, but an alert is not raised. In this case, the policy is checked four times. The job was not yet started during the first check, was running less than the threshold during the second (job duration = 5 minutes) and third checks (job duration = 20 minutes), and the job completes (job duration = 33) before the fourth check. If a job starts at 4 minutes after a first check, an alert is raised at the third check, since the job has executed for 26 minutes (11 + 15 minutes).
	Job finalized	Events-based	An alert is generated when a job of specified type, of the specified policy or client ended in the specified status.
	Incomplete Job	Events-based	An alert is generated when a job of a specified type of the specified policy or client moves to an Incomplete state.
Media	Frozen media	Events-based	An alert is generated when any of the selected media is frozen.
	Suspended media	Events-based	An alert is generated when any of the selected media is suspended.
	Exceeded max media mounts	Events-based	An alert is generated when a media exceeds the threshold number of mounts.
	Media required for restore	Events-based	An alert is generated when a restore operation requires media. The restore operation may require a specific media which contains the specific image to be restored.
	Low available media	Periodic	An alert is generated when the number of available media becomes less than the predefined threshold value. Note: When you select All Primary Server from the View drop-down list, low available media alert raises separate alerts for all the primary servers listed under All Primary Server. For example: If there are 5 primary servers present under the All Primary Servers view, OpsCenter will raise 5 alerts for each primary server.
	High suspended media	Periodic	An alert is generated when the percentage of suspended media exceeds the predefined threshold value.
	High frozen media	Periodic	An alert is generated when the percentage of frozen media exceeds the predefined threshold value.
	Zero Cleaning Left	Events-based	An alert is generated when a cleaning tape has zero cleaning left.
Catalog	Catalog Space low	Periodic	An alert is generated when space available for catalogs becomes less than the threshold value or size. For Catalog Space low condition, you can specify the threshold value for a particular policy in percentage, bytes, kilobytes (KB), megabytes (MB), gigabytes (GB), terabytes (TB) or petabytes (PB) and generate alerts. The generated alert can also show available catalog space using these units.
	Catalog not Backed up	Periodic	An alert is generated when catalog backup does not take place for a predefined time period. This does not necessarily mean that if you do not receive this alert, the catalog backup was successful.
	Catalog Backup Disabled	Periodic	An alert is generated when all the catalog backup policies are disabled. If the policy has been defined for a server group, an alert is generated for every primary server within the group that satisfies this criteria. The alert is not generated if no catalog backup policy exists for a primary server.
Device	Mount Request	Events-based	An alert is generated on a media mount request.
	No Cleaning Tape	Periodic	An alert is generated when no cleaning tapes are left.
	Drive is Down	Events-based	An alert is generated when a drive in a specified robot or media server in the selected server context goes down.
	High Down Drives	Periodic	An alert is generated when the percentage of down drives exceeds the predefined threshold value.
	OpenStorage	Events-based	An alert is generated when specific events occur in the NetApp devices. See About the Open Storage alert condition. See Adding an alert policy .
Disk	Disk Pool Full	Events-based	An alert is generated when a disk pool(s) reaches the high water mark. An alert policy based on Disk Pool Full condition generates an alert only when the used capacity of the disk pool reaches the high water mark.
	Disk Volume Down	Events-based	An alert is generated when the selected disk volume(s) is down.
	Low Disk Volume Capacity	Periodic	An alert is generated when a disk volume capacity is running below the threshold limit.
	Primary Server Unreachable	Events-based	An alert is generated when OpsCenter loses contact with the primary server. This alert condition means that the connection between OpsCenter and the managed NetBackup primary server is lost. It does not necessarily mean that NetBackup backups are not working.
	Lost Contact with Media Server	Events-based	An alert is generated when OpsCenter loses contact with the media server.
	Appliance Hardware Failure	Events-based	An alert is generated in case of OpsCenter Appliance hardware failure.
Others	Service Stopped	Events-based	An alert is generated when the selected appliance hardware fails. You can set this alert condition to monitor your NetBackup or deduplication appliance hardware.
	Job Policy Change	Events-based	An alert is generated when a policy attribute for a job policy is changed. Multiple alerts are generated if multiple attributes are changed for a job policy See Additional information on job policy change condition. If you select a particular job policy, only the selected job policy is monitored for change. If you do not select any job policy, all the job policies are monitored for changes.
	OpsCenter Tuning	Events-based	An alert is generated when the currently allocated memory parameters are less than the recommended ones.