aRecovering SRDF Sessions Using SYMCLI

 

Topic

Managing SRDF

Selections

Select SRDF management tool: Managing SRDF using Solutions Enabler SYMCLI

Select SRDF using SYMCLI task: Basic SRDF Operations

Select an SRDF task: Recovering SRDF Sessions

 

 

.

Contents

Recovering SRDF sessions. 3

Overview  3

About this procedure. 3

Task 1:   Set up scheduled task and PATH environment variable. 3

Task 2:   Create an options file including your command options. 4

Task 3:   Manually start symrecover for the SRDF/A composite group. 9

Task 4:   Stop symrecover 9

 


 

Recovering SRDF sessions

Overview

Starting with Solutions Enabler version 6.4, single SRDF sessions can be monitored and restarted using the symrecover command.

symrecover can be run manually from the command line, but more commonly is configured to run continuously in the background to monitor the state of synchronous (SRDF/S) or asynchronous (SRDF/A) sessions.

If the utility detects a session failure, an automatic recovery and restart of the session is attempted, based on preconfigured settings made in the symrecover options file. The options file contains a list of parameters, which perform compound commands. Actions are performed in the exact order as listed in the options file.

The symrecover command requires a device group or composite group to manage devices and can be run from either the local (R1) or remote (R2) site as long as all the devices in the monitored group are fully viewable from the execution host.

About this procedure

·    This procedure was created using Solutions Enabler version 6.4.

·    This procedure explains how to use the symrecover command to monitor and restart an SRDF/A session using a composite group You should be familiar with the concepts of SRDF, and have some experience using Solutions Enabler SYMCLI commands and scripts before attempting this procedure.

·    This procedure is based on content in the following guides:

·    EMC Solutions Enabler Symmetrix SRDF Family CLI Product Guide

·    EMC Solutions Enabler Symmetrix CLI Command Reference

You can download the guides from EMC Online Support (registration required): https://support.EMC.com.

Task 1:    Set up scheduled task and PATH environment variable

To run the utility continuously in the background, set up a Windows Scheduled Task, a UNIX CRON/scheduled task, or a UNIX (RC.2) file according to your operating system.

PATH is a required environment variable setting. Append the SYMCLI library binary directories to your PATH environment variable according to your operating system.

UNIX

For UNIX C shell, ensure the following SYMCLI directories are appended to variable PATH:

     set path = ($path /usr/symcli/bin)

For UNIX Korn or Bourne shell, ensure the following SYMCLI directories are appended to variable PATH:

     PATH=$PATH:/usr/symcli/bin

     export PATH

Windows

For Windows, ensure the following SYMCLI directories are appended to the MS-DOS variable path:

     C:\Program Files\EMC\SYMCLI\bin

OpenVMS

For OpenVMS, ensure the following SYMCLI directory has been defined for all users (use emc_cli.com in the system login.com):

     SHOW LOGICAL SYMCLI$BIN

Task 2:    Create an options file including your command options

From your host, create a text options file using the vi text editor or other tool. Include in the options file all of the desired steps to be used to monitor and recover your failed SRDF session. Additionally, there are options file parameters for setting up error logging and event notification through email, and additional parameters for monitor, recovery, and restart actions.

In the following example, of a symrecover options file named recover_srdf.txt is created. The option parameters in the file are used to perform specific operations as follows:

·         Creates desired backup copies (gold copy),

·         Sets the logging level to report errors, warnings, and informational messages,

·         Sets monitor cycle times,

·         Specifies concurrent RDF definition,

·         Specifies all restart parameters, and

·         Emails the log information to the desired address and server.

 

 

# Option file for symrecover

#######################################################

goldcopy_type_r2 = bcv

goldcopy_bcv_r2_mirror_state_startup = establish

goldcopy_bcv_r2_mirror_state_post_restart = split

goldcopy_max_wait_bcv = 3600

log_level = 3

monitor_cycle_time = 600

monitor_only = 0

rdfg = name:London

restart_window = 3600

restart_max_attempts = 10

restart_max_wait_state_change = 3600

restart_max_wait_adcopy_sync = 3600

restart_delay = 10

restart_attempt_pause = 60

restart_group_on_startup = 0

email_addr_target = operations_alert@somewhere.com

email_server = mailhost.somewhere.com

 

In addition to the options in the options file, you must specify some options from the command line, including:

·         mode (-mode)

·         group (-g)

·         composite group (-cg), and

·         The options file (-options)

 

In addition to the options specified in the options file, you can specify options directly in the command line. Options specified in the command line override those contained in the options file.

Table 1 lists symrecover option parameters, required syntax and description.

 

Table 1 Option file parameters

 

Setting

Description

email_addr_target=

<e_addr1, e_addr2, ..., ...>

Email notification address on errors. If any of the email_* options are specified, then this option must also be specified to activate email alerts.  Multiple comma delimited addresses may be specified. There is no default value.

email_server=

<e_srvr_addr>

Specifies the host target email server. If any of the email_* options are specified then this option must also be specified to activate email alerts. There is no default value.

email_subject= <err_subject_string>

 

Specifies the email notification subject on errors. The default value is: SymRecover Alert: Host [HostName] Group [GrpName]

email_log_level= <SeverityLevel>

The severity level desired for the email alert triggering message. Possible values are:

0 = Off.

1 = Only Errors will be reported.

2 = Errors and Warnings will be reported.

3 = Errors, Warnings, and Informational messages will be reported.

4 = All messages will be reported including all SYMCLI commands and responses.

Note: For each message that meets the particular logging level requirement, an email will be sent with that message. It is highly recommended that at most this be set to either a 1 or a 2.

If the required email options (email_server and email_addr_target) are not specified, then the default value is 0.  If they are specified then the default value is 1.

goldcopy_type_r2= <CopyType>

Specifies the type of backup (gold copy) to be created on the R2 side. Possible values are:

none = no gold copy is desired. All other goldcopy_* options are ignored.

bcv   = a BCV gold copy on the R2 side is desired

The default is bcv and this value is case insensitive.

Note: The R2 BCVs need to have been paired with the R2 devices prior to using symrecover.

goldcopy_bcv_r2_mir_state_startup= <CopyState>

Specifies the desired state of the R2 BCV gold copy upon routine startup. Possible values are:

establish = the devices should be established

split         = the devices should be split

none        = the devices should be unchanged

The default is none and this value is case insensitive.

Note: If the gold copy type is BCV and the default state of the BCVs is establish this has been shown to exacerbate SRDF/A session drops.

goldcopy_bcv_r2_mir_state_post_
restart = <CopyState>

Following a successful SRDF/A session restart or BCV resync, specifies what state the R2 gold copy should be. Possible values are:

establish = the devices should be left established

split         = the devices should be split

The default is split and this value is case insensitive.

Note: If the gold copy type is BCV and the default state of the BCVs are establish this has been shown to exacerbate SRDF/A session drops.

goldcopy_max_wait_bcv=

<MaxWaitTime>

Specifies the maximum length of time in seconds that during a restart the program will wait for a group finish synchronizing the standards with the BCVs.

Possible values are 0 to maxint. The default is 0, which is to wait forever.

goldcopy_bcv_r2_mirror_resync_
interval = <resynctime
>

Defines the amount of time in minutes when the gold copy BCV mirror will be automatically resynchronized and then split. This action will only take place during non-error periods.

Valid values are 0, and 15 to maxint. Zero (0) indicates that the mirrors are never to be automatically synchronized outside of error producing events. The default is 0.

Note: If the gold copy type is BCV, then the act of frequently synchronizing the R2 BCVs has been shown to exacerbate SRDF/A session drops.

log_level= <level>

The desired logging level. Possible values are:

0 = Off.

1 = Only Errors will be reported.

2 = Errors and Warnings will be reported.

3 = Errors, Warnings, and Informational messages will be reported.

4 = All messages will be reported.

The default is 3.

monitor_cycle_time= <cycletime>

Defines the number of seconds to pause between monitor status scans. The minimum value is 30 seconds, the maximum is 3600 seconds. The default value is 300 seconds.

monitor_only= [0 | 1]

Specifies to only monitor the state of specified group. No recovery actions will take place. This option is not enabled by default.

Note: monitor_only, run_once, and run_until_first_failure are mutually exclusive options.

run_once= [0 | 1]

Specifies to check the status of the group once.  If the group needs recovery actions perform them. Exit after one check. This option is not enabled by default. This option ignores the setting of restart_max_attempts.

Note: monitor_only, run_once, and run_until_first_failure are mutually exclusive options.

run_until_first_failure= [0 | 1]

Specifies to monitor the group until the first failure occurs and then exit without performing any recovery action. This option is not enabled by default. This option ignores the setting of restart_max_attempts.

Note: monitor_only, run_once, and run_until_first_failure are mutually exclusive options.

rdfg= <rdfgvalue>

Specifies the concurrent RDF definition for the group. This value is taken directly as specified and no data validation is performed on it. Monitoring of concurrent RDF defined groups is only supported when symrecover is executed from the R1 side of the session.

This option is not set by default and non-concurrent RDF groups are assumed.

Note: If the group is a composite group, and consistency is enabled, this must be of the "name:" format and this value is case sensitive.

restart_adcopy_resynch_threshold= <tracks>

Specifies the number of tracks outstanding that during recovery will trigger a switch over to SRDF/A or SRDF/S. The default value is 30000.

restart_attempt_pause= <time>

Inserts a specified wait time before an attempt is made to restart a failed session to allow for things to settle down. After the restart_attempt_pause is complete, symrecover redrives the overall monitor loop. If there is still a problem, the restart failure count is incremented and a restart is attempted.

Valid values are 30 to 3600 seconds. The default is 60 seconds.

restart_delay= <time>

Inserts a specified wait time after an attempt is made to restart a failed session and the attempt itself fails.

Valid values are 0 (no delay, immediately restart) to maxint. The default is 30 seconds.

restart_group_on_startup= [0 | 1]

On symrecover startup, if the group being monitored is not initially in a CONSISTENT state (for SRDF/A) or the SYNCHRONIZED state (for SRDF/S), symrecover will consider that an error condition and exit.  If this option is specified, symrecover will attempt to recover the group on startup. This option is not enabled by default.

restart_max_attempts= <attempts>

Specifies the maximum number of restart attempts that are performed within the restart_window interval. After this limit is reached the program will terminate.

The range is from 0 to maxint. The value of 0 means to infinitely attempt. The default is 5 attempts.

restart_max_wait_adcopy_sync= <time>

Specifies the length of time (in seconds) that during a restart, the program will wait for a group to achieve the restart_adcopy_resync_threshold number of track pending.

Valid values are 0 to maxint. The value of 0 means to infinitely wait. The default is 0.

restart_max_wait_state_change= <statetime>

Specifies the length of time (in seconds) that during a restart, the program will wait for a group to change to a desired state once requested.

Valid values are 0 to maxint. The value of 0 means to infinitely wait. The default is 0.

restart_max_wait_warn_interval= <warntime>

Specifies the length of time (in seconds) that during a restart, while waiting for a state change to occur, for a progress warning message to be displayed.

Valid values are 0 and 30 to maxint. The value of 0 means to wait forever. The default is 600 seconds.

restart_rdfa_min_cycle_warn_
interval= <cyclewarntime
>

Specifies the length of time (in seconds) where a warning message is displayed when the RDFA minimum cycle time exceeds the restart_rdfa_min_cycle_warn_value parameter.

Valid values are 30 to maxint. The default is 600.

restart_rdfa_min_cycle_warn_value= <warntime>

Specifies the maximum value (in seconds) to which a trigger can occur with a warning message, indicating that the RDFA minimum cycle time has exceeded this value.

Valid values are 0 and 30 to maxint. The value of 0 means this feature is turned off. The default is 0.

restart_state_syncinprog_wait_time <time>

The maximum length of time (in seconds) that during a group syncinprog state that a sleep is done before rechecking the group status.

Valid values are [30] to [maxint]. The default is [120] seconds.

restart_state_transmit_warn_
interval= <time
>

Specifies the interval of time (in seconds) that while a group remains in a transmit idle state, a warning message is generated.

Possible values are 0 to maxint. The default is 300 seconds.

restart_state_transmit_wait_time= <transwaittime>

Specifies the maximum length of time (in seconds) that during a group transmit idle state, a sleep is done before rechecking the group status.

Valid values are 30 to maxint. The default is 120 seconds.

restart_sync_type= <synctype>

Specifies the type of synchronization to be used following the detection of a failed SRDF/A session.  Possible values are:

ADCOPY = adaptive copy disk

SYNC      = synchronous mode

NONE     = No intermediate track resynch stage will be attempted. A direct re-establish using the existing SRDF session mode will be attempted.

The default is ADCOPY.

restart_window= <time>

Specifies the length of time (in seconds) starting with the first failure, which begins the clock for counting all successive failures.  Any failures that occur within this time span are considered grouped.  This window is used to determine the maximum number of restarts that are permitted per window of time.

The minimum value is 1800 seconds, the maximum is 86400 seconds. The default is 3600 seconds.

Task 3:    Manually start symrecover for the SRDF/A composite group

Note:  If an SRDF/A group becomes synchronous (SRDF/S) while being monitored by symrecover, symrecover will attempt to reset the RDF link to SRDF/A mode.

                   

Type the symrecover start –cg GroupName –mode Mode –options FileName command to begin monitoring and recovery operations the specified composite group.

To start monitoring and recovery for composite group RDFAmon:

symrecover start –cg RDFAmon –mode async –options recover_srdf.txt

 

Note:  The group, mode and options file must be specified.

Task 4:    Stop symrecover

Type Ctrl/C to manually stop symrecover.

To stop a symrecover task running in the background, use one of the following options per your operating system:

·         Windows – Cancel the task in the Scheduled Tasks, or use End Task in the Task Manager.

·         UNIX – Issue the kill command

 

Note:  If the symrecover operation is only monitoring (i.e., no failure recovery or BCV resynchronization is in progress), then no additional action is necessary. If the symrecover operation is stopped in the middle of a BCV resynchronization or a recovery effort, then the state of the devices must be reviewed. This action would require an evaluation of the device group and BCVs to determine exactly where the recovery process stopped.

When symrecover exits, information is written to the log file and can be reviewed.

If there is an error condition, and email notification has been turned on, an email is sent out.