User Tools

Site Tools


Sidebar

Table Of Contents

endurox:v8.0.x:manuals:exsingleckl.8

exsingleckl

Name

exsingleckl — Support server for checking local node singleton process group lock status

Synopsis

exsinglesv exsingleckl exsingleckr

DESCRIPTION

exsinglesv is a special XATMI server used for providing lock services for singleton process groups. For each singleton group, one copy of the server is required to be configured.

The lock server supports file system-based fnctl() locking. When process acquires such lock, it reports the status to the shared memory, so that ndrxd(8) and cpmsrv(8) can do the booting or failover sequences. At the lock time, secondary "heartbeat" lock file is updated, with the maximum value of counters seen in the file and incrementing that by some value larger than 1 (e.g. set by NDRX_SGLOCKINC ex_env(5) parameter). This ensures that if for some error reason another node continues to do the heartbeats, it would see that it has lost the lock.

The lock server periodically checks the lock health, by doing the following actions:

  • Checking of the primary lock file and additionally doing regular counter increments stored in secondary heartbeat (ping) lock file. Each cluster node stores two checksumed values of current counter for it’s own node id. Currently active exsinglesv instance verifies that it’s counter is highest of all counters found in the file. If condition is false, group is unlocked and process exits with failure.
  • Checks include lock state verification against process state and shared memory state.

In case if any checks fail, a singleton process group in shared memory (if was locked) and lock files are unlocked, and the process exits with failure.

Lock provider distinguishes two locking modes:

  1. If the process is started by normal boot (i.e. it does not respawn after crash), the process locks the primary file and immediately reports the locking status to the shared memory.
  2. If the process is re-spawned after the crash, it locks the primary file, however, it does not report the status immediately to the shared memory. Instead, it waits for the number of lock health checks (basically time delay) and after that, the group becomes locked. This is done for the reason, of letting the other (previously active) Enduro/X node kill the processes for the group that lost the lock.

The status for the process groups can be checked by running the following xadmin(8) command:

$ xadmin psg

exsinglesv has two user exits which may be set in the configuration file, parameters: exec_on_bootlocked and exec_on_locked. The first program is executed in case if lock was acquired during the normal startup, and the second (exec_on_locked) is executed in case if failover has happened (lock files which were locked by other node, were unlocked) or if exsinglesv did crash, was restarted and acquired the lock. If programs specified by these flags return non 0 status, the exsinglesv would unlock any resources it locked and exits with failure.

exsingleckl provides cluster consistency check service used by tmsrv(8), in case if in ndrxconfig.xml(5) tag <procgroups>/<procgroup> attribute sg_nodes_verify is set to Y. For load balancing service can be started in multiple copies. Separate process instances are required for each of the singleton groups.

Advertised service name is "@SGLOC-<node_id>-<grpno>". When request is received by the service, it check the if local node process group is locked, and if so it tries to access network services for other nodes, to verify group statuses (they all must not be locked). If network service is not available for the node, checks are done in read-only mode against heartbeat file (lockfile_2). The current group must have the highest counter number for the whole cluster. If that is not met, the group is unlocked and service reports group not locked.

exsingleckr provides remote check services for the locks. Requests are received from exsinglesv and exsingleckl servers via tpbridge(8). Remote check service read local shared memory singleton process group table and provides group status back to the caller. Service name "@SGREM-<node_id>-<grpno>" is advertised. Several instances of the server can be started. This service is optional, however for performance reasons, it is recommended to configure exsingleckr, regardless of the sg_nodes_verify attribute value, as exsinglesv during the periodic checks, always attempts to perform network call firstly.

Binaries exsinglesv, exsingleckl and exsingleckr uses the same configuration section. For these XATMI servers, tag <procgrp_lp> must be set to group name to which they provide locking services. However sections may be separated by CCTAGs.

For more detailed setup of the singleton groups, functions and limitations, please see more Enduro/X Administration Manual, High availability features section.

Binary is available for Enduro/X version 8.0.10.

CONFIGURATION

Process requires common configuration (ini-files) to be set up for the Enduor/X instance, where exsinglesv is booted.

Process reads settings from "[@exsinglesv[/<NDRX_CCTAG>]]" section.

lockfile_1=FILE_PATH
Primary lock file-path on which fcntl() advisory write lock is acquired. parameter is mandatory to be set. If file is missing, it will be created. For distributed servers, path to file must be located on the shared file system, such as GPFS, GFS2 or any other shared file system which supports fcntl().
lockfile_2=FILE_PATH
Heartbeat lock file, which periodically (by the chkinterval) performs lock counter update to the file. Additionally file is used for checks for remote node lock status, in case if remote check service (@SGREM) is not available, or noremote parameter is set to 1. File system shall be writable by the process. If file is missing, it will be created. File is expected to point to the same directory where lockfile_1 is located.
exec_on_bootlocked=FILE_PATH
Optional user exit, which is executed in case if lock was acquired during the normal startup. The program file is executed by system() call. If the program returns non 0 status, the exsinglesv would unlock files and restart.
exec_on_locked=FILE_PATH
Optional user exit, which is executed in case if lock was acquired during the failover or after the crash. The program file is executed by system() call. If the program returns non 0 status, the exsinglesv would unlock files and restart.
chkinterval=NUMBER_OF_SECONDS
Interval in seconds between lock checks. Parameter is optional. The default value is extracted from NDRX_SGREFRESH environment variable, which value is divided by 3. The final parameter value must be greater than 0. Additionally ULOG warning will be generated in case if check interval multiplied by 3 is longer time specified in NDRX_SGREFRESH. setting. Note that the default NDRX_SGREFRESH value is 30 seconds, in such case without specified parameter default check interval is calculated as 10 sec.
locked_wait=NUMBER_OF_CHECKS
The number of check cycles after which file locked status is reported to the shared memory of the Enduro/X application domain. This parameter is only used in case if it acquired the lockfile_1 lock when binary was already running (e.g. failover happens for other node to this) or exsinglesv by it self for some reason was reloaded to due to crash (failed exit, etc). At normal application domain boot, this wait does not apply and file lock is reported immediately to the shared memory. The parameter is optional, and the default value is calculated as NDRX_SGREFRESH value divided by chkinterval and multiplied by 2. If using all defaults, then this value is set to 6. This means that after the failover, that singleton group depending on this lock provider will be booted only after 60 seconds.
noremote=NO_REMOTE_SETTING
If set to 1, exsinglesv and exsingleckl checks cluster lock status in heartbeat file, instead of doing remote call to remote machines. For such file systems as GPFS, this might give performance benefit (as local checks on this file system is faster than doing remote call). Need for setting the flag shall be evaluated on test basis for the give system configuration. Default value is 0, meaning that firstly cluster check will attempt to call remote services and if that fail, do checks is Heartbeat file.

LIMITATIONS

If doing manual xadmin(8) command based exsinglesv start (or restart/sreload) on booted application, the return from command might delay, as depending on current ndrxd(8) sanity cycle, the singleton process group might pass for startup and only then xadmin will return results to command line back. However this is subject to change for future releases, where after the first boot all start/stop operations on the exsinglesv might be considered as failover recovery. Currently manual start stops, are assumed and locked as fresh boot operations (i.e. doing immediate lock of the group).

EXIT STATUS

0
Success
1
Failure

EXAMPLE

This section demonstrates simple configuration for one group. Note that such configuration shall match an all involved cluster nodes which serves the given singleton group.

ndrxconfig.xml demonstrates configuration for the group named "GRPV":

<?xml version="1.0" ?>
<endurox>
    <procgroups>
        <procgroup grpno="5" name="GRPV" singleton="Y" sg_nodes="1,4" sg_nodes_verify="Y"/>
    </procgroups>
    <servers>

        <!-- lock provider for group 5 -->
        <server name="exsinglesv">
            <!-- only one lock provider for the group! -->
            <min>1</min>
            <max>1</max>
            <srvid>10</srvid>
            <sysopt>-e ${NDRX_ULOG}/exsinglesv.log -r</sysopt>
            <procgrp_lp>GRPV</procgrp_lp>
            <cctag>GRPVCCT</cctag>
        </server>

        <!-- support servers, local -->
        <server name="exsingleckl">
            <min>10</min>
            <max>10</max>
            <srvid>15</srvid>
            <sysopt>-e ${NDRX_ULOG}/exsingleckl.log -r</sysopt>
            <procgrp_lp>GRPV</procgrp_lp>
            <cctag>GRPVCCT</cctag>
        </server>

        <!-- support servers, remote -->
        <server name="exsingleckr">
            <min>10</min>
            <max>10</max>
            <srvid>30</srvid>
            <sysopt>-e ${NDRX_ULOG}/exsingleckr.log -r</sysopt>
            <procgrp_lp>GRPV</procgrp_lp>
            <cctag>GRPVCCT</cctag>
        </server>

        <!-- banksv1 is configured as singleton in the cluster -->
        <server name="banksv1">
            <min>1</min>
            <max>1</max>
            <srvid>120</srvid>
            <sysopt>-e ${NDRX_ULOG}/banksv1.log -r</sysopt>
            <procgrp>GRPV</procgrp>
        </server>

        ...

        <!-- for demo purposes, we show configuration for client daemon processes too -->
        <server name="cpmsrv">
            <min>1</min>
            <max>1</max>
            <srvid>9999</srvid>
            <sysopt>-e ${NDRX_ULOG}/cpmsrv.log -r -- -k3 -i1</sysopt>
        </server>

    </servers>
    <clients>
        <!-- bankcl is also singleton in the cluster -->
        <client cmdline="bankcl" procgrp="GRPV">
            <exec tag="BANK1" subsect="1" autostart="Y" log="${NDRX_ULOG}/bankcl-1.log"/>
        </client>
        ...
    </clients>
</endurox>

app.ini

...
[@exsinglesv/GRPVCCT]
lockfile_1=/path/to/shared/file/system/GRPV_lock_1
lockfile_2=/path/to/shared/file/system/GRPV_lock_2
...

BUGS

Report bugs to support@mavimax.com

COPYING

© Mavimax, Ltd