Bug #123

Transaction Manager counts re-tries incorrectly.

Added by Pablo about 7 years ago. Updated almost 7 years ago.

Status:ClosedStart date:04/11/2017
Priority:High (Code 3)Due date:
Assignee:-% Done:

100%

Category:-
Target version:-

Description

Transaction has failed (due to some unknown reason - maybe DB):

N:NDRX:5:10465:001:20170328:161119000:autils.c:0401:Serialized xid: [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A]
N:NDRX:5:10465:001:20170328:161119000:autils.c:0405:xid deserialization total len: 27
U:UBF :5:10465:001:20170328:161119000:autils.c:0406:XID data for deserialization
  0000  61 94 f7 a1 10 05 0d 79 7a f4 89 77 42 39 9d b4  a......yz..wB9..
  0010  44 50 b8 41 1e b8 01 01 00 0e 00                 DP.A.......
U:UBF :5:10465:001:20170328:161119000:autils.c:0435:Original XID restored 
  0000  a1 f7 94 61 00 00 00 00 10 00 00 00 00 00 00 00  ...a............
  0010  05 00 00 00 00 00 00 00 0d 79 7a f4 89 77 42 39  .........yz..wB9
  0020  9d b4 44 50 b8 41 1e b8 01 01 00 0e 00 00 00 00  ..DP.A..........
  0030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  0040  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  0060  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  0070  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  0080  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  0090  00 00 00 00 00 00 00 00                          ........
N:NDRX:5:10465:001:20170328:161119000:tmi/xa.c:0379:atmi_xa_prepare_entry
N:NDRX:2:10465:001:20170328:161119001:tmi/xa.c:0385:xa_prepare_entry - fail: -6 [routine invoked in an improper context]
N:NDRX:3:10465:001:20170328:161119001:perror.c:0309:_TPset_error_fmt_rsn: 16 (TPERMERR) reason: -6 [xa_prepare_entry - fail: -6 [routine invoked in an improper context]]

then sort of fixed:
N:NDRX:5:10465:000:20170328:161111011:ed_buf.c:0377:Free-up auto buffer : 0xb9f520 typeid: 0
TMNODEID        1
TMRMID  1
TMSRVID 14
TMCALLERRM      1
TMCMD   c
TMXID   YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A
TMKNOWNRMS      \01
TMPROCESSID     srv,crdstatsv,8141,10555,1
N:NDRX:4:10453:007:20170328:161111011:atedrv.c:0148:Operation tperrno: 0, xa return code: 0
N:NDRX:4:10465:005:20170328:161111011:/tmsrv.c:0151:Got command code: [c]
N:NDRX:5:10465:005:20170328:161111011:/tmapi.c:0151:tm_tpcommit() called
N:NDRX:4:10453:007:20170328:161111011:atedrv.c:0211:RM 1 votes for stage: 70
N:NDRX:5:10453:007:20170328:161111011:rv/log.c:0637:tms_log_stage: new stage - 70
N:NDRX:4:10465:005:20170328:161111011:rv/log.c:0138:Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] locked for thread_id: 140048651441920
N:NDRX:4:10453:007:20170328:161111011:atedrv.c:0097:Entered in stage: COMMITTED

But then I get information that transaction is locked:
N:NDRX:4:10465:001:20170329:083859000:ground.c:0228:XID: [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] stage: [40]. Try: 1, max: 100
N:NDRX:2:10465:001:20170329:083859000:rv/log.c:0126:Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
N:NDRX:5:10465:001:20170329:083859000:ground.c:0253:Transaction locked or already processed by foreground...
N:NDRX:5:10465:001:20170329:083859000:ground.c:0262:background - sleep 10

this also could be OK if counter would be counted properly. But it's not. I'm getting same counter (try 1, max 100) all the time:
N:NDRX:5:10429:001:20170329:083728000:ground.c:0262:background - sleep 10
N:NDRX:4:10465:001:20170329:083729000:ground.c:0228:XID: [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] stage: [40]. Try: 1, max: 100
N:NDRX:2:10465:001:20170329:083729000:rv/log.c:0126:Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
N:NDRX:5:10465:001:20170329:083729000:ground.c:0253:Transaction locked or already processed by foreground...
N:NDRX:5:10465:001:20170329:083729000:ground.c:0262:background - sleep 10
...
N:NDRX:4:10465:001:20170329:083859000:ground.c:0228:XID: [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] stage: [40]. Try: 1, max: 100
N:NDRX:2:10465:001:20170329:083859000:rv/log.c:0126:Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
N:NDRX:5:10465:001:20170329:083859000:ground.c:0253:Transaction locked or already processed by foreground...
N:NDRX:5:10465:001:20170329:083859000:ground.c:0262:background - sleep 10
...
etc.

and I see following in user log:
...
    0:20170328:17521900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17522900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17523900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17524900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17525900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17530900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17531900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17532900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17533900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17534900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17535900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17540900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17541900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17542900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17543900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17544900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17545900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17550900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
    0:20170328:17551900:tmsrv       :tmsrv: Transaction [YZT3oRAFDXl69Il3QjmdtERQuEEeuAEBAA4A] already locked for thread_id: 140048643049216
...

Please fix

History

#1 Updated by Madars about 7 years ago

Hmm, seems that problem is here http://www.silodev.com/lxr/source/tmsrv/background.c :

0201 public int background_loop(void)
0202 {
0203     int ret = SUCCEED;
0204     atmi_xa_log_list_t *tx_list;
0205     atmi_xa_log_list_t *el, *tmp;
0206     atmi_xa_tx_info_t xai;
0207     atmi_xa_log_t *p_tl;
0208     
0209     memset(&xai, 0, sizeof(xai));
0210     
0211     while(!G_bacground_req_shutdown)
0212     {
0213         /* Check the list of transactions (iterate over...) 
0214          * Seems anyway, we need a list of background ops here...
0215          */
0216         
0217         /* Lock for processing... (cose xadmin might want to do some stuff in middle
0218          * Might want to think something better (so that it does not lock all process)
0219          */
0220         background_lock();
0221         tx_list = tms_copy_hash2list(COPY_MODE_BACKGROUND | COPY_MODE_ACQLOCK); <<<< it does the copy of the transaction list (for concurrency access) 
0222         
0223         LL_FOREACH_SAFE(tx_list,el,tmp)
0224         {
0225             el->p_tl.trycount++; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< This increments the copy and NOT the original transaction object.
0226             NDRX_LOG(log_info, "XID: [%s] stage: [%hd]. Try: %d, max: %d", 
0227                     el->p_tl.tmxid, el->p_tl.txstage, el->p_tl.trycount, 
0228                     G_tmsrv_cfg.max_tries);
0229             

#2 Updated by Madars almost 7 years ago

  • Priority changed from Normal (Code 4) to High (Code 3)

#3 Updated by Madars almost 7 years ago

But it looks like the actual transaction was in progress for some long time. We might want to reset tries counter back to 0 when actual action (prepare for example) on DB is finished.

#4 Updated by Madars almost 7 years ago

Also I guess we need to perform county only when we got the lock, not counting all locked attempts.
Thus we could better state what does the retry attempt setting does.

#5 Updated by Madars almost 7 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Fixed.

#6 Updated by Madars almost 7 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF