Bug #532

tpcancelTest failure on macos

Added by Madars Vitolins 8 months ago. Updated 5 months ago.

Status:NewStart date:03/23/2020
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

JUnit version 4.12
....E.
Time: 28.023
There was 1 failure:
1) tpcancelTest(TpacallTests)
org.endurox.exceptions.AtmiTPETIMEException: 13:TPETIME (last error 13: ndrx_tpacall: Failed to send, os err: Operation timed out)
    at org.endurox.AtmiCtx.tpacall(Native Method)
    at TpacallTests.tpcancelTest(TpacallTests.java:186)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runners.Suite.runChild(Suite.java:128)
    at org.junit.runners.Suite.runChild(Suite.java:27)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
    at org.junit.runner.JUnitCore.runMain(JUnitCore.java:77)
    at org.junit.runner.JUnitCore.main(JUnitCore.java:36)

FAILURES!!!
Tests run: 5,  Failures: 1

01_basic_server.tar.gz (1.17 MB) Madars Vitolins, 03/23/2020 09:08 AM

jserver01_2b_2020-07-05-131740_MacBook-Pro.crash (49.1 KB) Madars Vitolins, 07/05/2020 09:56 PM

endurox-7.0.38-2.DARWIN15_4_0_Clang_emq.x86_64.tar.gz (1.33 MB) Madars Vitolins, 07/05/2020 10:50 PM

History

#1 Updated by Madars Vitolins 8 months ago

  • Subject changed from TpacallTests failure on macos to tpcancelTest failure on macos

#7 Updated by Madars Vitolins 5 months ago

libnstd.dylib[0x11866] <+54>:  movq   0x38(%r14), %r12
libnstd.dylib[0x1186a] <+58>:  leaq   0x8(%r14), %rsi
libnstd.dylib[0x1186e] <+62>:  callq  0x123c0                   ; emq_notify
libnstd.dylib[0x11873] <+67>:  cmpl   $-0x1, %eax <<<<<<<<<<<<< this is last point int stack trace
libnstd.dylib[0x11876] <+70>:  jne    0x118f0                   ; <+192>
libnstd.dylib[0x11878] <+72>:  callq  0x2e518                   ; symbol stub for: __error

So could it be broken stack at some point in the emq_notify ?

#8 Updated by Madars Vitolins 5 months ago

$ git diff
diff --git a/libnstd/sys_emqueue.c b/libnstd/sys_emqueue.c
index 209db22..20915e9 100644
--- a/libnstd/sys_emqueue.c
+++ b/libnstd/sys_emqueue.c
@@ -379,7 +379,7 @@ expublic int emq_getattr(mqd_t emqd, struct mq_attr *emqstat)
 }

 /**
- * Confiugre notification
+ * Configure notification
  * @param emqd
  * @param notification
  * @return 
diff --git a/libnstd/sys_poll.c b/libnstd/sys_poll.c
index 7ad91d2..69e8006 100644
--- a/libnstd/sys_poll.c
+++ b/libnstd/sys_poll.c
@@ -187,7 +187,22 @@ exprivate void *sigthread_enter(void *arg)
     return NULL;
 }

-
+/**
+ * TODO: This shall be empty. As pthread_create() is not safe from signal
+ * handler.
+ * And for osx we shall not use signals for queue event dispatching, but
+ * instead we shall use threads.
+ * Another nice way of this would be to use thread pool for performance reasons
+ * not? I.e. SIGEV_THREAD for emq shall use thread pool (with one thread) to
+ * dispatch events through.
+ * 
+ * And this particular function shall be set to empty...
+ * 
+ * While thread pool is not available in Enduro/X 7.0, thus as part of the
+ * maintenance, we could just start new thread.
+ * 
+ * @param sig
+ */
 exprivate void slipSigHandler (int sig)
 {
     pthread_t thread;
@@ -266,7 +281,7 @@ exprivate int signal_handle_event(void)
     MUTEX_UNLOCK_V(M_psets_lock);

 out:
-        return ret;
+    return ret;
 }

 /**
@@ -308,7 +323,7 @@ exprivate void * signal_process(void *arg)
 out:
     return NULL;
 }
-    
+
 /**
  * Install notifications for all Qs
  */

Probably change shall be done in 7.5, as thread pool is available there. And also other places of pthread_create is fixed.

#9 Updated by Madars Vitolins 5 months ago

The 7.0 version could go with spawning new thread from emq, with 7.5 we shall port the code to thread pool. - this will not work, as there needs to be initiated destination process, but thread we can spawn only in local process.

Thus to avoid thread spawning during the signal wait, we could just write to another pipe the notification to signal thread.

#10 Updated by Madars Vitolins 5 months ago

Seems the most proper way for emq would be not to use signals, but instead the emq would deliver msg via pipe to other process. As we are processing only one epoll set (thought api allow more) we could just open named pipe for each process and let the main process to poll on this named pipe, thus process events in main thread.

#11 Updated by Madars Vitolins 5 months ago

- Or for each emq queue we create a pipe.
- Thus when emq sends msg: we lock the queue, write data to memory, put msg in pipe...
- When reading msg from queue: lock the queue at the position where we are ready to read from memory, we also read 1 byte from the pipe.

Thus servers may run poll on emq, as it is done with linux epoll sub-system.
Also wit this NDRX_MSGMAX cannot be larger than PIPE_BUF if we send 1 byte notifications.

Also available in: Atom PDF