[pjsip] Deadlocks: solved
Turnaev Eugeny
turnaev at t72.ru
Tue Jul 8 01:11:29 EDT 2008
Hi.
I have found why my program was deadlocking.
in py_pjsua.c
in function
static PyObject *py_pjsua_handle_events(PyObject *pSelf, PyObject *pArgs)
I have commented
lines Py_BEGIN_ALLOW_THREADS
and Py_END_ALLOW_THREADS
those are python macro witch allow other python threads to run
while current thread is in some long running IO aperations for example.
Also i get rid of worker_thread to poll with handle_events() .. now i am
polling from the thread where all other calls to py_pjsua lib is located.
Maybe the deadlocking situation was like this:
a worker thread called hanle_events
inside handle events .. some pjsua internal functions
is called to get events.. mutexes acquired.
now a context switches.. ( because we have other python threads and Py_BEGIN_ALLOW_THREADS was called)
context switched to other python thread calling another py_pjsua func..
so another pjsua internal function is entered..
This must not be a problem as long as mutexes acuried in the same order..
and in FAQ it is stated that in pjsip acquires mutexes in one order..
I dont know i think i saw 2 different macro.. in pjsua .. a try_to_get_mutex
and get_mutex.. maybe problem in this.
Ok. i am not stating pjsua have a deadlock bug, maybe this was my bad
application design or i messed up somewhere else.
But my app design a little bit mimics design of app in example
http://svn.pjsip.org/repos/pjproject/trunk/pjsip-apps/src/py_pjsua/pjsua_app.py
1 worker thread.. and also calls from other thread..
So i assume that example also have a deadlock problem..
it is just a metter of load.. achitecture gived in example worked for me
if i had 1 simul call.. and when i started with 16 simul calls it is deadlocked
in about 1-5 minutes.. (and as i can see from debug it is deadlocked in a call to py_pjsua)
Btw in example - another thread (main in example and not main in my app) is not registered
with py_pjsua.thread_register() .. so what threads examply must be registed
i am not getting it.
Now i have no worker thread... i am polling right from the thread where all other
calls to pj_pjsua located and also removed Py_BEGIN_ALLOW_THREADS from handle_events
to disallow interleave of threads while i am in handle_events().
So anyone working from python with pjsua can do the same if having unexpected problems with deadlocks :)
Best wishes for developers of pjsua, nice work :)
More information about the pjsip
mailing list