pjsip-2.7.1 SEGV Crash in pjmedia_transport_send_rtcp() on PJSIP_INV_STATE_CONNECTING

MO
Martin Oberhuber
Thu, Nov 12, 2020 7:54 PM

Dear PJSIP experts,

I'm running pjsip-2.7.1 on armhf, and got a crash today that looks like a
race condition in PJSIP. This is a one-time occurrence from our stress
tests and has not been reproduced. But I analyzed the coredump and logs to
get below information, hoping that we might find a possible issue.

My application was running a coverage build, and made a normal outgoing
call - timings included:

17:30:08.248242 |Call CALLING(1) to sip:1021@10.199.70.8

17:30:08.634752 |Call EARLY(3) to sip:1021@10.199.70.8

17:30:09.442568 |Call CONNECTING(4) to sip:1021@10.199.70.8

At this point, it ran into a SEGV because the transport (tp) was still NULL
here:

  • #0  0x76ceaa40 in pjmedia_transport_send_rtcp (size=64,
    pkt=0x663dee04, tp=0x0) at ../include/pjmedia/transport.h:792
    792    return (*tp->op->send_rtcp)(tp, pkt, size);

  • (gdb) bt

  • #0  0x76ceaa40 in pjmedia_transport_send_rtcp (size=64,
    pkt=0x663dee04, tp=0x0) at ../include/pjmedia/transport.h:792

  • #1  send_rtcp (stream=stream@entry=0x71f0fc4c, with_sdes=<optimized out>, with_bye=0, with_xr=0) at ../src/pjmedia/stream.c:1041

  • #2  0x76ceacd0 in on_rx_rtp (data=0x71f0fc4c, pkt=<optimized out>,
    bytes_read=<optimized out>) at ../src/pjmedia/stream.c:1882

  • #3  0x76cf7c88 in srtp_rtp_cb (user_data=0x2807234, pkt=0x27ef06c,
    size=172) at ../src/pjmedia/transport_srtp.c:1236

  • #4  0x76cf89d0 in on_rx_rtp (key=<optimized out>, op_key=<optimized out>, bytes_read=172) at ../src/pjmedia/transport_udp.c:540

  • #5  0x76bca758 in ioqueue_dispatch_read_event (ioqueue=<optimized out>, h=0x28460d4) at ../src/pj/ioqueue_common_abs.c:605

  • #6  0x76bcc1c8 in pj_ioqueue_poll (ioqueue=0x27e99b4,
    timeout=timeout@entry=0x7321ad80) at ../src/pj/ioqueue_epoll.c:812

  • #7  0x76cd2c64 in worker_proc (arg=0x2849044) at
    ../src/pjmedia/endpoint.c:339

  • #8  0x76bcdbec in thread_main (param=0x27df418) at
    ../src/pj/os_core_unix.c:541

  • #9  0x769d707c in start_thread (arg=0x7321b3c0) at pthread_create.c:339

There was another Thread active in PJSIP, and I noticed that its frame #5
would have assigned the transport just 5 statements later in stream.c:2363
if it had not already crashed:

  • #0  transport_attach2 (tp=0x2807234, param=<optimized out>) at
    ../src/pjmedia/transport_srtp.c:1069
  • #1  0x76cee030 in pjmedia_transport_attach2 (att_param=0x6fefde20,
    tp=0x2807234) at ../include/pjmedia/transport.h:688
  • #2  pjmedia_stream_create (endpt=<optimized out>, pool=<optimized out>, pool@entry=0x0, info=info@entry=0x6fefe068, tp=0x2807234,
    user_data=0x0,
  • p_stream=0x76e4b4b0 <pjsua_var+22496>) at
    

../src/pjmedia/stream.c:2363

  • #3  0x76e24810 in pjsua_aud_channel_update
    (call_med=call_med@entry=0x76e4b498
    <pjsua_var+22472>, tmp_pool=<optimized out>, si=si@entry=0x6fefe068,
  • local_sdp=local_sdp@entry=0x28138e4,
    

remote_sdp=remote_sdp@entry=0x71f16db4)
at ../src/pjsua-lib/pjsua_aud.c:624

  • #4  0x76e1cc7c in pjsua_media_channel_update (call_id=<optimized out>,
    local_sdp=0x28138e4, remote_sdp=0x71f16db4)
  • at ../src/pjsua-lib/pjsua_media.c:3092
    
  • #5  0x76e0eef4 in pjsua_call_on_media_update (inv=0x28505cc,
    status=<optimized out>) at ../src/pjsua-lib/pjsua_call.c:4008
  • #6  0x76dce0dc in inv_negotiate_sdp (inv=0x28505cc) at
    ../src/pjsip-ua/sip_inv.c:1948
  • #7  0x76dcf1d0 in inv_check_sdp_in_incoming_msg (inv=inv@entry=0x28505cc,
    tsx=tsx@entry=0x71f1a02c, rdata=0x71f24f0c)
  • at ../src/pjsip-ua/sip_inv.c:2163
    
  • #8  0x76dd59d8 in inv_on_state_early (inv=0x28505cc, e=0x6fefe73c) at
    ../src/pjsip-ua/sip_inv.c:4448
  • #9  0x76dce67c in mod_inv_on_tsx_state (tsx=0x71f1a02c, e=0x6fefe73c)
    at ../src/pjsip-ua/sip_inv.c:723
  • #10 0x76d83af8 in pjsip_dlg_on_tsx_state (dlg=0x27f92bc,
    tsx=0x71f1a02c, e=0x6fefe73c) at ../src/pjsip/sip_dialog.c:2066
  • #11 0x76d7bd38 in tsx_set_state (tsx=tsx@entry=0x71f1a02c,
    state=state@entry=PJSIP_TSX_STATE_TERMINATED,
  • event_src_type=event_src_type@entry=PJSIP_EVENT_RX_MSG,
    

event_src=0x71f24f0c, flag=flag@entry=0) at
../src/pjsip/sip_transaction.c:1267

  • #12 0x76d7dad4 in tsx_on_state_proceeding_uac (tsx=0x71f1a02c,
    event=0x6fefe7cc) at ../src/pjsip/sip_transaction.c:2974
  • #13 0x76d80350 in pjsip_tsx_recv_msg (tsx=tsx@entry=0x71f1a02c,
    rdata=rdata@entry=0x71f24f0c) at ../src/pjsip/sip_transaction.c:1827
  • #14 0x76d80400 in mod_tsx_layer_on_rx_response (rdata=0x71f24f0c) at
    ../src/pjsip/sip_transaction.c:893
  • #15 0x76d66a2c in pjsip_endpt_process_rx_data (endpt=0x27b43ac,
    rdata=0x71f24f0c, p=<optimized out>, p_handled=0x6fefe8a4)
  • at ../src/pjsip/sip_endpoint.c:901
    
  • #16 0x76d66bd4 in endpt_on_rx_msg (endpt=0x27b43ac, status=<optimized out>, rdata=0x71f24f0c) at ../src/pjsip/sip_endpoint.c:1043
  • #17 0x76d6e000 in pjsip_tpmgr_receive_packet (mgr=0x282e4a0,
    rdata=rdata@entry=0x71f24f0c) at ../src/pjsip/sip_transport.c:1965
  • #18 0x76d70dc0 in udp_on_read_complete (key=0x282c328,
    op_key=<optimized out>, bytes_read=1001) at
    ../src/pjsip/sip_transport_udp.c:170
  • #19 0x76bca7b4 in ioqueue_dispatch_read_event (ioqueue=<optimized out>, h=0x282c328) at ../src/pj/ioqueue_common_abs.c:605
  • #20 0x76bcc1c8 in pj_ioqueue_poll (ioqueue=0x282c078,
    timeout=timeout@entry=0x6fefed38) at ../src/pj/ioqueue_epoll.c:812
  • #21 0x76d666fc in pjsip_endpt_handle_events2 (endpt=0x27b43ac,
    max_timeout=max_timeout@entry=0x6fefed70, p_count=p_count@entry
    =0x6fefed6c)
  • at ../src/pjsip/sip_endpoint.c:744
    
  • #22 0x76e11704 in pjsua_handle_events (msec_timeout=<optimized out>)
    at ../src/pjsua-lib/pjsua_core.c:1988
  • #23 0x76e11744 in worker_thread (arg=<optimized out>) at
    ../src/pjsua-lib/pjsua_core.c:704
  • #24 0x76bcdbec in thread_main (param=0x27d3b44) at
    ../src/pj/os_core_unix.c:541
  • #25 0x769d707c in start_thread (arg=0x6feff3c0) at pthread_create.c:339

I have checked my application, and I believe that none of my callbacks are
involved here. Also noticed from the Coredump that
PjCall::onStreamCreated() had not been called yet.

These facts are making me believe that I might have found a race condition
or insufficient lock protection in PJSIP. Is this ringing a bell for
anyone, has such a problem been seen before? - I have searched existing
issues on Github and not found anything related, but maybe some of the
experts could shed some light on this.

PS I cannot currently upgrade from pjsip-2.7.1 :( so would appreciate any
ideas that might point me in the right direction!

Thanks a lot,
Martin

Dear PJSIP experts, I'm running pjsip-2.7.1 on armhf, and got a crash today that looks like a race condition in PJSIP. This is a one-time occurrence from our stress tests and has not been reproduced. But I analyzed the coredump and logs to get below information, hoping that we might find a possible issue. My application was running a coverage build, and made a normal outgoing call - timings included: - 17:30:08.248242 |Call CALLING(1) to sip:1021@10.199.70.8 - 17:30:08.634752 |Call EARLY(3) to sip:1021@10.199.70.8 - 17:30:09.442568 |Call CONNECTING(4) to sip:1021@10.199.70.8 At this point, it ran into a SEGV because the transport (tp) was still NULL here: - #0 0x76ceaa40 in pjmedia_transport_send_rtcp (size=64, pkt=0x663dee04, tp=0x0) at ../include/pjmedia/transport.h:792 792 return (*tp->op->send_rtcp)(tp, pkt, size); - (gdb) bt - #0 0x76ceaa40 in pjmedia_transport_send_rtcp (size=64, pkt=0x663dee04, tp=0x0) at ../include/pjmedia/transport.h:792 - #1 send_rtcp (stream=stream@entry=0x71f0fc4c, with_sdes=<optimized out>, with_bye=0, with_xr=0) at ../src/pjmedia/stream.c:1041 - #2 0x76ceacd0 in on_rx_rtp (data=0x71f0fc4c, pkt=<optimized out>, bytes_read=<optimized out>) at ../src/pjmedia/stream.c:1882 - #3 0x76cf7c88 in srtp_rtp_cb (user_data=0x2807234, pkt=0x27ef06c, size=172) at ../src/pjmedia/transport_srtp.c:1236 - #4 0x76cf89d0 in on_rx_rtp (key=<optimized out>, op_key=<optimized out>, bytes_read=172) at ../src/pjmedia/transport_udp.c:540 - #5 0x76bca758 in ioqueue_dispatch_read_event (ioqueue=<optimized out>, h=0x28460d4) at ../src/pj/ioqueue_common_abs.c:605 - #6 0x76bcc1c8 in pj_ioqueue_poll (ioqueue=0x27e99b4, timeout=timeout@entry=0x7321ad80) at ../src/pj/ioqueue_epoll.c:812 - #7 0x76cd2c64 in worker_proc (arg=0x2849044) at ../src/pjmedia/endpoint.c:339 - #8 0x76bcdbec in thread_main (param=0x27df418) at ../src/pj/os_core_unix.c:541 - #9 0x769d707c in start_thread (arg=0x7321b3c0) at pthread_create.c:339 There was another Thread active in PJSIP, and I noticed that its frame #5 would have assigned the transport just 5 statements later in stream.c:2363 if it had not already crashed: - - #0 transport_attach2 (tp=0x2807234, param=<optimized out>) at ../src/pjmedia/transport_srtp.c:1069 - #1 0x76cee030 in pjmedia_transport_attach2 (att_param=0x6fefde20, tp=0x2807234) at ../include/pjmedia/transport.h:688 - #2 pjmedia_stream_create (endpt=<optimized out>, pool=<optimized out>, pool@entry=0x0, info=info@entry=0x6fefe068, tp=0x2807234, user_data=0x0, - p_stream=0x76e4b4b0 <pjsua_var+22496>) at ../src/pjmedia/stream.c:2363 - #3 0x76e24810 in pjsua_aud_channel_update (call_med=call_med@entry=0x76e4b498 <pjsua_var+22472>, tmp_pool=<optimized out>, si=si@entry=0x6fefe068, - local_sdp=local_sdp@entry=0x28138e4, remote_sdp=remote_sdp@entry=0x71f16db4) at ../src/pjsua-lib/pjsua_aud.c:624 - #4 0x76e1cc7c in pjsua_media_channel_update (call_id=<optimized out>, local_sdp=0x28138e4, remote_sdp=0x71f16db4) - at ../src/pjsua-lib/pjsua_media.c:3092 - #5 0x76e0eef4 in pjsua_call_on_media_update (inv=0x28505cc, status=<optimized out>) at ../src/pjsua-lib/pjsua_call.c:4008 - #6 0x76dce0dc in inv_negotiate_sdp (inv=0x28505cc) at ../src/pjsip-ua/sip_inv.c:1948 - #7 0x76dcf1d0 in inv_check_sdp_in_incoming_msg (inv=inv@entry=0x28505cc, tsx=tsx@entry=0x71f1a02c, rdata=0x71f24f0c) - at ../src/pjsip-ua/sip_inv.c:2163 - #8 0x76dd59d8 in inv_on_state_early (inv=0x28505cc, e=0x6fefe73c) at ../src/pjsip-ua/sip_inv.c:4448 - #9 0x76dce67c in mod_inv_on_tsx_state (tsx=0x71f1a02c, e=0x6fefe73c) at ../src/pjsip-ua/sip_inv.c:723 - #10 0x76d83af8 in pjsip_dlg_on_tsx_state (dlg=0x27f92bc, tsx=0x71f1a02c, e=0x6fefe73c) at ../src/pjsip/sip_dialog.c:2066 - #11 0x76d7bd38 in tsx_set_state (tsx=tsx@entry=0x71f1a02c, state=state@entry=PJSIP_TSX_STATE_TERMINATED, - event_src_type=event_src_type@entry=PJSIP_EVENT_RX_MSG, event_src=0x71f24f0c, flag=flag@entry=0) at ../src/pjsip/sip_transaction.c:1267 - #12 0x76d7dad4 in tsx_on_state_proceeding_uac (tsx=0x71f1a02c, event=0x6fefe7cc) at ../src/pjsip/sip_transaction.c:2974 - #13 0x76d80350 in pjsip_tsx_recv_msg (tsx=tsx@entry=0x71f1a02c, rdata=rdata@entry=0x71f24f0c) at ../src/pjsip/sip_transaction.c:1827 - #14 0x76d80400 in mod_tsx_layer_on_rx_response (rdata=0x71f24f0c) at ../src/pjsip/sip_transaction.c:893 - #15 0x76d66a2c in pjsip_endpt_process_rx_data (endpt=0x27b43ac, rdata=0x71f24f0c, p=<optimized out>, p_handled=0x6fefe8a4) - at ../src/pjsip/sip_endpoint.c:901 - #16 0x76d66bd4 in endpt_on_rx_msg (endpt=0x27b43ac, status=<optimized out>, rdata=0x71f24f0c) at ../src/pjsip/sip_endpoint.c:1043 - #17 0x76d6e000 in pjsip_tpmgr_receive_packet (mgr=0x282e4a0, rdata=rdata@entry=0x71f24f0c) at ../src/pjsip/sip_transport.c:1965 - #18 0x76d70dc0 in udp_on_read_complete (key=0x282c328, op_key=<optimized out>, bytes_read=1001) at ../src/pjsip/sip_transport_udp.c:170 - #19 0x76bca7b4 in ioqueue_dispatch_read_event (ioqueue=<optimized out>, h=0x282c328) at ../src/pj/ioqueue_common_abs.c:605 - #20 0x76bcc1c8 in pj_ioqueue_poll (ioqueue=0x282c078, timeout=timeout@entry=0x6fefed38) at ../src/pj/ioqueue_epoll.c:812 - #21 0x76d666fc in pjsip_endpt_handle_events2 (endpt=0x27b43ac, max_timeout=max_timeout@entry=0x6fefed70, p_count=p_count@entry =0x6fefed6c) - at ../src/pjsip/sip_endpoint.c:744 - #22 0x76e11704 in pjsua_handle_events (msec_timeout=<optimized out>) at ../src/pjsua-lib/pjsua_core.c:1988 - #23 0x76e11744 in worker_thread (arg=<optimized out>) at ../src/pjsua-lib/pjsua_core.c:704 - #24 0x76bcdbec in thread_main (param=0x27d3b44) at ../src/pj/os_core_unix.c:541 - #25 0x769d707c in start_thread (arg=0x6feff3c0) at pthread_create.c:339 I have checked my application, and I believe that none of my callbacks are involved here. Also noticed from the Coredump that PjCall::onStreamCreated() had not been called yet. These facts are making me believe that I might have found a race condition or insufficient lock protection in PJSIP. Is this ringing a bell for anyone, has such a problem been seen before? - I have searched existing issues on Github and not found anything related, but maybe some of the experts could shed some light on this. PS I cannot currently upgrade from pjsip-2.7.1 :( so would appreciate any ideas that might point me in the right direction! Thanks a lot, Martin