WebRTC Calls dropping

    Posted 08-18-2023 14:17
    Edited by Paul Simpson 08-21-2023 11:29


    I have a somewhat perplexing issue with one of my customers and I'm hoping someone may be able to shed some light on it / suggest something we haven't tried.

    On three separate occasions, they have experienced a sharp rise in WebRTC calls just randomly disconnecting. On the last two occasions, (beginning of July and beginning of August) this lasted for 1 day then went away. We are in the middle of the third occurrence right now. We have a ticket open with Genesys, but of course today is Mental Health Friday and they are essentially closed. I have been on hold for over 30 minutes trying to call them!

    So, specifics (not sure which of the following is relevant, but here you go!)

    • All affected users are remote.
    • Customer has a very strict security policy, so connection to Genesys is from Whitelisted IPs only, therefore users connect over a VPN. They use Split Tunneling to divert the WebRTC audio stream directly to AWS. Signaling goes over the VPN though.
    • Until about half an hour ago, they were blocking access to the Google STUN/TURN service, relying on Genesys' own only (which Care said should not be an issue). They do, however see a lot of error 701s for both services.
    • We are seeing a lot of warnings "HTML5 Audio pool exhausted, returning potentially locked audio object".
    • We have seen some errors from Phone Integration service "Request was rejected because user is not permitted to perform this operation"
    • It's intermittent.

    I am awaiting them installing Wireshark on remote PCs to get a packet capture. Anyone have any ideas?


    Paul Simpson
    Eventus Solutions Group

    Posted 08-20-2023 17:51

    Some of this sounds like an issue with the local device, but if it is intermittent, could be other applications conflicting.  The best way to see this would be the console and network logs on the local machine.  Also, make sure they are all using audio profiles in Genesys to define the device.  The other thing it should like is a web proxy or load balancer like an F5 causing redirects in the SIP traffic.  SIP ALGs can also cause this.  I don't think the Wireshark capture will show too much outside of what the console log or the WebRTC Sysinternals would show.  The HTML5 issue is suspiciously like a problem with a button press not getting through to Genesys Cloud.  One thing you could try is to allow direct signaling and https to split out from the VPN to or (or your home region) and see if the problem goes away, it is the router or firewall or VPN fabric.  

    Robert Wakefield-Carl
    ttec Digital
    Sr. Director - Innovation Architects

    Posted 08-21-2023 17:26

    Thanks, Robert.

    The first problem we have is that it's random. It's not even "intermittent" as such. It lasts for a day and then goes away. (The customer normally experiences less than 3% calls marked as "dropped", on the days in question, this rises to 10-15%.) So any tests / attempted fixes are difficult to verify! The fact that this most recent occurrence coincided with Genesys' "August Mental Heath Fridays" really didn't help either....

    We have supplied the logs you mention to Genesys and it was they who requested Wireshark captures. I think they are hoping to have verifiable proof of path the packets are taking. Of course, they requested captures from various places along the way, but given the RTP is going over the public internet, that's not happening any time soon! The only thing they can tell us from the logs is that they get an "ICE Disconnect Error", but they are unable to say if that's due to RTP being interrupted, or an issue with the signaling. Beyond "it's the network", they haven't been able to shed any light on the matter.

    Unfortunately, we are unable to bypass the VPN for Gen Cloud entirely. They have IP whitelisting in place, which means the client will only run on their site, or via the VPN. It's the WebRTC where it starts to get more murky. From what I am told (yeah, I have to rely on information from their team on the ground!) the STUN traffic goes out both directly and via the VPN. Now, I'm thinking that when it goes over the VPN, that would cause problems, but even if it is related, it doesn't explain the 1-day peaks, followed by normal operations for a couple of weeks. I suggested blocking STUN from the VPN (just to be sure) but they are not keen to do that without some evidence that it will help.

    I agree about load balancers etc., however this does not affect office-based users at all, so the issue has to be something to do with either the users' home networks / ISP or the VPN. One thing we have noted is that it would seem all affected users share the same ISP. Of course, they deny any responsibility for this!

    I haven't mentioned Audio Profiles to them, but I will (can't hurt, right?) but again, that doesn't account for the sporadic nature of it all. The only thing I can think of is some localized "event" that is delaying traffic. Maybe a heavy internet user who is on the same ISP? I have asked if they were pushing out Windoze updates, or performing backups, or anything else at the time, but apparently not. No unusual or different applications being run.

    We have requested involvement from Genesys' PS to perform a network review. I'm hoping that if I can get some folks who thoroughly understand the networking from Genesys' side to talk to the customer's network team, we might make some progress! (For example, we know we have IP address whitelisting in pace, but it isn't clear exactly which services / ports are affected by that.) My gut is telling me the Split Tunnel isn't helping, but the only alternative is to push everything over the VPN, which will overload their firewalls and probably introduce an unacceptable level of latency.

    Oh, and to cap it all, Micro$oft Teams' WebRTC phones work without any issues! One thing I've learned from this is that every vendor's implementation of WebRTC is slightly different - so much for standards!!!

    It's a head-scratcher, that's for sure! 😲

    Paul Simpson
    Eventus Solutions Group

