    Posted 06-10-2021 13:52

    Curious if anyone else has ran into this data issue.

    We have a 2020R2 switchpair configured and communicating across our WAN. We typically have no issues.
    There are times however that our MPLS/WAN has an issue and during that time each IC server "stands alone".
    During this stand alone time period, let's assume we have 50% of our calls are routing into the location of one IC server and the other 50% routing into the location of our second IC server.
    What we have discovered is the calls that are routing into the IC server that has write access to our SQL server shows workgroup interval data during the stand alone period
    Calls however that come into the second IC server that doesn't have real-time SQL write access during this WAN outage, do NOT get written to workgroup interval data tables.
    Call Detail and Agent summary data both seem to be fine, this issue seems solely focused on workgroup summary data.
    We do typically reprocess PMQ once we restore the switch pair after the WAN/MPLS issue has been resolved.

    Has anyone ran into data/reporting issues similar to this during a stand alone period?


    Posted 07-26-2021 16:13
    Received an update from support, they have suggested adding the following server parameter to help mitigate the issue of queue summary data reflecting zero's when we have a stand along event and both IC servers operate independently.

    StatServer_DisableQPSLoggingOnBackup = Yes

    Does anyone have experience with this configuration and server parameter?

    GCAP Member
    Posted 07-26-2021 16:19
    We ran into an issue earlier this year as a side effect of some sql performance issues and also have this setting disabled.

    The idea is that since stats require processing before being written to the database, why not let the backup server do it to offload that required work from the current primary CIC.  The way they apparently get around potential duplicate entries from both the backup and primary writing the same data to the database is by using a unique key in the database and then basically allowing the write attempt to fail if the key is already present since the assumption is that the backup already completed the task.

    The problem is that if you have communication issues or performance issues with your database this can result in data that should be retried via PMQ to just being discarded entirely since the key was present from a failed/partial write from the backup server.

    I'm probably missing some points with the explanation, but the gist is that since only the primary will be writing stats to the DB, it can manage failures properly and consistently.

    State of Utah

    Posted 07-26-2021 16:27
    Thanks for the reply Aaron.

    Any ideas what disabling this server parameter would do in the event we have both IC servers standing alone due to a WAN event?

    Currently, we do not get accurate summary data for the prior backup server and summary data shows up at 0's, for the time we stood alone. We do get accurate data from the former primary during this stand alone time frame.

    We're trying to mitigate seeing no summary data on that former backup server, as we clearly can see calls came into the backup server during this stand along window by reviewing the call detail records.

    Does that make any sense?

    Posted 07-30-2021 09:10
    Good discussion. This topic has come up a couple times within Product Management, and I think we need to take a deeper look at what's going wrong with Interaction Recovery such that stat server offloading has some sort of edge case where data is lost. I've spoken with Development and with Care. The next time this is reported to Care by a customer, Care is going to attempt to investigate the root cause with Development instead of simply asking for the flag to be set. Hopefully we can get to the bottom of things and issue a fix into the product in the future.

    Scott Thomas
    Director of PM

    Posted 09-15-2021 10:14
    Good morning all,

    Just to update this thread, we previously set the server parameter, StatServer_DisableQPSLoggingOnBackup = Yes.

    Since doing that we've switched over several times.

    Last week we had another network event in which our switch pair servers stood alone. During this stand alone window, the same issue has seemingly occurred. Calls that come into our former backup server during this stand alone time do not show up in the interval period (SQL db) but call detail data does show up properly. Calls that come into the former primary server seem to show both interval and detail data properly in the SQL database.

    We have opened a new ticket w Customer Care on this event to try and get an understanding of why the interval data for calls flowing into one of the IC servers during this time frame are showing as 0 while the other seems perfectly fine. It's also strange to us that detail data seems accurate regardless of which server handled the call, it's just interval data that isn't accurate.

