Genesys Cloud CX

 View Only
Discussion Thread View
Expand all | Collapse all

Edges kernel panic

  • 1.  Edges kernel panic

    Posted 12-09-2021 09:22

    Hi there

    i'm wondering if anyone had issues, kernel panic, specifically and if there's any reason on why the edges are having this issues.

    i have 4 edges with apparently the same issue and no way to know why and since when this happens.

    care mentioned an update fix with no eta.



    #ArchitectureandDesign

    ------------------------------
    Mariano Martinez
    Interaxa S.A.
    ------------------------------


  • 2.  RE: Edges kernel panic

    Posted 12-10-2021 04:00
    Same here... really bad...
    I opened a case and waiting for the fix. The root cause for sure you will never know. As far as I understood form the case is something related to Paring...

    ------------------------------
    Gennaro Montanino
    ------------------------------



  • 3.  RE: Edges kernel panic

    Posted 12-10-2021 08:51

    Thanks Gennaro for your answer.

    Question, do or did your edges have or had any issue with network connectivity? often getting disconnected or something?



    ------------------------------
    Mariano Martinez
    Interaxa S.A.
    ------------------------------



  • 4.  RE: Edges kernel panic

    Posted 12-10-2021 13:49
    There is no clear pattern when it happens...

    ------------------------------
    Gennaro Montanino
    ------------------------------



  • 5.  RE: Edges kernel panic

    Posted 12-10-2021 14:20

    thanks again Gennaro for your help.

    I'd been informed by care about an kernel update which solves this issue but i'm concerned that the update was not offered by automatic update or the update process may damage the edge.

    Was it difficult to update if your team made it?



    ------------------------------
    Mariano Martinez
    Interaxa S.A.
    ------------------------------



  • 6.  RE: Edges kernel panic

    Posted 12-11-2021 09:05
    Hi Mariano,
    I've just upgraded edge kernel for other reasons: it's very easy to do. You can use instructions here: Manually update the software for a BYOC Premises Edge.
    Hope this helps you.

    ------------------------------
    Giuliano Ferri
    Indra Italia spa
    ------------------------------



  • 7.  RE: Edges kernel panic

    Posted 12-11-2021 09:33

    Ciao Giuliano!
    che piacere leggeri. Yes Indeed is not complex. We are struggling to make first iLO work. 

    The software should be stable as far as I know from Support. 



    ------------------------------
    Gennaro Montanino
    ------------------------------



  • 8.  RE: Edges kernel panic

    Posted 12-11-2021 14:57
    Hello Giuliano Ferri,
    Sorry, the document that You sent describes how to upgrade the Edde software and not the kernel, the kernel upgrade is much more complicate procedure

    ------------------------------
    Best regards,

    Yvgeni Liberman
    ITNavPro
    ------------------------------



  • 9.  RE: Edges kernel panic

    Posted 12-11-2021 16:14
    Sorry for leading you astray. I just did the firmware upgrade and I thought it was equivalent to the kernel upgrade. There's always something to learn...
    I apologize for that.

    ------------------------------
    Giuliano Ferri
    Indra Italia spa
    ------------------------------



  • 10.  RE: Edges kernel panic

    Posted 12-11-2021 14:45
    Hi,
    we also had such a problem,
    It's a "well-known issue"
    The proposed solution was to replace the kernel with the new one, but it's not so simple procedure.
    You must open the "case" and Genesys engineer will help You.
    "fixing" the first Edge take more than 4 hours, the 3-rd (last) we fixed in 45 minutes (take time to learn)

    ------------------------------
    Best regards,

    Yvgeni Liberman
    ITNavPro
    ------------------------------



  • 11.  RE: Edges kernel panic

    Posted 12-19-2021 19:26
    Edited by Vaun McCarthy 12-19-2021 19:32
    For anyone else coming looking for this thread, the following is the feedback I got from my own case to Genesys on this:

    Thank you for your patience thus far, after reviewing the ipmisel log I do see that this Edge had an OS critical stop on 12/17/21 at 15:41:47 and went into a panic mode.

     

    This usually happen to V3 Edges because they have a different CPU that enables a new instruction to set called TSX. This has caused instability in the CentOS kernel. The latest Kernel disables this instruction set which will stop the Edges from crashing. However, disabling TSX also changes the CPUID, which is used to calculate the pairing ID. This causes the Edge to get a new pairing ID which does not match the one on their label, and which invalidates their current paired status, causing the Edge to show invalid config.

     

    Our Dev team are aware of this issue, and they are working on the fix, but it is OS fix, and it does take a while, the current ETA for fix to be release is Jan 2022. I have attached the SERVOPS ticket in this case for monitoring of their progress and I will update you when the fix is release, but in the meantime, we will have to keep an eye on the Edge.

     

    Additionally, this issue can be resolved by manually updated Edge kernel image, the latest kernel image disables the instruction set which will stop the Edges from crashing. Some important things to note as well is that during manual application of the kernel image several issues can be encountered: 

     

    1. Boot loop  

    After an attempt to manually upgrade the Linux Kernel an Edge v3 is stuck in an infinite boot loop.  

    The device would go through two POST screens, then run file validation and patching, completes and reboots into the exact same cycle again. When boot loop happens, this also leads to network configuration being lost.

     

    1. Invalid Configuration 

    Disabling TSX also changes the CPUID, which is used to calculate the pairing ID. This causes the Edge to get a new pairing ID which does not match the one on their label, and which invalidates their current paired status, causing the Edge to show invalid config. 

     

    So, even though all the above listed issue has their corrective action, I just don't think this temporary fix is ideal for your Org just because of the other issue we may encounter while trying to solve this issue.

     

    Development is working on a fix to change how CPUID is calculated. This will allow you to do a manual update of CentOS without having to unpair. Just as I mentioned earlier, this release is targeted to be available by January 2022 (that should be in 4 weeks' time or less). 

     

    Lastly, development will also be working on a fix that can be included in a regular edge build. This will not require a manual update. This fix has a high risk of affecting all edges so the fix will need to be tested rigorously and ensure that it works on all versions. The target is Q1 2022, but that might change depending on testing. 

     

    ------------------------------

    Vaun McCarthy
    ------------------------------



  • 12.  RE: Edges kernel panic

    Posted 12-20-2021 09:28

    Thanks Vaun for this info, it's infinitely more useful than the info that customer care gave me trying to understand this issue.

    But so far, there´s any condition that causes the kernel to go into panic mode? it's random?
    i have 4 edges ready to go live and no way to fix this issue.
    i've been waiting for the instructions to upgrade the kernel for the last days but all this information is useful.

    thanks and kindest regards



    ------------------------------
    Mariano Martinez
    Interaxa S.A.
    ------------------------------



  • 13.  RE: Edges kernel panic

    Posted 12-21-2021 12:39
    Well since ETA is January not sure if make sense. I got also the instruction to upgrade manually but they are quite "manual" and you need Eng. support from Technical support, meaning you must do with them.
    I will wait till January. What I find interestingly is that such a huge bug take so long to be fixed....

    ------------------------------
    Gennaro Montanino
    ------------------------------



  • 14.  RE: Edges kernel panic

    Posted 12-21-2021 13:21
    i'm worried about the same.
    and i have 4 edges with the same issue and i still have not received any instructions.

    ------------------------------
    Mariano Martinez
    Interaxa S.A.
    ------------------------------



  • 15.  RE: Edges kernel panic

    Posted 03-23-2022 11:25
    Hello, we are facing the same issue and the latest update is that there is still no ETA about the fix.

    ------------------------------
    Charis Sideridis
    Intracom S.A. Telecom Solutions
    ------------------------------



  • 16.  RE: Edges kernel panic

    Posted 5 days ago
    Are you still having the kernel panic issues? I have  at least one Edge that may be experiencing this issue.

    The problem is the Edge will show Disconnected in Genesys Cloud, but will still respond to SIP requests from our Ingate SBC, which prevents SIP requests from rolling to the next Edge in the list.

    Also, the console shows this following repeating error:
    rcu_sched self-detected stall on CPU

    ------------------------------
    George Beikler
    EFG Companies
    ------------------------------