For anyone else coming looking for this thread, the following is the feedback I got from my own case to Genesys on this:
Thank you for your patience thus far, after reviewing the ipmisel log I do see that this Edge had an OS critical stop on 12/17/21 at 15:41:47 and went into a panic mode.
This usually happen to V3 Edges because they have a different CPU that enables a new instruction to set called TSX. This has caused instability in the CentOS kernel. The latest Kernel disables this instruction set which will stop the Edges from crashing. However, disabling TSX also changes the CPUID, which is used to calculate the pairing ID. This causes the Edge to get a new pairing ID which does not match the one on their label, and which invalidates their current paired status, causing the Edge to show invalid config.
Our Dev team are aware of this issue, and they are working on the fix, but it is OS fix, and it does take a while, the current ETA for fix to be release is Jan 2022. I have attached the SERVOPS ticket in this case for monitoring of their progress and I will update you when the fix is release, but in the meantime, we will have to keep an eye on the Edge.
Additionally, this issue can be resolved by manually updated Edge kernel image, the latest kernel image disables the instruction set which will stop the Edges from crashing. Some important things to note as well is that during manual application of the kernel image several issues can be encountered:
- Boot loop
After an attempt to manually upgrade the Linux Kernel an Edge v3 is stuck in an infinite boot loop.
The device would go through two POST screens, then run file validation and patching, completes and reboots into the exact same cycle again. When boot loop happens, this also leads to network configuration being lost.
- Invalid Configuration
Disabling TSX also changes the CPUID, which is used to calculate the pairing ID. This causes the Edge to get a new pairing ID which does not match the one on their label, and which invalidates their current paired status, causing the Edge to show invalid config.
So, even though all the above listed issue has their corrective action, I just don't think this temporary fix is ideal for your Org just because of the other issue we may encounter while trying to solve this issue.
Development is working on a fix to change how CPUID is calculated. This will allow you to do a manual update of CentOS without having to unpair. Just as I mentioned earlier, this release is targeted to be available by January 2022 (that should be in 4 weeks' time or less).
Lastly, development will also be working on a fix that can be included in a regular edge build. This will not require a manual update. This fix has a high risk of affecting all edges so the fix will need to be tested rigorously and ensure that it works on all versions. The target is Q1 2022, but that might change depending on testing.
------------------------------
Vaun McCarthy
------------------------------
Original Message:
Sent: 12-11-2021 14:44
From: Yvgeni Liberman
Subject: Edges kernel panic
Hi,
we also had such a problem,
It's a "well-known issue"
The proposed solution was to replace the kernel with the new one, but it's not so simple procedure.
You must open the "case" and Genesys engineer will help You.
"fixing" the first Edge take more than 4 hours, the 3-rd (last) we fixed in 45 minutes (take time to learn)
------------------------------
Best regards,
Yvgeni Liberman
ITNavPro
------------------------------