MCE hardware error finally resolved

My KDE Neon crashes intermittently for the past 2 years without a clear pattern or trigger. I suppose the only hint was that it often (not always) happens when I was using graphic intensive software, such as video player or remote desktop. But also happens during a cold boot up, or when the monitors wake up from sleep. It's mostly unpredictable.

What made the diagnosis confusing is the kernel log mostly shows error like this:

mce: [Hardware Error]: CPU 19: Machine Check: 0 Bank 5: bea0000001000108
mce: [Hardware Error]: TSC 0 ADDR 7f0d1a36c7e5 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1689465218 SOCKET 0 APIC 7 microcode a201025

Basically hints towards CPU being unstable and crashes. I have tried:

Nothing worked.

Recently after another fatal crash, I investigated the matter further and came across these 2 pages, where a group of people have been debugging this issue together for almost 2 years:

Majority of users described the same issue that I have experienced, and they have made some amazing discoveries.

They have found the comment from AMD's Richard T:

bea0000000000108 means the thread has stopped executing…this is longest timeout, all other hardware fault timers would/should fire before this. […] this case has lots of possible causes…OS, App, voltage , temp, board hardware(power delivery cases), memory (are you running ECC memory ?)

And a genius named Leonardo Gates commented:

This issue has been prevalent across all the RX 5000 series cards it seems (I've seen this on the 5500XT/5600XT/5700/5700XT) as on Windows, it will trigger a WHEA-18 error (Cache-Hierarchy error) while on Linux it gives this MCE, bea0000000000108. I highly suspect this is either a hardware errata or due to faulty hardware (as even people I've spoken to that did RMAs, still got the error). I'm genuinely wondering if there will ever be a fix for this since it's been almost 2 years and this is pretty much the only problem I have with my GPU.

In other words, the root problem was probably my GPU - RX 5700XT itself, nothing to do with my CPU, RAM or motherboard.

And the solution was to upgrade to a different GPU, such as the RX 6000 or 7000 series.

So I picked up an RX 7600 today. Let's see how long this solution holds up.