The mitigation for the instability of STM8 EEPROM
I hope you read this article if you have a project involving EEPROM on STM8 controller. Lets me walk you through the hardware bug I found along with my proposed mitigation.
The symptom of the bug
You might see no symptom at all on your hardware and software combination. However, it is not that uncommon to stumble on a faulty STM8 chip which occasionally don’t let you write to the EEPROM. In other words, you can’t reliably write data to the EEPROM. Sometimes it works sometimes it doesn’t. If this is precisely why you are here, you came to the right place.
Encountering
I am not the first one who found the bug I read about it on a post on Mark Stevens’ blog.
I guess this guy is a big fan of STM8, his tutorial articles about STM8 are very detailed, which is why it also included the strange behavior he found.
Basically, I did the same thing and witness the same thing as him. I was following the instructions from the STM8 reference manual
And very carefully make sure I do everything right.
I compared my code with the references I found;
I compared it with Mark’s,
I walked through the standard peripheral driver code release by STMicroelectronics themselves,
but I see nothing wrong with my code.
However, by reading Mark’s article, I know I am not the only one. And after some struggling and a bit of luck, I finally figured what going on.
What went wrong?
Our KEY1
is CPU’s KEY2
, and our KEY2
is CPU’s KEY1
.
This is what happened:
The hardware keys (MASS keys) should always be 0x56
and 0xAE
consecutively for both EEPROM and FLASH memory.
No need to reverse the order for EEPROM.
The document is wrong; the official driver is wrong; everything else is just wrong. As simple as that.
But on second thought, I was wrong
The world is not full of simpletons and ST’s engineers are hardly one of them.
My speculative instinct kicked in. I guess we see the deliberate fix for the bug here, but for the other way around, and under the wrong assumptions about the underlying flaw.
What did ST do?
I guess they, just like me, concluded that something went wrong and reverse the order of MASS keys for EEPROM everywhere; edit the documents, made changes to official drivers, etc. This is probably the reason why we see the reverse-ordered keys for EEPROM unlocking in the reference manual and ultimately the following comment in the official driver.
However, at this point, we knew that the fix failed. There is hardware that accepts ascending ordered keys (mine for instance) and those that accepts reverse-ordered keys.
The root of everything
At this point, I can only guess that the real underlying flaw is the fact that internal states of FLASH_DUKR
register did not get reset properly under some circumstances.
This can’t merely solve by reverse the order of MASS keys; no matter what the order is, there are always exceptional cases.
The workaround/mitigation
Fortunately, the software solution to mitigate the bug is possible. We can exploit the fact that STM8 allow a wrong attempt for the first byte of MASS keys without locking up the device.
Even if we don’t know which byte the FLASH_DUKR
register will consider as the first byte of the unlocking sequence, we can just brute force the keys.
Now, there are two possible scenarios with the workaround.
We will be able to unlock the device either by
or
; and if you want to be more cautious,
you can also use the watchdog timer to prevent device locking.
The faulty chip gradually changed
This old hardware bug managed to stay for at least five years under the radar.
Mark’s article was in 2013, and my discovery and mitigation popped up in 2018.
After running the mitigated solution for some time, the faulty STM8 gradually change its behavior to match the reference manual. Now the buggy behavior is gone; the standard driver will work just fine on the same faulty hardware previously needs mitigated solutions. However, since we do not yet understand the mechanism why it changed, it is recommended to use the mitigated solution for EEPROM unlocking regardless of the situations.
A type of flaw hard to caught is the one that inconsistently surfaced.
See some code in an actual project
This article is the rewrite of my article on GitHub in 2018.
The original article contains some more information about my project back then, and with an actual code you can compile.
Head there if you want to learn a few more thing on this bug.