Solana’s latest downtime caused by code bug and malfunctioning node
The Solana blockchain downtime on Sept. 30 was caused by a bug in the blockchain’s code and trigged by a malfunctioning hot-spare node that resulted in duplicate blocks.
A hot-spare node is when a validator runs a second node that’s online and designed to be used as a backup if the main one fails. Yet the spare node became active and was running as well as the main one, according to an update from the Solana Foundation. This led to the two nodes submitting different blocks to the network, resulting in parallel blocks.
This was handled well for the first 24 hours, as the blockchain ended up choosing between the two alternate blocks, like it would with any small fork in the network. Yet at one point, the bug in the blockchain’s code led to it failing to produce any more blocks after one of these choices.
«Even though the correct version of the block 221 was confirmed, a bug in the fork selection logic prevented block producers from building on top of 221 and prevented the cluster from achieving consensus,» said Austin Federa, head of comms at the Solana Foundation in the update.
As a result, the blockchain went down for about seven hours until the validators agreed and implemented a fix to the code.