simd: '0384' title: Alpenglow migration authors:
Migrate to Alpenglow from TowerBFT
Migrating from TowerBFT to Alpenglow consensus requires a safe handoff mechanism that doesn't rollback TowerBFT confirmed user transactions.
This proposal depends on the following accepted proposal:
[SIMD-0326]: Alpenglow
Requires Alpenglow to be implemented in order to migrate
[SIMD-0307]: Add Block Footer
Specifies BlockMarker, a means of disseminating metadata in a block
Migration boundary slot:
Alpenglow genesis block:
Strong optimistic confirmation:
B is strong OC if there exists a block T such that B is the parent
of T and slot(B) + 1 = slot(T) and T contains vote transactions for B from
at least 82% of stake.Pick a "migration boundary slot" (defined above) S as follows. Let X be
the feature activation slot. Let the migration slot S be X + 5000, as to
avoid the beginning of an epoch.
After the migration boundary slot S, wait for some block B >= S to reach
strong optimistic confirmation.
Find the most recent ancestor block G of B from before the migration
boundary slot S. This is the Alpenglow genesis block. Cast a BLS vote,
the "genesis vote", for G via all to all. Note validators should have
filled out their BLS keys prior to the feature flag activation, and will
sign this genesis vote with this BLS key.
If we observe >=82% genesis votes for the ancestor block G, this
consitutes the genesis certificate, and G is the genesis block for
Alpenglow. Validators will periodically refresh genesis votes every
GENESIS_VOTE_REFRESH = 400ms (i.e. once a slot) until this
genesis certificate is observed. During this period they perform regular
TowerBFT consensus for all blocks.
Anytime a correct validator receives a genesis certificate for a slot G
(either constructed themselves, received through replaying a block, or received
from all-to-all broadcast), they:
votor module with G as genesis, and disable
TowerBFT for any slots past Ggenesis certificate as a GenesisBlockMarker
for any blocks that are direct children of G. This means anybody
replaying any of the initial Alpenglow blocks must see the
genesis certificate.slot(G) from
blockstore, and reset all associated state (replay, AccountsDB) to block
G, as is currently done when we rollback duplicate blocks.Anytime a validator receives a genesis certificate validated through
replaying the header of a block, they store the certificate in a
migration success off-curve account
Pubkey::find_program_address(&["carlgration"], alpenglow::id()).
This means all snapshots descended from the block will contain this account
and signal to validators that they should initiate Alpenglow after unpacking
the snapshot.
Alternatively, anytime a correct validator that has not yet detected a
genesis certificate, but receives an Alpenglow finalization certificate for
some block X that they can verify, they should repair/replay all the
ancestors of X
Once an Alpenglow finalization certificate is received via all-to-all or via replaying a block, validators can stop broadcasting the genesis certificate as the Alpenglow finalization certificate is sufficent proof of the cluster's successful migration.
On validator restart from a snapshot, if the migration feature flag is active:
migration success account is empty in the snapshot, we
enter step 1.migration success account contains a certificate and the
certificate is valid, immediately enter Alpenglow.In order to disseminate the genesis certificate in the initial Alpenglow block
we add a new BlockMarker to the specification of SIMD-0307 with variant ID 3:
GenesisBlockMarker:
+---------------------------------------+
| Genesis Slot (8 bytes) |
+---------------------------------------+
| Genesis Block ID (32 bytes) |
+---------------------------------------+
| BLS Signature (192 bytes) |
+---------------------------------------+
| Validator bitmap length (8 bytes) |
+---------------------------------------+
| Validator bitmap (max 512 bytes) |
+---------------------------------------+
Total size: max 752 bytes
The full serialization of this component is:
+---------------------------------------+
| Entry Count = 0 (8 bytes) |
+---------------------------------------+
| Marker Version = 1 (2 bytes) |
+---------------------------------------+
| Variant ID = 3 (1 byte) |
+---------------------------------------+
| Length = max 752 (2 bytes) |
+---------------------------------------+
| Genesis Slot (8 bytes) |
+---------------------------------------+
| Genesis Block ID (32 bytes) |
+---------------------------------------+
| BLS Signature (192 bytes) |
+---------------------------------------+
| Bitmap length (max 512) (8 bytes) |
+---------------------------------------+
| Validator bitmap (max 512 bytes) |
+---------------------------------------+
Total size: max 765 bytes
First note it's always safe to rollback a block greater after the migration slot boundary because we stopped packing user transactions.
Next we show that if two correct validators switch to Alpenglow, they must pick
the same genesis block G.
To switch to Alpenglow, both correct validators must observe optimistic
confirmation on some slots B and B' past the migration boundary. It's
guaranteed by optimistic confirmation that B and B' are on the same fork,
and must have the same ancestors. This means all correct validators must cast
a genesis vote on the same ancestor block.
Because there's at most 19% malicious, it will be impossible to construct two
>82% genesis certificates, so all correct validators that switch must have
observed the same genesis certificate for the same block.
We assume the the cluster must eventually run under normal network conditions,
so blocks past the migration boundary slot S should be strongly optimistically
confirmed.
Next, until a migration certificate is observed, no correct validators will migrate, so all correct validators will vote as normal and contribute to optimistic confirmation. From the correctness argument, we know if a correct validator casts a genesis vote, they must vote for the same Alpenglow genesis block.
This means that eventually 82% of validators will cast a genesis vote for the same genesis block. Because these genesis votes are reliably delivered via all-to-all, some correct validator will eventually get a genesis certificate.
There are two ways for correct validators to migrate:
The first correct validator to migrate must have gotten a 82% genesis
certificate. We assume at most 19% of the cluster is Byzantine, then at least
this correct validator will continuosly broadcast the genesis certificate until
they see an Alpenglow finalization certificate.
This means 82% - 19% = 63% correct nodes will eventually receive
the genesis certificate and will migrate to Alpenglow upon receiving the
certificate via all-to-all broadcast which guaranteees delivery.
This 63% correct validators is then sufficient to run Alpenglow and produce
a finalized Alpenglow block, which will induce a repair/transition from any
other correct/lagging validators.
Thus from 1. we know some correct validator must eventually migrate because eventually they must receive a genesis certificate, then we know from 2 that eventually all correct validators must migrate.
When switching to the first Alpenglow block, we want to deprecate Poh. This will be done in a few steps to mitigate the amount of code changes:
bank.max_tick_height() - 1tick_alpenglow() at the end of every alpenglow block which
makes 1 ending tick per block. Because of 1. above, this gaurantees that
each bank will still think it has reached bank.max_tick_height(). This
last tick is necessary to coordinate with banking stage and broadcast to
properly end the packing/dispersion of a block. Eliminating it is possible,
but a load of risky work.blockstore_processor::verify_ticks() to turn off tick verification.On all blocks descended from the Alpenglow genesis block:
Alpenswitch where we pick fixed slot intervals N at which to attempt to
optimistically migrate to Alpenglow. On failure fallback to TowerBFT we try
again at the next slot interval. This is more painful to implement because
of the transition back and forth between Alpenglow and TowerBFT
During the migration process, there will be a liveness impact for the duration of the migration, which is optimistically only one slot past the migration boundary slot.
Validators will run Alpenglow after discovering the genesis block after the migration boundary.
N/A
This feature is not backwards compatible.