|
|
@@ -0,0 +1,289 @@
|
|
|
+---
|
|
|
+simd: '0384'
|
|
|
+title: Alpenglow migration
|
|
|
+authors:
|
|
|
+ - Kobi Sliwinski (Anza)
|
|
|
+ - Ashwin Sekar (Anza)
|
|
|
+ - Carl Lin (Anza)
|
|
|
+category: Standard
|
|
|
+type: Core
|
|
|
+status: Review
|
|
|
+created: 2025-10-21
|
|
|
+feature: a1penGLz8Vm2QHYB3JPefBiU4BY3Z6JkW2k3Scw5GWP
|
|
|
+---
|
|
|
+
|
|
|
+## Summary
|
|
|
+
|
|
|
+Migrate to Alpenglow from TowerBFT
|
|
|
+
|
|
|
+## Motivation
|
|
|
+
|
|
|
+Migrating from TowerBFT to Alpenglow consensus requires a safe handoff
|
|
|
+mechanism that doesn't rollback TowerBFT confirmed user transactions.
|
|
|
+
|
|
|
+## Dependencies
|
|
|
+
|
|
|
+This proposal depends on the following accepted proposal:
|
|
|
+
|
|
|
+- **[SIMD-0326]: Alpenglow**
|
|
|
+
|
|
|
+ Requires Alpenglow to be implemented in order to migrate
|
|
|
+
|
|
|
+- **[SIMD-0307]: Add Block Footer**
|
|
|
+
|
|
|
+ Specifies `BlockMarker`, a means of disseminating metadata in a block
|
|
|
+
|
|
|
+
|
|
|
+## New Terminology
|
|
|
+
|
|
|
+Migration boundary slot:
|
|
|
+
|
|
|
+- Slot at which Alpenglow migration begins
|
|
|
+- For all slots after and inclusive of this boundary slot:
|
|
|
+ - Turn off packing anything other than simple vote transactions in blocks.
|
|
|
+ Any transactions not
|
|
|
+ belonging to the vote program will cause a block to be marked dead during
|
|
|
+ replay by all correct validators.
|
|
|
+ - Core code will stop notifying RPC services of any optimistic confirmation/
|
|
|
+ commitment updates.
|
|
|
+ - TowerBFT will stop rooting blocks to prevent losing optimistically
|
|
|
+ confirmed blocks that could qualify as the Alpenglow genesis blocks.
|
|
|
+
|
|
|
+Alpenglow genesis block:
|
|
|
+
|
|
|
+- The last TowerBFT block before the "migration boundary slot" which is the
|
|
|
+parent of the first Alpenglow block. This is picked via the process described in
|
|
|
+the "Detailed Design" section below
|
|
|
+
|
|
|
+Strong optimistic confirmation:
|
|
|
+
|
|
|
+- A block `B` is strong OC if there exists a block `T` such that `B` is the parent
|
|
|
+ of `T` and `slot(B) + 1 = slot(T)` and `T` contains vote transactions for `B` from
|
|
|
+ at least `82%` of stake.
|
|
|
+
|
|
|
+## Detailed Design
|
|
|
+
|
|
|
+### Migration Handoff
|
|
|
+
|
|
|
+1. Pick a "migration boundary slot" (defined above) `S` as follows. Let `X` be
|
|
|
+ the feature activation slot. Let the migration slot S be `X + 5000`, as to
|
|
|
+ avoid the beginning of an epoch.
|
|
|
+
|
|
|
+2. After the migration boundary slot `S`, wait for some block `B >= S` to reach
|
|
|
+ strong optimistic confirmation.
|
|
|
+
|
|
|
+3. Find the most recent ancestor block `G` of `B` from before the migration
|
|
|
+ boundary slot `S`. This is the Alpenglow genesis block. Cast a BLS vote,
|
|
|
+ the "genesis vote", for `G` via all to all. Note validators should have
|
|
|
+ filled out their BLS keys prior to the feature flag activation, and will
|
|
|
+ sign this genesis vote with this BLS key.
|
|
|
+
|
|
|
+4. If we observe `>=82%` genesis votes for the ancestor block `G`, this
|
|
|
+ consitutes the `genesis certificate`, and `G` is the genesis block for
|
|
|
+ Alpenglow. Validators will periodically refresh genesis votes every
|
|
|
+ `GENESIS_VOTE_REFRESH` = 400ms (i.e. once a slot) until this
|
|
|
+ `genesis certificate` is observed. During this period they perform regular
|
|
|
+ TowerBFT consensus for all blocks.
|
|
|
+
|
|
|
+5. Anytime a correct validator receives a `genesis certificate` for a slot `G`
|
|
|
+ (either constructed themselves, received through replaying a block, or received
|
|
|
+ from all-to-all broadcast), they:
|
|
|
+ - Verify the certificate against the BLS keys for the epoch.
|
|
|
+ - Broadcast the certificate to all other validators via the Alpenglow
|
|
|
+ all-to-all mechanism. Validators will continually retry broadcasting this
|
|
|
+ certificate every 10 seconds via the certificate pool standstill timer so
|
|
|
+ long as a finalized Alpenglow certificate for a higher slot isn't detected.
|
|
|
+ - We initialize the alpenglow `votor` module with `G` as genesis, and disable
|
|
|
+ TowerBFT for any slots past `G`
|
|
|
+ - In block production pack the `genesis certificate` as a `GenesisBlockMarker`
|
|
|
+ for any blocks that are *direct* children of `G`. This means anybody
|
|
|
+ replaying any of the initial Alpenglow blocks must see the
|
|
|
+ `genesis certificate`.
|
|
|
+ - Delete all blocks and shreds with slot greater than `slot(G)` from
|
|
|
+ blockstore, and reset all associated state (replay, AccountsDB) to block
|
|
|
+ `G`, as is currently done when we rollback duplicate blocks.
|
|
|
+ - We re-enable packing non-vote transactions, enable Alpenglow rooting, and
|
|
|
+ re-enable RPC commitment/confirmation reporting.
|
|
|
+
|
|
|
+6. Anytime a validator receives a `genesis certificate` validated through
|
|
|
+ *replaying* the header of a block, they store the certificate in a
|
|
|
+ `migration success` off-curve account
|
|
|
+ `Pubkey::find_program_address(&["carlgration"], alpenglow::id())`.
|
|
|
+ This means all snapshots descended from the block will contain this account
|
|
|
+ and signal to validators that they should initiate Alpenglow after unpacking
|
|
|
+ the snapshot.
|
|
|
+
|
|
|
+7. Alternatively, anytime a correct validator that has not yet detected a
|
|
|
+ `genesis certificate`, but receives an Alpenglow finalization certificate for
|
|
|
+ some block `X` that they can verify, they should repair/replay all the
|
|
|
+ ancestors of `X`
|
|
|
+
|
|
|
+8. Once an Alpenglow finalization certificate is received via all-to-all or via
|
|
|
+ replaying a block, validators can stop broadcasting the genesis certificate
|
|
|
+ as the Alpenglow finalization certificate is sufficent proof of the
|
|
|
+ cluster's successful migration.
|
|
|
+
|
|
|
+9. On validator restart from a snapshot, if the migration feature flag is
|
|
|
+ active:
|
|
|
+ - If the off-curve `migration success` account is empty in the snapshot, we
|
|
|
+ enter step 1.
|
|
|
+ - If the off-curve `migration success` account contains a certificate and the
|
|
|
+ certificate is valid, immediately enter Alpenglow.
|
|
|
+
|
|
|
+### GenesisBlockMarker
|
|
|
+
|
|
|
+In order to disseminate the `genesis certificate` in the initial Alpenglow block
|
|
|
+we add a new `BlockMarker` to the specification of SIMD-0307 with variant ID `3`:
|
|
|
+
|
|
|
+```
|
|
|
+GenesisBlockMarker:
|
|
|
++---------------------------------------+
|
|
|
+| Genesis Slot (8 bytes) |
|
|
|
++---------------------------------------+
|
|
|
+| Genesis Block ID (32 bytes) |
|
|
|
++---------------------------------------+
|
|
|
+| BLS Signature (192 bytes) |
|
|
|
++---------------------------------------+
|
|
|
+| Validator bitmap length (8 bytes) |
|
|
|
++---------------------------------------+
|
|
|
+| Validator bitmap (max 512 bytes) |
|
|
|
++---------------------------------------+
|
|
|
+
|
|
|
+Total size: max 752 bytes
|
|
|
+```
|
|
|
+
|
|
|
+The full serialization of this component is:
|
|
|
+
|
|
|
+```
|
|
|
++---------------------------------------+
|
|
|
+| Entry Count = 0 (8 bytes) |
|
|
|
++---------------------------------------+
|
|
|
+| Marker Version = 1 (2 bytes) |
|
|
|
++---------------------------------------+
|
|
|
+| Variant ID = 3 (1 byte) |
|
|
|
++---------------------------------------+
|
|
|
+| Length = max 752 (2 bytes) |
|
|
|
++---------------------------------------+
|
|
|
+| Genesis Slot (8 bytes) |
|
|
|
++---------------------------------------+
|
|
|
+| Genesis Block ID (32 bytes) |
|
|
|
++---------------------------------------+
|
|
|
+| BLS Signature (192 bytes) |
|
|
|
++---------------------------------------+
|
|
|
+| Bitmap length (max 512) (8 bytes) |
|
|
|
++---------------------------------------+
|
|
|
+| Validator bitmap (max 512 bytes) |
|
|
|
++---------------------------------------+
|
|
|
+
|
|
|
+Total size: max 765 bytes
|
|
|
+```
|
|
|
+
|
|
|
+
|
|
|
+#### Correctness argument:
|
|
|
+
|
|
|
+First note it's always safe to rollback a block greater after the migration
|
|
|
+slot boundary because we stopped packing user transactions.
|
|
|
+
|
|
|
+Next we show that if two correct validators switch to Alpenglow, they must pick
|
|
|
+the same genesis block `G`.
|
|
|
+
|
|
|
+To switch to Alpenglow, both correct validators must observe optimistic
|
|
|
+confirmation on some slots `B` and `B'` past the migration boundary. It's
|
|
|
+guaranteed by optimistic confirmation that `B` and `B'` are on the same fork,
|
|
|
+and must have the same ancestors. This means all correct validators must cast
|
|
|
+a genesis vote on the same ancestor block.
|
|
|
+
|
|
|
+Because there's at most `19%` malicious, it will be impossible to construct two
|
|
|
+`>82%` genesis certificates, so all correct validators that switch must have
|
|
|
+observed the same genesis certificate for the same block.
|
|
|
+
|
|
|
+#### Liveness argument
|
|
|
+
|
|
|
+1. First we show that eventually at least one correct validator should see an
|
|
|
+ Alpenglow genesis certificate.
|
|
|
+
|
|
|
+We assume the the cluster must eventually run under normal network conditions,
|
|
|
+so blocks past the migration boundary slot `S` should be `strongly optimistically
|
|
|
+confirmed`.
|
|
|
+
|
|
|
+Next, until a migration certificate is observed, no correct validators will
|
|
|
+migrate, so all correct validators will vote as normal and contribute to
|
|
|
+optimistic confirmation. From the correctness argument, we know if a correct
|
|
|
+validator casts a genesis vote, they must vote for the same Alpenglow genesis
|
|
|
+block.
|
|
|
+
|
|
|
+This means that eventually 82% of validators will cast a genesis vote for the
|
|
|
+same genesis block. Because these genesis votes are reliably delivered via
|
|
|
+all-to-all, some correct validator will eventually get a genesis certificate.
|
|
|
+
|
|
|
+2. Next we show that once a correct validator migrates, then all correct
|
|
|
+ validators will eventually migrate.
|
|
|
+
|
|
|
+There are two ways for correct validators to migrate:
|
|
|
+
|
|
|
+1. A genesis certificate
|
|
|
+2. An Alpenglow finalization certificate
|
|
|
+
|
|
|
+The first correct validator to migrate must have gotten a `82%` genesis
|
|
|
+certificate. We assume at most `19%` of the cluster is Byzantine, then at least
|
|
|
+this correct validator will continuosly broadcast the genesis certificate until
|
|
|
+they see an Alpenglow finalization certificate.
|
|
|
+
|
|
|
+This means `82% - 19% = 63%` correct nodes will eventually receive
|
|
|
+the genesis certificate and will migrate to Alpenglow upon receiving the
|
|
|
+certificate via all-to-all broadcast which guaranteees delivery.
|
|
|
+
|
|
|
+This `63%` correct validators is then sufficient to run Alpenglow and produce
|
|
|
+a finalized Alpenglow block, which will induce a repair/transition from any
|
|
|
+other correct/lagging validators.
|
|
|
+
|
|
|
+Thus from 1. we know some correct validator must eventually migrate because
|
|
|
+eventually they must receive a genesis certificate, then we know from 2 that
|
|
|
+eventually all correct validators must migrate.
|
|
|
+
|
|
|
+### Poh Migration
|
|
|
+
|
|
|
+When switching to the first Alpenglow block, we want to deprecate Poh. This will
|
|
|
+be done in a few steps to mitigate the amount of code changes:
|
|
|
+
|
|
|
+1. Before the end of each Alpenglow block, set the bank tick height to
|
|
|
+ `bank.max_tick_height() - 1`
|
|
|
+2. Set tick producer to low power mode, which only has one hash per tick.
|
|
|
+ We then call `tick_alpenglow()` at the end of every alpenglow block which
|
|
|
+ makes 1 ending tick per block. Because of 1. above, this gaurantees that
|
|
|
+ each bank will still think it has reached `bank.max_tick_height()`. This
|
|
|
+ last tick is necessary to coordinate with banking stage and broadcast to
|
|
|
+ properly end the packing/dispersion of a block. Eliminating it is possible,
|
|
|
+ but a load of risky work.
|
|
|
+3. Change `blockstore_processor::verify_ticks()` to turn off tick verification.
|
|
|
+
|
|
|
+### Duplicate block handling
|
|
|
+
|
|
|
+On all blocks descended from the Alpenglow genesis block:
|
|
|
+
|
|
|
+1. Turn off tower duplicate block handling
|
|
|
+2. Turn off epoch slots
|
|
|
+
|
|
|
+## Alternatives Considered
|
|
|
+
|
|
|
+Alpenswitch where we pick fixed slot intervals `N` at which to attempt to
|
|
|
+optimistically migrate to Alpenglow. On failure fallback to TowerBFT we try
|
|
|
+again at the next slot interval. This is more painful to implement because
|
|
|
+of the transition back and forth between Alpenglow and TowerBFT
|
|
|
+
|
|
|
+## Impact
|
|
|
+
|
|
|
+During the migration process, there will be a liveness impact for the duration
|
|
|
+of the migration, which is optimistically only one slot past the migration
|
|
|
+boundary slot.
|
|
|
+
|
|
|
+Validators will run Alpenglow after discovering the genesis block after the
|
|
|
+migration boundary.
|
|
|
+
|
|
|
+## Security Considerations
|
|
|
+
|
|
|
+N/A
|
|
|
+
|
|
|
+## Backwards Compatibility
|
|
|
+
|
|
|
+This feature is not backwards compatible.
|