0321-vm-r2-instruction-data-pointer.md 5.1 KB


simd: '0321' title: VM Register 2 Instruction Data Pointer authors:

  • Joe Caulfield (Anza) category: Standard type: Core status: Review created: 2025-07-11 feature: 5xXZc66h4UdB6Yq7FzdBxBiRAFMMScMLwHxk2QZDaNZL ---

Summary

Provide a pointer to instruction data in VM register 2 (r2) at program entrypoint, enabling direct access to instruction data without having to parse the accounts section of the serialized input region.

Motivation

Currently, sBPF programs must parse the accounts section of the serialized input region to locate instruction data. The serialization layout places accounts before instruction data, requiring programs to iterate through all accounts before reaching the instruction data section. This is inefficient for programs that primarily or exclusively need to access instruction data.

By providing a direct pointer to instruction data in r2, programs can immediately access this data without any parsing overhead, resulting in improved performance and reduced compute unit consumption.

New Terminology

  • Instruction data pointer: A 64-bit pointer (8 bytes) stored in VM register 2 that points directly to the start of the instruction data section in the input region.

Detailed Design

When the feature is activated, the VM shall set register 2 (r2) to contain a pointer to the beginning of the instruction data section within the input region. The instruction data format remains unchanged:

[8 bytes: data length (little-endian)][N bytes: instruction data]

This pointer in r2 is made available to all programs, under all loaders, regardless of whether or not the value is read. Prior to this feature, r2 contains uninitialized data at program entrypoint.

Despite technically being a breaking change, mainnet-beta testing with a modified Agave validator confirms no divergence in execution or consensus. This is because r2 can typically only be accessed uninitialized through contrived examples such as assembly manipulation or compiler bugs. The performance benefits are considered a reasonable tradeoff. See security section for more details.

Register Assignment:

  • r1: Input region pointer (existing behavior)
  • r2: Pointer to instruction data section (new)

Pointer Details:

  • The pointer in r2 points to the first byte of the actual instruction data, NOT the length field.
  • The pointer value in r2 is stored as a native 64-bit pointer (8 bytes) in little-endian format.
  • When there is no instruction data (length = 0), r2 still points to the offset immediately proceeding the instruction length counter; in this case, the first byte of the program ID, ensuring it will always point to valid, readable memory within the bounds of the input region.
  • The pointer must always point to valid memory within the input region bounds.

Alternatives Considered

  1. Provide a pointer to instruction data length: Store a pointer to the instruction data length field in r2. However, providing a direct pointer to the start of instruction data is more ergonomic.

  2. Provide optional entrypoint parameter: Allow programs to opt-in via a different entrypoint signature. The current approach is simpler as it avoids supporting multiple entrypoint signatures and makes the pointer universally available. This relies on the assumption that no programs depend on the garbage value previously in r2.

  3. Modify serialization layout: The serialization layout will eventually be overhauled with ABI v2, a comprehensive upgrade that could resolve this issue among many others. Given the significant scope of ABI v2 and potential for delays, this targeted optimization provides immediate value and remains compatible with ABI v2.

Impact

On-chain programs are positively impacted by this change. The new r2 pointer gives programs the ability to efficiently read instruction data, further customize their program's control flow and maximize compute unit effiency. However, any programs that currently depend on the uninitialized/garbage value in r2 at entrypoint will break when this feature is activated.

Core contributors must implement this feature, which should be extremely minimally invasive, depending on the VM implementation.

Security Considerations

Programs should read and validate the instruction data length (stored at r2 - 8) before accessing data via the r2 pointer. Failing to check the length could result in reading unintended memory contents or out-of-bounds access attempts.

Additionally, programs that currently rely on r2 containing uninitialized or garbage data at entrypoint will experience breaking changes when this feature is activated. While it is technically possible with assembly manipulations, no compiled code uses r2 with an uninitialized value except in the case of sol_log_64_ which is not a direct security concern as logs are not enshrined by consensus.

Backwards Compatibility

This feature is only backwards compatible for programs that currently do not read from r2 at program entrypoint.

This feature is NOT backwards compatible for any programs that depend on the uninitialized/garbage data previously in r2.