Unveiling PolkaVM’s Secrets:How Programs Really Work ⚙️

This article describes how executable programs are represented in PolkaVM, covering the structure, encoding, and transformation of programs. We focuse on the internal representation of compiled code, not the process of compilation itself here. For information about the compilation pipeline, see Compilation Pipeline.

Overview

PolkaVM represents programs using a custom binary format called ProgramBlob. This format encapsulates all necessary components for program execution: code, data, jump tables, and symbols. Programs originate from either ELF binaries (compiled from languages like Rust) or from assembly source.

Sources:

ProgramBlob Format

The ProgramBlob is the core representation of a program in PolkaVM. It’s a binary format that contains all the information needed to execute a program.

Key Components

Component	Description
Magic+Version Header	Identifies the blob as a PolkaVM program and its version
Memory Config	Defines memory layout parameters (page size, section sizes)
Code Section	Contains the executable instructions
Read-only Data	Constant data that doesn’t change during execution
Read-write Data	Data that can be modified during program execution
Jump Table	Maps basic block indices to instruction offsets
Bitmask	Bitfield that marks instruction boundaries
Import/Export Symbols	Symbols for interfacing with external code

Program Header Structure

The blob starts with a header that contains:

Magic identifier “PVM\0” (4 bytes)
Size of the entire blob (variable-length encoding)
Memory configuration data

Building ProgramBlobs

The ProgramBlobBuilder class handles the construction of program blobs, managing the serialization of all components into the final binary format.

Sources:

Code Representation

Instruction Encoding

PolkaVM uses a variable-length instruction encoding scheme. Each instruction starts with an opcode byte, followed by variable-length operands. This compact representation balances code density with execution efficiency.

Key characteristics:

Variable-length instructions (1 to 17 bytes)
First byte is always the opcode
Operands use variable-length encoding for integers
Register operands are encoded efficiently
Relative jumps use offset-based encoding

Instruction Bitmask

To efficiently identify instruction boundaries without full decoding, PolkaVM uses a bitmask:

A bit is set (1) for the first byte of each instruction
Continuation bytes are marked with 0
Allows quick jumping to instruction boundaries

Code:    [Instr1][Instr2  ][Instr3][Instr4    ]
Bitmask: 1      1         1      1            
         ^      ^         ^      ^
         |      |         |      |
         Start  Start     Start  Start

Jump Table

The jump table maps basic block indices to their positions in the code section:

Enables efficient control flow transfers (jumps, branches)
Abstracts away the actual instruction offsets
Contains entries of variable size depending on offset magnitude

Sources:

Register Representation

PolkaVM uses a fixed set of registers that map to RISC-V registers. These are encoded efficiently in instructions.

Register Set

PolkaVM Register	RISC-V ABI Name	Description
`RA`	ra	Return address
`SP`	sp	Stack pointer
`T0`-`T2`	t0-t2	Temporary registers
`S0`-`S1`	s0-s1	Saved registers
`A0`-`A5`	a0-a5	Argument/return registers

Sources:

Memory Layout

The memory layout for PolkaVM programs follows a specific structure to ensure proper isolation and efficient access.

Memory Sections

Section	Description	Access
Read-only Data	Constants, string literals, etc.	Read-only
Read-write Data	Global variables, mutable data	Read-write
Stack	Local variables, call frames	Read-write
Heap	Dynamically allocated memory	Read-write
Auxiliary Data	Implementation-specific data	Implementation-defined

Memory Map Builder

The MemoryMapBuilder class handles the calculation of memory layout based on program requirements:

Ensures proper alignment of sections
Handles page size requirements
Calculates maximum heap size based on available address space

Sources:

Program Transformation Process

The process of transforming source programs (ELF binaries or assembly) into the PolkaVM representation involves several steps.

ELF to ProgramBlob Conversion

Parse the ELF file to extract sections and symbols
Process code sections by converting RISC-V instructions to PolkaVM instructions
Handle relocations and symbol references
Place data sections in appropriate locations
Generate the jump table and bitmask
Construct the final ProgramBlob

Assembly to ProgramBlob Conversion

Assembly is parsed directly into PolkaVM instructions, which are then assembled into a ProgramBlob. This provides a more direct route for creating PolkaVM programs.

Sources:

Instruction Visitor Pattern

PolkaVM uses the Visitor pattern to process instructions. The OpcodeVisitor trait defines an interface for traversing and operating on instructions:

This pattern enables:

Efficient instruction traversal
Different operations (execution, disassembly, analysis) sharing the same traversal mechanism
Separation of instruction decoding from operation

Sources:

Symbol Handling

PolkaVM programs can import and export symbols for interaction with host functions and to expose functionality.

Import Symbols

Imports represent external functions that the program can call:

Each import has a unique name
Host environments provide implementations for these functions
Calls to imports are handled via special ecalli instructions

Export Symbols

Exports are functions within the program that can be called from outside:

Each export has a name and a target (basic block index)
The main entry point of a program is typically an export
External code can invoke exports by name

Sources:

Variable-Length Integer Encoding

PolkaVM uses variable-length encoding for integer values to save space in instructions. Two main encoding schemes are used:

Varint Encoding

Used for general-purpose variable-length integers:

Small values use fewer bytes
First byte contains size information and part of the value
Can encode 32-bit values in 1-5 bytes

Simple Varint Encoding

Used for instruction operands:

Optimized for the common case of small values
Number of bytes determined by the instruction’s “skip” value
Efficiently handles sign extension

Sources:

Conclusion

The program representation in PolkaVM is designed to efficiently store and execute programs. The ProgramBlob format encapsulates all necessary components (code, data, jump table, etc.) in a compact binary representation. The variable-length instruction encoding and efficient memory layout ensure that programs can be executed with minimal overhead.

This representation is a core part of the PolkaVM system, bridging the gap between source programs (ELF binaries or assembly) and the execution engine. Understanding this representation is essential for working with the PolkaVM ecosystem.

Unveiling PolkaVM's Secrets:How Programs Really Work ⚙️

A Deep Dive into the ProgramBlob:Code, Data, and Everything In Between 📦

Overview

ProgramBlob Format

Key Components

Program Header Structure

Building ProgramBlobs

Code Representation

Instruction Encoding

Instruction Bitmask

Jump Table

Register Representation

Register Set

Memory Layout

Memory Sections

Memory Map Builder

Program Transformation Process

ELF to ProgramBlob Conversion

Assembly to ProgramBlob Conversion

Instruction Visitor Pattern

Symbol Handling

Import Symbols

Export Symbols

Variable-Length Integer Encoding

Varint Encoding

Simple Varint Encoding

Conclusion