Capability representation in memory

Underlying implementations of CHERI are diverse, spanning 32-bit microcontrollers (such as Microsoft's CHERIoT) to 64-bit server-class processors (such as Arm's Morello). CHERI C/C++ provide broad flexibility for implementations to represent capability metadata in the ways most suitable to their individual requirements. One specific area in which CHERI implementations may differ is in the specific in-memory representations of capabilities, due to not just different address sizes, but also different tradeoffs around bounds compression, permissions, and so on.

CHERI C/C++ in general expect that capabilities will be accessed via pointer types, with operations such as dereferencing a pointer or performing pointer arithmetic implemented by compiler-generated code. Broadly, a capability consists of three parts: An address, inline metadata, and a validity tag. When stored in memory, CHERI capabilities are twice the size of the native address type (e.g., 128 bits on 64-bit systems, and 64 bits on 32-bit systems), in addition to an unaddressable tag bit. There is one tag per capability aligned region of memory, and hence capabilities must themselves be stored at capability alignment.

Non-portability of the in-memory representation

To the greatest extent possible, it is desirable to write portable CHERI C/C++ code that never directly interprets the in-memory representation of a capability, with the exception of NULL values (see below). Portable access to capability fields must be made using the CHERI C APIs to get and set capability properties.

However, there are cases in which writing non-portable CHERI C/C++ code is both acceptable and essential, such as in the implementation of compilation, linking, debugging, and tracing tools intentionally targeting specific target architectures. This is especially true when code will not be operating on the target architecture itself, such as for cross-compilation, cross-linkage, and cross-debugging, including in accessing core dumps. In these cases, architecture specifications must be referenced in writing encoding and decoding code, as there are significant variations between platforms, and platforms themselves may also have parametizable elements to their encoding.

In-memory representation of NULL pointers

Conventional, integer-based architectures implement NULL pointers integers with a value of 0. CHERI C/C++ similarly represents NULL as an all-zero capability value with zero tag value, which is the only portable aspect of the in-memory representation of a CHERI capability.

This has a number of implications, including that zero-filled memory with zeroed tag values will be interpreted as being NULL-filled, as is the case with conventional runtimes for C/C++. This is particularly relevant for automatically initialized variable values (such as global variables without specific initialization values), pre-zeroed memory allocated by calloc(), or memory explicitly zeroed using memset(p, 0, n). Similarly, storing NULL pointer values in memory will result in that memory being zeroed.

The following code will always succeed:

	void *p = NULL;
	char zeroes[sizeof(p)];

	/* NULL == 0. */
	assert(p == 0);
	assert(cheri_address_get(p) == 0);

	/* All bytes in the NULL pointer are 0. */
	memset(zeroes, 0, sizeof(zeroes));
	assert(memcmp(&p, zeroes, sizeof(p)) == 0);