Pointer provenance validity

CHERI C/C++ implement pointers using architectural capabilities, rather than using conventional 32-bit or 64-bit integers. This allows the provenance validity of language-level pointers to be protected by the provenance properties of CHERI architectural capabilities: only pointers implemented using valid capabilities can be dereferenced. Other types that contain pointers, uintptr_t and intptr_t, are similarly implemented using architectural capabilities, so that casts through these types can retain capability properties. When a dereference is attempted on a capability without a valid tag — including load, store, and instruction fetch — a hardware exception fires (see Capability-related faults).

On the whole, the effects of pointer provenance validity are non-disruptive to C/C++ source code. However, a number of cases exist in language runtimes and other (typically less portable) C code that conflate integers and pointers that can disrupt provenance validity. In general, generated code will propagate provenance validity in only two situations:

  • Pointer types The compiler will generate suitable code to propagate the provenance validity of pointers by using capability load and store instructions. This occurs when using a pointer type (e.g., void *) or an integer type defined as being able to hold a pointer (e.g., intptr_t). As with attempting to store 64-bit pointers in 32-bit integers on 64-bit architectures, passing a pointer through an inappropriate type will lead to truncation of metadata (e.g., the validity tag and bounds). It is therefore important that a suitable type be used to hold pointers.

    This pattern often occurs where an opaque field exists in a data structure — e.g., a long_t argument to a callback in older C code — that needs to be changed to use a capability-oblivious type such as intptr_t.

  • Capability-oblivious code In some portions of the C/C++ runtime and compiler-generated code, it may not be possible to know whether memory is intended to contain a pointer or not — and yet preserving pointers is desirable. In those cases, memory accesses must be performed in a way that preserves pointer provenance. In the C runtime itself, this includes memcpy, which must use capability load and store instructions to transparently propagate capability metadata and tags.

    A useful example of potentially surprising code requiring modification for CHERI C/C++ is qsort. Some C programs assume that qsort on an array of data structures containing pointers will preserve the usability of those pointers. As a result, qsort must be modified to perform memory copies using pointer-based types, such as intptr_t, when size and alignment require it.