Dynamic performance
A rich understanding of the dynamic performance effects of CHERI was also an important aim of the work, but unlike the former goals, this goal was necessarily constrained by the limited one-year development timeline imposed by Digital Security by Design programme, driven by the need to carry Morello use into a broad range of industrial, government, and academic labs. Hence it is necessary to make the distinction between the industrial prototype built to gain understanding of new architecture versus what one would expect from a future first or second generation commercial product optimized for power and performance.
The dynamic performance aims for Morello were to create a hardware design able to enable the evaluation of the usage of capabilities within rich established software ecosystems and to demonstrate their practical viability and security benefit. A secondary aim was to enable the creation of software workloads suitable to drive future hardware optimisation, and to allow realistic projections of future performance — while not necessarily achieving optimal results in the current generation. The design-time constraints were:
-
Limited existing CHERI-enabled workloads against which to tune the implementation during initial development;
-
Limitations on changes to microarchitecture from the baseline N1SDP achievable in 12 months; and
-
Classical Armv8.2-A elements of the microarchitecture have seen significantly more optimization than added CHERI extensions, meaning that care is required when performing side-by-side comparisons.
These limitations mean that run-time performance measurements observed on Morello should not be regarded as a good predictor of expected CHERI performance overheads when creating a production-quality, later-generation microarchitecture, at least without further analysis in the style of this report.
Microarchitectural research and development has, however, continued following tapeout. Essential to our analysis has been the creation of post-tapeout Morello variations implemented on FPGA, based on the same RTL used in the shipped chip, allowing both more detailed performance analysis and also optimization of key parameters and behaviors driven by profiling data that we have already been able to collect on the shipped chip. This lets us use Morello-derived measurements to estimate performance of more optimized potential future implementations.
Please see later sections Performance methodology and Performance analysis of SPECint 2006 for significant greater detail on our approach and results presented in the remainder of this section.