CAPcelerate Project

Department of Computer Science and Technology

Part of the

Outline

CAPcelerate is a £1.2m project funded by the UK Industrial Strategy Challenge Fund's Digital Security by Design (DSbD) programme, led by Prof. Timothy Jones and Dr A. Theodore Markettos. We are considering the implications of using CHERI capabilities in various classes of accelerators, such as GPUs, AI and crypto accelerators and FPGAs. In particular we are interested in workloads that have rich software stacks that interwork with CPU code in shared memory. We are considering whether it is feasible to add capability support to such accelerators and their software stacks (compilers, drivers, libraries, etc), or whether alternative schemes can be used to protect from accelerators that might be running malicious software, or potentially malicious hardware itself (for example, a malicious external Thunderbolt GPU).

Team

Timothy Jones is the Principal Investigator
Theo Markettos is a Co-Investigator working on GPU characterisation and tracing
Matthew Naylor and Alexandre Joannou are exploring CHERI-enabled GPGPU hardware
Paul Metzger is working on GPU driver compartments
Jianyi Cheng is working on CHERI protections for custom accelerators

News

14 Mar 2025 A technical report about adding CHERI support to SIMTight is now available.
1 Aug 2024 A paper about our SIMTight GPGPU has been accepted to ICCD 2024. This paper covers our baseline design and dynamic scalarisation features. A seperate paper about adding CHERI support to SIMTight is in preparation.
16 Feb 2023 We implemented a low-level GPU device driver that runs in user space and uses one of the CHERI compartmentalisation techniques that are in development. We tested it with a set of OpenGL game traces on Morello. We plan to implement support for multiple client applications in the next months.
25 Nov 2022 We finished a security review of the interfaces between our user space driver and the kernel, and between our driver and the layer on top of it in user space.
4 Nov 2022: Our SIMTight GPGPU now supports functional unit sharing between vector lanes. This can be used to implement area-expensive instructions, such as CHERI bounds-setting instructions, at low hardware cost, while still allowing maximum run-time performance when the instruction is scalarisable.
25 Oct 2022 We fixed a bug in CheriBSDs virtual memory system that caused significant performance issues with some of the game traces that we use for our device driver work.
12 Oct 2022: We presented a poster about GPU driver compartments at the DSbD all hands meeting in Wolverhampton.
7 Oct 2022: Our SIMTight GPGPU now fully implements register file compression by supporting a hardware mechanism to dynamically spill registers to main memory when the on-chip storage available to the register file is exhausted. This extends our earlier work on dynamic scalarisation.
06 Sept 2022 We recorded OpenGL traces with a set of games for our GPU device driver work and ran them on Morello with a purecap graphics stack.
26 Aug 2022: Our SIMTight GPGPU now supports a scalarised vector store buffer, reducing the cost of compiler-inserted register spills at low hardware cost. This is particularly useful when the register being spilled is a capability, reducing the memory bandwidth overhead CHERI's double-sized pointers. This extends our earlier work on dynamic scalarisation.
21 May 2022: Our SIMTight GPGPU now supports parallel scalar/vector pipelines, building upon our earlier work on dynamic scalarisation. It allows scalarisable instructions to be executed in parallel with vector ones, doubling performance density in workloads with sufficient scalar behaviour. In future, this feature may be used to implement commonly-scalarisable CHERI instructions at low hardware cost.
12 May 2022 The first Morello board arrived.
7 Apr 2022: We presented a poster about capabilities for heterogeneous accelerators at the DSbD all hands meeting in London.
7 Feb 2022 We implemented a proof of concept for our GPU device driver work that runs a portion of the low-level GPU specific code that is normally part of the kernel in user space. We implemented this on Linux and an ARM development board. We used a set of compute kernels for preliminary performance investigations.
8 Dec 2021: Our SIMTight GPGPU now implements dynamic scalarisation. This has the potential to largely eliminate the on-chip storage overhead of CHERI's double-sized registers by exploiting the redundancy of capability meta-data between hardware threads.
10 Aug 2021: We have released SIMTight, a fully-synthesisable RISC-V GPGPU with support CHERI! This includes a CUDA-like programming API called NoCL along with a suite of benchmarks which run in pure capability mode.
7 May 2021: We presented a poster about adding CHERI support to GPGPUs at the DSbD all hands virtual meeting.

Posters

Better Security for GPU Drivers

Capabilities for Heterogenous Accelerators

Towards a CHERI-enabled GPU

Releases

SIMTight, our fully synthesisable CHERI-enabled RISC-V GPGPU with high performance density on Intel's Stratix 10 FPGA board.
NoCL, our CUDA-like programming API (and benchmark suite) in plain C++ that runs in pure capability mode.
Blarney, our Haskell library for hardware description, used to implement SIMTight.