Memclave Artifact Documentation
Loading...
Searching...
No Matches
The Memclave Client Library

The Memclave Client Library is used to interact with PIM ranks running Memclave. It serves as a replacement to UPMEM's host library and follows a similar programming paradigm. These pages document the libraries usage.

Building the Client Library

The Memclave Client Library is meant to be used together with other CMake based projects. A viable CMakeLists.txt for the project below could be

cmake_minimum_required(VERSION 3.20)
project(add-example C)
set(CMAKE_C_STANDARD 11)
add_subdirectory(ime-client-lib EXCLUDE_FROM_ALL)
add_executable(add add.c)
target_link_library(add PUBLIC ime-client-lib)

which would build the addition example. The subkernel still has to be build seperately.

Usage Example

We demonstrate a simple use-case as a usage example: A program that adds two integer vectors by transfering them to a PIM rank and then reads the result. First, just like in UPMEM's case, we have to allocate a rank of DPUs:

if (r.err) {
puts("Cannot allocate rank.");
return 1;
}
Definition vud.h:31
vud_error err
Definition vud.h:34
#define VUD_ALLOC_ANY
Definition vud.h:11
vud_rank vud_rank_alloc(int rank_nr)
allocate a single vud rank

Here you may already notice that Memclave uses a different mechanism for handling errors. Instead of explicitly returning error codes, errors are stored in a single variable of a rank. Future operations on the rank are NOPs, if an error value is set. This allows chaining multiple PIM operations, without having to do explicit error handling each time.

After allocation, we wait for the rank to become available. This is important for edge-cases, were the loader has just started up and is not yet ready. We also set the number of worker threads responsible for copying data.

void vud_rank_nr_workers(vud_rank *rank, unsigned n)
specify the number of worker threads
void vud_ime_wait(vud_rank *r)
wait until the whole rank has exposed the MUX to the guest system

Once we know the rank is ready, we can exchange a key with the DPU rank. For the example, we'll use a random key fetched from /dev/urandom.

uint8_t key[32];
random_key(key);
vud_ime_install_key(&r, key, NULL, NULL);
if (r.err) {
puts("key exchange failed");
goto error;
}
void vud_ime_install_key(vud_rank *r, const uint8_t key[32], const uint64_t common_pk[32], const uint64_t pk[64][32])
perform a key exchange with the rank and install a new user key

The key exchange will take roughly 10s. Once a session key is established, we can deploy a subkernel, in our case the addition subkernel to the rank and transfer input data.

// create some input data
uint64_t a[64];
uint64_t b[64];
for (int i = 0; i < 64; ++i) {
a[i] = i;
b[i] = 2 * i;
}
vud_ime_load(&r, "../add");
vud_broadcast_to(&r, 64, &a, "a");
vud_broadcast_to(&r, 64, &b, "b");
void vud_ime_load(vud_rank *r, const char *path)
set the next subkernel (ELF file not .sk) to load
void vud_broadcast_to(vud_rank *r, vud_mram_size sz, const uint64_t(*src)[sz], const char *symbol)
broadcast data to some variable in MRAM

Once all inputs are transfered and a kernel is loaded, we can start processing on the DPU side and wait for it to finish.

void vud_ime_launch(vud_rank *r)
load a subkernel (ELF file not .sk) on a rank of DPUs

Finally, all that is left is fetching back the data and confirming that everything worked out.

uint64_t c[64][64];
uint64_t* c_ptr[64];
for (int i = 0; i < 64; ++i) { c_ptr[i] = &c[i][0]; }
vud_gather_from(&r, 64, "c", &c_ptr);
for (int i = 0; i < 64; ++i) {
for (int j = 0; j < 64; ++j) {
assert(c[i][j] == 3 * j);
}
}
void vud_rank_free(vud_rank *rank)
release the rank back to the os and free associated resources
void vud_gather_from(vud_rank *r, vud_mram_size sz, const char *symbol, uint64_t *(*tgt)[64])
gather data per-DPU from some variable in MRAM

Here you may also notice that our memory transfer functions have a different signature. We use C99's array pointers to describe data movement, instead of UPMEM's transfer matrix approach.