Plasma: Storing objects in memory

Plasma is a shared region of memory that allows multiple processes running on the same machine to access shared data objects.

It can be used both as a Ray service and a library in your own programs.

An object is created in two distinct phases:

  1. Allocate memory and write data into allocated memory. If the size of the data is not known in advance, the buffer can be resized. Note that during this phase the buffer is writable, but only by its creator. No one else can access the buffer during this phase.
  2. Seal the buffer. Once the creator finishes writing data into buffer it seals the buffer. From this moment on the buffer becomes immutable and other processes can read it.

To create an object, the user specifies a unique identifier for the object and an optional name. Plasma keeps track of the process id that created the object, the creation time stamp, how long creation of the object took and the size of the object. During creation, the user can also specify metadata that will be associated with the object.

Other processes can request an object by its unique identifier (later also by name). If the object has not been created or sealed yet, the process requesting the object will block until the object has been sealed.

The Buffer interface

A buffer is the region of memory associated to a data object, as determined by a start address and a size in bytes. There are two kinds of buffers, read-only buffers and read-write buffers.

class plasma::Buffer

Read-only view on data

Subclassed by plasma::MutableBuffer

Public Functions

const uint8_t *data()

Return the start address of the buffer.

const uint8_t *data(uint64_t offset)

Return an address corresponding to an “offset” in this buffer

int64_t size()

Return the size of the object in bytes

MutableBuffers have a richer interface, they allow writing to and resizing the object. When the object creator has finished modifying the object, it calls the Seal method to make the object immutable, which allows other processes to read the object.

class plasma::MutableBuffer

Mutable view on data

Inherits from plasma::Buffer

Public Functions

MutableBuffer()

After the default constructor has been called, the class is not functional and all methods will raise errors. Only after it has been initialized by ClientContext::BuildObject can this class be used.

uint8_t *mutable_data()

Return the start address of the buffer (mutable).

uint8_t *mutable_data(uint64_t offset)

Return an address corresponding to an “offset” in this buffer (mutable).

Status Resize(int64_t new_size)

Resize the buffer.

Parameters
  • new_size -

    New size of the buffer (in bytes).

Status Seal()

Make the data contained in this buffer immutable. After the buffer has been sealed, it is illegal to modify data from the buffer or to resize the buffer.

bool sealed()

Has this MutableBuffer been sealed?

The Plasma client interface

The developer interacts with Plasma through the Plasma API. Each process needs to instantiate a ClientContext, which will give the process access to objects and their metadata and allow them to write objects.

class plasma::ClientContext

A client context is the primary interface through which clients interact with Plasma.

Public Functions

ClientContext(const std::string &address)

Create a new client context.

Parameters
  • address -

    Adress of the Ray shell socket we are connecting to

Status BuildObject(ObjectID object_id, int64_t size, MutableBuffer &buffer, const std::string &name = "", const std::map<std::string, Buffer> &metadata = EMPTY)

Build a new object. Building an object involves multiple steps. Once the creator process finishes to construct the objects, it seals the object. Only after that can it be shared with other processes.

Parameters
  • object_id -

    The object ID of the newly created objects. Provided by the client, which must ensure it is globally unique.

  • size -

    The number of bytes that are allocated for the object initially. Can be reallocated through the MutableBuffer interface.

  • buffer -

    The function will pass the allocated buffer to the client using this argument.

  • name -

    An optional name for the object through which is can be accessed without knowing its object ID.

  • metadata -

    An optional dictionary of metadata for the object. The keys of the dictionary are strings and the values are arbitrary binary data represented by Buffer objects.

Status GetObject(ObjectID object_id, Buffer &buffer)

Get buffer associated to an object ID. If the object has not been sealed yet, this function will block the current thread.

Parameters
  • object_id -

    The object ID of the object that shall be retrieved.

  • buffer -

    The argument is used to pass the read-only buffer to the client.

Status ListObjects(std::vector<ObjectInfo> *objects)

Put object information of objects in the store into the vector objects.

Status GetMetadata(ObjectID object_id, const std::string &key, Buffer &data)

Retrieve metadata for a given object.

Return
A view on the metadata associated to that key.
Parameters
  • key -

    The key of the metadata information to be retrieved.

Plasma metadata

There are two parts to the object metadata: One internally maintained by Plasma an one provided by the user. The first part is represented by the ObjectInfo class.

class plasma::ObjectInfo

Public Members

std::string name

Name of the object as provided by the user during object construction.

int64_t size

Size of the object in bytes.

int64_t create_time

Time when object construction started, in microseconds since the Unix epoch.

int64_t construct_duration

Time in microseconds between object creation and sealing.

int64_t creator_id

Process ID of the process that created the object.

std::string creator_address

Cluster wide unique address for the process that created the object.

Each object has a small dictionary that can hold metadata provided by users. Users can provide arbitrary information here. It is most likely going to be used to store information like format (binary, arrow, protobuf, json) and schema, which could hold a schema for the data.

An example application

We are going to have more examples here. Currently, the best way of understanding the API is by looking at libplasma, the Python C extension for Plasma. It can be found in https://github.com/amplab/ray-core/blob/master/src/plasma/client/plasma.cc.

Note that this is not the Python API that users will interact with. It can be used like this:

import libplasma

plasma = libplasma.connect("/home/pcmoritz/shell")

A = libplasma.build_object(plasma, 1, 1000, "object-1")
libplasma.seal_object(A)
B = libplasma.build_object(plasma, 2, 2000, "object-2")
libplasma.seal_object(B)

libplasma.list_objects(plasma)