Developer Documentation for Numbuf¶
Numbuf is a library for the fast serialization of primitive Python objects (lists, tuples, dictionaries, NumPy arrays) to the Apache Arrow format.
-
class
numbuf::
DictBuilder
¶ Constructing dictionaries of key/value pairs. Sequences of keys and values are built separately using a pair of SequenceBuilders. The resulting Arrow representation can be obtained via the Finish method.
Public Functions
-
SequenceBuilder &
keys
()¶ Builder for the keys of the dictionary.
-
SequenceBuilder &
vals
()¶ Builder for the values of the dictionary.
Construct an Arrow StructArray representing the dictionary. Contains a field “keys” for the keys and “vals” for the values.
- Parameters
list_data
-List containing the data from nested lists in the value list of the dictionary
dict_data
-List containing the data from nested dictionaries in the value list of the dictionary
-
SequenceBuilder &
-
class
numbuf::
SequenceBuilder
¶ A Sequence is a heterogeneous collections of elements. It can contain scalar Python types, lists, tuples, dictionaries and tensors.
Public Functions
-
Status
Append
()¶ Appending a none to the sequence.
-
Status
Append
(bool data)¶ Appending a boolean to the sequence.
-
Status
Append
(int64_t data)¶ Appending an int64_t to the sequence.
-
Status
Append
(uint64_t data)¶ Appending an uint64_t to the sequence.
-
Status
Append
(const char *data, int32_t length)¶ Appending a string to the sequence.
-
Status
Append
(float data)¶ Appending a float to the sequence.
-
Status
Append
(double data)¶ Appending a double to the sequence.
-
arrow::Status
Append
(const std::vector<int64_t> &dims, uint8_t *data)¶ Appending a tensor to the sequence
- Parameters
dims
-A vector of dimensions
data
-A pointer to the start of the data block. The length of the data block will be the product of the dimensions
-
Status
AppendList
(int32_t size)¶ Add a sublist to the sequenc. The data contained in the sublist will be specified in the “Finish” method.
To construct l = [[11, 22], 33, [44, 55]] you would for example run list = ListBuilder(); list.AppendList(2); list.Append(33); list.AppendList(2); list.Finish([11, 22, 44, 55]); list.Finish();
- Parameters
size
-The size of the sublist
Finish building the sequence and return the result.
-
Status
- template <typename T>
-
class
numbuf::
TensorBuilder
¶ This is a class for building a dataframe where each row corresponds to a Tensor (= multidimensional array) of numerical data. There are two columns, “dims” which contains an array of dimensions for each Tensor and “data” which contains data buffer of the Tensor as a flattened array.
Public Functions
-
Status
Append
(const std::vector<int64_t> &dims, const elem_type *data)¶ Append a new tensor.
- Parameters
dims
-The dimensions of the Tensor
data
-Pointer to the beginning of the data buffer of the Tensor. The total length of the buffer is sizeof(elem_type) * product of dims[i] over i
-
std::shared_ptr<Array>
Finish
()¶ Convert the tensors to an Arrow StructArray.
-
int32_t
length
()¶ Number of tensors in the column.
-
Status