Developer Documentation for Numbuf¶
Numbuf is a library for the fast serialization of primitive Python objects (lists, tuples, dictionaries, NumPy arrays) to the Apache Arrow format.
-
class
numbuf::DictBuilder¶ Constructing dictionaries of key/value pairs. Sequences of keys and values are built separately using a pair of SequenceBuilders. The resulting Arrow representation can be obtained via the Finish method.
Public Functions
-
SequenceBuilder &
keys()¶ Builder for the keys of the dictionary.
-
SequenceBuilder &
vals()¶ Builder for the values of the dictionary.
Construct an Arrow StructArray representing the dictionary. Contains a field “keys” for the keys and “vals” for the values.
- Parameters
list_data-List containing the data from nested lists in the value list of the dictionary
dict_data-List containing the data from nested dictionaries in the value list of the dictionary
-
SequenceBuilder &
-
class
numbuf::SequenceBuilder¶ Public Functions
-
Status
Append()¶ Appending a none to the list.
-
Status
Append(bool data)¶ Appending a boolean to the list.
-
Status
Append(int64_t data)¶ Appending an int64_t to the list.
-
Status
Append(const char *data)¶ Appending a null-delimited string to the list.
-
arrow::Status
Append(const std::string &data)¶ Appending a C++ string to the list.
-
Status
Append(float data)¶ Appending a float to the list.
-
Status
Append(double data)¶ Appending a double to the list.
-
Status
Append(const std::vector<int64_t> &dims, double *data)¶ Appending a tensor to the list
- Parameters
dims-A vector of dimensions
data-A pointer to the start of the data block. The length of the data block will be the product of the dimensions
-
Status
AppendList(int32_t size)¶ Add a sublist to the list. The data contained in the list will be specified in the “Finish” method.
To construct l = [[11, 22], 33, [44, 55]] you would for example run list = ListBuilder(); list.AppendList(2); list.Append(33); list.AppendList(2); list.Finish([11, 22, 44, 55]); list.Finish();
- Parameters
size-The size of the sublist
Finish building the list and return the result.
-
Status
- template <typename T>
-
class
numbuf::TensorBuilder¶ This is a class for building a dataframe where each row corresponds to a Tensor (= multidimensional array) of numerical data. There are two columns, “dims” which contains an array of dimensions for each Tensor and “data” which contains data buffer of the Tensor as a flattened array.
Public Functions
-
Status
Append(const std::vector<int64_t> &dims, const elem_type *data)¶ Append a new tensor.
- Parameters
dims-The dimensions of the Tensor
data-Pointer to the beginning of the data buffer of the Tensor. The total length of the buffer is sizeof(elem_type) * product of dims[i] over i
-
std::shared_ptr<Array>
Finish()¶ Convert the tensors to an Arrow StructArray.
-
int32_t
length()¶ Number of tensors in the column.
-
Status