Layers API

Layers are the basic building blocks of neural networks in FlexFlow. The inputs of a layer consists of a tensor or a list of tensors and some state variables, and the outputs of a layer is a tensor or a list of tensors.

Conv2D

class flexflow.core.flexflow_cffi.FFModel
conv2d(input, out_channels, kernel_h, kernel_w, stride_h, stride_w, padding_h, padding_w, activation=ActiMode.AC_MODE_NONE, groups=1, use_bias=True, shared_op=None, kernel_initializer=None, bias_initializer=None, name=None)

This layer creates a 2D convolution kernel that is convolved with the layer input to produce a tensor of output.

The size of input tensor is \((N, C_{in}, H, W)\) and the size of output tensor is \((N, C_{out}, H_{out}, W_{out})\), which can be calculated by:

\[C_{out} = out\_channels\]
\[K_{H} = kernel\_h\]
\[K_{W} = kernel\_w\]
\[S_{H} = stride\_h\]
\[S_{W} = stride\_w\]
\[P_{H} = padding\_h\]
\[P_{S} = padding\_s\]
\[H_{out} = (H - K_{H} + 2 * P_{H}) / S_{H} + 1 \]
\[W_{out} = (W - K_{W} + 2 * P_{W}) / S_{W} + 1 \]
Parameters
  • input (Tensor) – the input Tensor.

  • out_channels (int) – the dimensionality of the output space (i.e. the number of output filters in the convolution).

  • kernel_h (int) – the height of the 2D convolution window: \(K_{H}\).

  • kernel_w (int) – the width of the 2D convolution window: \(K_{W}\).

  • stride_h (int) – the stride of the convolution along the height: \(S_{H}\).

  • stride_w (int) – the stride of the convolution along the width: \(S_{W}\).

  • padding_h (int) – the amount of implicit zero-paddings along the height: \(P_{H}\).

  • padding_w (int) – the amount of implicit zero-paddings along the width: \(P_{W}\).

  • activation (ActiMode) – Activation function to use. Default is ActiMode.AC_MODE_NONE.

  • groups (int) – the number of groups in this convolution

  • use_bias (bool) – whether the layer uses a bias vector. Default is True.

  • shared_op (Op) – the layer whose parameters are shared with. Default is None.

  • kernel_initializer (Initializer) – Initializer for the kernel weights matrix. If it is set to None, the GlorotUniformInitializer is applied.

  • bias_initializer (Initializer) – Initializer for the bias vector. If it is set to None, the ZeroInitializer is applied.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Pool2D

class flexflow.core.flexflow_cffi.FFModel
pool2d(input, kernel_h, kernel_w, stride_h, stride_w, padding_h, padding_w, pool_type=PoolType.POOL_MAX, activation=ActiMode.AC_MODE_NONE, name=None)

Pooling operation for 2D spatial data.

The size of input tensor is \((N, C_{in}, H, W)\) and the size of output tensor is \((N, C_{out}, H_{out}, W_{out})\), which can be calculated by:

\[C_{out} = out\_channels\]
\[K_{H} = kernel\_h\]
\[K_{W} = kernel\_w\]
\[S_{H} = stride\_h\]
\[S_{W} = stride\_w\]
\[P_{H} = padding\_h\]
\[P_{S} = padding\_s\]
\[H_{out} = (H - K_{H} + 2 * P_{H}) / S_{H} + 1 \]
\[W_{out} = (W - K_{W} + 2 * P_{W}) / S_{W} + 1 \]
Parameters
  • input (Tensor) – the input Tensor.

  • kernel_h (int) – the height of the 2D pooling window: \(K_{H}\).

  • kernel_w (int) – the width of the 2D pooling window: \(K_{W}\).

  • stride_h (int) – the stride of the pooling along the height: \(S_{H}\).

  • stride_w (int) – the stride of the pooling along the width: \(S_{W}\).

  • padding_h (int) – the amount of implicit zero-paddings along the height: \(P_{H}\).

  • padding_w (int) – the amount of implicit zero-paddings along the width: \(P_{W}\).

  • activation (ActiMode) – Tyoe of pooling function to use. If you don’t specify anything, PoolType.POOL_MAX is applied.

  • activation – Activation function to use. Default is ActiMode.AC_MODE_NONE.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Dense

class flexflow.core.flexflow_cffi.FFModel
dense(input, out_dim, activation=ActiMode.AC_MODE_NONE, use_bias=True, datatype=DataType.DT_NONE, shared_op=None, kernel_initializer=None, bias_initializer=None, kernel_regularizer=None, name=None)

Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).

The size of input tensor is \((N, C_{in})\) and the size of output tensor is \((N, C_{out})\), where \(C_{out} = out\_dim\)

Parameters
  • input (Tensor) – the input Tensor.

  • out_dim (int) – dimensionality of the output space.

  • activation (ActiMode) – Activation function to use. Default is ActiMode.AC_MODE_NONE.

  • use_bias (bool) – whether the layer uses a bias vector. Default is True.

  • shared_op (Op) – the layer whose parameters are shared with. Default is None.

  • kernel_initializer (Initializer) – Initializer for the kernel weights matrix. If it is set to None, the GlorotUniformInitializer is applied.

  • bias_initializer (Regularizer) – Initializer for the bias vector. If it is set to None, the ZeroInitializer is applied.

  • kernel_regularizer – Regularizer for the kernel weights matrix

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Embedding

class flexflow.core.flexflow_cffi.FFModel
embedding(input, num_embeddings, embedding_dim, aggr, dtype=DataType.DT_FLOAT, shared_op=None, kernel_initializer=None, name=None)

Layer that turns positive integers into dense vectors of fixed size

Parameters
  • input (Tensor) – the input Tensor.

  • num_embeddings (int) – size of the vocabulary, i.e. maximum integer index + 1

  • embedding_dim (int) – dimension of the dense embedding.

  • aggr (AggrMode) – aggregation mode. Options are AGGR_MODE_NONE, AGGR_MODE_SUM and AGGR_MODE_AVG.

  • dtype (DataType) – the tensor data type. Options are DT_BOOLEAN, DT_INT32, DT_INT64, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_INT4, DT_INT8, DT_NONE

  • shared_op (Op) – the layer whose parameters are shared with. Default is None.

  • kernel_initializer (Initializer) – Initializer for the kernel weights matrix. If it is set to None, the GlorotUniformInitializer is applied.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Transpose

class flexflow.core.flexflow_cffi.FFModel
transpose(input, perm, name=None)

Transposes the input tensor. Permutes the dimensions according to perm

Parameters
  • input (Tensor) – the input Tensor.

  • perm (List of int) – A permutation of the dimensions of a.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Reverse

class flexflow.core.flexflow_cffi.FFModel
reverse(input, axis, name=None)

Layer that reverses specific dimensions of a tensor.

Given a input tensor, this operation reverses the dimension axis.

Parameters
  • input (Tensor) – the input Tensor.

  • axis (int) – the dimension to reverse.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Concatenate

class flexflow.core.flexflow_cffi.FFModel
concat(tensors, axis, name=None)

Layer that concatenates a list of inputs.

It takes as input a list of tensors, all of the same shape except for the concatenation axis, and returns a single tensor that is the concatenation of all inputs.

Parameters
  • input (List of Tensors) – the list of input Tensors.

  • axis (int) – the dimension along which to concatenate.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Split

class flexflow.core.flexflow_cffi.FFModel
split(input, sizes, axis, name=None)

Layer that splits a input tensor into a list of tensors.

Parameters
  • input (Tensor) – the input Tensor.

  • sizes (int or list of int) – either an int indicating the number of splits along axis or a Python list containing the sizes of each output tensor along axis. If a scalar, then it must evenly divide input.dims[axis]; otherwise the sum of sizes along the split axis must match that of the input.

  • axis (int) – the dimension along which to split.

  • name (string) – the name of the layer. Default is None.

Returns

list of Tensors – the output tensors.

Reshape

class flexflow.core.flexflow_cffi.FFModel
reshape(input, shape, name=None)

Layer that reshapes inputs into the given shape.

Given a input tensor, this operation returns a output tensor that has the same values as tensor in the same order, except with a new shape given by shape.

Parameters
  • input (Tensor) – the input Tensor.

  • shape (list of int) – A list defining the shape of the output tensor.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Flat

class flexflow.core.flexflow_cffi.FFModel
flat(input, name=None)

Flattens the input. Does not affect the batch size.

Parameters
  • input (Tensor) – the input Tensor.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

BatchNorm

class flexflow.core.flexflow_cffi.FFModel
batch_norm(input, relu=True, name=None)

Layer that normalizes its inputs.

Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1.

Parameters
  • input (Tensor) – the list of input Tensors.

  • relu (bool) – whether a ReLU function is applied. Default is True.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

BatchMatMul

class flexflow.core.flexflow_cffi.FFModel
batch_matmul(A, B, a_seq_length_dim=None, b_seq_length_dim=None, name=None)

Layer that applied batched matrix multiplication onto two input Tensors, output = x * y.

Parameters
  • A (Tensor) – the first input Tensor.

  • B (Tensor) – the second input Tensor.

  • a_seq_length_dim (int) – an int when set indicating the a_seq_length_dim dimention of A is a sequence_length dimension

  • b_seq_length_dim (int) – an int when set indicating the b_seq_length_dim dimention of B is a sequence_length dimension

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Add

class flexflow.core.flexflow_cffi.FFModel
add(x, y, inplace_a=False, name=None)

Layer that adds two input Tensors, output = x + y.

Parameters
  • x (Tensor) – the first input Tensor.

  • y (Tensor) – the second input Tensor.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Subtract

class flexflow.core.flexflow_cffi.FFModel
subtract(x, y, inplace_a=False, name=None)

Layer that subtracts two input Tensors, output = x * y.

Parameters
  • x (Tensor) – the first input Tensor.

  • y (Tensor) – the second input Tensor.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Multiply

class flexflow.core.flexflow_cffi.FFModel
multiply(x, y, inplace_a=False, name=None)

Layer that multiplies (element-wise) two input Tensors, output = x * y.

Parameters
  • x (Tensor) – the first input Tensor.

  • y (Tensor) – the second input Tensor.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Divide

class flexflow.core.flexflow_cffi.FFModel
divide(x, y, inplace_a=False, name=None)

Layer that divides (element-wise) two input Tensors, output = x / y.

Parameters
  • x (Tensor) – the first input Tensor.

  • y (Tensor) – the second input Tensor.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Exponential

class flexflow.core.flexflow_cffi.FFModel
exp(x, name=None)

Exponential activation function.

Parameters
  • x (Tensor) – the input Tensor.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

ReLU

class flexflow.core.flexflow_cffi.FFModel
relu(input, inplace=True, name=None)

Rectified Linear Unit activation function.

Parameters
  • input (Tensor) – the input Tensor.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

ELU

class flexflow.core.flexflow_cffi.FFModel
elu(input, inplace=True, name=None)

Exponential Linear Unit. activation function.

Parameters
  • input (Tensor) – the input Tensor.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Sigmoid

class flexflow.core.flexflow_cffi.FFModel
sigmoid(input, name=None)

Sigmoid activation function, \(sigmoid(x) = 1 / (1 + exp(-x))\).

Parameters
  • input (Tensor) – the input Tensor.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Tanh

class flexflow.core.flexflow_cffi.FFModel
tanh(input, name=None)

Hyperbolic tangent activation function.

Parameters
  • input (Tensor) – the input Tensor.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Softmax

class flexflow.core.flexflow_cffi.FFModel
softmax(input, axis=- 1, name=None)

Softmax activation function.

Parameters
  • input (Tensor) – the input Tensor.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

Dropout

class flexflow.core.flexflow_cffi.FFModel
dropout(input, rate, seed, name=None)

The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged.

Parameters
  • input (Tensor) – the input Tensor.

  • rate (float(0-1)) – Fraction of the input units to drop.

  • seed (int) – random seed.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.

MultiheadAttention

class flexflow.core.flexflow_cffi.FFModel
multihead_attention(query, key, value, embed_dim, num_heads, kdim=0, vdim=0, dropout=0.0, bias=True, add_bias_kv=False, add_zero_attn=False, kernel_initializer=None, name=None)

Defines the MultiHead Attention operation as described in Attention Is All You Need which takes in the tensors query, key, and value, and returns the dot-product attention between them:.

Parameters
  • query (Tensor) – the query Tensor.

  • key (Tensor) – the key Tensor.

  • value (Tensor) – the value Tensor.

  • embed_dim (int) – total dimension of the model

  • num_heads (int) – Number of attention heads.

  • kdim (int) – total number of features in key. Default is 0

  • vdim (int) – total number of features in value. Default is 0

  • dropout (float(0-1)) – a Dropout layer on attn_output_weights. Default is 0.0

  • bias (bool) – Whether the dense layers use bias vectors. Default is True.

  • add_bias_kv (bool) – add bias to the key and value sequences at dim=0. Default is False.

  • add_zero_attn (bool) – add a new batch of zeros to the key and value sequences at dim=1. Default is False.

  • kernel_initializer (Initializer) – Initializer for dense layer kernels. If it is set to None, the GlorotUniformInitializer is applied.

  • name (string) – the name of the layer. Default is None.

Returns

Tensor – the output tensor.