Layers API
Layers are the basic building blocks of neural networks in FlexFlow. The inputs of a layer consists of a tensor or a list of tensors and some state variables, and the outputs of a layer is a tensor or a list of tensors.
Conv2D
- class flexflow.core.flexflow_cffi.FFModel
- conv2d(input, out_channels, kernel_h, kernel_w, stride_h, stride_w, padding_h, padding_w, activation=ActiMode.AC_MODE_NONE, groups=1, use_bias=True, shared_op=None, kernel_initializer=None, bias_initializer=None, name=None)
This layer creates a 2D convolution kernel that is convolved with the layer
input
to produce a tensor ofoutput
.The size of input tensor is \((N, C_{in}, H, W)\) and the size of output tensor is \((N, C_{out}, H_{out}, W_{out})\), which can be calculated by:
\[C_{out} = out\_channels\]\[K_{H} = kernel\_h\]\[K_{W} = kernel\_w\]\[S_{H} = stride\_h\]\[S_{W} = stride\_w\]\[P_{H} = padding\_h\]\[P_{S} = padding\_s\]\[H_{out} = (H - K_{H} + 2 * P_{H}) / S_{H} + 1 \]\[W_{out} = (W - K_{W} + 2 * P_{W}) / S_{W} + 1 \]- Parameters
input (Tensor) – the input Tensor.
out_channels (int) – the dimensionality of the output space (i.e. the number of output filters in the convolution).
kernel_h (int) – the height of the 2D convolution window: \(K_{H}\).
kernel_w (int) – the width of the 2D convolution window: \(K_{W}\).
stride_h (int) – the stride of the convolution along the height: \(S_{H}\).
stride_w (int) – the stride of the convolution along the width: \(S_{W}\).
padding_h (int) – the amount of implicit zero-paddings along the height: \(P_{H}\).
padding_w (int) – the amount of implicit zero-paddings along the width: \(P_{W}\).
activation (ActiMode) – Activation function to use. Default is ActiMode.AC_MODE_NONE.
groups (int) – the number of groups in this convolution
use_bias (bool) – whether the layer uses a bias vector. Default is True.
shared_op (Op) – the layer whose parameters are shared with. Default is None.
kernel_initializer (Initializer) – Initializer for the kernel weights matrix. If it is set to None, the GlorotUniformInitializer is applied.
bias_initializer (Initializer) – Initializer for the bias vector. If it is set to None, the ZeroInitializer is applied.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Pool2D
- class flexflow.core.flexflow_cffi.FFModel
- pool2d(input, kernel_h, kernel_w, stride_h, stride_w, padding_h, padding_w, pool_type=PoolType.POOL_MAX, activation=ActiMode.AC_MODE_NONE, name=None)
Pooling operation for 2D spatial data.
The size of input tensor is \((N, C_{in}, H, W)\) and the size of output tensor is \((N, C_{out}, H_{out}, W_{out})\), which can be calculated by:
\[C_{out} = out\_channels\]\[K_{H} = kernel\_h\]\[K_{W} = kernel\_w\]\[S_{H} = stride\_h\]\[S_{W} = stride\_w\]\[P_{H} = padding\_h\]\[P_{S} = padding\_s\]\[H_{out} = (H - K_{H} + 2 * P_{H}) / S_{H} + 1 \]\[W_{out} = (W - K_{W} + 2 * P_{W}) / S_{W} + 1 \]- Parameters
input (Tensor) – the input Tensor.
kernel_h (int) – the height of the 2D pooling window: \(K_{H}\).
kernel_w (int) – the width of the 2D pooling window: \(K_{W}\).
stride_h (int) – the stride of the pooling along the height: \(S_{H}\).
stride_w (int) – the stride of the pooling along the width: \(S_{W}\).
padding_h (int) – the amount of implicit zero-paddings along the height: \(P_{H}\).
padding_w (int) – the amount of implicit zero-paddings along the width: \(P_{W}\).
activation (ActiMode) – Tyoe of pooling function to use. If you don’t specify anything, PoolType.POOL_MAX is applied.
activation – Activation function to use. Default is ActiMode.AC_MODE_NONE.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Dense
- class flexflow.core.flexflow_cffi.FFModel
- dense(input, out_dim, activation=ActiMode.AC_MODE_NONE, use_bias=True, datatype=DataType.DT_NONE, shared_op=None, kernel_initializer=None, bias_initializer=None, kernel_regularizer=None, name=None)
Dense implements the operation:
output = activation(dot(input, kernel) + bias)
whereactivation
is the element-wise activation function passed as the activation argument,kernel
is a weights matrix created by the layer, andbias
is a bias vector created by the layer (only applicable ifuse_bias
is True).The size of input tensor is \((N, C_{in})\) and the size of output tensor is \((N, C_{out})\), where \(C_{out} = out\_dim\)
- Parameters
input (Tensor) – the input Tensor.
out_dim (int) – dimensionality of the output space.
activation (ActiMode) – Activation function to use. Default is ActiMode.AC_MODE_NONE.
use_bias (bool) – whether the layer uses a bias vector. Default is True.
shared_op (Op) – the layer whose parameters are shared with. Default is None.
kernel_initializer (Initializer) – Initializer for the kernel weights matrix. If it is set to None, the GlorotUniformInitializer is applied.
bias_initializer (Regularizer) – Initializer for the bias vector. If it is set to None, the ZeroInitializer is applied.
kernel_regularizer – Regularizer for the kernel weights matrix
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Embedding
- class flexflow.core.flexflow_cffi.FFModel
- embedding(input, num_embeddings, embedding_dim, aggr, dtype=DataType.DT_FLOAT, shared_op=None, kernel_initializer=None, name=None)
Layer that turns positive integers into dense vectors of fixed size
- Parameters
input (Tensor) – the input Tensor.
num_embeddings (int) – size of the vocabulary, i.e. maximum integer index + 1
embedding_dim (int) – dimension of the dense embedding.
aggr (AggrMode) – aggregation mode. Options are AGGR_MODE_NONE, AGGR_MODE_SUM and AGGR_MODE_AVG.
dtype (DataType) – the tensor data type. Options are DT_BOOLEAN, DT_INT32, DT_INT64, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_INT4, DT_INT8, DT_NONE
shared_op (Op) – the layer whose parameters are shared with. Default is None.
kernel_initializer (Initializer) – Initializer for the kernel weights matrix. If it is set to None, the GlorotUniformInitializer is applied.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Transpose
- class flexflow.core.flexflow_cffi.FFModel
- transpose(input, perm, name=None)
Transposes the
input
tensor. Permutes the dimensions according to perm- Parameters
input (Tensor) – the input Tensor.
perm (List of int) – A permutation of the dimensions of a.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Reverse
- class flexflow.core.flexflow_cffi.FFModel
- reverse(input, axis, name=None)
Layer that reverses specific dimensions of a tensor.
Given a
input
tensor, this operation reverses the dimensionaxis
.- Parameters
input (Tensor) – the input Tensor.
axis (int) – the dimension to reverse.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Concatenate
- class flexflow.core.flexflow_cffi.FFModel
- concat(tensors, axis, name=None)
Layer that concatenates a list of inputs.
It takes as input a list of tensors, all of the same shape except for the concatenation axis, and returns a single tensor that is the concatenation of all inputs.
- Parameters
input (List of Tensors) – the list of input Tensors.
axis (int) – the dimension along which to concatenate.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Split
- class flexflow.core.flexflow_cffi.FFModel
- split(input, sizes, axis, name=None)
Layer that splits a
input
tensor into a list of tensors.- Parameters
input (Tensor) – the input Tensor.
sizes (int or list of int) – either an int indicating the number of splits along axis or a Python list containing the sizes of each output tensor along axis. If a scalar, then it must evenly divide
input.dims[axis]
; otherwise the sum of sizes along the split axis must match that of theinput
.axis (int) – the dimension along which to split.
name (string) – the name of the layer. Default is None.
- Returns
list of Tensors – the output tensors.
Reshape
- class flexflow.core.flexflow_cffi.FFModel
- reshape(input, shape, name=None)
Layer that reshapes inputs into the given shape.
Given a
input
tensor, this operation returns a output tensor that has the same values as tensor in the same order, except with a new shape given byshape
.- Parameters
input (Tensor) – the input Tensor.
shape (list of int) – A list defining the shape of the output tensor.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Flat
- class flexflow.core.flexflow_cffi.FFModel
- flat(input, name=None)
Flattens the input. Does not affect the batch size.
- Parameters
input (Tensor) – the input Tensor.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
BatchNorm
- class flexflow.core.flexflow_cffi.FFModel
- batch_norm(input, relu=True, name=None)
Layer that normalizes its inputs.
Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1.
- Parameters
input (Tensor) – the list of input Tensors.
relu (bool) – whether a ReLU function is applied. Default is True.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
BatchMatMul
- class flexflow.core.flexflow_cffi.FFModel
- batch_matmul(A, B, a_seq_length_dim=None, b_seq_length_dim=None, name=None)
Layer that applied batched matrix multiplication onto two input Tensors,
output = x * y
.- Parameters
A (Tensor) – the first input Tensor.
B (Tensor) – the second input Tensor.
a_seq_length_dim (int) – an int when set indicating the a_seq_length_dim dimention of A is a sequence_length dimension
b_seq_length_dim (int) – an int when set indicating the b_seq_length_dim dimention of B is a sequence_length dimension
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Add
- class flexflow.core.flexflow_cffi.FFModel
- add(x, y, inplace_a=False, name=None)
Layer that adds two input Tensors,
output = x + y
.- Parameters
x (Tensor) – the first input Tensor.
y (Tensor) – the second input Tensor.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Subtract
- class flexflow.core.flexflow_cffi.FFModel
- subtract(x, y, inplace_a=False, name=None)
Layer that subtracts two input Tensors,
output = x * y
.- Parameters
x (Tensor) – the first input Tensor.
y (Tensor) – the second input Tensor.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Multiply
- class flexflow.core.flexflow_cffi.FFModel
- multiply(x, y, inplace_a=False, name=None)
Layer that multiplies (element-wise) two input Tensors,
output = x * y
.- Parameters
x (Tensor) – the first input Tensor.
y (Tensor) – the second input Tensor.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Divide
- class flexflow.core.flexflow_cffi.FFModel
- divide(x, y, inplace_a=False, name=None)
Layer that divides (element-wise) two input Tensors,
output = x / y
.- Parameters
x (Tensor) – the first input Tensor.
y (Tensor) – the second input Tensor.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Exponential
- class flexflow.core.flexflow_cffi.FFModel
- exp(x, name=None)
Exponential activation function.
- Parameters
x (Tensor) – the input Tensor.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
ReLU
- class flexflow.core.flexflow_cffi.FFModel
- relu(input, inplace=True, name=None)
Rectified Linear Unit activation function.
- Parameters
input (Tensor) – the input Tensor.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
ELU
- class flexflow.core.flexflow_cffi.FFModel
- elu(input, inplace=True, name=None)
Exponential Linear Unit. activation function.
- Parameters
input (Tensor) – the input Tensor.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Sigmoid
- class flexflow.core.flexflow_cffi.FFModel
- sigmoid(input, name=None)
Sigmoid activation function, \(sigmoid(x) = 1 / (1 + exp(-x))\).
- Parameters
input (Tensor) – the input Tensor.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Tanh
- class flexflow.core.flexflow_cffi.FFModel
- tanh(input, name=None)
Hyperbolic tangent activation function.
- Parameters
input (Tensor) – the input Tensor.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Softmax
- class flexflow.core.flexflow_cffi.FFModel
- softmax(input, axis=- 1, name=None)
Softmax activation function.
- Parameters
input (Tensor) – the input Tensor.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
Dropout
- class flexflow.core.flexflow_cffi.FFModel
- dropout(input, rate, seed, name=None)
The Dropout layer randomly sets input units to 0 with a frequency of
rate
at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged.- Parameters
input (Tensor) – the input Tensor.
rate (float(0-1)) – Fraction of the input units to drop.
seed (int) – random seed.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.
MultiheadAttention
- class flexflow.core.flexflow_cffi.FFModel
- multihead_attention(query, key, value, embed_dim, num_heads, kdim=0, vdim=0, dropout=0.0, bias=True, add_bias_kv=False, add_zero_attn=False, kernel_initializer=None, name=None)
Defines the MultiHead Attention operation as described in Attention Is All You Need which takes in the tensors
query
,key
, andvalue
, and returns the dot-product attention between them:.- Parameters
query (Tensor) – the query Tensor.
key (Tensor) – the key Tensor.
value (Tensor) – the value Tensor.
embed_dim (int) – total dimension of the model
num_heads (int) – Number of attention heads.
kdim (int) – total number of features in key. Default is 0
vdim (int) – total number of features in value. Default is 0
dropout (float(0-1)) – a Dropout layer on attn_output_weights. Default is 0.0
bias (bool) – Whether the dense layers use bias vectors. Default is True.
add_bias_kv (bool) – add bias to the key and value sequences at dim=0. Default is False.
add_zero_attn (bool) – add a new batch of zeros to the key and value sequences at dim=1. Default is False.
kernel_initializer (Initializer) – Initializer for dense layer kernels. If it is set to None, the GlorotUniformInitializer is applied.
name (string) – the name of the layer. Default is None.
- Returns
Tensor – the output tensor.