mT5 Model
We mention a few prerequisites and tips for setting up.
We assume access to at least one GPU and an installation of Anaconda.
We assume PyTorch version 1.9.
Using PyTorch and FlexFlow concurrently requires a CPU version of PyTorch.
To install the CPU version of
torch
(andtorchvision
), run:conda install pytorch==1.9.0 torchvision==0.10.0 cpuonly -c pytorch
To install the CPU version of
torch
from source, clone the repository, runexport USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1
, rungit submodule sync; git submodule update --init --recursive
, and runpython setup.py develop
(orpython setup.py install
).
We need an installation of the HuggingFace
transformers
repository.To install
transformers
, run:conda install -c conda-forge transformers
To install
transformers
from source, clone the repository, and runpython setup.py develop
(orpython setup.py install
).
To run PyTorch-FlexFlow examples, make sure to run
export FF_USE_CFFI=1
to usecffi
instead ofpybind11
.Additional notes:
You may need to update
huggingface_hub
with:conda update huggingface_hub
If you encounter
ImportError: Found an incompatible version of torch.
, try updating to a later version oftransformers
.
mT5 in PyTorch
We present an example of training mT5 for the Sinhalese-English translation
task from
here,
reusing some code from
here. In
this section, we walk through the training script using PyTorch, and in the
next section, we walk through the training script using FlexFlow. The
corresponding code may be found in mt5_torch.py
and mt5_ff.py
,
respectively.
To download and uncompress the dataset, run:
cd examples/python/pytorch/mt5
wget https://object.pouta.csc.fi/Tatoeba-Challenge/eng-sin.tar
tar -xvf eng-sin.tar
gzip -d data/eng-sin/*.gz
This will create a directory data/
containing a single subdirectory
data/eng-sin/
containing test.id
, test.src
, test.trg
, train.id
,
train.src
, and train.trg
.
We extract, prepare, and save the data to .tsv
by using
DataPreparer.data_to_tsv()
– this creates two new files, data/train.tsv
and
data/eval.tsv
, and only needs to be done once. Then, we can train using those
.tsv
files. A base implementation for this may be found in mt5_torch.py
,
which saves the .tsv
files, trains for some number of epochs, and outputs a
.csv
containing the predicted and actual text on the evaluation data.
python examples/python/pytorch/mt5/mt5_torch.py
Note: Running mt5_torch.py
requires a GPU-version of PyTorch.
mT5 in FlexFlow
Now, we examine how to write a similar training script using FlexFlow. To
begin, FlexFlow dataloaders expect the data to be passed in as numpy
arrays
and to be already preprocessed so that batches may be directly given to the
model. In mt5_ff.py
, data_to_numpy()
converts the .tsv
files to .npy
,
and preprocess_train()
performs the necessary preprocessing.
Note: data_to_numpy()
takes a while to run.
Next, following the conventional FlexFlow terminology, we define a top-level task to train the mT5 model. The key steps are as follows (including some notable code snippets):
Define
ffconfig = FFConfig()
andffmodel = FFModel(ffconfig)
–ffmodel
is the Python object for the FlexFlow modelDefine the PyTorch mT5 model:
model = MT5ForConditionalGeneration.from_pretrained("google/mt5-small")
Load the preprocessed training data from the
.npy
filesUse
ffmodel.create_tensor()
for theinput_ids
,attention_mask
, anddecoder_input_ids
– these are the input tensors to the modelConstruct a
PyTorchModel()
object wrapping the PyTorch modelmodel
to enable conversion to FlexFlow:hf_model = PyTorchModel( model, is_hf_model=True, batch_size=ffconfig.batch_size, seq_length=seq_length, )
We pass
is_hf_model=True
since HuggingFace models require a specialsymbolic_trace()
distinct from the native PyTorch one.seq_length
is a tuple(encoder_seq_length, decoder_seq_length)
.
Convert the model to FlexFlow:
output_tensors = hf_model.to_ff(ffmodel, input_tensors)
Define the optimizer
ffoptimizer
Compile the model:
ffmodel.compile( optimizer=ffoptimizer, loss_type=LossType.LOSS_SPARSE_CATEGORICAL_CROSSENTROPY, metrics=[ MetricsType.METRICS_ACCURACY, MetricsType.METRICS_SPARSE_CATEGORICAL_CROSSENTROPY, ], )
Create the dataloaders for the
input_ids
,attention_mask
,decoder_input_ids
, andlabels
Initialize the model layers:
ffmodel.init_layers()
Train the model, passing the appropriate dataloaders into
fit()
:ffmodel.fit( x=[input_ids_dl, attention_mask_dl, decoder_ids_dl], y=labels_dl, batch_size=batch_size, epochs=epochs, )
A base implementation may be found in mt5_ff.py
.
./python/flexflow_python examples/python/pytorch/mt5/mt5_ff.py -ll:py 1 -ll:gpu 1 -ll:fsize 14000 -ll:zsize 4096
Note: Running mt5_ff.py
requires a CPU-version of PyTorch.