mT5 Model
We mention a few prerequisites and tips for setting up.
We assume access to at least one GPU and an installation of Anaconda.
We assume PyTorch version 1.9.
Using PyTorch and FlexFlow concurrently requires a CPU version of PyTorch.
To install the CPU version of
torch(andtorchvision), run:conda install pytorch==1.9.0 torchvision==0.10.0 cpuonly -c pytorch
To install the CPU version of
torchfrom source, clone the repository, runexport USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1, rungit submodule sync; git submodule update --init --recursive, and runpython setup.py develop(orpython setup.py install).
We need an installation of the HuggingFace
transformersrepository.To install
transformers, run:conda install -c conda-forge transformers
To install
transformersfrom source, clone the repository, and runpython setup.py develop(orpython setup.py install).
To run PyTorch-FlexFlow examples, make sure to run
export FF_USE_CFFI=1to usecffiinstead ofpybind11.Additional notes:
You may need to update
huggingface_hubwith:conda update huggingface_hub
If you encounter
ImportError: Found an incompatible version of torch., try updating to a later version oftransformers.
mT5 in PyTorch
We present an example of training mT5 for the Sinhalese-English translation
task from
here,
reusing some code from
here. In
this section, we walk through the training script using PyTorch, and in the
next section, we walk through the training script using FlexFlow. The
corresponding code may be found in mt5_torch.py and mt5_ff.py,
respectively.
To download and uncompress the dataset, run:
cd examples/python/pytorch/mt5
wget https://object.pouta.csc.fi/Tatoeba-Challenge/eng-sin.tar
tar -xvf eng-sin.tar
gzip -d data/eng-sin/*.gz
This will create a directory data/ containing a single subdirectory
data/eng-sin/ containing test.id, test.src, test.trg, train.id,
train.src, and train.trg.
We extract, prepare, and save the data to .tsv by using
DataPreparer.data_to_tsv() – this creates two new files, data/train.tsv and
data/eval.tsv, and only needs to be done once. Then, we can train using those
.tsv files. A base implementation for this may be found in mt5_torch.py,
which saves the .tsv files, trains for some number of epochs, and outputs a
.csv containing the predicted and actual text on the evaluation data.
python examples/python/pytorch/mt5/mt5_torch.py
Note: Running mt5_torch.py requires a GPU-version of PyTorch.
mT5 in FlexFlow
Now, we examine how to write a similar training script using FlexFlow. To
begin, FlexFlow dataloaders expect the data to be passed in as numpy arrays
and to be already preprocessed so that batches may be directly given to the
model. In mt5_ff.py, data_to_numpy() converts the .tsv files to .npy,
and preprocess_train() performs the necessary preprocessing.
Note: data_to_numpy() takes a while to run.
Next, following the conventional FlexFlow terminology, we define a top-level task to train the mT5 model. The key steps are as follows (including some notable code snippets):
Define
ffconfig = FFConfig()andffmodel = FFModel(ffconfig)–ffmodelis the Python object for the FlexFlow modelDefine the PyTorch mT5 model:
model = MT5ForConditionalGeneration.from_pretrained("google/mt5-small")
Load the preprocessed training data from the
.npyfilesUse
ffmodel.create_tensor()for theinput_ids,attention_mask, anddecoder_input_ids– these are the input tensors to the modelConstruct a
PyTorchModel()object wrapping the PyTorch modelmodelto enable conversion to FlexFlow:hf_model = PyTorchModel( model, is_hf_model=True, batch_size=ffconfig.batch_size, seq_length=seq_length, )
We pass
is_hf_model=Truesince HuggingFace models require a specialsymbolic_trace()distinct from the native PyTorch one.seq_lengthis a tuple(encoder_seq_length, decoder_seq_length).
Convert the model to FlexFlow:
output_tensors = hf_model.to_ff(ffmodel, input_tensors)
Define the optimizer
ffoptimizerCompile the model:
ffmodel.compile( optimizer=ffoptimizer, loss_type=LossType.LOSS_SPARSE_CATEGORICAL_CROSSENTROPY, metrics=[ MetricsType.METRICS_ACCURACY, MetricsType.METRICS_SPARSE_CATEGORICAL_CROSSENTROPY, ], )
Create the dataloaders for the
input_ids,attention_mask,decoder_input_ids, andlabelsInitialize the model layers:
ffmodel.init_layers()
Train the model, passing the appropriate dataloaders into
fit():ffmodel.fit( x=[input_ids_dl, attention_mask_dl, decoder_ids_dl], y=labels_dl, batch_size=batch_size, epochs=epochs, )
A base implementation may be found in mt5_ff.py.
./python/flexflow_python examples/python/pytorch/mt5/mt5_ff.py -ll:py 1 -ll:gpu 1 -ll:fsize 14000 -ll:zsize 4096
Note: Running mt5_ff.py requires a CPU-version of PyTorch.