100% gpu memory usage

#2
by Sibgat-Ul - opened

Hello,

First of all thank you guys for making this repo and all of your hard works.
I was using this model to train a Seq2SeqTrainer and hitting the memory limit (3060 12gb), which was not the case for t5 and mt5. I have been using the same training args for all of the cases except the batch_size = 64/48/32 but for banglat5 I had to set the batch_size = 16,

Is there anyway to optimize the gpu memory usage?

Thank you,

BUET CSE NLP Group org

Possibly a tokenization issue since banglat5 has the exact same architecture as t5. In fact, banglat5 should have lower memory requirements because the banglat5 tokenizer creates a lower number of tokens than mt5 given the same Bangla text.

Check if you're using the right tokenizer (the one in this repo)

Maybe worthwhile to explicitly set max_length, truncation, and padding variables when calling the tokenizer.

Good Luck.

Sign up or log in to comment