我使用huggingface的Trainer,利用Lora微调Llama2模型,在我设置了compute_metrics属性后,出现Out of memory
trainer=transformers.Trainer(
model=model,
args=train_args,
train_dataset=train_data,
eval_dataset=test_data,
data_collator=data_collator,
compute_metrics=compute_metrics
)
huggingface在设定了compute_metrics后,会把测试集上所有数据的模型输出(例如logits等)都cat成一个张量,而这个过程是在GPU完成的,最后才会把这些巨大无比的张量放到cpu上,很多情况下还没到转移到cpu那一步,就已经爆显存了
(1)在TrainingArguments中设置eval_accumulation_steps,它代表多久一次将tensor搬到cpu,官方的文档是这样说的:
eval_accumulation_steps (
int
, optional) — Number of predictions steps to accumulate the output tensors for, before moving the results to the CPU. If left unset, the whole predictions are accumulated on GPU/NPU/TPU before being moved to the CPU (faster but requires more memory).
?(2)在Trainer中设置preprocess_logits_for_metrics方法,它代表你要在每一个eval step后怎么处理这些张量,如果你并不需要所有的logits(例如我只想知道它到底属于哪一类),那么你可以在这个方法中定义,从而减小合并的时候占用的显存,官方的文档是这样说的:
preprocess_logits_for_metrics (
Callable[[torch.Tensor, torch.Tensor], torch.Tensor]
, optional) — A function that preprocess the logits right before caching them at each evaluation step. Must take two tensors, the logits and the labels, and return the logits once processed as desired. The modifications made by this function will be reflected in the predictions received bycompute_metrics
.
?本文的内容借鉴了https://discuss.huggingface.co/t/cuda-out-of-memory-when-using-trainer-with-compute-metrics/2941