Better Initialization: pre-trained EVA weights to initializa the images encoder of EVA-CLIP
Optimizer: LAMB, 为large-batch training设计的优化器。 its adaptive elementwise updating and layer-wise learning rates enhance training efficiency and accelerate convergence rates
FLIP: we randomly mask 50% image tokens during training esulting in a significant reduction of time complexity by half.