本地安装Remote-SSH、python
远程服务器上安装Python
code --install-extension ms-python.python-2022.9.11681004.vsix
即可之后就要run->add configuration->
launch.json如下:
{
"version": "0.2",
"configurations": [
{
"name": "Python: Launch",
"type": "python",
"request": "launch",
"program": "${workspaceFolder}/CLIP4Clip/main_task_retrieval.py",
"args": [
"--do_train",
"--num_thread_reader=0",
"--epochs=5",
"--batch_size=128",
"--n_display=50",
"--train_csv",
"${env:DATA_PATH}/MSRVTT_train.9k.csv",
"--val_csv",
"${env:DATA_PATH}/MSRVTT_JSFUSION_test.csv",
"--data_path",
"${env:DATA_PATH}/MSRVTT_data.json",
"--features_path",
"${env:DATA_PATH}/MSRVTT_Videos",
"--output_dir",
"ckpts/ckpt_msrvtt_retrieval_looseType",
"--lr",
"1e-4",
"--max_words",
"32",
"--max_frames",
"12",
"--batch_size_val",
"16",
"--datatype",
"msrvtt",
"--expand_msrvtt_sentences",
"--feature_framerate",
"1",
"--coef_lr",
"1e-3",
"--freeze_layer_num",
"0",
"--slice_framepos",
"2",
"--loose_type",
"--linear_patch",
"2d",
"--sim_header",
"meanP",
"--pretrained_clip_name",
"ViT-B/32"
],
"env": {
"DATA_PATH": "/mnt/cloud_disk/wf/msrvtt_data"
},
"console": "integratedTerminal"
}
]
}
之后出现一个问题就是目前引用env变量在命令行中显示为空,目前不能用这个方式引用所以还得用笨方法,就是挨个复制粘贴。
并且python -m要变成module词段,module与program冲突,需要调整:
{
"version": "0.2",
"configurations": [
{
"name": "Python: Launch",
"type": "python",
"request": "launch",
"module": "torch.distributed.launch",
"args": [
"${workspaceFolder}/CLIP4Clip/main_task_retrieval.py",
"--do_train",
"--num_thread_reader=0",
"--epochs=5",
"--batch_size=128",
"--n_display=50",
"--train_csv",
"/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_train.9k.csv",
"--val_csv",
"/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_JSFUSION_test.csv",
"--data_path",
"/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_data.json",
"--features_path",
"/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_Videos",
"--output_dir",
"ckpts/ckpt_msrvtt_retrieval_looseType",
"--lr",
"1e-4",
"--max_words",
"32",
"--max_frames",
"12",
"--batch_size_val",
"16",
"--datatype",
"msrvtt",
"--expand_msrvtt_sentences",
"--feature_framerate",
"1",
"--coef_lr",
"1e-3",
"--freeze_layer_num",
"0",
"--slice_framepos",
"2",
"--loose_type",
"--linear_patch",
"2d",
"--sim_header",
"meanP",
"--pretrained_clip_name",
"ViT-B/32"
],
"console": "integratedTerminal"
}
]
}
之后设置断点调试之后发现这个问题:
挨个语句调试之后发现出现在某个加载模型的地方,模型的位置防止错误了,远程调试真的好用,可以清晰看到过程的调用栈call stack