conda create -n mask2former python=3.8
conda activate mask2former
在pytorch官网,找到对应版本pytorch
# CUDA 11.3
conda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=11.3 -c pytorch -c conda-forge
通过这个命令安装的pytorch在后续使用时出现了问题(后面会讲到),于是后来我换了安装命令,解决了问题,建议直接用下面的命令安装:
# CUDA 11.3 pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -U opencv-python
如果这些语句下载不下来就直接去网址下载
# under your working directory
git clone git@github.com:facebookresearch/detectron2.git
cd detectron2 pip install -e .
pip install git+https://github.com/cocodataset/panopticapi.git
pip install git+https://github.com/mcordts/cityscapesScripts.git
cd ..
git clone git@github.com:facebookresearch/Mask2Former.git
cd Mask2Former
pip install -r requirements.txt
cd mask2former/modeling/pixel_decoder/ops
sh make.sh
数据集文件夹内容如下
ADEChallengeData2016/
images/
annotations/
objectInfo150.txt
# 1、下载 instance annotation
annotations_instance/
# 2、下面内容由 prepare_ade20k_sem_seg.py 生成
annotations_detectron2/
# 3、下面内容由 prepare_ade20k_pan_seg.py 生成
ade20k_panoptic_{train,val}.json
ade20k_panoptic_{train,val}/
# 4、下面内容由 prepare_ade20k_ins_seg.py 生成
ade20k_instance_{train,val}.json
根据以上步骤依次生成数据集所需文件,另外由于我把数据集放在了项目文件夹外,所以在各种py文件中需要修改路径
下载 instance annotation 可以从?MIT Scene Parsing Benchmark,也可以用命令下载
wget http://sceneparsing.csail.mit.edu/data/ChallengeData2017/annotations_instance.tar
然后,运行将语义和实例注释组合为全景注释。
python-datasets/prepare_ade20k_pan_seg.py
并运行提取实例annota
python datasets/prepare_ade20k_ins_seg.py
多gpu训练:
python train_net.py --num-gpus 2 --config-file configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml
以ADE20K数据集为例:
数据集路径在/home/dell/liyan/Mask2Former-main/mask2former/data/datasets/相对应的文件中,最后两行可以设置数据集路径
1、执行 sh make.sh 后出现问题:
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.3' Traceback (most recent call last): File "setup.py", line 76, in <module> ext_modules=get_extensions(), File "setup.py", line 54, in get_extensions raise NotImplementedError('No CUDA runtime is found. Please set FORCE_CUDA=1 or test it by running torch.cuda.is_available().')
在 .bashrc 文件中添加
export FORCE_CUDA="1"
然后运行 sh make.sh 后出现
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.3' running build running build_py running build_ext building 'MultiScaleDeformableAttention' extension Traceback (most recent call last): File "setup.py", line 69, in <module> setup( File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/__init__.py", line 103, in setup return distutils.core.setup(**attrs) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 131, in run self.run_command(cmd_name) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 88, in run _build_ext.run(self) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run self.build_extensions() File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 709, in build_extensions build_ext.build_extensions(self) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions self._build_extensions_serial() File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial self.build_extension(ext) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 249, in build_extension _build_ext.build_extension(self, ext) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/Cython/Distutils/build_ext.py", line 135, in build_extension super(build_ext, self).build_extension(ext) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension objects = self.compiler.compile( File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 525, in unix_wrap_ninja_compile cuda_post_cflags = unix_cuda_flags(cuda_post_cflags) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 424, in unix_cuda_flags cflags + _get_cuda_arch_flags(cflags)) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1562, in _get_cuda_arch_flags arch_list[-1] += '+PTX' IndexError: list index out of range
检查CUDA是否可用:你可以在终端中运行以下Python命令来测试CUDA在你的系统上是否可用:
import torch print(torch.cuda.is_available())
这将使用PyTorch来检查CUDA是否可用。如果返回
True,表示CUDA已在你的Python环境中安装并可用。如果返回
False,则CUDA可能未正确安装。
换了一个安装渠道:
# CUDA 11.3 pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
安装成功了,那啥也显示的ture
2、运行时发现PIL库有问题
conda install pillow
解决了
3、训练时报错
AttributeError: module 'numpy' has no attribute 'typeDict'
解决:降低numpy版本到1.21后出现问题
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
解决:numpy版本升高到1.22后出现错误:
ImportError: numpy.core.multiarray failed to import
conda install numpy==1.23
4、上一个问题解决后
ImportError: /home/abc/liyan/detectron2-main/detectron2/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7reshapeEN3c108ArrayRefIlEE
解决:在detactron2-main文件夹下打开终端,进入虚拟环境,删除build文件,重新安装
rm -r build pip install -e .
5、训练时报错
FileNotFoundError: [Errno 2] No such file or directory: 'datasets/ADEChallengeData2016/ade20k_instance_train.json'
原因是数据集路径不对,修改 Mask2Former-main/mask2former/data/datasets 路径下的py文件中的路径,改成绝对路径问题解决
6、上一个问题解决后出现新问题
File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 240, in __init__ assert prefetch_factor > 0 TypeError: '>' not supported between instances of 'NoneType' and 'int'
原因:None 和 int 不能做大小比较,通过print出prefetch_factor的值发现是None,有人说是detectron2安装和torch版本之间的错误,在detectron2的github上有人提问这个问题,他们的解决方法是安装 pytorch 2.1.0 ,但是由于我的cuda版本太低,装不了这么高版本的pytorch,然后我在detectron2-main文件夹中寻找prefetch_factor,发现/detectron2-main/detectron2/data文件夹下的build.py文件中将prefetch_factor设置成了None,于是我把prefetch_factor的值改为2,再次进行训练,这个问题消失,但是消失不代表解决,至于真的解决没有,之后有待考证。
7、上个问题解决后,出现新问题:
dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats) Could not load library libcudnn_cnn_train.so.8. Error: /home/abc/anaconda3/envs/mask2former/bin/../lib/libcudnn_ops_train.so.8: undefined symbol: _Z20traceback_iretf_implPKcRKN5cudnn16InternalStatus_tEb, version libcudnn_ops_infer.so.8 Please make sure libcudnn_cnn_train.so.8 is in your library path! 已放弃 (核心已转储)
重新建立软连接
在文件中搜索libcudnn_cnn_train.so.8结果发现在两个路径中存在,一个是anaconda虚拟环境中,一个是/usr/,然后发现,在anaconda中链接的是8.9.1,在usr中链接的是8.2.0,本机中cudnn的版本是8.2.0,所以我觉得,在anaconda环境嗯中的链接应该是链接到8.2.0版本,这两个链接修改之后不报错了,不报错不代表没有错,后续出现问题在解决。
8、
ERROR [11/03 14:48:57 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/home/abc/.local/lib/python3.8/site-packages/tensorboard/compat/__init__.py", line 47, in tf from tensorboard.compat import notf # pylint: disable=g-import-not-at-top ImportError: cannot import name 'notf' from 'tensorboard.compat' (/home/abc/.local/lib/python3.8/site-packages/tensorboard/compat/__init__.py)
这个错误后面还有一个别的错误,应该是缺了一个什么库,安装好了之后,这个错也消失了,所以具体解决方法不详
9、
File "/home/abc/.local/lib/python3.8/site-packages/scipy/optimize/_hungarian.py", line 93, in linear_sum_assignment raise ValueError("matrix contains invalid numeric entries") ValueError: matrix contains invalid numeric entries
待更新