habitat challenge rearrangement代码复现细节及踩坑实录

发布时间:2024年01月16日

具身智能移动操作

Habitat-Challenge是2022年Meta发起的具身智能挑战赛之一,主要是重拍任务。具体细节可以参见以下两篇论文:
1、Habitat 2.0: Training Home Assistants to Rearrange their Habitat,这篇论文中提出了任务细节,以及对应的Baseline方法MonolithicRL和TP-SRL,其中MonolithicRL是采用端到端RL的方法,TP-SRL是采用分层的方法,上层任务规划下层子技能;
对应github官网
2、Multi-skill mobile manipulation for object rearrangement,这篇论文是目前成功率最高的方法,后续简称M3;
对应gibhub官网
具身智能
具体实现细节参照论文后续只描述代码复现过程中遇到的一些坑,可能可以帮助后续学者节省时间。

环境安装:

1.安装habitat-sim:

如果直接采用官网给的conda install habitat-sim withbullet -c conda-forge -c aihabitat命令,很有可能由于网络问题导致配置失败。
有两种替代的安装方式:
方式一:直接去Habitat-sim Conda官网下载对应的包。
下载对应的安装包
方式二:可以直接下载对应的Habitat-sim包,采用如下命令安装:

cd habitat-sim
pip install -r requirements.txt
python setup.py install --bullet --headless 
cd ..

选择Habitat-sim时需要注意一是要与Habitat的版本相匹配。一般要选择withbullet版本,而headless参数取决于是否需要显示,如没有显示器可以安装headless的版本。最好根据github界面中对应的readme指示来,如withbullet和headless就要下载conda对应的版本。
安装细节

2.安装Habitat-lab

这里需要特别注意的是因为habitat-lab不是一个库,所以一个conda环境可能就对应了一个habitat-lab环境。直接在安装包里下载即可。

git clone --branch stable https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab
pip install -e habitat-lab  # install habitat_lab
或者
python -m pip install -e .

3.安装成功结果:

可以看到二者对应的版本其实是不一样的,
我这里hab-mm对应的是M3的conda环境,对应的habitat和habitat-sim版本都是0.2.1;
而在habitat对应的是habitat-challenge官方环境,对应的habitat和habitat-sim版本都是0.2.2;
habitat仿真器对于环境要求较为严格,因此如果不对应可能会出现意向不到的错误。
安装成功结果

habitat-challenge仿真踩坑

安装环境后可能出现的问题:

安装环境时可能出现的小问题:
OSError: /home/lu/.conda/envs/habitat/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11
需要在~/.bashrc文件里加上一句:

export LD_LIBRARY_PATH=/home/lu/.conda/envs/habitat/lib/python3.7/site-packages/nvidia/cublas/lib/:$LD_LIBRARY_PATH

命令一:执行MonolithicRL时:

执行命令:

#/bin/bash

export MAGNUM_LOG=quiet
export HABITAT_SIM_LOG=quiet

set -x
python habitat-lab/habitat_baselines/run.py \
    --exp-config configs/methods/ddppo_monolithic.yaml \
    --run-type train \
    BASE_TASK_CONFIG_PATH configs/tasks/rearrange.local.rgbd.yaml \
    TASK_CONFIG.DATASET.SPLIT 'train' \
    TASK_CONFIG.TASK.TASK_SPEC_BASE_PATH configs/pddl/ \
    TENSORBOARD_DIR tb \
    CHECKPOINT_FOLDER checkpoints \
    LOG_FILE train.log

问题一:提示Not a gzipped file:

报错
检查路径是否有问题:
因为对应了pointnav_dataset.py函数中,

datasetfile_path = config.DATA_PATH.format(split=config.SPLIT)
with gzip.open(datasetfile_path, "rt") as f:
    self.from_json(f.read(), scenes_dir=config.SCENES_DIR)

问题二:在训练过程中总报错EOFError:

Traceback (most recent call last):
  File "habitat-lab/habitat_baselines/run.py", line 81, in <module>
    main()
  File "habitat-lab/habitat_baselines/run.py", line 40, in main
    run_exp(**vars(args))
  File "habitat-lab/habitat_baselines/run.py", line 77, in run_exp
    execute_exp(config, run_type)
  File "habitat-lab/habitat_baselines/run.py", line 60, in execute_exp
    trainer.train()
  File "/home/lu/.conda/envs/habitat/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 715, in train
    self._init_train()
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 254, in _init_train
    self._init_envs()
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 204, in _init_envs
    workers_ignore_signals=is_slurm_batch_job(),
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat_baselines/common/construct_vector_env.py", line 97, in construct_envs
    workers_ignore_signals=workers_ignore_signals,
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 200, in __init__
    read_fn() for read_fn in self._connection_read_fns
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 200, in <listcomp>
    read_fn() for read_fn in self._connection_read_fns
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 103, in __call__
    res = self.read_fn()
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 68, in recv
    buf = self.recv_bytes()
  File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Exception ignored in: <function VectorEnv.__del__ at 0x7fafedb180e0>
Traceback (most recent call last):
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 584, in __del__
    self.close()
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 452, in close
    read_fn()
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 103, in __call__
    res = self.read_fn()
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 68, in recv
    buf = self.recv_bytes()
  File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError:

在Github上读到:
gpu问题
可能是由于GPU训练不了,可以修改:
habitat-challenge/habitat-lab/habitat_baselines/common/construct_vector_env.py文件
分析中的74行可以看到这里做了一个判断:

    if int(os.environ.get("HABITAT_ENV_DEBUG", 0)):
        logger.warn(
            "Using the debug Vector environment interface. Expect slower performance."
        )
        vector_env_cls = ThreadedVectorEnv
    else:
        vector_env_cls = VectorEnv
   envs = vector_env_cls(
        make_env_fn=make_gym_from_config,
        env_fn_args=tuple((c,) for c in configs),
        workers_ignore_signals=workers_ignore_signals,
    )

因为VectorEnv不是所有gpu都带得动,直接把vector_env_cls强行指定为ThreadedVectorEnv就好。

envs = ThreadedVectorEnv(
        make_env_fn=make_gym_from_config,
        env_fn_args=tuple((c,) for c in configs),
        workers_ignore_signals=workers_ignore_signals,
    )

具体原因可以看官网给出的解释:

Debugging an environment issue

Our vectorized environments are very fast, but they are not very verbose. When using VectorEnv some errors may be silenced, resulting in process hanging or multiprocessing errors that are hard to interpret. We recommend setting the environment variable HABITAT_ENV_DEBUG to 1 when debugging (export HABITAT_ENV_DEBUG=1) as this will use the slower, but more verbose ThreadedVectorEnv class. Do not forget to reset HABITAT_ENV_DEBUG (unset HABITAT_ENV_DEBUG) when you are done debugging since VectorEnv is much faster than ThreadedVectorEnv.
且可以看habitat.core.vector_env
仿真环境区别

命令二:分层强化学习代码(TP-SRL):

问题一:无法找到路径

执行命令该命令需要在habitat-lab文件夹下执行,否则需要修改对应的.yaml文件:

python habitat_baselines/run.py \
    --exp-config habitat-lab/habitat_baselines/config/rearrange/ddppo_open_cab.yaml \
    --run-type train \
    TENSORBOARD_DIR ../pick_tb/ \
    CHECKPOINT_FOLDER ../pick_checkpoints/ \
    LOG_FILE ../pick_train.log

因为它给的config都是相对路径
比如上面我要运行habitat-lab/habitat_baselines/config/rearrange/ddppo_open_cab.yaml文件我就需要修改BASE_TASK_CONFIG_PATH部分,将其修改为从habitat-challenge下运行的路径。其他yaml文件同理。
相对路径
如果直接在habitat-lab文件下执行也需要注意,需要创建一个执行数据的软链接,因为它会直接在该目录下找数据:

ln -s ../data data

问题二:AssertionError: Object attributes not uniquely matched to shortened handle.

这个问题是由于objects/ycb的版本导致的:

Traceback (most recent call last):
  File "habitat_baselines/run.py", line 81, in <module>
Process ForkServerProcess-26:
Traceback (most recent call last):
  File "/home/lu/.conda/envs/hab-mm/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/lu/.conda/envs/hab-mm/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lu/.conda/envs/hab-mm/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 262, in _worker_env
    observations = env.reset()
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/gym_env_episode_count_wrapper.py", line 50, in reset
    return self.env.reset(**kwargs)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/gym_env_obs_dict_wrapper.py", line 32, in reset
    return self.env.reset(**kwargs)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/utils/gym_adapter.py", line 287, in reset
    obs = self._env.reset()
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/environments.py", line 47, in reset
    observations = super().reset()
  File "/home/lu/.conda/envs/hab-mm/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 402, in reset
    return self._env.reset()
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 250, in reset
    self.reconfigure(self._config)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 336, in reconfigure
    self._sim.reconfigure(self._config.SIMULATOR)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/tasks/rearrange/rearrange_sim.py", line 223, in reconfigure
    self._add_objs(ep_info, should_add_objects)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/tasks/rearrange/rearrange_sim.py", line 409, in _add_objs
    ), f"Object attributes not uniquely matched to shortened handle. '{obj_handle}' matched to {matching_templates}. TODO: relative paths as handles should fix some duplicates. For now, try renaming objects to avoid collision."
AssertionError: Object attributes not uniquely matched to shortened handle. '005_tomato_soup_can.object_config.json' matched to {}. TODO: relative paths as handles should fix some duplicates. For now, try renaming objects to avoid collision.

在pick.yaml文件中:

ADDITIONAL_OBJECT_PATHS:
- "data/objects/ycb/configs/"

而存在两个ycb,ycb_1.1和ycb_1.2,其中ycb_1.1中没有configs的文件夹,在ycb_1.2中有。可以看到在data/versioned_data文件夹下有两个版本的ycb:
ycb
因此解决这个错误只需要链接正确的ycb到objects目录下:

cd objects
ln -s ../versioned_data/ycb_1.2 ycb

问题三:

这就是纯粹gpu带不起:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 7.77 GiB total capacity; 5.21 GiB already allocated; 191.38 MiB free; 5.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

可以试一试修改参数:
可以修改habitat_baselines/config/rearrange/ddppo_pick.yaml中的NUM_ENVIRONMENTS参数,原本是32改成了16可能可以训练。

M3仿真踩坑

M3中相对问题较少,基本上安装就能使用。

问题一:EOF问题

这个问题和Habitat-challenge中出现问题的原因如出一辙,几乎一样。只是在代码中需要修改的位置不一样。
需要修改mobile_manipulation/utils//env_utils.py中的文件:
直接把它原本的代码注释,换成vec_env_cls = ThreadedVectorEnv,强制指定环境为ThreadedVectorEnv即可。

#vec_env_cls = ThreadedVectorEnv if debug else VectorEnv
    vec_env_cls = ThreadedVectorEnv
    envs = vec_env_cls(
        make_env_fn=make_env_fn,
        env_fn_args=tuple(zip(configs, env_classes, [wrappers] * num_envs)),
        workers_ignore_signals=workers_ignore_signals,
        auto_reset_done=auto_reset_done,
    )

问题二:ycb的问题

Exception in thread Thread-26:
Traceback (most recent call last):
  File "/home/lu/.conda/envs/hab-mm/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/home/lu/.conda/envs/hab-mm/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lu/.conda/envs/hab-mm/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 262, in _worker_env
    observations = env.reset()
  File "/home/lu/.conda/envs/hab-mm/lib/python3.7/site-packages/gym/core.py", line 337, in reset
    return self.env.reset(**kwargs)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/env.py", line 34, in reset
    observations = super().reset()
  File "/home/lu/.conda/envs/hab-mm/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 405, in reset
    return self._env.reset()
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 253, in reset
    self.reconfigure(self._config)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 339, in reconfigure
    self._sim.reconfigure(self._config.SIMULATOR)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/sim.py", line 165, in reconfigure
    self._add_rigid_objects()
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/sim.py", line 190, in _add_rigid_objects
    obj.transformation = mn_utils.orthogonalize(T)
AttributeError: 'NoneType' object has no attribute 'transformation'

这里要特别注意M3采用的是ycb1.1而非habitat-challenge中的1.2,所以在跑M3的使用一定要用1.1的版本。否则会出现找不到数据的错误。

cd objects
rm ycb
ln -s ../versioned_data/ycb_1.1 ycb

问题三:下载数据集

下载benchmark数据。
可以参考datasets_download.py文件中有写对应文件的link和version。

突然出现错误:

python -m habitat_sim.utils.datasets_download --uids hab2_bench_assets --data-path <path to download folder>
(hab-mm) lu@lu:~/Desktop/embodied_ai/hab-mobile-manipulation$ python habitat_extensions/tasks/rearrange/play.py
pybullet build time: Sep 22 2020 00:55:20
Loaded /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/play.yaml
Merging /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/base.yaml into /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/play.yaml
Loaded /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/base.yaml
Merging /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/__base__.py into /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/base.yaml
Loaded /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/__base__.py
2023-09-20 17:46:41,099 Initializing dataset RearrangeDataset-v0
2023-09-20 17:46:41,917 initializing sim RearrangeSim-v0
Traceback (most recent call last):
  File "habitat_extensions/tasks/rearrange/play.py", line 271, in <module>
    main()
  File "habitat_extensions/tasks/rearrange/play.py", line 221, in main
    env: RearrangeRLEnv = env_cls(config)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/env.py", line 31, in __init__
    super().__init__(self._core_env_config, dataset=dataset)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 374, in __init__
    self._env = Env(config, dataset)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 105, in __init__
    id_sim=self._config.SIMULATOR.TYPE, config=self._config.SIMULATOR
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/sims/registration.py", line 19, in make_sim
    return _sim(**kwargs)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/sim.py", line 63, in __init__
    super().__init__(config)
  File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py", line 282, in __init__
    for path in self.habitat_config.ADDITIONAL_OBJECT_PATHS:
  File "/home/lu/.conda/envs/hab-mm/lib/python3.7/site-packages/yacs/config.py", line 141, in __getattr__
    raise AttributeError(name)
AttributeError: ADDITIONAL_OBJECT_PATHS

是因为版本问题,只能用它自带的版本,不能用habitat-challenge中的版本。

有其他问题欢迎一起交流学习!

文章来源:https://blog.csdn.net/qq_43650421/article/details/135636444
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。