Habitat-Challenge是2022年Meta发起的具身智能挑战赛之一,主要是重拍任务。具体细节可以参见以下两篇论文:
1、Habitat 2.0: Training Home Assistants to Rearrange their Habitat,这篇论文中提出了任务细节,以及对应的Baseline方法MonolithicRL和TP-SRL,其中MonolithicRL是采用端到端RL的方法,TP-SRL是采用分层的方法,上层任务规划下层子技能;
对应github官网
2、Multi-skill mobile manipulation for object rearrangement,这篇论文是目前成功率最高的方法,后续简称M3;
对应gibhub官网
具体实现细节参照论文后续只描述代码复现过程中遇到的一些坑,可能可以帮助后续学者节省时间。
如果直接采用官网给的conda install habitat-sim withbullet -c conda-forge -c aihabitat命令,很有可能由于网络问题导致配置失败。
有两种替代的安装方式:
方式一:直接去Habitat-sim Conda官网下载对应的包。
方式二:可以直接下载对应的Habitat-sim包,采用如下命令安装:
cd habitat-sim
pip install -r requirements.txt
python setup.py install --bullet --headless
cd ..
选择Habitat-sim时需要注意一是要与Habitat的版本相匹配。一般要选择withbullet版本,而headless参数取决于是否需要显示,如没有显示器可以安装headless的版本。最好根据github界面中对应的readme指示来,如withbullet和headless就要下载conda对应的版本。
这里需要特别注意的是因为habitat-lab不是一个库,所以一个conda环境可能就对应了一个habitat-lab环境。直接在安装包里下载即可。
git clone --branch stable https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab
pip install -e habitat-lab # install habitat_lab
或者
python -m pip install -e .
可以看到二者对应的版本其实是不一样的,
我这里hab-mm对应的是M3的conda环境,对应的habitat和habitat-sim版本都是0.2.1;
而在habitat对应的是habitat-challenge官方环境,对应的habitat和habitat-sim版本都是0.2.2;
habitat仿真器对于环境要求较为严格,因此如果不对应可能会出现意向不到的错误。
安装环境时可能出现的小问题:
OSError: /home/lu/.conda/envs/habitat/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11
需要在~/.bashrc文件里加上一句:
export LD_LIBRARY_PATH=/home/lu/.conda/envs/habitat/lib/python3.7/site-packages/nvidia/cublas/lib/:$LD_LIBRARY_PATH
执行命令:
#/bin/bash
export MAGNUM_LOG=quiet
export HABITAT_SIM_LOG=quiet
set -x
python habitat-lab/habitat_baselines/run.py \
--exp-config configs/methods/ddppo_monolithic.yaml \
--run-type train \
BASE_TASK_CONFIG_PATH configs/tasks/rearrange.local.rgbd.yaml \
TASK_CONFIG.DATASET.SPLIT 'train' \
TASK_CONFIG.TASK.TASK_SPEC_BASE_PATH configs/pddl/ \
TENSORBOARD_DIR tb \
CHECKPOINT_FOLDER checkpoints \
LOG_FILE train.log
检查路径是否有问题:
因为对应了pointnav_dataset.py函数中,
datasetfile_path = config.DATA_PATH.format(split=config.SPLIT)
with gzip.open(datasetfile_path, "rt") as f:
self.from_json(f.read(), scenes_dir=config.SCENES_DIR)
Traceback (most recent call last):
File "habitat-lab/habitat_baselines/run.py", line 81, in <module>
main()
File "habitat-lab/habitat_baselines/run.py", line 40, in main
run_exp(**vars(args))
File "habitat-lab/habitat_baselines/run.py", line 77, in run_exp
execute_exp(config, run_type)
File "habitat-lab/habitat_baselines/run.py", line 60, in execute_exp
trainer.train()
File "/home/lu/.conda/envs/habitat/lib/python3.7/contextlib.py", line 74, in inner
return func(*args, **kwds)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 715, in train
self._init_train()
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 254, in _init_train
self._init_envs()
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 204, in _init_envs
workers_ignore_signals=is_slurm_batch_job(),
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat_baselines/common/construct_vector_env.py", line 97, in construct_envs
workers_ignore_signals=workers_ignore_signals,
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 200, in __init__
read_fn() for read_fn in self._connection_read_fns
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 200, in <listcomp>
read_fn() for read_fn in self._connection_read_fns
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 103, in __call__
res = self.read_fn()
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 68, in recv
buf = self.recv_bytes()
File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Exception ignored in: <function VectorEnv.__del__ at 0x7fafedb180e0>
Traceback (most recent call last):
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 584, in __del__
self.close()
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 452, in close
read_fn()
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 103, in __call__
res = self.read_fn()
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 68, in recv
buf = self.recv_bytes()
File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError:
在Github上读到:
可能是由于GPU训练不了,可以修改:
habitat-challenge/habitat-lab/habitat_baselines/common/construct_vector_env.py文件
分析中的74行可以看到这里做了一个判断:
if int(os.environ.get("HABITAT_ENV_DEBUG", 0)):
logger.warn(
"Using the debug Vector environment interface. Expect slower performance."
)
vector_env_cls = ThreadedVectorEnv
else:
vector_env_cls = VectorEnv
envs = vector_env_cls(
make_env_fn=make_gym_from_config,
env_fn_args=tuple((c,) for c in configs),
workers_ignore_signals=workers_ignore_signals,
)
因为VectorEnv不是所有gpu都带得动,直接把vector_env_cls强行指定为ThreadedVectorEnv就好。
envs = ThreadedVectorEnv(
make_env_fn=make_gym_from_config,
env_fn_args=tuple((c,) for c in configs),
workers_ignore_signals=workers_ignore_signals,
)
具体原因可以看官网给出的解释:
Our vectorized environments are very fast, but they are not very verbose. When using VectorEnv
some errors may be silenced, resulting in process hanging or multiprocessing errors that are hard to interpret. We recommend setting the environment variable HABITAT_ENV_DEBUG
to 1 when debugging (export HABITAT_ENV_DEBUG=1
) as this will use the slower, but more verbose ThreadedVectorEnv
class. Do not forget to reset HABITAT_ENV_DEBUG
(unset HABITAT_ENV_DEBUG
) when you are done debugging since VectorEnv
is much faster than ThreadedVectorEnv
.
且可以看habitat.core.vector_env:
执行命令该命令需要在habitat-lab文件夹下执行,否则需要修改对应的.yaml文件:
python habitat_baselines/run.py \
--exp-config habitat-lab/habitat_baselines/config/rearrange/ddppo_open_cab.yaml \
--run-type train \
TENSORBOARD_DIR ../pick_tb/ \
CHECKPOINT_FOLDER ../pick_checkpoints/ \
LOG_FILE ../pick_train.log
因为它给的config都是相对路径
比如上面我要运行habitat-lab/habitat_baselines/config/rearrange/ddppo_open_cab.yaml文件我就需要修改BASE_TASK_CONFIG_PATH部分,将其修改为从habitat-challenge下运行的路径。其他yaml文件同理。
如果直接在habitat-lab文件下执行也需要注意,需要创建一个执行数据的软链接,因为它会直接在该目录下找数据:
ln -s ../data data
这个问题是由于objects/ycb的版本导致的:
Traceback (most recent call last):
File "habitat_baselines/run.py", line 81, in <module>
Process ForkServerProcess-26:
Traceback (most recent call last):
File "/home/lu/.conda/envs/hab-mm/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/lu/.conda/envs/hab-mm/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/lu/.conda/envs/hab-mm/lib/python3.7/contextlib.py", line 74, in inner
return func(*args, **kwds)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 262, in _worker_env
observations = env.reset()
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/gym_env_episode_count_wrapper.py", line 50, in reset
return self.env.reset(**kwargs)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/gym_env_obs_dict_wrapper.py", line 32, in reset
return self.env.reset(**kwargs)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/utils/gym_adapter.py", line 287, in reset
obs = self._env.reset()
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/environments.py", line 47, in reset
observations = super().reset()
File "/home/lu/.conda/envs/hab-mm/lib/python3.7/contextlib.py", line 74, in inner
return func(*args, **kwds)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 402, in reset
return self._env.reset()
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 250, in reset
self.reconfigure(self._config)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 336, in reconfigure
self._sim.reconfigure(self._config.SIMULATOR)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/tasks/rearrange/rearrange_sim.py", line 223, in reconfigure
self._add_objs(ep_info, should_add_objects)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/tasks/rearrange/rearrange_sim.py", line 409, in _add_objs
), f"Object attributes not uniquely matched to shortened handle. '{obj_handle}' matched to {matching_templates}. TODO: relative paths as handles should fix some duplicates. For now, try renaming objects to avoid collision."
AssertionError: Object attributes not uniquely matched to shortened handle. '005_tomato_soup_can.object_config.json' matched to {}. TODO: relative paths as handles should fix some duplicates. For now, try renaming objects to avoid collision.
在pick.yaml文件中:
ADDITIONAL_OBJECT_PATHS:
- "data/objects/ycb/configs/"
而存在两个ycb,ycb_1.1和ycb_1.2,其中ycb_1.1中没有configs的文件夹,在ycb_1.2中有。可以看到在data/versioned_data文件夹下有两个版本的ycb:
因此解决这个错误只需要链接正确的ycb到objects目录下:
cd objects
ln -s ../versioned_data/ycb_1.2 ycb
这就是纯粹gpu带不起:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 7.77 GiB total capacity; 5.21 GiB already allocated; 191.38 MiB free; 5.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
可以试一试修改参数:
可以修改habitat_baselines/config/rearrange/ddppo_pick.yaml中的NUM_ENVIRONMENTS参数,原本是32改成了16可能可以训练。
M3中相对问题较少,基本上安装就能使用。
这个问题和Habitat-challenge中出现问题的原因如出一辙,几乎一样。只是在代码中需要修改的位置不一样。
需要修改mobile_manipulation/utils//env_utils.py中的文件:
直接把它原本的代码注释,换成vec_env_cls = ThreadedVectorEnv,强制指定环境为ThreadedVectorEnv即可。
#vec_env_cls = ThreadedVectorEnv if debug else VectorEnv
vec_env_cls = ThreadedVectorEnv
envs = vec_env_cls(
make_env_fn=make_env_fn,
env_fn_args=tuple(zip(configs, env_classes, [wrappers] * num_envs)),
workers_ignore_signals=workers_ignore_signals,
auto_reset_done=auto_reset_done,
)
Exception in thread Thread-26:
Traceback (most recent call last):
File "/home/lu/.conda/envs/hab-mm/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/home/lu/.conda/envs/hab-mm/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/lu/.conda/envs/hab-mm/lib/python3.7/contextlib.py", line 74, in inner
return func(*args, **kwds)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 262, in _worker_env
observations = env.reset()
File "/home/lu/.conda/envs/hab-mm/lib/python3.7/site-packages/gym/core.py", line 337, in reset
return self.env.reset(**kwargs)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/env.py", line 34, in reset
observations = super().reset()
File "/home/lu/.conda/envs/hab-mm/lib/python3.7/contextlib.py", line 74, in inner
return func(*args, **kwds)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 405, in reset
return self._env.reset()
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 253, in reset
self.reconfigure(self._config)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 339, in reconfigure
self._sim.reconfigure(self._config.SIMULATOR)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/sim.py", line 165, in reconfigure
self._add_rigid_objects()
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/sim.py", line 190, in _add_rigid_objects
obj.transformation = mn_utils.orthogonalize(T)
AttributeError: 'NoneType' object has no attribute 'transformation'
这里要特别注意M3采用的是ycb1.1而非habitat-challenge中的1.2,所以在跑M3的使用一定要用1.1的版本。否则会出现找不到数据的错误。
cd objects
rm ycb
ln -s ../versioned_data/ycb_1.1 ycb
下载benchmark数据。
可以参考datasets_download.py文件中有写对应文件的link和version。
突然出现错误:
python -m habitat_sim.utils.datasets_download --uids hab2_bench_assets --data-path <path to download folder>
(hab-mm) lu@lu:~/Desktop/embodied_ai/hab-mobile-manipulation$ python habitat_extensions/tasks/rearrange/play.py
pybullet build time: Sep 22 2020 00:55:20
Loaded /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/play.yaml
Merging /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/base.yaml into /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/play.yaml
Loaded /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/base.yaml
Merging /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/__base__.py into /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/base.yaml
Loaded /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/__base__.py
2023-09-20 17:46:41,099 Initializing dataset RearrangeDataset-v0
2023-09-20 17:46:41,917 initializing sim RearrangeSim-v0
Traceback (most recent call last):
File "habitat_extensions/tasks/rearrange/play.py", line 271, in <module>
main()
File "habitat_extensions/tasks/rearrange/play.py", line 221, in main
env: RearrangeRLEnv = env_cls(config)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/env.py", line 31, in __init__
super().__init__(self._core_env_config, dataset=dataset)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 374, in __init__
self._env = Env(config, dataset)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 105, in __init__
id_sim=self._config.SIMULATOR.TYPE, config=self._config.SIMULATOR
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/sims/registration.py", line 19, in make_sim
return _sim(**kwargs)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/sim.py", line 63, in __init__
super().__init__(config)
File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py", line 282, in __init__
for path in self.habitat_config.ADDITIONAL_OBJECT_PATHS:
File "/home/lu/.conda/envs/hab-mm/lib/python3.7/site-packages/yacs/config.py", line 141, in __getattr__
raise AttributeError(name)
AttributeError: ADDITIONAL_OBJECT_PATHS
是因为版本问题,只能用它自带的版本,不能用habitat-challenge中的版本。
有其他问题欢迎一起交流学习!