通过langchain调用Qwen/Qwen-1_8B-Chat模型时,对话过程中出现报错提示:
ERROR: object of type 'NoneType' has no len()
Traceback (most recent call last):
File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain/chains/base.py", line 385, in acall
raise e
File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain/chains/base.py", line 379, in acall
await self._acall(inputs, run_manager=run_manager)
File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain/chains/llm.py", line 275, in _acall
response = await self.agenerate([inputs], run_manager=run_manager)
File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain/chains/llm.py", line 142, in agenerate
return await self.llm.agenerate_prompt(
File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 506, in agenerate_prompt
return await self.agenerate(
File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 466, in agenerate
raise exceptions[0]
File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 569, in _agenerate_with_cache
return await self._agenerate(
File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_community/chat_models/openai.py", line 519, in _agenerate
return await agenerate_from_stream(stream_iter)
File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 85, in agenerate_from_stream
async for chunk in stream:
File "/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_community/chat_models/openai.py", line 490, in _astream
if len(chunk["choices"]) == 0:
TypeError: object of type 'NoneType' has no len()
很疑惑,其他LLM模型都能正常运行,唯独Qwen不行。
查了很多资料,众说纷纭,未解决。
于是仔细看报错信息,最后一行报错说 File “/root/anaconda3/envs/chatchat/lib/python3.10/site-packages/langchain_community/chat_models/openai.py”, line 490有问题,那就打开490行附近,看看源码:
if not isinstance(chunk, dict):
chunk = chunk.dict()
if len(chunk["choices"]) == 0:
continue
choice = chunk["choices"][0]
应该就是这个chunk里面没有choices导致的报错。
那我们把这个chunk打印一下,看看他里面有些什么,于是修改这个文件代码为:
if not isinstance(chunk, dict):
chunk = chunk.dict()
print(f'chunk:{chunk}')
if len(chunk["choices"]) == 0:
continue
choice = chunk["choices"][0]
再次运行,看到chunk的输出为:
chunk:{'id': None, 'choices': None, 'created': None, 'model': None, 'object': None, 'system_fingerprint': None, 'text': '**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**\n\n(FlashAttention only supports Ampere GPUs or newer.)', 'error_code': 50001}
终于看到真正的错误信息了:NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE:FlashAttention only supports Ampere GPUs or newer。
看样子真正出问题的点在flash-attention上。
翻看huggingface上通义千问的安装说明:
依赖项(Dependency)
运行Qwen-1.8B-Chat,请确保满足上述要求,再执行以下pip命令安装依赖库
pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed
另外,推荐安装flash-attention库(当前已支持flash attention 2),以实现更高的效率和更低的显存占用。
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
# 下方安装可选,安装可能比较缓慢。
# pip install csrc/layer_norm
# pip install csrc/rotary
按照文档,flash-attention是安装好了的,问题应该不是出在安装上面。
在qwenlm的issue上看到说要卸载flash-atten:https://github.com/QwenLM/Qwen/issues/438
然后在huggingface社区看到对这个问题的解释:https://huggingface.co/Qwen/Qwen-7B-Chat/discussions/37:
flash attention是一个用于加速模型训练推理的可选项,且仅适用于Turing、Ampere、Ada、Hopper架构的Nvidia GPU显卡(如H100、A100、RTX 3090、T4、RTX 2080),您可以在不安装flash attention的情况下正常使用模型进行推理。
再一核对我自己的GPU,了然了,原来是我的GPU不适用于flash attention!
所以,解决方案就是:
pip uninstall flash-atten