admin管理员组

文章数量:1028861

ascend cann镜像构建失败, 报错"ImportError: libascend

因为cann版本不匹配, vllm运行失败, 所以需要从头开始装cann.

在安装到deepspeed时, 报错"ImportError: libascend_hal.so: cannot open shared object file: No such file or directory".

代码语言:shell复制
#30 [26/36] RUN source  ~/.bashrc && pip install  deepspeed==0.16.7
#30 0.150 /root/custom.bashrc
#30 1.862 Now using node v12.18.3 (npm v6.14.6)
#30 6.706 Looking in indexes: /
#30 6.811 Collecting deepspeed==0.16.7
#30 6.841   Downloading .16.7.tar.gz (1.5 MB)
#30 6.933      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 17.4 MB/s eta 0:00:00
#30 7.824   Preparing metadata (setup.py): started
#30 11.00   Preparing metadata (setup.py): finished with status 'error'
#30 11.01   error: subprocess-exited-with-error
#30 11.01   
#30 11.01   × python setup.py egg_info did not run successfully.
#30 11.01   │ exit code: 1
#30 11.01   ╰─> [42 lines of output]
#30 11.01       Traceback (most recent call last):
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/__init__.py", line 39, in <module>
#30 11.01           import torch_npu.npu
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/npu/__init__.py", line 122, in <module>
#30 11.01           from torch_npu.utils import _should_print_warning
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/utils/__init__.py", line 1, in <module>
#30 11.01           from torch_npu import _C
#30 11.01       ImportError: libascend_hal.so: cannot open shared object file: No such file or directory
#30 11.01       
#30 11.01       During handling of the above exception, another exception occurred:
#30 11.01       
#30 11.01       Traceback (most recent call last):
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/__init__.py", line 2637, in _import_device_backends
#30 11.01           entrypoint = backend_extension.load()
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
#30 11.01           module = import_module(match.group('module'))
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/importlib/__init__.py", line 126, in import_module
#30 11.01           return _bootstrap._gcd_import(name[level:], package, level)
#30 11.01         File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
#30 11.01         File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
#30 11.01         File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
#30 11.01         File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
#30 11.01         File "<frozen importlib._bootstrap_external>", line 883, in exec_module
#30 11.01         File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/__init__.py", line 41, in <module>
#30 11.01           from torch_npu.utils._error_code import ErrCode, pta_error
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/utils/__init__.py", line 1, in <module>
#30 11.01           from torch_npu import _C
#30 11.01       ImportError: libascend_hal.so: cannot open shared object file: No such file or directory
#30 11.01       
#30 11.01       The above exception was the direct cause of the following exception:
#30 11.01       
#30 11.01       Traceback (most recent call last):
#30 11.01         File "<string>", line 2, in <module>
#30 11.01         File "<pip-setuptools-caller>", line 34, in <module>
#30 11.01         File "/tmp/pip-install-fenwbomu/deepspeed_faf6a1693b3242eda4916c80bde05ad1/setup.py", line 34, in <module>
#30 11.01           import torch
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/__init__.py", line 2665, in <module>
#30 11.01           _import_device_backends()
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/__init__.py", line 2641, in _import_device_backends
#30 11.01           raise RuntimeError(
#30 11.01       RuntimeError: Failed to load the backend extension: torch_npu. You can disable extension auto-loading with TORCH_DEVICE_BACKEND_AUTOLOAD=0.
#30 11.01       [end of output]
#30 11.01   
#30 11.01   note: This error originates from a subprocess, and is likely not a problem with pip.
#30 11.07 error: metadata-generation-failed
#30 11.07 
#30 11.07 × Encountered error while generating package metadata.
#30 11.07 ╰─> See above for output.
#30 11.07 
#30 11.07 note: This is an issue with the package mentioned above, not pip.
#30 11.07 hint: See above for details.
#30 ERROR: process "/bin/sh -c source  ~/.bashrc && pip install  deepspeed==0.16.7" did not complete successfully: exit code: 1

这个so是在/usr/local/Ascend/driver/lib64/driver目录下, 我已经通过ENV设置了, 但是仍然无效. RUN ls /usr/local/Ascend报错, 才意识到我在一台cpu机器打镜像, 是没有这个驱动的. 这个驱动是镜像运行后, 挂载上去的.

所以参考报错里的信息, 执行前将TORCH_DEVICE_BACKEND_AUTOLOAD设置为0

代码语言:shell复制
ENV TORCH_DEVICE_BACKEND_AUTOLOAD=0

...包安装等

ENV TORCH_DEVICE_BACKEND_AUTOLOAD=1

这样即可解决.

ascend cann镜像构建失败, 报错"ImportError: libascend

因为cann版本不匹配, vllm运行失败, 所以需要从头开始装cann.

在安装到deepspeed时, 报错"ImportError: libascend_hal.so: cannot open shared object file: No such file or directory".

代码语言:shell复制
#30 [26/36] RUN source  ~/.bashrc && pip install  deepspeed==0.16.7
#30 0.150 /root/custom.bashrc
#30 1.862 Now using node v12.18.3 (npm v6.14.6)
#30 6.706 Looking in indexes: /
#30 6.811 Collecting deepspeed==0.16.7
#30 6.841   Downloading .16.7.tar.gz (1.5 MB)
#30 6.933      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 17.4 MB/s eta 0:00:00
#30 7.824   Preparing metadata (setup.py): started
#30 11.00   Preparing metadata (setup.py): finished with status 'error'
#30 11.01   error: subprocess-exited-with-error
#30 11.01   
#30 11.01   × python setup.py egg_info did not run successfully.
#30 11.01   │ exit code: 1
#30 11.01   ╰─> [42 lines of output]
#30 11.01       Traceback (most recent call last):
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/__init__.py", line 39, in <module>
#30 11.01           import torch_npu.npu
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/npu/__init__.py", line 122, in <module>
#30 11.01           from torch_npu.utils import _should_print_warning
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/utils/__init__.py", line 1, in <module>
#30 11.01           from torch_npu import _C
#30 11.01       ImportError: libascend_hal.so: cannot open shared object file: No such file or directory
#30 11.01       
#30 11.01       During handling of the above exception, another exception occurred:
#30 11.01       
#30 11.01       Traceback (most recent call last):
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/__init__.py", line 2637, in _import_device_backends
#30 11.01           entrypoint = backend_extension.load()
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
#30 11.01           module = import_module(match.group('module'))
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/importlib/__init__.py", line 126, in import_module
#30 11.01           return _bootstrap._gcd_import(name[level:], package, level)
#30 11.01         File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
#30 11.01         File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
#30 11.01         File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
#30 11.01         File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
#30 11.01         File "<frozen importlib._bootstrap_external>", line 883, in exec_module
#30 11.01         File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/__init__.py", line 41, in <module>
#30 11.01           from torch_npu.utils._error_code import ErrCode, pta_error
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/utils/__init__.py", line 1, in <module>
#30 11.01           from torch_npu import _C
#30 11.01       ImportError: libascend_hal.so: cannot open shared object file: No such file or directory
#30 11.01       
#30 11.01       The above exception was the direct cause of the following exception:
#30 11.01       
#30 11.01       Traceback (most recent call last):
#30 11.01         File "<string>", line 2, in <module>
#30 11.01         File "<pip-setuptools-caller>", line 34, in <module>
#30 11.01         File "/tmp/pip-install-fenwbomu/deepspeed_faf6a1693b3242eda4916c80bde05ad1/setup.py", line 34, in <module>
#30 11.01           import torch
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/__init__.py", line 2665, in <module>
#30 11.01           _import_device_backends()
#30 11.01         File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/__init__.py", line 2641, in _import_device_backends
#30 11.01           raise RuntimeError(
#30 11.01       RuntimeError: Failed to load the backend extension: torch_npu. You can disable extension auto-loading with TORCH_DEVICE_BACKEND_AUTOLOAD=0.
#30 11.01       [end of output]
#30 11.01   
#30 11.01   note: This error originates from a subprocess, and is likely not a problem with pip.
#30 11.07 error: metadata-generation-failed
#30 11.07 
#30 11.07 × Encountered error while generating package metadata.
#30 11.07 ╰─> See above for output.
#30 11.07 
#30 11.07 note: This is an issue with the package mentioned above, not pip.
#30 11.07 hint: See above for details.
#30 ERROR: process "/bin/sh -c source  ~/.bashrc && pip install  deepspeed==0.16.7" did not complete successfully: exit code: 1

这个so是在/usr/local/Ascend/driver/lib64/driver目录下, 我已经通过ENV设置了, 但是仍然无效. RUN ls /usr/local/Ascend报错, 才意识到我在一台cpu机器打镜像, 是没有这个驱动的. 这个驱动是镜像运行后, 挂载上去的.

所以参考报错里的信息, 执行前将TORCH_DEVICE_BACKEND_AUTOLOAD设置为0

代码语言:shell复制
ENV TORCH_DEVICE_BACKEND_AUTOLOAD=0

...包安装等

ENV TORCH_DEVICE_BACKEND_AUTOLOAD=1

这样即可解决.

本文标签: ascend cann镜像构建失败报错quotImportError libascend