admin管理员组文章数量:1028861
ascend cann镜像构建失败, 报错"ImportError: libascend
因为cann版本不匹配, vllm运行失败, 所以需要从头开始装cann.
在安装到deepspeed时, 报错"ImportError: libascend_hal.so: cannot open shared object file: No such file or directory".
代码语言:shell复制#30 [26/36] RUN source ~/.bashrc && pip install deepspeed==0.16.7
#30 0.150 /root/custom.bashrc
#30 1.862 Now using node v12.18.3 (npm v6.14.6)
#30 6.706 Looking in indexes: /
#30 6.811 Collecting deepspeed==0.16.7
#30 6.841 Downloading .16.7.tar.gz (1.5 MB)
#30 6.933 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 17.4 MB/s eta 0:00:00
#30 7.824 Preparing metadata (setup.py): started
#30 11.00 Preparing metadata (setup.py): finished with status 'error'
#30 11.01 error: subprocess-exited-with-error
#30 11.01
#30 11.01 × python setup.py egg_info did not run successfully.
#30 11.01 │ exit code: 1
#30 11.01 ╰─> [42 lines of output]
#30 11.01 Traceback (most recent call last):
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/__init__.py", line 39, in <module>
#30 11.01 import torch_npu.npu
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/npu/__init__.py", line 122, in <module>
#30 11.01 from torch_npu.utils import _should_print_warning
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/utils/__init__.py", line 1, in <module>
#30 11.01 from torch_npu import _C
#30 11.01 ImportError: libascend_hal.so: cannot open shared object file: No such file or directory
#30 11.01
#30 11.01 During handling of the above exception, another exception occurred:
#30 11.01
#30 11.01 Traceback (most recent call last):
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/__init__.py", line 2637, in _import_device_backends
#30 11.01 entrypoint = backend_extension.load()
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
#30 11.01 module = import_module(match.group('module'))
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/importlib/__init__.py", line 126, in import_module
#30 11.01 return _bootstrap._gcd_import(name[level:], package, level)
#30 11.01 File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
#30 11.01 File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
#30 11.01 File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
#30 11.01 File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
#30 11.01 File "<frozen importlib._bootstrap_external>", line 883, in exec_module
#30 11.01 File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/__init__.py", line 41, in <module>
#30 11.01 from torch_npu.utils._error_code import ErrCode, pta_error
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/utils/__init__.py", line 1, in <module>
#30 11.01 from torch_npu import _C
#30 11.01 ImportError: libascend_hal.so: cannot open shared object file: No such file or directory
#30 11.01
#30 11.01 The above exception was the direct cause of the following exception:
#30 11.01
#30 11.01 Traceback (most recent call last):
#30 11.01 File "<string>", line 2, in <module>
#30 11.01 File "<pip-setuptools-caller>", line 34, in <module>
#30 11.01 File "/tmp/pip-install-fenwbomu/deepspeed_faf6a1693b3242eda4916c80bde05ad1/setup.py", line 34, in <module>
#30 11.01 import torch
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/__init__.py", line 2665, in <module>
#30 11.01 _import_device_backends()
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/__init__.py", line 2641, in _import_device_backends
#30 11.01 raise RuntimeError(
#30 11.01 RuntimeError: Failed to load the backend extension: torch_npu. You can disable extension auto-loading with TORCH_DEVICE_BACKEND_AUTOLOAD=0.
#30 11.01 [end of output]
#30 11.01
#30 11.01 note: This error originates from a subprocess, and is likely not a problem with pip.
#30 11.07 error: metadata-generation-failed
#30 11.07
#30 11.07 × Encountered error while generating package metadata.
#30 11.07 ╰─> See above for output.
#30 11.07
#30 11.07 note: This is an issue with the package mentioned above, not pip.
#30 11.07 hint: See above for details.
#30 ERROR: process "/bin/sh -c source ~/.bashrc && pip install deepspeed==0.16.7" did not complete successfully: exit code: 1
这个so是在/usr/local/Ascend/driver/lib64/driver
目录下, 我已经通过ENV设置了, 但是仍然无效. RUN ls /usr/local/Ascend
报错, 才意识到我在一台cpu机器打镜像, 是没有这个驱动的. 这个驱动是镜像运行后, 挂载上去的.
所以参考报错里的信息, 执行前将TORCH_DEVICE_BACKEND_AUTOLOAD
设置为0
ENV TORCH_DEVICE_BACKEND_AUTOLOAD=0
...包安装等
ENV TORCH_DEVICE_BACKEND_AUTOLOAD=1
这样即可解决.
ascend cann镜像构建失败, 报错"ImportError: libascend
因为cann版本不匹配, vllm运行失败, 所以需要从头开始装cann.
在安装到deepspeed时, 报错"ImportError: libascend_hal.so: cannot open shared object file: No such file or directory".
代码语言:shell复制#30 [26/36] RUN source ~/.bashrc && pip install deepspeed==0.16.7
#30 0.150 /root/custom.bashrc
#30 1.862 Now using node v12.18.3 (npm v6.14.6)
#30 6.706 Looking in indexes: /
#30 6.811 Collecting deepspeed==0.16.7
#30 6.841 Downloading .16.7.tar.gz (1.5 MB)
#30 6.933 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 17.4 MB/s eta 0:00:00
#30 7.824 Preparing metadata (setup.py): started
#30 11.00 Preparing metadata (setup.py): finished with status 'error'
#30 11.01 error: subprocess-exited-with-error
#30 11.01
#30 11.01 × python setup.py egg_info did not run successfully.
#30 11.01 │ exit code: 1
#30 11.01 ╰─> [42 lines of output]
#30 11.01 Traceback (most recent call last):
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/__init__.py", line 39, in <module>
#30 11.01 import torch_npu.npu
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/npu/__init__.py", line 122, in <module>
#30 11.01 from torch_npu.utils import _should_print_warning
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/utils/__init__.py", line 1, in <module>
#30 11.01 from torch_npu import _C
#30 11.01 ImportError: libascend_hal.so: cannot open shared object file: No such file or directory
#30 11.01
#30 11.01 During handling of the above exception, another exception occurred:
#30 11.01
#30 11.01 Traceback (most recent call last):
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/__init__.py", line 2637, in _import_device_backends
#30 11.01 entrypoint = backend_extension.load()
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
#30 11.01 module = import_module(match.group('module'))
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/importlib/__init__.py", line 126, in import_module
#30 11.01 return _bootstrap._gcd_import(name[level:], package, level)
#30 11.01 File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
#30 11.01 File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
#30 11.01 File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
#30 11.01 File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
#30 11.01 File "<frozen importlib._bootstrap_external>", line 883, in exec_module
#30 11.01 File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/__init__.py", line 41, in <module>
#30 11.01 from torch_npu.utils._error_code import ErrCode, pta_error
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/utils/__init__.py", line 1, in <module>
#30 11.01 from torch_npu import _C
#30 11.01 ImportError: libascend_hal.so: cannot open shared object file: No such file or directory
#30 11.01
#30 11.01 The above exception was the direct cause of the following exception:
#30 11.01
#30 11.01 Traceback (most recent call last):
#30 11.01 File "<string>", line 2, in <module>
#30 11.01 File "<pip-setuptools-caller>", line 34, in <module>
#30 11.01 File "/tmp/pip-install-fenwbomu/deepspeed_faf6a1693b3242eda4916c80bde05ad1/setup.py", line 34, in <module>
#30 11.01 import torch
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/__init__.py", line 2665, in <module>
#30 11.01 _import_device_backends()
#30 11.01 File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/__init__.py", line 2641, in _import_device_backends
#30 11.01 raise RuntimeError(
#30 11.01 RuntimeError: Failed to load the backend extension: torch_npu. You can disable extension auto-loading with TORCH_DEVICE_BACKEND_AUTOLOAD=0.
#30 11.01 [end of output]
#30 11.01
#30 11.01 note: This error originates from a subprocess, and is likely not a problem with pip.
#30 11.07 error: metadata-generation-failed
#30 11.07
#30 11.07 × Encountered error while generating package metadata.
#30 11.07 ╰─> See above for output.
#30 11.07
#30 11.07 note: This is an issue with the package mentioned above, not pip.
#30 11.07 hint: See above for details.
#30 ERROR: process "/bin/sh -c source ~/.bashrc && pip install deepspeed==0.16.7" did not complete successfully: exit code: 1
这个so是在/usr/local/Ascend/driver/lib64/driver
目录下, 我已经通过ENV设置了, 但是仍然无效. RUN ls /usr/local/Ascend
报错, 才意识到我在一台cpu机器打镜像, 是没有这个驱动的. 这个驱动是镜像运行后, 挂载上去的.
所以参考报错里的信息, 执行前将TORCH_DEVICE_BACKEND_AUTOLOAD
设置为0
ENV TORCH_DEVICE_BACKEND_AUTOLOAD=0
...包安装等
ENV TORCH_DEVICE_BACKEND_AUTOLOAD=1
这样即可解决.
本文标签: ascend cann镜像构建失败报错quotImportError libascend
版权声明:本文标题:ascend cann镜像构建失败, 报错"ImportError: libascend 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://it.en369.cn/jiaocheng/1747534796a2171877.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论