Jetson orin使用问题记录:
ubuntu20,jetpack5.1.1
nvidia@nvidia-desktop:/ssd/home/nvidia/jetson-containers$ ./run.sh $(./autotag local_llm:r35.4.1) python3 -m local_llm --api=mlc --model=liuhaotian/llava-v1.5-13b --prompt '/data/images/fruit.jpg' rompt 'what kind of fruits do you see?' --prompt 'reset' --prompt '/data/images/dogs.jpg' --prompt 'what breed of dogs are in the image?' --prompt 'reset' --prompt '/data/images/path.
Namespace(disable=[''], output='/tmp/autotag', packages=['local_llm:r35.4.1', 'share=Ture'], prefer=['local', 'registry', 'build'], quiet=False, user='dustynv', verbose=False)
-- L4T_VERSION=35.3.1 JETPACK_VERSION=5.1.1 CUDA_VERSION=11.4.315
-- Finding compatible container image for ['local_llm:r35.4.1', 'share=Ture']
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/ssd/home/nvidia/jetson-containers/jetson_containers/tag.py", line 55, in <module>
image = find_container(args.packages[0], prefer_sources=args.prefer, disable_sources=args.disable, user=args.user, quiet=args.quiet, verbose=args.verbose)
File "/ssd/home/nvidia/jetson-containers/jetson_containers/container.py", line 481, in find_container
local_images = find_local_containers(package, **kwargs)
File "/ssd/home/nvidia/jetson-containers/jetson_containers/container.py", line 384, in find_local_containers
local_images = get_local_containers()
File "/ssd/home/nvidia/jetson-containers/jetson_containers/container.py", line 341, in get_local_containers
status = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['docker', 'images', '--format', "'{{json . }}'"]' returned non-zero exit status 1.
想不明白昨天还好好的今天断开连接重新下载就不行了,菜鸟可能就是这样把,度一下:
python里import subprocess模块,使用subprocess.check_output(command)可以检查输出,如果报错“subprocess.CalledProcessError: Command 'XXX' returned non-zero exit status 1.”说明在系统cmd或terminal里执行命令出错,并不是找不到命令。
找不到命令时的报错是:FileNotFoundError: [WinError 2] 系统找不到指定的文件。
如果Command是where xxx,就相当于cmd执行where xxx,是用来在PATH里查找目标的路径的命令。如果没有找到也会报如题错误。
当把在目标放到PATH后,需要重新启动python环境更新path才能正常找到
到这里我其实也没想明白原因,想着我下载的镜像有没有下载成功看下,
nvidia@nvidia-desktop:/ssd/home/nvidia/jetson-containers$ docker images
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
nvidia@nvidia-desktop:/ssd/home/nvidia/jetson-containers$ systemctl docker start
Unknown operation docker.
nvidia@nvidia-desktop:/ssd/home/nvidia/jetson-containers$ systemctl start docker
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to start 'docker.service'.
Authenticating as: nvidia,,, (nvidia)
Password:
==== AUTHENTICATION COMPLETE ===
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
nvidia@nvidia-desktop:/ssd/home/nvidia/jetson-containers$ status docker.service
-bash: status: command not found
nvidia@nvidia-desktop:/ssd/home/nvidia/jetson-containers$ systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2024-02-19 13:25:28 CST; 19s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Process: 5794 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
Main PID: 5794 (code=exited, status=1/FAILURE)
2月 19 13:25:28 nvidia-desktop systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
2月 19 13:25:28 nvidia-desktop systemd[1]: Stopped Docker Application Container Engine.
2月 19 13:25:28 nvidia-desktop systemd[1]: docker.service: Start request repeated too quickly.
2月 19 13:25:28 nvidia-desktop systemd[1]: docker.service: Failed with result 'exit-code'.
2月 19 13:25:28 nvidia-desktop systemd[1]: Failed to start Docker Application Container Engine.
systemctl start docker
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to start 'docker.service'.
Authenticating as: nvidia,,, (nvidia)
Password:
==== AUTHENTICATION COMPLETE ===
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
nvidia@nvidia-desktop:~$ systemctl restart docker
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to restart 'docker.service'.
Authenticating as: nvidia,,, (nvidia)
Password:
==== AUTHENTICATION COMPLETE ===
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
nvidia@nvidia-desktop:~$ sudo rm /etc/docker/daemon.json
[sudo] password for nvidia:
nvidia@nvidia-desktop:~$ systemctl restart docker
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to restart 'docker.service'.
Authenticating as: nvidia,,, (nvidia)
Password:
==== AUTHENTICATION COMPLETE ===
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
nvidia@nvidia-desktop:~$ systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2024-02-19 13:29:50 CST; 10s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Process: 6361 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
Main PID: 6361 (code=exited, status=1/FAILURE)
2月 19 13:29:50 nvidia-desktop systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
2月 19 13:29:50 nvidia-desktop systemd[1]: Stopped Docker Application Container Engine.
2月 19 13:29:50 nvidia-desktop systemd[1]: docker.service: Start request repeated too quickly.
2月 19 13:29:50 nvidia-desktop systemd[1]: docker.service: Failed with result 'exit-code'.
2月 19 13:29:50 nvidia-desktop systemd[1]: Failed to start Docker Application Container Engine.
我删除了/etc/docker/daemon.json文件,跟着这个前辈做的操作。这个一般不要删除要不数据都没了,还是知识储备不够跟着别人删没了哈哈。
Docker stopped working after jetpack update + ssd mount - Jetson Xavier NX - NVIDIA Developer Forums
通过指定守护程序要使用的存储驱动程序来修复它。
dockerd -s overlay2
开始了瞎操作模式:想起来昨天搭建服务器安装了防火墙
nvidia@nvidia-desktop:~$ sudo apt-get install ip6tables-restore
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package ip6tables-restore
nvidia@nvidia-desktop:~$ sudo apt-get install ip6tables
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package ip6tables
nvidia@nvidia-desktop:~$ sudo apt-get install ip6tables
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package ip6tables
nvidia@nvidia-desktop:~$ systemctl enable firewalld
Synchronizing state of firewalld.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable firewalld
==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ===
Authentication is required to reload the systemd state.
Authenticating as: nvidia,,, (nvidia)
Password:
==== AUTHENTICATION COMPLETE ===
==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ===
Authentication is required to reload the systemd state.
Authenticating as: nvidia,,, (nvidia)
Password:
==== AUTHENTICATION COMPLETE ===
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-unit-files ===
Authentication is required to manage system service or unit files.
Authenticating as: nvidia,,, (nvidia)
Password:
==== AUTHENTICATION COMPLETE ===
==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ===
Authentication is required to reload the systemd state.
Authenticating as: nvidia,,, (nvidia)
Password:
==== AUTHENTICATION COMPLETE ===
关掉它还得卸载加sudo,自己挖坑自己填
nvidia@nvidia-desktop:~$ systemctl disable firewalld
Synchronizing state of firewalld.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable firewalld
==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ===
Authentication is required to reload the systemd state.
Authenticating as: nvidia,,, (nvidia)
Password:
==== AUTHENTICATION COMPLETE ===
update-rc.d: error: Permission denied
nvidia@nvidia-desktop:~$ sudo apt-get uninstall firewalld
E: Invalid operation uninstall
nvidia@nvidia-desktop:~$ sudo apt-get unstall firewalld
E: Invalid operation unstall
nvidia@nvidia-desktop:~$ sudo apt-get remove firewalld
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
ipset libipset13 libnftables1 python3-firewall python3-nftables python3-selinux python3-slip python3-slip-dbus
Use 'sudo apt autoremove' to remove them.
The following packages will be REMOVED:
firewalld
0 upgraded, 0 newly installed, 1 to remove and 296 not upgraded.
After this operation, 2,362 kB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 178321 files and directories currently installed.)
Removing firewalld (0.8.2-1) ...
update-alternatives: using /usr/share/polkit-1/actions/org.fedoraproject.FirewallD1.desktop.policy.choice to provide /usr/share/polkit-1/actions/org.fedoraproject.FirewallD1.policy (org.fedoraproject.FirewallD1.policy) in auto mode
Processing triggers for dbus (1.12.16-2ubuntu2.3) ...
Processing triggers for man-db (2.9.1-1) ...
没权限就加sudo
update-rc.d: error: Permission denied
nvidia@nvidia-desktop:~$ sudo systemctl disable firewalld
Synchronizing state of firewalld.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable firewalld
能用了记得把原来的sudo vi /etc/docker/daemon.json文件恢复好,不然你下载的镜像都找不到了
nvidia@nvidia-desktop:~$ sudo du -sh /ssd/var/lib/docker/
36G /ssd/var/lib/docker/
nvidia@nvidia-desktop:~$ sudo du -sh /ssd/docker/
296K /ssd/docker/
nvidia@nvidia-desktop:~$ sudo vi /etc/docker/daemon.json
nvidia@nvidia-desktop:/ssd/home/nvidia/jetson-containers$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
dustynv/local_llm r35.4.1 80aee66ddf10 3 days ago 20.2GB
dustynv/llamaspeak r35.4.1 91f8d1868c34 2 months ago 9.93GB
dustynv/text-generation-webui 1.7-r35.4.1 0a3ae6d644e6 2 months ago 14.5GB
nvcr.io/nvidia/riva/riva-speech 2.12.1-l4t-aarch64 4c7658e18d85 7 months ago 12.4GB
镜像回来了,继续搬砖把