Jetson Nano
Jetson Nano系统安装
系统安装参考Jetson nano (4GB B01) 系统安装
基本配置
首先SD卡扩容,一般只有32G及以上的卡才能装上Nano系统,打开disk选择resize将剩下的空间补上。
安装基础依赖
sudo apt install -y make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python-openssl git
输入法
nano没有中文输入法,国内网络搜索会有很大的阻碍,因此安装fcitx-googlepinyin输入法:
sudo apt-get install fcitx fcitx-tools fcitx-config* fcitx-frontend* fcitx-module* fcitx-ui-* presage fcitx-googlepinyin
在语言支持language support当中的键盘输入方法系统keyboard input method system当中选择fcitx
然后再输入下列指令重启:
sudo reboot
重启后在右上角键盘设置当中选择配置Fcitx
在输入方法配置选择框当中按左下角加号添加google pinyin后即可
同时在全局配置选项当中可以设置中英切换按键,默认是ctrl加空格
基础软件
Jetson nano相当于一台小电脑,为了方便开发需要一部分好用的软件帮助我们达成目的。
远程桌面
nano开发板其中一个好处在于其便携性,但不可能每次都带一个nano加显示屏幕,安装一款远程桌面可以方便我们控制nano
option1:Todesk
利用todesk可以跨平台控制,只要连接上任意的网络就可以实现远程桌面控制,控制的方便程度和网络传输相关,但同时需要注意的是todesk只支持gnome-desktop的桌面,部分系统如果没有连接显示屏幕那么就不会输出显示画面,导致todesk显示的是黑屏。解决方法通常有修改系统配置或者购买一个显卡欺骗器
我们只需要在Jetson当中打开浏览器搜索todesk然后下载:
选择arm64架构并在下载目录打开终端按照界面内的安装指令输入即可
登录自己的账号,然后在任意其他平台设备上下载todesk便可以远控各种设备了:
option2:nomachine或者VNC
VNC或者nomachine都是利用jetson nano连接的网络的IP来连接控制的,因此需要控制设备和Jetson nano连接在同一个网络下,通过IP访问。
以Nomachine为例:
网络浏览器搜索Nomachine进入官网并选择linux操作平台arm64,网址:NoMachine - NoMachine for Arm
安装后会自动在右上角显示NoMachine Services,它自己会设置一个开机自启服务用于开启NoMachine服务,服务口能够显示IP地址。同样可以在终端输入:
ifconfig
会显示如下界面:
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:14:7c:d1:0b txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether 48:b0:2d:c1:27:3c txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 151 base 0x7000
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1 (Local Loopback)
RX packets 8548 bytes 664893 (664.8 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8548 bytes 664893 (664.8 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
rndis0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether 4a:4c:3a:fd:ea:e9 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
usb0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether ##:##:##:##:##:## txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet ###.###.###.### netmask 255.255.255.0 broadcast ###.###.###.###
inet6 fe80::### prefixlen ## scopeid 0x##<link>
ether ##:##:##:c0:5e:e7 txqueuelen 1000 (Ethernet)
RX packets 111615 bytes 40187332 (40.1 MB)
RX errors 0 dropped 1584 overruns 0 frame 0
TX packets 260908 bytes 198071760 (198.0 MB)
TX errors 0 dropped 6 overruns 0 carrier 0 collisions 0
根据你的nano连接的网络类型选择,例如你连接的是wifi那么就从wlan0或者wlan1当中找ip,一般wlan当中连接的是inet后的ip地址,如果是usb网络共享或者RJ45网口则可能需要从usb或者eth当中寻找ip,详见:使用 ifconfig 查看本机 ip_终端怎么查看本地ip
通过在其他设备上安装Nomachine连接查看远程桌面,点击Add添加设备,名字任意取,Host当中输入对应的IP地址即可
编程开发
VScode或者VScode-OSS
VScode支持多种编程语言,还可以添加插件,相当好用。因此在nano上面安装VScode可以极大提高开发效率
由于老版本的ubuntu可能不一定支持最新版本的VScode,可以安装旧版本的VScode或者直接安装VScode-OSS
我们前往网址Visual Studio Code September 2023点击Linux:后的deb便可以下载安装包
安装方式(注意jetson应该安装arm版本,由于我在x86-64设备上下载的安装包成了AMD,下载时注意即可,下面的指令当中也要将amd改为arm):
sudo dpkg -i code_1.83.1-1696982868_amd64.deb
安装完成后在开始菜单当中搜索code便可以找到VScode,可以将其Pin到左侧菜单栏当中
可以在extension拓展插件,必要插件有Python、C/C++等。
以python为例,在vscode界面右下角点击python可以选择对应版本的python的解释器
文件下载
例如迅雷等软件,直接前往网址【免费】WPS、迅雷等软件的arm64linux版本deb直接安装所有软件包
安装方式:
sudo dpkg -i #对应的安装包名字#
工作空间
输入下列指令创建ROS工作空间
mkdir -p ~/catkin_ws/src
cd ~/catkin_ws/src
catkin_init_workspace
catkin_make
source devel/setup.bash
外接设备
PWM风扇
Jetson nano的功耗还是较大的,为了及时散热需要使用PWM风扇连接主板上的PWM风扇口,然后我们在系统终端输入:
sudo apt install lm-sensors
sudo -H pip3 install -U jetson-stats
sudo sh -c 'echo 100 > /sys/devices/pwm-fan/target_pwm'
就可以控制风扇转到到正常转速,同时我们需要让转速和温度正相关,编写下列函数:
import os
import time
def get_cpu_temp():
temp = os.popen("cat /sys/devices/virtual/thermal/thermal_zone0/temp").readline()
return int(temp) / 1000
def set_fan_speed(pwm_value):
pwm_value = max(0, min(255, pwm_value))
os.system(f"sudo sh -c 'echo {pwm_value} > /sys/devices/pwm-fan/target_pwm'")
def adjust_fan_speed():
temp = get_cpu_temp()
if temp < 40:
pwm_value = 0
else:
pwm_value = int((temp - 40) * 6.375)
set_fan_speed(pwm_value)
print(f"Current CPU Temp: {temp}°C, Fan Speed PWM: {pwm_value}")
if __name__ == "__main__":
while True:
adjust_fan_speed()
time.sleep(10)
上述代码可以实现温控PWM风扇,其中get_cpu_temp函数主要用于读取CPU温度并转化为一个数值,并设置一个函数adjust_fan_speed将PWM的0到255和CPU温度数值对应起来,如果温度低于40PWM为0,如果温度大于40则建立线性表达式pwm_value = int((temp - 40) * 6.375),最后将这个对应的PWM数值利用set_fan_speed函数控制风扇转动。
注意有些PWM风扇刚好逻辑是反的,例如此处0是不转,但是有些是255不转,只需要在set_fan_speed函数开头添加一句代码即可:
pwm_value = 255 - pwm_value
将上述代码命名为fanctrl.py然后输入下列指令:
crontab -e
选择nano或者vim来显示内部代码,然后在最后加入下列指令,其中“/home/jetson/Workspace/Startup/”代表的是你的fanctrl.py所在的路径,注意路径不要输入错误
@reboot python3 /home/jetson/Workspace/Startup/fanctrl.py
这样就可以在每次开机的时候自启动该程序实现自动风扇控制。同时我们也可以使用系统自带的服务来实现开机自启,详见:Jetson Nano开机自动启动Python程序
OLED
OLED是许多嵌入式板子都支持的显示模块,有SPI协议和IIC协议两种。此处利用IIC协议的OLED为例显示NANO的部分信息。
首先是准备工作,在终端输入下列指令安装必要的库:
sudo apt-get install python3-tk
sudo apt-get install python3-pip
sudo apt-get install python3-dev
sudo pip3 install psutil smbus requests datetime Jetson.GPIO eyed3 adafruit-circuitpython-ssd1306 board adafruit-blinka adafruit-circuitpython-typing
引用对应库并设置显示OLED的分辨率,一般OLED设置有128*64和128*32这两种。
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import time
import smbus
import board
import busio
import psutil
import subprocess
from adafruit_ssd1306 import SSD1306_I2C
from datetime import datetime
from PIL import Image, ImageDraw, ImageFont
i2c = busio.I2C(board.SCL, board.SDA)
disp = SSD1306_I2C(128, 32, i2c)
width = disp.width
height = disp.height
image = Image.new("1", (width, height))
draw = ImageDraw.Draw(image)
font = ImageFont.load_default()
padding = -2
top = padding
bottom = height - padding
x = 0
设置下列函数用于显示IP、磁盘空间、CPU占用率、网络收发速度(wlan)、CPU温度、时间、内存这7类信息,由于选择的是128*32分辨率因此设置了切换变量display_mode切换显示信息:
def mainrun():
count = 0
start_time = time.time()
display_mode = 0
switch_interval = 10
while True:
draw.rectangle((0, 0, width, height), outline=0, fill=0)
if count % 60 == 0:
cmd = "ifconfig wlan0 | grep 'inet ' | awk '{print $2}'"
wlan0_IP = subprocess.check_output(cmd, shell=True).decode("utf-8").strip()
if display_mode == 0:
cpu_usage_percentage = psutil.cpu_percent(1)
CPU_usage = "CPU: {:.2f}%".format(cpu_usage_percentage)
cmd = "free -m | awk 'NR==2{printf \"Mem: %s/%sMB %.0f%%\", $3,$2,$3*100/$2 }'"
MemUsage = subprocess.check_output(cmd, shell=True).decode("utf-8").strip()
cmd = "cat /sys/class/thermal/thermal_zone0/temp"
temp = subprocess.check_output(cmd, shell=True).decode("utf-8").strip()
CPU_temp = str(round(float(temp) / 1000, 2)) + "°C"
draw.text((x, top + 0), "IP: " + wlan0_IP, font=font, fill=255)
draw.text((x, top + 8), CPU_usage, font=font, fill=255)
draw.text((x, top + 16), MemUsage, font=font, fill=255)
draw.text((x, top + 24), "CPU Temp: " + CPU_temp, font=font, fill=255)
elif display_mode == 1:
cmd = 'df -h | awk \'$NF=="/"{printf "Disk: %d/%d GB %s", $3,$2,$5}\''
Disk = subprocess.check_output(cmd, shell=True).decode("utf-8").strip()
cmd = "cat /sys/class/net/wlan0/statistics/rx_bytes"
rx_bytes_before = int(subprocess.check_output(cmd, shell=True).decode("utf-8").strip())
cmd = "cat /sys/class/net/wlan0/statistics/tx_bytes"
tx_bytes_before = int(subprocess.check_output(cmd, shell=True).decode("utf-8").strip())
time.sleep(1)
cmd = "cat /sys/class/net/wlan0/statistics/rx_bytes"
rx_bytes_after = int(subprocess.check_output(cmd, shell=True).decode("utf-8").strip())
cmd = "cat /sys/class/net/wlan0/statistics/tx_bytes"
tx_bytes_after = int(subprocess.check_output(cmd, shell=True).decode("utf-8").strip())
rx_speed = (rx_bytes_after - rx_bytes_before) / 1024 # KB/s
tx_speed = (tx_bytes_after - tx_bytes_before) / 1024 # KB/s
rUNITS = 'KB'
tUNITS = 'KB'
if rx_speed > 1024:
rx_speed = rx_speed / 1024
rUNITS = 'MB'
if tx_speed > 1024:
tx_speed = tx_speed / 1024
tUNITS = 'MB'
NetSpeed = "RT: {:.1f}{} {:.1f}{}".format(rx_speed, rUNITS, tx_speed, tUNITS)
current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
draw.text((x, top + 0), "IP: " + wlan0_IP, font=font, fill=255)
draw.text((x, top + 8), Disk, font=font, fill=255)
draw.text((x, top + 16), NetSpeed, font=font, fill=255)
draw.text((x, top + 24), current_time, font=font, fill=255)
disp.image(image)
disp.show()
if time.time() - start_time >= switch_interval:
display_mode = (display_mode + 1) % 2
start_time = time.time()
time.sleep(1)
count += 1
同样将上述代码命名为ssd1306startup.py同样输入下列指令用于开机自启
@reboot python3 /home/jetson/Workspace/Startup/ssd1306startup.py
如果出现AttributeError: module 'board' has no attribute 'SCL'解决方案是前往board.py 将python3安装目录下的board.py替换为该网址下的board.py,然后根据接下来的报错将报错的py文件内部的地址大小写修改一下。
编程开发
此案例当中的系统:ubuntu18.04
python3.6
ubuntu18.04自带的python3版本是3.6,用python3.6两个案例开发
face_recognition
face_recognition是一个基于Python的人脸识别库,它使用dlib顶尖的深度学习人脸识别技术构建。这个库不仅提供了丰富的API接口,还封装了许多复杂的操作,能够更加简单快捷地进行人脸识别相关的工作。
安装face_recognition
pip3 install boost cmake build
pip3 install dlib
pip3 install face_recognition
在./datasets/known文件夹放置一张人脸图片命名为starlight.jpg,如果还需要在添加人脸则继续添加但是每一张人脸图片要对应名字,
放置完成后编写下列代码可以让face_recognition读取文件夹下的starlight图片并利用库中用于生成图像中人脸的128维特征向量的函数返回一个人脸特征向量。这些特征向量可以被用于面部识别或比较两个面部是否相似
starlight_image = face_recognition.load_image_file("datasets/known/starlight.jpg")
starlight_face_encoding = face_recognition.face_encodings(starlight_image)[0]
利用opencv读取摄像头并将摄像头读取的每一帧图片人脸特征向量数据和库比对,如果对应上其中的某个特征向量则返回对应名字,否则返回未知:
face_locations = face_recognition.face_locations(frame)
face_encodings = face_recognition.face_encodings(frame, face_locations)
for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
# Matching faces
matches_starlight = face_recognition.compare_faces([starlight_face_encoding], face_encoding)
matches_alice = face_recognition.compare_faces([alice_face_encoding], face_encoding)
name = "unknown"
if True in matches_starlight:
name = "starlight"
elif True in matches_alice:
name = "XW"
总体代码如下:
import face_recognition
import cv2
import tkinter as tk
from PIL import Image, ImageTk
print("Loading faces data")
try:
starlight_image = face_recognition.load_image_file("datasets/known/starlight.jpg")
starlight_face_encoding = face_recognition.face_encodings(starlight_image)[0]
alice_image = face_recognition.load_image_file("datasets/known/XW.jpg")
alice_face_encoding = face_recognition.face_encodings(alice_image)[0]
except:
rootpath = "/home/jetson/Workspace/Py36Script/FaceRecognition-master/"
starlight_image = face_recognition.load_image_file(rootpath + "datasets/known/starlight.jpg")
starlight_face_encoding = face_recognition.face_encodings(starlight_image)[0]
alice_image = face_recognition.load_image_file(rootpath + "datasets/known/XW.jpg")
alice_face_encoding = face_recognition.face_encodings(alice_image)[0]
print("Warming Camera")
root = tk.Tk()
root.title("Cam")
lmain = tk.Label(root)
lmain.pack()
cap = cv2.VideoCapture(0)
def show_frame():
_, frame = cap.read()
frame = cv2.resize(frame,(280,320))
# Searching faces
face_locations = face_recognition.face_locations(frame)
face_encodings = face_recognition.face_encodings(frame, face_locations)
for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
# Matching faces
matches_starlight = face_recognition.compare_faces([starlight_face_encoding], face_encoding)
matches_alice = face_recognition.compare_faces([alice_face_encoding], face_encoding)
name = "unknown"
if True in matches_starlight:
name = "starlight"
elif True in matches_alice:
name = "Zhao"
# Show name
cv2.rectangle(frame, (left, top), (right, bottom), (0, 255, 0), 2)
font = cv2.FONT_HERSHEY_DUPLEX
cv2.putText(frame, name, (left + 6, bottom - 6), font, 0.5, (0, 255, 0), 2)
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
img = Image.fromarray(frame)
imgtk = ImageTk.PhotoImage(image=img)
lmain.imgtk = imgtk
lmain.configure(image=imgtk)
lmain.after(200, show_frame)
show_frame()
root.mainloop()
结果图如下:例如本案例导入了星光starlight的人脸图片然后将摄像头对准星光的其他图片便可以识别,而其余人脸图片输出unknown
mediapipe
MediaPipe 是由 Google 开发的一个开源跨平台框架,旨在实现高效、实时的机器学习和计算机视觉管道。MediaPipe 提供了一系列的预构建模型和工具,支持开发者在设备端进行实时的视频和图像处理,适用于多个平台,包括 Android、iOS、Web 和桌面系统。MediaPipe 的高性能和跨平台特性使其在手势识别、姿态估计、面部表情识别、物体检测等任务中得到了广泛应用。
以下编写一个简单案列实现用mediapipe检测人体姿态:
首先安装mediapipe:
pip3 install mediapipe
编写代码:
利用 mp_pose.Pose()
初始化姿态检测模型,设置最小置信度为 0.5即至少有 50% 的置信度,才认为检测到有效姿态),pose.process()
处理输入的 RGB 图像并检测姿态,返回的 results
包含检测到的姿态关键点数据。如果成功检测到姿态,results.pose_landmarks
会包含这些关键点的位置,否则返回无姿态检测文本信息
with mp_pose.Pose(static_image_mode=True, min_detection_confidence=0.5) as pose:
results = pose.process(image_rgb)
if results.pose_landmarks:
print("Detected pose landmarks.")
mp_drawing.draw_landmarks(
image,
results.pose_landmarks,
mp_pose.POSE_CONNECTIONS,
mp_drawing.DrawingSpec(color=(0, 255, 0), thickness=2, circle_radius=2),
mp_drawing.DrawingSpec(color=(0, 0, 255), thickness=2, circle_radius=2)
)
else:
print("No pose detected.")
总体代码如下:
import cv2
import mediapipe as mp
mp_pose = mp.solutions.pose
mp_drawing = mp.solutions.drawing_utils
image_path = "input_image.png"
image = cv2.imread(image_path)
cv2.imshow("original",image)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
with mp_pose.Pose(static_image_mode=True, min_detection_confidence=0.5) as pose:
results = pose.process(image_rgb)
if results.pose_landmarks:
print("Detected pose landmarks.")
mp_drawing.draw_landmarks(
image,
results.pose_landmarks,
mp_pose.POSE_CONNECTIONS,
mp_drawing.DrawingSpec(color=(0, 255, 0), thickness=2, circle_radius=2),
mp_drawing.DrawingSpec(color=(0, 0, 255), thickness=2, circle_radius=2)
)
else:
print("No pose detected.")
cv2.imshow("Pose Detection", image)
cv2.waitKey(0)
cv2.destroyAllWindows()
显示结果如下:
python3.10
下列工程直接利用python3.6会存在版本不支持的问题,安装python3.10可以参考机器人Rviz+matlab仿真详细步骤,从solidworks到urdf以及机器人编程控制,以ER50-C20为例-CSDN博客利用三个案列显示python可以完成的编程开发任务
ultralytics
YOLOv8 是由 Ultralytics 开发的最新一代 YOLO(You Only Look Once)目标检测模型,它延续了 YOLO 系列在计算机视觉任务中的高效表现,并在准确性和推理速度上进行了进一步改进。YOLOv8 可以用于多种视觉任务,包括目标检测、实例分割和图像分类。它是 YOLO 家族的最新版本,结合了前代 YOLOv5、YOLOv6、YOLOv7 的优势,并引入了新的特性和改进。
安装ultralytics
~/python3.10/bin/python3.10 -m pip install ultralytics
注意上述指令会默认安装最新版本的ultralytics以及最新版本的torch和torchvision,根据nano的cuda版本自行提前安装对应版本的CPU或者GPU版本的torch以及torchvision、torchaudio,如果接下来运行出现类似于“core dumped”的报错基本是安装的torch或者numpy等库版本不兼容导致的。
假设已经安装完成,如果出现:“AttributeError: module 'collections' has no attribute ###”那么就卸载重装:
~/python3.10/bin/python3.10 -m pip uninstall ultralytics -y
~/python3.10/bin/python3.10 -m pip install ultralytics
同时也可以安装指定版本的ultralytics,如果不兼容可以适当降低版本。
编写核心部分代码:
import cv2
from ultralytics import YOLO
model = YOLO(model="./weights/detect/yolov8n.pt")
def process_frame(self, frame):
res = model(frame)
annotated_img = res[0].plot()
results_frames = cv2.cvtColor(annotated_img, cv2.COLOR_BGR2RGB)
return results_frames
其中从模型库当中导入yolov8n.PT,模型可以从huggingface或者github下载:Ultralytics/YOLOv8 at main (huggingface.co)https://huggingface.co/Ultralytics/YOLOv8/tree/main
Releases · ultralytics/assets (github.com)https://github.com/ultralytics/assets/releasesprocess_frame函数使用加载的YOLO模型对输入的
frame
进行推理。model(frame)
调用会返回检测结果,这些结果通常包括检测到的目标的位置(边界框)、类别和置信度等信息。这里的res
变量接收了这些结果。
利用TK界面显示每一帧处理后的图像并显示,总体代码如下:
import cv2
from tkinter import *
from PIL import Image, ImageTk
from ultralytics import YOLO
model = YOLO(model="./weights/detect/yolov8n.pt")
#model = YOLO(model="./custom_PT/detect/trees.pt")
class App:
def __init__(self, window, window_title, video_source):
self.window = window
self.window.title(window_title)
self.video_source = video_source
self.vid = cv2.VideoCapture(video_source)
if not self.vid.isOpened():
raise ValueError("Unable to open video source", video_source)
self.width = self.vid.get(cv2.CAP_PROP_FRAME_WIDTH)
self.height = self.vid.get(cv2.CAP_PROP_FRAME_HEIGHT)
self.canvas = Canvas(window, width=self.vid.get(cv2.CAP_PROP_FRAME_WIDTH), height=self.vid.get(cv2.CAP_PROP_FRAME_HEIGHT))
#self.canvas = Canvas(window, width=self.width, height=self.height)
self.canvas.pack()
self.delay = 1
self.update()
self.window.mainloop()
def update(self):
ret, frame = self.vid.read()
if ret:
processed_frame = self.process_frame(frame)
self.photo = ImageTk.PhotoImage(image=Image.fromarray(processed_frame))
self.canvas.create_image(0, 0, image=self.photo, anchor=NW)
self.window.after(self.delay, self.update)
def process_frame(self, frame):
res = model(frame)
annotated_img = res[0].plot()
results_frames = cv2.cvtColor(annotated_img, cv2.COLOR_BGR2RGB)
return results_frames
def __del__(self):
if self.vid.isOpened():
self.vid.release()
App(Tk(), "Tkinter and OpenCV", video_source=0)
结果如下:
yolov8还支持归类、姿态检测、人脸检测、语义分割等图像处理,只需要下载对应的模型文件即可。
Speech-recognition
SpeechRecognition
是 Python 生态系统中的一个流行库,旨在帮助开发者通过编程处理语音识别任务。它支持多种语音识别引擎和 API,允许从音频文件或实时麦克风音频中识别语音并将其转换为文本。SpeechRecognition
库非常易用,适用于各种语音识别任务,如语音命令控制、自动字幕生成等。
安装:
~/python3.10/bin/python3.10 -m pip install speechrecognition
利用下列代码创建TK GUI界面选择翻译语言然后将wav文件当中的语音转化为对应语言的文字
import tkinter as tk
from tkinter import filedialog, ttk, messagebox
import speech_recognition as sr
import os
def open_results_folder():
current_file_directory = os.path.dirname(os.path.abspath(__file__))
try:
abspath = current_file_directory + "\\results"
os.system(f"xdg-open '{abspath}'")
except Exception as e:
messagebox.showerror("Error", f"Could not open folder: {e}")
def show_help():
help_window = tk.Toplevel(app)
help_window.title("Help")
help_window.geometry("300x150")
help_text = tk.Text(help_window, height=8, width=40, bg="white", fg="black")
help_text.pack()
help_message = "Here is how to use this application:\n1. Select a WAV file.\n2. Choose a language.\n3. Click 'Convert to Text'.\n4. View the results or open the results folder."
help_text.insert(tk.END, help_message)
def on_right_click(event):
try:
popup_menu.tk_popup(event.x_root, event.y_root)
finally:
popup_menu.grab_release()
def convert_to_text():
try:
language = lang_var.get()
audio_file = file_var.get()
r = sr.Recognizer()
with sr.AudioFile(audio_file) as source:
audio = r.record(source)
text = r.recognize_sphinx(audio, language=language)
text_box.delete("1.0", tk.END)
text_box.insert(tk.END, text)
with open(f'./results/{os.path.basename(audio_file).replace(".wav", ".txt")}', 'w') as f:
f.write(text)
except Exception as e:
messagebox.showerror("Error", f"Could not convert audio: {e}")
def select_file():
filename = filedialog.askopenfilename(initialdir='./datasets', title="Select a file", filetypes=[("WAV files", "*.wav")])
file_var.set(filename)
app = tk.Tk()
app.title("STT_v1")
app.geometry("305x400")
#app.iconbitmap('speech2txt.ico')
app.configure(background="white")
file_var = tk.StringVar()
lang_var = tk.StringVar()
lang_var.set('zh-CN')
style = ttk.Style()
style.configure("TButton", background="white", foreground="black")
open_file_btn = ttk.Button(app, text="Select WAV File", command=select_file)
open_file_btn.grid(row=0, column=0, padx=10, pady=10, sticky="ew")
language_options = ['zh-CN', 'en-US', 'fr-FR', 'ru-RU', 'de-DE']
lang_menu = ttk.Combobox(app, textvariable=lang_var, values=language_options)
lang_menu.grid(row=1, column=0, padx=10, pady=10, sticky="ew")
confirm_btn = ttk.Button(app, text="Convert to Text", command=convert_to_text)
confirm_btn.grid(row=2, column=0, padx=10, pady=10, sticky="ew")
open_results_btn = ttk.Button(app, text="Open Results Folder", command=open_results_folder)
open_results_btn.grid(row=3, column=0, padx=10, pady=10, sticky="ew")
text_box = tk.Text(app, height=15, width=40, bg="white", fg="black")
text_box.grid(row=4, column=0, padx=10, pady=10, sticky="ew")
popup_menu = tk.Menu(app, tearoff=0)
popup_menu.add_command(label="Help", command=show_help)
app.bind("<Button-3>", on_right_click)
app.mainloop()
由于默认识别的是英文,因此需要下载模型到pocketsphinx的模型文件当中下载其他语言模型,下载网址CMU Sphinx - Browse /Acoustic and Language Models at SourceForge.net
Depth-anything
Depth-Anything 是一个用于深度估计的开源项目,旨在从单张图像中推断出场景的深度信息。深度估计可以应用于诸如自动驾驶、增强现实(AR)、机器人视觉等领域,帮助系统理解场景中的距离和空间结构。
从github当中克隆该项目
git clone https://github.com/LiheYoung/Depth-Anything.git
在下载的项目目录当中打开终端输入下列指令安装依赖:
~/python3.10/bin/python3.10 -m pip install -r requirements.txt
降低huggingface_hub版本:
~/python3.10/bin/python3.10 -m pip install huggingface_hub==0.12.1
编写核心代码并利用CPU推理,self.depth_anything从指定的模型路径加载预训练模型,并将模型设置为评估模式 (eval()
),其中self.model_path代表深度估计模型的路径,模型分为s,m,l分别是小中大,可以从huggingface下载:LiheYoung/Depth-Anything at main (huggingface.co)
import tkinter as tk
from PIL import Image, ImageTk
import cv2
import numpy as np
import torch
import torch.nn.functional as F
from torchvision.transforms import Compose
from depth_anything.dpt import DepthAnything
from depth_anything.util.transform import Resize, NormalizeImage, PrepareForNet
self.model_path = './models/depth_anything_vits14'
self.DEVICE = 'cpu'
self.depth_anything = DepthAnything.from_pretrained(self.model_path, local_files_only=True).to(self.DEVICE)
self.depth_anything.eval()
在该工程界面编写总体代码:
import tkinter as tk
from PIL import Image, ImageTk
import cv2
import numpy as np
import torch
import torch.nn.functional as F
from torchvision.transforms import Compose
from depth_anything.dpt import DepthAnything
from depth_anything.util.transform import Resize, NormalizeImage, PrepareForNet
class DepthEstimationApp:
def __init__(self, root, image_path):
self.root = root
self.root.title("Depth Estimation")
self.left_frame = tk.Frame(self.root)
self.right_frame = tk.Frame(self.root)
self.left_frame.pack(side=tk.LEFT)
self.right_frame.pack(side=tk.RIGHT)
self.original_label = tk.Label(self.left_frame)
self.processed_label = tk.Label(self.right_frame)
self.original_label.pack()
self.processed_label.pack()
self.model_path = './models/depth_anything_vits14'
self.DEVICE = 'cpu'
self.depth_anything = DepthAnything.from_pretrained(self.model_path, local_files_only=True).to(self.DEVICE)
self.depth_anything.eval()
self.transform = Compose([
Resize(width=256, height=256, resize_target=False, keep_aspect_ratio=True, ensure_multiple_of=14,
resize_method='lower_bound', image_interpolation_method=cv2.INTER_LINEAR),
NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
PrepareForNet(),
])
self.image_path = image_path
self.show_image()
def show_image(self):
frame = cv2.imread(self.image_path)
if frame is not None:
original_height, original_width = frame.shape[:2]
original_img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
original_img = Image.fromarray(original_img)
original_imgtk = ImageTk.PhotoImage(image=original_img)
self.original_label.imgtk = original_imgtk
self.original_label.configure(image=original_imgtk)
depth_image = self.process_frame(frame)
depth_image = cv2.resize(depth_image, (original_width, original_height))
depth_image = cv2.cvtColor(depth_image, cv2.COLOR_BGR2RGB)
processed_img = Image.fromarray(depth_image)
processed_imgtk = ImageTk.PhotoImage(image=processed_img)
self.processed_label.imgtk = processed_imgtk
self.processed_label.configure(image=processed_imgtk)
def process_frame(self, frame):
frame = cv2.resize(frame, (0, 0), fx=0.25, fy=0.25)
raw_image = frame
image = cv2.cvtColor(raw_image, cv2.COLOR_BGR2RGB) / 255.0
h, w = image.shape[:2]
image = self.transform({'image': image})['image']
image = torch.from_numpy(image).unsqueeze(0).to(self.DEVICE)
with torch.no_grad():
depth = self.depth_anything(image)
depth = F.interpolate(depth[None], (h, w), mode='bilinear', align_corners=False)[0, 0]
depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
depth = depth.cpu().numpy().astype(np.uint8)
depth_color = cv2.applyColorMap(depth, cv2.COLORMAP_INFERNO)
return depth_color
def close(self):
pass
if __name__ == '__main__':
image_path = "/home/jetson/Workspace/Py310Script/DepthAnything-master/sitgirl.png"
root = tk.Tk()
app = DepthEstimationApp(root, image_path)
root.mainloop()
运行结果图:
Stable-Diffusion
利用下列代码尝试部署stable-diffusion,Jetsonnano的性能一般不支持,建议在NX上部署。
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd ./stable-diffusion-webui/
~/Workspace/py310scripts/bin/python3.10 -m venv venv
source ./venv/bin/activate
pip install -r requirements.txt
pip install --upgrade pip
./webui.sh --skip-torch-cuda-test
export HF_ENDPOINT=https://hf-mirror.com
./webui.sh --skip-torch-cuda-test
总结
本文可以帮助jetson nano从开始搭配好部分基础库,方便在开发某项工程的时候利用jetson当中的这些工具达到事半功倍的效果。