cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Orin"
CUDA Driver Version / Runtime Version 11.4 / 11.4
CUDA Capability Major/Minor version number: 8.7
Total amount of global memory: 30593 MBytes (32079273984 bytes)
(014) Multiprocessors, (128) CUDA Cores/MP: 1792 CUDA Cores
GPU Max Clock rate: 930 MHz (0.93 GHz)
Memory Clock rate: 930 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 167936 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS
sudo apt-get install -y sysbench
sysbench cpu --threads=1 run
CPU speed:
events per second: 2615.41
General statistics:
total time: 10.0001s
total number of events: 26158
多核性能测试
# 安装Geekbench
wget https://cdn.geekbench.com/Geekbench-5.4.1-LinuxARMPreview.tar.gz
tar xvf Geekbench-5.4.1-LinuxARMPreview.tar.gz
cd Geekbench-5.4.1-LinuxARMPreview
# 运行测试(结果自动上传官网)
./geekbench5
# 结果
https://browser.geekbench.com/v5/cpu/23371424
内存带宽测试
t@t-desktop:/usr/local/cuda/samples/1_Utilities/bandwidthTest$ ./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: Orin
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 18.1
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 26.2
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 114.5
Result = PASS
磁盘性能测试
# 顺序写测试(生成1GB测试文件)
dd if=/dev/zero of=testfile bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.73304 s, 187 MB/s
# 随机读写测试(需安装sysbench)
sudo apt-get install -y sysbench
sysbench fileio --file-total-size=2G prepare
2147483648 bytes written in 17.01 seconds (120.38 MiB/sec).
sysbench fileio --file-test-mode=rndrw run
sudo sysbench fileio --file-test-mode=rndrw run
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Extra file open flags: (none)
128 files, 16MiB each
2GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...
Threads started!
File operations:
reads/s: 1211.30
writes/s: 807.53
fsyncs/s: 2592.01
Throughput:
read, MiB/s: 18.93
written, MiB/s: 12.62
General statistics:
total time: 10.0047s
total number of events: 46007
Latency (ms):
min: 0.00
avg: 0.22
max: 76.12
95th percentile: 0.35
sum: 9979.21
Threads fairness:
events (avg/stddev): 46007.0000/0.00
execution time (avg/stddev): 9.9792/0.00