香橙派5 Ultra(1)——上手记录

0 硬件说明

Orange Pi5 Ultra 采用了瑞芯微 RK3588 新一代八核 64 位 ARM 处理器,具体为四核 A76 和四核 A55,采用的三星 8nm LP 制程工艺,大核主频最高可达 2.4GHz,集成 ARM Mali-G610 MP4 GPU,内嵌高性能 3D 和 2D 图像加速模块,内置高达 6 Tops 算力的 AI 加速器 NPU,具有高达 8K 显示处理能力。

开发板接口说明,

在这里插入图片描述

GPIO 说明:

在这里插入图片描述

四个定位孔的直径都是 2.7mm。

1 系统烧录

这个就是常规操作了。

烧录软件 balenaEtcher:https://www.balena.io/etcher/

TF 卡需要 16GB 或更大容量,TF 卡的传输速度必须为 class10 级或 class10 级以上。

2 启动

将烧录好镜像的 TF 卡插入香橙派开发板的 TF 卡插槽中。

开发板有 HDMI 接口,可以通过 HDMI 转 HDMI 连接线把开发板连接到电视或者 HDMI 显示器。接上 USB 鼠标和键盘,用于控制香橙派开发板。

3 安装散热与 M.2 固态硬盘

  • 板载 pcie gen3x4 的 m.2 接口,可以用来安装 nvme/sata 协议的 m.2 接口固态硬盘。

请添加图片描述

3.1 配置 m.2 固态硬盘开机自动挂载
# 查看磁盘
lsblk
# NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
# mtdblock0    31:0    0   16M  0 disk
# mmcblk1     179:0    0 29.7G  0 disk
# ├─mmcblk1p1 179:1    0    1G  0 part /boot
# └─mmcblk1p2 179:2    0 28.4G  0 part /var/log.hdd
#                                      /
# zram0       254:0    0  7.8G  0 disk [SWAP]
# zram1       254:1    0  200M  0 disk /var/log
# nvme0n1     259:0    0  1.9T  0 disk

可以看到有一个 1.9T 的 nvme0n1 就是添加的 SSD 硬盘了。

如果需要对 SSD 硬盘进行分区,可以使用fdisk进行分区,

sudo fdisk /dev/nvme0n1

后面根据提示和需要进行分区即可,这里只分了一个区:

lsblk

NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
mtdblock0    31:0    0   16M  0 disk
mmcblk1     179:0    0 29.7G  0 disk
├─mmcblk1p1 179:1    0    1G  0 part /boot
└─mmcblk1p2 179:2    0 28.4G  0 part /var/log.hdd
                                     /
zram0       254:0    0  7.8G  0 disk [SWAP]
zram1       254:1    0  200M  0 disk /var/log
nvme0n1     259:0    0  1.9T  0 disk
└─nvme0n1p1 259:1    0  1.9T  0 part

如果不需要分区,就跳过分区处理,直接进行硬盘的格式化,输入mkfs,之后连续按两次tab,查看支持哪些文件系统格式:

mkfs

mkfs         mkfs.cramfs  mkfs.ext2    mkfs.ext4    mkfs.minix   mkfs.ntfs
mkfs.bfs     mkfs.exfat   mkfs.ext3    mkfs.fat     mkfs.msdos   mkfs.vfat

可以看到系统支持多种格式,这里选择ext4:

sudo mkfs.ext4 /dev/nvme0n1p1

mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 500099414 4k blocks and 125026304 inodes
Filesystem UUID: 8b60d807-81c4-4061-b947-bfea65909117
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848

Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done

至此,硬盘格式化完成。

新建挂载点:

sudo mkdir -p /mnt/nvme0

挂载:

# 单次挂载
sudo mount /dev/nvme0n1p1 /mnt/nvme0

可以通过df -h来查看是否挂载成功。

3.1.1 设置开机自动挂载

首先查看UUID

blkid

/dev/nvme0n1p1: UUID="8b60d807-81c4-4061-b947-bfea65909117" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="a5bd4a72-01"

/dev/nvme0n1p1UUID 复制出来,然后写入到/etc/fstab中去:

sudo echo "UUID=8b60d807-81c4-4061-b947-bfea65909117 /mnt/nvme0 ext4 defaults 0 0" >> /etc/fstab

/etc/fstab 中定义的所有档案系统挂上:

mount -a

修改挂载目录权限:

sudo chmod 777 -R /mnt/nvme0/

最后,重启试验一下是否成功。

3.1.2 硬盘读写测试

进入磁盘挂载目录

cd nvme_ssd0

清除缓存

sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"

写测试

sudo dd if=/dev/zero of=./test_write count=2000 bs=1024k

# 2000+0 records in
# 2000+0 records out
# 2097152000 bytes (2.1 GB, 2.0 GiB) copied, 1.44551 s, 1.5 GB/s

我这里的写速度为 1.5GB/s。

读测试

# 清除缓存
sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
sudo dd if=./test_write of=/dev/null count=2000 bs=1024k

2000+0 records in
2000+0 records out
2097152000 bytes (2.1 GB, 2.0 GiB) copied, 1.18762 s, 1.8 GB/s

我这里的读速度为 1.8GB/s,这里一定要先清除缓存,否则会由于缓存的原因,达到很高的读速度。

  • 板子上带有 PH1.25 的散热风扇接口,可以安装主动散热器

请添加图片描述

3.2 设置散热器散热策略
# 查看系统温度传感器
sensors
# npu_thermal-virtual-0
# Adapter: Virtual device
# temp1:        +44.4°C  (crit = +115.0°C)

# center_thermal-virtual-0
# Adapter: Virtual device
# temp1:        +44.4°C  (crit = +115.0°C)

# bigcore1_thermal-virtual-0
# Adapter: Virtual device
# temp1:        +44.4°C  (crit = +115.0°C)

# soc_thermal-virtual-0
# Adapter: Virtual device
# temp1:        +44.4°C  (crit = +115.0°C)

# gpu_thermal-virtual-0
# Adapter: Virtual device
# temp1:        +43.5°C  (crit = +115.0°C)

# littlecore_thermal-virtual-0
# Adapter: Virtual device
# temp1:        +44.4°C  (crit = +115.0°C)

# bigcore0_thermal-virtual-0
# Adapter: Virtual device
# temp1:        +44.4°C  (crit = +115.0°C)

# 查看 nvme ssd 固态硬盘当前温度
sudo smartctl -a /dev/nvme0 | grep "Temperature:"
# Temperature:                        36 Celsius

开机后风扇没有转是正常的,因为开机后 CPU 的温度一般都低于 50 度,默认只有当 CPU 的温度达到 50 度后,风扇才会开始转。

使用下面的命令可以让所有CPU都跑满,然后就能看到风扇会开始工作了:

for i in $(seq 0 $(( $(nproc --all) - 1)) ); do (taskset -c $i yes > /dev/null &); done

开发板上的风扇可以通过PWM来调节转速和开关,使用的PWM引脚为PWM3_IR_M1。Linux 系统默认使用 驱动来控制风扇,所使用的 配置如下所示:

设备树文件 kernel/arch/arm64/boot/dts/rockchip/rk3588-orangepi-5-plus.dts

fan: pwm-fan {

    compatible = "pwm-fan";

    #cooling-cells = <2>;

    pwms = <&pwm3 0 50000 0>;

    cooling-levels = <0 50 100 150 200 255>;

    rockchip,temp-trips = <

        42000 1

        48000 2

        55000 3

        62000 4

        68000 5

    >;



    status = "okay";

};

4 基本测试

4.1 功耗
  • 插上固态后系统待机:6W
  • 大核满载:6W
  • 小核满载:1W
  • NVME 满载:7W
  • 单个网卡满载:0.25W(AX210)
  • 散热风扇: 0.5W
4.2 部分测试记录

基本信息:

neofetch

            .-/+oossssoo+/-.               orangepi@orangepi5ultra
        `:+ssssssssssssssssss+:`           -----------------------
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 22.04.5 LTS aarch64
    .ossssssssssssssssssdMMMNysssso.       Host: RK3588 OPi 5 Ultra
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 6.1.43-rockchip-rk3588
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 14 hours, 27 mins
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 1746 (dpkg)
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: bash 5.1.16
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Resolution: 3520x2557
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Theme: Adwaita [GTK3]
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Icons: Adwaita [GTK3]
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Terminal: /dev/pts/0
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   CPU: (8) @ 1.800GHz
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/    Memory: 796MiB / 15964MiB
  +sssssssssdmydMMMMMMMMddddyssssssss+
   /ssssssssssshdmNNNNmyNMMMMhssssss/
    .ossssssssssssssssssdMMMNysssso.
      -+sssssssssssssssssyyyssss+-
        `:+ssssssssssssssssss+:`
            .-/+oossssoo+/-.
4.2.1 sbc-bench
git clone https://github.com/ThomasKaiser/sbc-bench.git
cd sbc-bench
sudo ./sbc-bench.sh

主要测试结果:

# 当前系统状态
Status of performance related governors found below /sys (w/o cpufreq):
dmc: dmc_ondemand / 2400 MHz (rknpu_ondemand dmc_ondemand userspace powersave performance simple_ondemand / 534 1320 1968 2400)
fb000000.gpu: simple_ondemand / 300 MHz (rknpu_ondemand dmc_ondemand userspace powersave performance simple_ondemand / 300 400 500 600 700 800 900 1000)
fdab0000.npu: rknpu_ondemand / 1000 MHz (rknpu_ondemand dmc_ondemand userspace powersave performance simple_ondemand / 300 400 500 600 700 800 900 1000)

sbc-bench v0.9.71


# Results validation
Results validation:

  * Measured clockspeed not lower than advertised max CPU clockspeed
  * No swapping
  * Background activity (%system) OK
  * Too much other background activity: 0% avg, 4% max -> https://tinyurl.com/mr2wy5uv
  * No throttling


# 内存性能测试,测试了大小核的三个簇
Memory performance (all 3 CPU clusters measured individually):
memcpy: 6599.2 MB/s (Cortex-A55)
memset: 21712.5 MB/s (Cortex-A55)
memcpy: 12925.7 MB/s (Cortex-A76)
memset: 27424.8 MB/s (Cortex-A76)
memcpy: 12844.3 MB/s (Cortex-A76)
memset: 27290.9 MB/s (Cortex-A76)


# 7-zip测试分数
7-zip total scores (3 consecutive runs): 15639,15694,15626, single-threaded: 2888


# OpenSSL测试分数
OpenSSL results (all 3 CPU clusters measured individually):
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc     157907.60k   466907.84k   913189.29k  1211602.94k  1335563.61k  1345901.91k (Cortex-A55)
aes-128-cbc     637695.14k  1269587.78k  1627369.30k  1741073.41k  1785804.12k  1790377.98k (Cortex-A76)
aes-128-cbc     636491.22k  1270735.13k  1630370.90k  1745556.48k  1790883.16k  1795375.10k (Cortex-A76)
aes-192-cbc     150707.76k   417121.37k   746944.17k   933191.00k  1005458.77k  1011433.47k (Cortex-A55)
aes-192-cbc     593616.55k  1116556.37k  1378340.10k  1448504.66k  1489854.46k  1492833.62k (Cortex-A76)
aes-192-cbc     595237.65k  1113902.61k  1381619.88k  1452472.66k  1493748.39k  1496771.24k (Cortex-A76)
aes-256-cbc     146073.90k   382410.11k   645469.95k   780004.01k   829835.95k   833869.14k (Cortex-A55)
aes-256-cbc     575753.51k   986583.40k  1192946.60k  1254034.43k  1277599.74k  1279923.54k (Cortex-A76)
aes-256-cbc     574328.99k   989858.52k  1196424.45k  1257626.28k  1281086.81k  1283407.87k (Cortex-A76)


# 一些系统信息
Unable to upload full test results. Please copy&paste the below stuff to pastebin.com and
provide the URL. Check the output for throttling and swapping please.


sbc-bench v0.9.71 RK3588 OPi 5 Ultra (Fri, 07 Mar 2025 23:43:38 +0800)

Distributor ID: Ubuntu
Description:    Ubuntu 22.04.5 LTS
Release:        22.04
Codename:       jammy
Build system:   https://github.com/orangepi-xunlong/orangepi-build, 1.0.0, Orange Pi 5 Ultra, rockchip-rk3588, rockchip-rk3588

/usr/bin/gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Uptime: 23:43:38 up 14:24,  6 users,  load average: 1.38, 1.21, 1.10,  42.5°C,  105341910

Linux 6.1.43-rockchip-rk3588 (orangepi5ultra)   03/07/25        _aarch64_       (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.22    0.00    0.43    0.01    0.00   99.34

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
mmcblk1           0.35        12.28         4.81         0.00     637150     249217          0
nvme0n1           0.04         0.19         2.42         0.00       9717     125508          0
zram0             0.01         0.04         0.00         0.00       2248          4          0
zram1             0.07         0.02         0.55         0.00       1124      28508          0

               total        used        free      shared  buff/cache   available
Mem:            15Gi       709Mi        14Gi        13Mi       341Mi        14Gi
Swap:          7.8Gi          0B       7.8Gi



# 测试细项

# CPU 频率测试
##########################################################################

Checking cpufreq OPP for cpu0-cpu3 (Cortex-A55):

Cpufreq OPP: 1800    Measured: 1787 (1787.703/1787.681/1787.614)
Cpufreq OPP: 1608    Measured: 1600 (1600.550/1600.490/1600.370)
Cpufreq OPP: 1416    Measured: 1402 (1402.854/1401.707/1401.690)
Cpufreq OPP: 1200    Measured: 1190 (1190.608/1190.608/1190.489)
Cpufreq OPP: 1008    Measured:  944    (944.281/944.210/944.175)     (-6.3%)
Cpufreq OPP:  816    Measured:  755    (755.260/755.260/755.063)     (-7.5%)
Cpufreq OPP:  600    Measured:  591    (591.393/591.319/591.282)     (-1.5%)
Cpufreq OPP:  408    Measured:  393    (393.473/393.414/393.272)     (-3.7%)

Checking cpufreq OPP for cpu4-cpu5 (Cortex-A76):

Cpufreq OPP: 2256    Measured: 2254 (2254.173/2254.117/2254.060)
Cpufreq OPP: 2208    Measured: 2230 (2230.562/2230.507/2230.423)
Cpufreq OPP: 2016    Measured: 2021 (2021.858/2021.808/2021.757)
Cpufreq OPP: 1800    Measured: 1853 (1853.985/1853.985/1853.870)     (+2.9%)
Cpufreq OPP: 1608    Measured: 1623 (1623.488/1623.406/1623.305)
Cpufreq OPP: 1416    Measured: 1408 (1408.440/1408.440/1408.282)
Cpufreq OPP: 1200    Measured: 1101 (1101.257/1101.175/1101.037)     (-8.3%)
Cpufreq OPP: 1008    Measured:  925    (925.898/925.851/925.840)     (-8.2%)
Cpufreq OPP:  816    Measured:  742    (742.202/742.174/742.156)     (-9.1%)
Cpufreq OPP:  600    Measured:  593    (593.208/593.208/593.186)     (-1.2%)
Cpufreq OPP:  408    Measured:  395    (395.238/395.238/395.199)     (-3.2%)

Checking cpufreq OPP for cpu6-cpu7 (Cortex-A76):

Cpufreq OPP: 2256    Measured: 2261 (2261.224/2261.196/2261.167)
Cpufreq OPP: 2208    Measured: 2237 (2237.741/2237.685/2237.685)     (+1.3%)
Cpufreq OPP: 2016    Measured: 2001 (2001.445/2001.370/2001.270)
Cpufreq OPP: 1800    Measured: 1830 (1830.725/1830.564/1830.519)     (+1.7%)
Cpufreq OPP: 1608    Measured: 1632 (1632.412/1632.249/1632.228)     (+1.5%)
Cpufreq OPP: 1416    Measured: 1418 (1418.126/1418.108/1418.055)
Cpufreq OPP: 1200    Measured: 1109 (1110.045/1109.975/1109.892)     (-7.6%)
Cpufreq OPP: 1008    Measured:  933    (933.992/933.992/933.969)     (-7.4%)
Cpufreq OPP:  816    Measured:  749    (749.262/749.177/749.140)     (-8.2%)
Cpufreq OPP:  600    Measured:  593    (593.140/593.140/593.103)     (-1.2%)
Cpufreq OPP:  408    Measured:  395    (395.189/395.189/395.179)     (-3.2%)


# benchmark
##########################################################################

Executing benchmark on cpu0 (Cortex-A55):

tinymembench v0.4.9-nuumio (simple benchmark for memory throughput and latency)

CFLAGS:
bandwidth test min repeats (-b): 2
bandwidth test max repeats (-B): 3
bandwidth test mem realloc (-M): no      (-m for realloc)
      latency test repeats (-l): 3
        latency test count (-c): 1000000

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Test result is the best of repeated runs. Number of repeats  ==
==         is shown in brackets                                         ==
== Note 3: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 4: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 5: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                 :   2720.7 MB/s (3, 5.5%)
 C copy backwards (32 byte blocks)                :   2692.4 MB/s (3, 0.3%)
 C copy backwards (64 byte blocks)                :   2723.4 MB/s (3, 0.4%)
 C copy                                           :   6053.8 MB/s (2)
 C copy prefetched (32 bytes step)                :   2242.4 MB/s (2)
 C copy prefetched (64 bytes step)                :   6268.1 MB/s (2)
 C 2-pass copy                                    :   2645.6 MB/s (3, 0.3%)
 C 2-pass copy prefetched (32 bytes step)         :   1440.8 MB/s (3, 0.3%)
 C 2-pass copy prefetched (64 bytes step)         :   2910.4 MB/s (2)
 C scan 8                                         :    441.8 MB/s (2)
 C scan 16                                        :    876.5 MB/s (2)
 C scan 32                                        :   1735.8 MB/s (2)
 C scan 64                                        :   3411.4 MB/s (2)
 C fill                                           :  12336.9 MB/s (2)
 C fill (shuffle within 16 byte blocks)           :  12340.4 MB/s (2)
 C fill (shuffle within 32 byte blocks)           :  12339.9 MB/s (2)
 C fill (shuffle within 64 byte blocks)           :  12046.4 MB/s (2)
 ---
 libc memcpy copy                                 :   6599.2 MB/s (3, 0.4%)
 libc memchr scan                                 :   2734.5 MB/s (2)
 libc memset fill                                 :  21712.5 MB/s (2)
 ---
 NEON LDP/STP copy                                :   5677.2 MB/s (2)
 NEON LDP/STP copy pldl2strm (32 bytes step)      :   1731.4 MB/s (2)
 NEON LDP/STP copy pldl2strm (64 bytes step)      :   3573.7 MB/s (3, 0.1%)
 NEON LDP/STP copy pldl1keep (32 bytes step)      :   2586.6 MB/s (3)
 NEON LDP/STP copy pldl1keep (64 bytes step)      :   5437.1 MB/s (2)
 NEON LD1/ST1 copy                                :   5454.8 MB/s (2)
 NEON LDP load                                    :   6844.2 MB/s (3, 0.2%)
 NEON LDNP load                                   :   7050.1 MB/s (2)
 NEON STP fill                                    :  21630.1 MB/s (3, 0.2%)
 NEON STNP fill                                   :  14517.3 MB/s (3, 2.2%)
 ARM LDP/STP copy                                 :   5456.9 MB/s (3, 3.2%)
 ARM LDP load                                     :   6610.1 MB/s (3, 1.4%)
 ARM LDNP load                                    :   6570.1 MB/s (3, 2.6%)
 ARM STP fill                                     :  21617.3 MB/s (3, 0.7%)
 ARM STNP fill                                    :  15243.5 MB/s (3, 1.2%)

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns
      2048 :    0.0 ns          /     0.0 ns
      4096 :    0.0 ns          /     0.0 ns
      8192 :    0.0 ns          /     0.0 ns
     16384 :    0.1 ns          /     0.0 ns
     32768 :    0.5 ns          /     0.9 ns
     65536 :    1.5 ns          /     2.6 ns
    131072 :    4.0 ns          /     6.3 ns
    262144 :    9.0 ns          /    12.0 ns
    524288 :   13.2 ns          /    15.3 ns
   1048576 :   16.0 ns          /    16.6 ns
   2097152 :   25.7 ns          /    25.5 ns
   4194304 :   55.5 ns          /    78.5 ns
   8388608 :  102.6 ns          /   141.1 ns
  16777216 :  130.5 ns          /   162.7 ns
  33554432 :  144.9 ns          /   171.7 ns
  67108864 :  154.3 ns          /   179.4 ns

Executing benchmark on cpu4 (Cortex-A76):

tinymembench v0.4.9-nuumio (simple benchmark for memory throughput and latency)

CFLAGS:
bandwidth test min repeats (-b): 2
bandwidth test max repeats (-B): 3
bandwidth test mem realloc (-M): no      (-m for realloc)
      latency test repeats (-l): 3
        latency test count (-c): 1000000

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Test result is the best of repeated runs. Number of repeats  ==
==         is shown in brackets                                         ==
== Note 3: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 4: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 5: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                 :  11985.9 MB/s (3, 0.2%)
 C copy backwards (32 byte blocks)                :  11998.8 MB/s (3, 0.1%)
 C copy backwards (64 byte blocks)                :  11978.0 MB/s (2)
 C copy                                           :  12503.9 MB/s (2)
 C copy prefetched (32 bytes step)                :  12969.3 MB/s (3, 0.1%)
 C copy prefetched (64 bytes step)                :  12996.2 MB/s (3, 0.2%)
 C 2-pass copy                                    :   4821.6 MB/s (3, 0.5%)
 C 2-pass copy prefetched (32 bytes step)         :   7304.9 MB/s (2)
 C 2-pass copy prefetched (64 bytes step)         :   6461.2 MB/s (2)
 C scan 8                                         :   1116.4 MB/s (2)
 C scan 16                                        :   2233.6 MB/s (2)
 C scan 32                                        :   4468.5 MB/s (2)
 C scan 64                                        :   8917.2 MB/s (2)
 C fill                                           :  27435.4 MB/s (3, 0.8%)
 C fill (shuffle within 16 byte blocks)           :  27462.1 MB/s (3, 0.3%)
 C fill (shuffle within 32 byte blocks)           :  27503.2 MB/s (3, 0.4%)
 C fill (shuffle within 64 byte blocks)           :  27267.6 MB/s (3, 0.2%)
 ---
 libc memcpy copy                                 :  12925.7 MB/s (3, 0.1%)
 libc memchr scan                                 :  14988.3 MB/s (2)
 libc memset fill                                 :  27424.8 MB/s (3, 0.4%)
 ---
 NEON LDP/STP copy                                :  12900.6 MB/s (3, 0.1%)
 NEON LDP/STP copy pldl2strm (32 bytes step)      :  13142.8 MB/s (3)
 NEON LDP/STP copy pldl2strm (64 bytes step)      :  13114.4 MB/s (2)
 NEON LDP/STP copy pldl1keep (32 bytes step)      :  12900.2 MB/s (3, 0.1%)
 NEON LDP/STP copy pldl1keep (64 bytes step)      :  12893.7 MB/s (3, 0.1%)
 NEON LD1/ST1 copy                                :  12771.5 MB/s (3, 0.2%)
 NEON LDP load                                    :  16842.6 MB/s (2)
 NEON LDNP load                                   :  15980.6 MB/s (2)
 NEON STP fill                                    :  27327.4 MB/s (3, 0.3%)
 NEON STNP fill                                   :  27432.6 MB/s (3, 0.8%)
 ARM LDP/STP copy                                 :  12847.1 MB/s (2)
 ARM LDP load                                     :  16339.8 MB/s (2)
 ARM LDNP load                                    :  15212.7 MB/s (3)
 ARM STP fill                                     :  27405.7 MB/s (3, 1.0%)
 ARM STNP fill                                    :  27433.5 MB/s (3, 0.3%)

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns
      2048 :    0.0 ns          /     0.0 ns
      4096 :    0.0 ns          /     0.0 ns
      8192 :    0.0 ns          /     0.0 ns
     16384 :    0.0 ns          /     0.0 ns
     32768 :    0.0 ns          /     0.0 ns
     65536 :    0.1 ns          /     0.0 ns
    131072 :    1.3 ns          /     1.6 ns
    262144 :    2.8 ns          /     3.0 ns
    524288 :    5.7 ns          /     6.5 ns
   1048576 :   12.1 ns          /    13.4 ns
   2097152 :   25.3 ns          /    18.7 ns
   4194304 :   51.2 ns          /    67.6 ns
   8388608 :  100.9 ns          /   131.4 ns
  16777216 :  126.4 ns          /   152.5 ns
  33554432 :  139.4 ns          /   164.2 ns
  67108864 :  147.8 ns          /   165.2 ns

Executing benchmark on cpu6 (Cortex-A76):

tinymembench v0.4.9-nuumio (simple benchmark for memory throughput and latency)

CFLAGS:
bandwidth test min repeats (-b): 2
bandwidth test max repeats (-B): 3
bandwidth test mem realloc (-M): no      (-m for realloc)
      latency test repeats (-l): 3
        latency test count (-c): 1000000

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Test result is the best of repeated runs. Number of repeats  ==
==         is shown in brackets                                         ==
== Note 3: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 4: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 5: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                 :  11985.9 MB/s (3, 0.2%)
 C copy backwards (32 byte blocks)                :  11929.1 MB/s (2)
 C copy backwards (64 byte blocks)                :  11923.8 MB/s (2)
 C copy                                           :  12425.0 MB/s (3, 0.1%)
 C copy prefetched (32 bytes step)                :  12865.5 MB/s (3, 0.1%)
 C copy prefetched (64 bytes step)                :  12859.6 MB/s (3, 0.1%)
 C 2-pass copy                                    :   4426.1 MB/s (3, 1.0%)
 C 2-pass copy prefetched (32 bytes step)         :   6922.1 MB/s (3, 0.3%)
 C 2-pass copy prefetched (64 bytes step)         :   6330.9 MB/s (2)
 C scan 8                                         :   1116.8 MB/s (2)
 C scan 16                                        :   2233.0 MB/s (2)
 C scan 32                                        :   4469.8 MB/s (2)
 C scan 64                                        :   8922.5 MB/s (2)
 C fill                                           :  27412.3 MB/s (3, 0.7%)
 C fill (shuffle within 16 byte blocks)           :  27450.6 MB/s (3, 0.4%)
 C fill (shuffle within 32 byte blocks)           :  27455.9 MB/s (3, 0.4%)
 C fill (shuffle within 64 byte blocks)           :  27316.2 MB/s (3, 0.2%)
 ---
 libc memcpy copy                                 :  12844.3 MB/s (2)
 libc memchr scan                                 :  14974.8 MB/s (3)
 libc memset fill                                 :  27290.9 MB/s (3, 0.3%)
 ---
 NEON LDP/STP copy                                :  12847.3 MB/s (2)
 NEON LDP/STP copy pldl2strm (32 bytes step)      :  13136.4 MB/s (3, 0.1%)
 NEON LDP/STP copy pldl2strm (64 bytes step)      :  13131.1 MB/s (3)
 NEON LDP/STP copy pldl1keep (32 bytes step)      :  12904.4 MB/s (3, 0.1%)
 NEON LDP/STP copy pldl1keep (64 bytes step)      :  12898.5 MB/s (3, 0.4%)
 NEON LD1/ST1 copy                                :  12755.7 MB/s (2)
 NEON LDP load                                    :  16794.4 MB/s (3)
 NEON LDNP load                                   :  15786.2 MB/s (2)
 NEON STP fill                                    :  27268.1 MB/s (3, 0.3%)
 NEON STNP fill                                   :  27426.0 MB/s (3, 0.3%)
 ARM LDP/STP copy                                 :  12894.1 MB/s (3, 0.2%)
 ARM LDP load                                     :  16378.3 MB/s (2)
 ARM LDNP load                                    :  15103.7 MB/s (2)
 ARM STP fill                                     :  27357.4 MB/s (3, 0.5%)
 ARM STNP fill                                    :  27449.6 MB/s (3, 0.3%)

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns
      2048 :    0.0 ns          /     0.0 ns
      4096 :    0.0 ns          /     0.0 ns
      8192 :    0.0 ns          /     0.0 ns
     16384 :    0.0 ns          /     0.0 ns
     32768 :    0.0 ns          /     0.0 ns
     65536 :    0.1 ns          /     0.0 ns
    131072 :    1.3 ns          /     1.6 ns
    262144 :    2.8 ns          /     2.9 ns
    524288 :    5.8 ns          /     6.6 ns
   1048576 :   12.1 ns          /    13.4 ns
   2097152 :   21.1 ns          /    16.5 ns
   4194304 :   50.6 ns          /    69.8 ns
   8388608 :   96.2 ns          /   132.4 ns
  16777216 :  125.9 ns          /   152.7 ns
  33554432 :  143.6 ns          /   160.2 ns
  67108864 :  147.2 ns          /   168.5 ns

##########################################################################

Executing ramlat on cpu0 (Cortex-A55), results in ns:

       size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
         4k: 1.700 1.692 1.689 1.688 1.125 1.687 2.285 4.604
         8k: 1.687 1.687 1.687 1.687 1.124 1.687 2.285 4.605
        16k: 1.695 1.687 1.698 1.687 1.131 1.687 2.285 4.604
        32k: 1.716 1.690 1.713 1.689 1.143 1.690 2.290 4.611
        64k: 10.53 11.65 10.53 11.65 10.67 11.65 16.32 29.59
       128k: 13.38 14.83 13.38 14.82 14.04 14.83 21.84 40.83
       256k: 16.17 16.57 16.18 16.54 15.65 16.76 25.85 50.49
       512k: 17.06 17.24 17.07 17.19 16.34 17.40 27.11 54.07
      1024k: 17.17 17.33 17.17 17.36 16.59 17.56 28.43 54.18
      2048k: 19.53 19.23 19.58 29.47 20.56 22.66 37.45 76.78
      4096k: 152.7 151.5 150.7 143.2 143.0 117.3 153.6 280.1
      8192k: 130.5 137.8 123.5 134.2 132.2 138.5 208.6 366.3
     16384k: 145.2 146.6 141.6 145.8 142.4 147.5 228.3 399.2
     32768k: 153.1 154.9 149.7 151.1 154.0 155.5 227.7 408.5
     65536k: 159.3 163.1 160.2 159.4 160.8 159.6 232.7 417.2
    131072k: 167.7 166.1 164.1 167.5 165.6 167.2 239.2 418.6

Executing ramlat on cpu4 (Cortex-A76), results in ns:

       size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
         4k: 1.783 1.783 1.783 1.783 1.783 1.783 1.783 3.396
         8k: 1.783 1.783 1.783 1.783 1.783 1.783 1.783 3.474
        16k: 1.783 1.783 1.782 1.783 1.782 1.783 1.783 3.474
        32k: 1.783 1.783 1.782 1.783 1.782 1.783 1.783 3.476
        64k: 1.784 1.783 1.784 1.783 1.783 1.784 1.784 3.477
       128k: 5.348 5.351 5.348 5.351 5.348 6.128 7.566 13.51
       256k: 6.341 6.412 6.364 6.424 6.354 6.363 7.844 13.52
       512k: 9.567 9.254 9.597 9.246 9.532 9.793 11.27 17.56
      1024k: 18.50 18.84 18.41 18.82 18.47 18.64 20.55 31.15
      2048k: 19.57 20.11 19.86 20.10 19.48 20.33 22.54 32.88
      4096k: 70.00 55.27 67.82 55.24 80.24 54.03 54.66 68.09
      8192k: 136.0 111.1 121.7 106.9 131.8 100.7 107.1 109.1
     16384k: 146.2 137.2 148.5 137.7 145.6 131.8 132.5 131.5
     32768k: 153.5 153.7 157.3 153.8 152.9 149.7 144.7 137.1
     65536k: 160.8 156.6 158.3 154.8 160.5 152.8 150.0 147.6
    131072k: 160.7 155.0 158.3 154.6 159.7 156.7 151.5 157.4

Executing ramlat on cpu6 (Cortex-A76), results in ns:

       size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
         4k: 1.778 1.777 1.777 1.777 1.777 1.777 1.777 3.383
         8k: 1.777 1.777 1.777 1.777 1.777 1.777 1.777 3.463
        16k: 1.777 1.777 1.777 1.777 1.777 1.777 1.777 3.462
        32k: 1.777 1.777 1.777 1.777 1.777 1.777 1.778 3.465
        64k: 1.778 1.777 1.778 1.777 1.778 1.778 1.778 3.466
       128k: 5.330 5.332 5.330 5.332 5.330 6.108 7.550 13.47
       256k: 6.800 6.810 6.847 6.804 6.843 6.783 8.103 14.24
       512k: 10.52 10.15 10.31 10.15 10.40 10.69 12.32 18.76
      1024k: 17.99 17.97 17.98 17.97 17.98 18.10 20.05 29.84
      2048k: 22.03 20.17 20.96 20.30 20.97 20.47 22.68 34.07
      4096k: 81.57 60.44 66.90 52.04 78.90 52.95 53.36 69.04
      8192k: 131.7 110.8 121.7 109.4 128.1 102.9 100.6 106.5
     16384k: 147.4 137.0 143.6 133.5 148.3 130.9 125.4 128.1
     32768k: 152.3 150.2 153.2 152.8 150.2 146.5 145.2 135.2
     65536k: 159.6 156.9 158.3 154.8 159.7 147.9 147.3 146.7
    131072k: 160.6 155.9 158.5 148.5 158.6 147.3 153.0 154.7

##########################################################################

Executing benchmark on each cluster individually

OpenSSL 3.0.2, built on 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc     157907.60k   466907.84k   913189.29k  1211602.94k  1335563.61k  1345901.91k (Cortex-A55)
aes-128-cbc     637695.14k  1269587.78k  1627369.30k  1741073.41k  1785804.12k  1790377.98k (Cortex-A76)
aes-128-cbc     636491.22k  1270735.13k  1630370.90k  1745556.48k  1790883.16k  1795375.10k (Cortex-A76)
aes-192-cbc     150707.76k   417121.37k   746944.17k   933191.00k  1005458.77k  1011433.47k (Cortex-A55)
aes-192-cbc     593616.55k  1116556.37k  1378340.10k  1448504.66k  1489854.46k  1492833.62k (Cortex-A76)
aes-192-cbc     595237.65k  1113902.61k  1381619.88k  1452472.66k  1493748.39k  1496771.24k (Cortex-A76)
aes-256-cbc     146073.90k   382410.11k   645469.95k   780004.01k   829835.95k   833869.14k (Cortex-A55)
aes-256-cbc     575753.51k   986583.40k  1192946.60k  1254034.43k  1277599.74k  1279923.54k (Cortex-A76)
aes-256-cbc     574328.99k   989858.52k  1196424.45k  1257626.28k  1281086.81k  1283407.87k (Cortex-A76)

##########################################################################

Executing benchmark single-threaded on cpu0 (Cortex-A55)

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,8 CPUs LE)

LE
CPU Freq: - - - - 128000000 256000000 - - -

RAM size:   15964 MB,  # CPU hardware threads:   8
RAM usage:    435 MB,  # Benchmark threads:      1

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       1262   100   1228   1228  |      21180   100   1808   1808
23:       1185   100   1208   1208  |      21006   100   1818   1818
24:       1162   100   1250   1250  |      20577   100   1807   1807
25:       1116   100   1274   1274  |      20023   100   1782   1782
----------------------------------  | ------------------------------
Avr:             100   1240   1240  |              100   1804   1804
Tot:             100   1522   1522

Executing benchmark single-threaded on cpu4 (Cortex-A76)

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,8 CPUs LE)

LE
CPU Freq: - - - - - - - - -

RAM size:   15964 MB,  # CPU hardware threads:   8
RAM usage:    435 MB,  # Benchmark threads:      1

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       2773   100   2698   2698  |      36511   100   3117   3117
23:       2589   100   2638   2638  |      36036   100   3119   3119
24:       2459   100   2644   2644  |      35162   100   3087   3087
25:       2359   100   2694   2694  |      34188   100   3043   3043
----------------------------------  | ------------------------------
Avr:             100   2669   2669  |              100   3092   3092
Tot:             100   2880   2880

Executing benchmark single-threaded on cpu6 (Cortex-A76)

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,8 CPUs LE)

LE
CPU Freq: - - - - - - - - 2048000000

RAM size:   15964 MB,  # CPU hardware threads:   8
RAM usage:    435 MB,  # Benchmark threads:      1

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       2705   100   2633   2632  |      36740   100   3137   3137
23:       2604   100   2654   2653  |      36133   100   3128   3128
24:       2495   100   2683   2683  |      35325   100   3101   3101
25:       2375   100   2712   2712  |      34377   100   3060   3060
----------------------------------  | ------------------------------
Avr:             100   2670   2670  |              100   3107   3106
Tot:             100   2888   2888

##########################################################################

Executing benchmark 3 times multi-threaded on CPUs 0-7

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,8 CPUs LE)

LE
CPU Freq: - 64000000 - - - - - - -

RAM size:   15964 MB,  # CPU hardware threads:   8
RAM usage:   1765 MB,  # Benchmark threads:      8

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:      15096   751   1955  14686  |     198449   676   2505  16927
23:      14164   728   1983  14431  |     193292   675   2479  16727
24:      13587   748   1953  14609  |     188499   676   2449  16544
25:      13055   783   1904  14907  |     182935   674   2414  16281
----------------------------------  | ------------------------------
Avr:             753   1949  14658  |              675   2462  16620
Tot:             714   2205  15639

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,8 CPUs LE)

LE
CPU Freq: - - - - - - - - -

RAM size:   15964 MB,  # CPU hardware threads:   8
RAM usage:   1765 MB,  # Benchmark threads:      8

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:      15367   766   1953  14950  |     197978   675   2503  16887
23:      14125   714   2016  14392  |     193111   674   2478  16711
24:      13829   770   1931  14870  |     188669   676   2449  16559
25:      13023   772   1927  14870  |     183297   676   2413  16313
----------------------------------  | ------------------------------
Avr:             755   1957  14770  |              675   2461  16617
Tot:             715   2209  15694

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,8 CPUs LE)

LE
CPU Freq: - - - - - - - - -

RAM size:   15964 MB,  # CPU hardware threads:   8
RAM usage:   1765 MB,  # Benchmark threads:      8

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:      15088   745   1969  14678  |     198129   677   2495  16900
23:      14147   731   1971  14415  |     192542   673   2475  16662
24:      13825   767   1937  14865  |     188253   675   2448  16523
25:      12849   760   1932  14672  |     183105   676   2410  16296
----------------------------------  | ------------------------------
Avr:             751   1952  14657  |              675   2457  16595
Tot:             713   2205  15626

Compression: 14658,14770,14657
Decompression: 16620,16617,16595
Total: 15639,15694,15626


# 测试真实频率
##########################################################################

Testing maximum cpufreq again, still under full load. System health now:

Time       cpu0/cpu4/cpu6    load %cpu %sys %usr %nice %io %irq   Temp
23:59:32: 1800/2256/2256MHz  8.45  88%   1%  87%   0%   0%   0%  61.0°C

Checking cpufreq OPP for cpu0-cpu3 (Cortex-A55):

Cpufreq OPP: 1800    Measured: 1777 (1777.984/1777.940/1776.963)     (-1.3%)

Checking cpufreq OPP for cpu4-cpu5 (Cortex-A76):

Cpufreq OPP: 2256    Measured: 2240 (2240.683/2240.683/2240.375)

Checking cpufreq OPP for cpu6-cpu7 (Cortex-A76):

Cpufreq OPP: 2256    Measured: 2247 (2247.757/2247.673/2247.616)



# 内存频率测试
##########################################################################

DRAM clock transitions since last boot (52833430 ms ago):

/sys/devices/platform/dmc/devfreq/dmc:

     From  :   To
           : 534000000132000000019680000002400000000   time(ms)
  534000000:         0         0         0    796212   7785276
 1320000000:       183         0         0      1352     11516
 1968000000:         5        59         0       302      2696
*2400000000:    796024      1476       366         0  45032793
Total transition : 1595979


# 温度表现还好
##########################################################################

Thermal source: /sys/devices/virtual/thermal/thermal_zone0/ (soc-thermal)

System health while running tinymembench:

Time       cpu0/cpu4/cpu6    load %cpu %sys %usr %nice %io %irq   Temp
23:45:55: 1800/2256/2256MHz  1.94   0%   0%   0%   0%   0%   0%  44.4°C
23:46:25: 1800/2256/2256MHz  1.96  12%   0%  12%   0%   0%   0%  47.2°C
23:46:55: 1800/2256/2256MHz  2.28  16%   0%  13%   1%   0%   0%  49.0°C
23:47:25: 1800/2256/2256MHz  2.17  13%   0%  12%   0%   0%   0%  50.8°C
23:47:55: 1800/2256/2256MHz  2.34  13%   0%  12%   0%   0%   0%  54.5°C
23:48:26: 1800/2256/2256MHz  2.21  13%   0%  12%   0%   0%   0%  54.5°C
23:48:56: 1800/2256/2256MHz  2.12  13%   0%  12%   0%   0%   0%  57.3°C

System health while running ramlat:

Time       cpu0/cpu4/cpu6    load %cpu %sys %usr %nice %io %irq   Temp
23:49:17: 1800/2256/2256MHz  2.09   0%   0%   0%   0%   0%   0%  54.5°C
23:49:26: 1800/2256/2256MHz  2.07  12%   0%  12%   0%   0%   0%  51.8°C
23:49:35: 1800/2256/2256MHz  2.14  16%   0%  12%   3%   0%   0%  52.7°C
23:49:44: 1800/2256/2256MHz  2.12  13%   0%  12%   0%   0%   0%  51.8°C
23:49:53: 1800/2256/2256MHz  2.11  12%   0%  12%   0%   0%   0%  50.8°C
23:50:02: 1800/2256/2256MHz  2.09  13%   0%  12%   0%   0%   0%  50.8°C
23:50:11: 1800/2256/2256MHz  2.08  13%   0%  12%   0%   0%   0%  50.8°C
23:50:20: 1800/2256/2256MHz  2.07  13%   0%  12%   0%   0%   0%  51.8°C
23:50:29: 1800/2256/2256MHz  2.06  13%   0%  12%   0%   0%   0%  50.8°C
23:50:38: 1800/2256/2256MHz  2.05  13%   0%  12%   0%   0%   0%  51.8°C
23:50:47: 1800/2256/2256MHz  2.19  13%   0%  12%   0%   0%   0%  50.8°C

System health while running OpenSSL benchmark:

Time       cpu0/cpu4/cpu6    load %cpu %sys %usr %nice %io %irq   Temp
23:50:53: 1800/2256/2256MHz  2.17   0%   0%   0%   0%   0%   0%  51.8°C
23:51:09: 1800/2256/2256MHz  2.12  12%   0%  12%   0%   0%   0%  49.9°C
23:51:25: 1800/2256/2256MHz  2.10  12%   0%  12%   0%   0%   0%  50.8°C
23:51:41: 1800/2256/2256MHz  2.07  13%   0%  12%   0%   0%   0%  50.8°C
23:51:57: 1800/2256/2256MHz  2.06  12%   0%  12%   0%   0%   0%  49.9°C
23:52:13: 1800/2256/2256MHz  2.04  12%   0%  12%   0%   0%   0%  50.8°C
23:52:29: 1800/2256/2256MHz  2.03  12%   0%  12%   0%   0%   0%  50.8°C
23:52:45: 1800/2256/2256MHz  2.02  12%   0%  12%   0%   0%   0%  49.9°C
23:53:01: 1800/2256/2256MHz  2.02  12%   0%  12%   0%   0%   0%  50.8°C
23:53:17: 1800/2256/2256MHz  2.01  12%   0%  12%   0%   0%   0%  50.8°C
23:53:33: 1800/2256/2256MHz  2.01  12%   0%  12%   0%   0%   0%  49.9°C

System health while running 7-zip single core benchmark:

Time       cpu0/cpu4/cpu6    load %cpu %sys %usr %nice %io %irq   Temp
23:53:35: 1800/2256/2256MHz  2.01   0%   0%   0%   0%   0%   0%  51.8°C
23:53:45: 1800/2256/2256MHz  2.01  13%   0%  12%   0%   0%   0%  49.9°C
23:53:55: 1800/2256/2256MHz  2.00  13%   0%  12%   0%   0%   0%  49.9°C
23:54:05: 1800/2256/2256MHz  2.00  13%   0%  12%   0%   0%   0%  49.9°C
23:54:15: 1800/2256/2256MHz  2.00  13%   0%  12%   0%   0%   0%  49.0°C
23:54:25: 1800/2256/2256MHz  2.00  13%   0%  12%   0%   0%   0%  49.0°C
23:54:35: 1800/2256/2256MHz  2.00  13%   0%  12%   0%   0%   0%  49.0°C
23:54:45: 1800/2256/2256MHz  2.00  13%   0%  12%   0%   0%   0%  49.0°C
23:54:55: 1800/2256/2256MHz  2.00  13%   0%  12%   0%   0%   0%  49.0°C
23:55:05: 1800/2256/2256MHz  2.00  12%   0%  12%   0%   0%   0%  49.0°C
23:55:15: 1800/2256/2256MHz  2.00  13%   0%  12%   0%   0%   0%  49.0°C
23:55:25: 1800/2256/2256MHz  2.00  12%   0%  12%   0%   0%   0%  50.8°C
23:55:35: 1800/2256/2256MHz  2.00  12%   0%  12%   0%   0%   0%  50.8°C
23:55:45: 1800/2256/2256MHz  2.00  13%   0%  12%   0%   0%   0%  50.8°C
23:55:55: 1800/2256/2256MHz  2.00  12%   0%  12%   0%   0%   0%  50.8°C
23:56:05: 1800/2256/2256MHz  2.00  12%   0%  12%   0%   0%   0%  50.8°C
23:56:15: 1800/2256/2256MHz  2.00  12%   0%  12%   0%   0%   0%  50.8°C
23:56:26: 1800/2256/2256MHz  2.00  12%   0%  12%   0%   0%   0%  50.8°C
23:56:36: 1800/2256/2256MHz  2.00  13%   0%  12%   0%   0%   0%  50.8°C
23:56:46: 1800/2256/2256MHz  2.00  13%   0%  12%   0%   0%   0%  50.8°C

System health while running 7-zip multi core benchmark:

Time       cpu0/cpu4/cpu6    load %cpu %sys %usr %nice %io %irq   Temp
23:56:54: 1800/2256/2256MHz  2.00   0%   0%   0%   0%   0%   0%  56.4°C
23:57:04: 1800/2256/2256MHz  3.08  87%   0%  87%   0%   0%   0%  57.3°C
23:57:14: 1800/2256/2256MHz  3.55  87%   0%  86%   0%   0%   0%  59.2°C
23:57:25: 1800/2256/2256MHz  4.48  89%   1%  88%   0%   0%   0%  62.8°C
23:57:35: 1800/2256/2256MHz  5.56  79%   1%  78%   0%   0%   0%  61.9°C
23:57:45: 1800/2256/2256MHz  5.85  90%   1%  89%   0%   0%   0%  60.1°C
23:57:55: 1800/2256/2256MHz  6.26  92%   0%  91%   0%   0%   0%  60.1°C
23:58:05: 1800/2256/2256MHz  6.83  86%   0%  85%   0%   0%   0%  63.8°C
23:58:19: 1800/2256/2256MHz  7.24  83%   1%  82%   0%   0%   0%  63.8°C
23:58:29: 1800/2256/2256MHz  7.59  82%   1%  80%   0%   0%   0%  61.9°C
23:58:39: 1800/2256/2256MHz  7.96  89%   0%  88%   0%   0%   0%  61.0°C
23:58:49: 1800/2256/2256MHz  7.80  92%   0%  91%   0%   0%   0%  61.9°C
23:58:59: 1800/2256/2256MHz  7.98  86%   0%  85%   0%   0%   0%  63.8°C
23:59:12: 1800/2256/2256MHz  8.29  83%   1%  82%   0%   0%   0%  63.8°C
23:59:22: 1800/2256/2256MHz  8.27  82%   1%  80%   0%   0%   0%  62.8°C
23:59:32: 1800/2256/2256MHz  8.45  88%   1%  87%   0%   0%   0%  61.0°C



##########################################################################
SoC guess: Rockchip RK3588 (35881000)
  DMC gov: dmc_ondemand (upthreshold: 40, downdifferential: 20)
DT compat: rockchip,rk3588-orangepi-5-ultra
           rockchip,rk3588
 Compiler: /usr/bin/gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 / aarch64-linux-gnu
 Userland: arm64
   Kernel: 6.1.43-rockchip-rk3588/aarch64
           CONFIG_HZ=300
           CONFIG_HZ_300=y
           CONFIG_PREEMPT_NOTIFIERS=y
           CONFIG_PREEMPT_VOLUNTARY=y
           CONFIG_PREEMPT_VOLUNTARY_BUILD=y
           rockchip-vop2 fdd90000.vop: leakage=23
           rockchip-vop2 fdd90000.vop: leakage-volt-sel=0
           cpu cpu0: leakage=8
           cpu cpu0: pvtm=1372
           cpu cpu0: pvtm-volt-sel=0
           cpu cpu4: leakage=7
           cpu cpu4: pvtm=1586
           cpu cpu4: pvtm-volt-sel=0
           cpu cpu6: leakage=7
           cpu cpu6: pvtm=1600
           cpu cpu6: pvtm-volt-sel=1
           mpp_rkvenc2 fdbd0000.rkvenc-core: leakage=8
           mpp_rkvenc2 fdbd0000.rkvenc-core: leakage-volt-sel=0
           mpp_rkvenc2 fdbe0000.rkvenc-core: leakage=8
           mpp_rkvenc2 fdbe0000.rkvenc-core: leakage-volt-sel=0
           mali fb000000.gpu: leakage=11
           rockchip-dmc dmc: leakage=23
           rockchip-dmc dmc: leakage-volt-sel=0
           RKNPU fdab0000.npu: leakage=5

##########################################################################

   vdd_cpu_big0_s0: 1000 mV (1050 mV max)
   vdd_cpu_big1_s0: 1000 mV (1050 mV max)
   vdd_npu_s0: 700 mV (950 mV max)

   cluster0-opp-table:
       408 MHz    675.0 mV (00f9 ffff)
       408 MHz    750.0 mV (0006 ffff)
       600 MHz    675.0 mV (00f9 ffff)
       600 MHz    750.0 mV (0006 ffff)
       816 MHz    675.0 mV (00f9 ffff)
       816 MHz    750.0 mV (0006 ffff)
      1008 MHz    675.0 mV (00f9 ffff)
      1008 MHz    750.0 mV (0006 ffff)
      1200 MHz    712.5 mV (00f9 ffff)
      1200 MHz    750.0 mV (0006 ffff)
      1296 MHz    750.0 mV (0004 ffff)
      1416 MHz    750.0 mV (0006 ffff)
      1416 MHz    762.5 mV (00f9 ffff)
      1608 MHz    850.0 mV (00f9 ffff)
      1608 MHz    887.5 mV (0006 ffff)
      1704 MHz    937.5 mV (0006 ffff)
      1800 MHz    950.0 mV (00f9 ffff)

   cluster1-opp-table:
       408 MHz    675.0 mV (00f9 ffff)
       408 MHz    750.0 mV (0006 ffff)
       600 MHz    675.0 mV (00f9 ffff)
       600 MHz    750.0 mV (0006 ffff)
       816 MHz    675.0 mV (00f9 ffff)
       816 MHz    750.0 mV (0006 ffff)
      1008 MHz    675.0 mV (00f9 ffff)
      1008 MHz    750.0 mV (0006 ffff)
      1200 MHz    675.0 mV (00f9 ffff)
      1200 MHz    750.0 mV (0006 ffff)
      1416 MHz    725.0 mV (00f9 ffff)
      1416 MHz    750.0 mV (0006 ffff)
      1608 MHz    762.5 mV (00f9 ffff)
      1608 MHz    787.5 mV (0006 ffff)
      1800 MHz    850.0 mV (00f9 ffff)
      1800 MHz    875.0 mV (0006 ffff)
      2016 MHz    925.0 mV (00f9 ffff)
      2016 MHz    950.0 mV (0006 ffff)
      2208 MHz    987.5 mV (00f9 ffff)
      2256 MHz   1000.0 mV (00f9 0013)
      2304 MHz   1000.0 mV (00f9 0024)
      2352 MHz   1000.0 mV (00f9 0048)
      2400 MHz   1000.0 mV (00f9 0080)

   cluster2-opp-table:
       408 MHz    675.0 mV (00f9 ffff)
       408 MHz    750.0 mV (0006 ffff)
       600 MHz    675.0 mV (00f9 ffff)
       600 MHz    750.0 mV (0006 ffff)
       816 MHz    675.0 mV (00f9 ffff)
       816 MHz    750.0 mV (0006 ffff)
      1008 MHz    675.0 mV (00f9 ffff)
      1008 MHz    750.0 mV (0006 ffff)
      1200 MHz    675.0 mV (00f9 ffff)
      1200 MHz    750.0 mV (0006 ffff)
      1416 MHz    725.0 mV (00f9 ffff)
      1416 MHz    750.0 mV (0006 ffff)
      1608 MHz    762.5 mV (00f9 ffff)
      1608 MHz    787.5 mV (0006 ffff)
      1800 MHz    850.0 mV (00f9 ffff)
      1800 MHz    875.0 mV (0006 ffff)
      2016 MHz    925.0 mV (00f9 ffff)
      2016 MHz    950.0 mV (0006 ffff)
      2208 MHz    987.5 mV (00f9 ffff)
      2256 MHz   1000.0 mV (00f9 0013)
      2304 MHz   1000.0 mV (00f9 0024)
      2352 MHz   1000.0 mV (00f9 0048)
      2400 MHz   1000.0 mV (00f9 0080)

   dmc-opp-table:
       528 MHz    675.0 mV (00f9 ffff)
       528 MHz    750.0 mV (0006 ffff)
      1068 MHz    725.0 mV (00f9 ffff)
      1068 MHz    750.0 mV (0006 ffff)
      1560 MHz    800.0 mV (0006 ffff)
      1560 MHz    800.0 mV (00f9 ffff)
      2750 MHz    875.0 mV (0006 ffff)
      2750 MHz    875.0 mV (00f9 ffff)

   gpu-opp-table:
       300 MHz    675.0 mV (00f9 ffff)
       300 MHz    750.0 mV (0006 ffff)
       400 MHz    675.0 mV (00f9 ffff)
       400 MHz    750.0 mV (0006 ffff)
       500 MHz    675.0 mV (00f9 ffff)
       500 MHz    750.0 mV (0006 ffff)
       600 MHz    675.0 mV (00f9 ffff)
       600 MHz    750.0 mV (0006 ffff)
       700 MHz    700.0 mV (00f9 ffff)
       700 MHz    750.0 mV (0006 ffff)
       800 MHz    750.0 mV (0002 ffff)
       800 MHz    750.0 mV (00f9 ffff)
       850 MHz    787.5 mV (0004 ffff)
       900 MHz    800.0 mV (0002 ffff)
       900 MHz    800.0 mV (00f9 ffff)
      1000 MHz    850.0 mV (0002 ffff)
      1000 MHz    850.0 mV (00f9 ffff)

   npu-opp-table:
       300 MHz    700.0 mV (00f9 ffff)
       300 MHz    750.0 mV (0006 ffff)
       400 MHz    700.0 mV (00f9 ffff)
       400 MHz    750.0 mV (0006 ffff)
       500 MHz    700.0 mV (00f9 ffff)
       500 MHz    750.0 mV (0006 ffff)
       600 MHz    700.0 mV (00f9 ffff)
       600 MHz    750.0 mV (0006 ffff)
       700 MHz    700.0 mV (00f9 ffff)
       700 MHz    750.0 mV (0006 ffff)
       800 MHz    750.0 mV (0006 ffff)
       800 MHz    750.0 mV (00f9 ffff)
       900 MHz    800.0 mV (00f9 ffff)
       950 MHz    837.5 mV (0006 ffff)
      1000 MHz    850.0 mV (00f9 ffff)

   venc-opp-table:
       800 MHz    750.0 mV

   vop-opp-table:
       500 MHz    725.0 mV
       750 MHz    725.0 mV
       850 MHz    800.0 mV
4.2.2 pcie 设备

使用lspci看不到多少 pcie 设备,这个比较可惜。看来 ESXi 只能直通固态、两个板载网卡还有 WiFi 网卡位置的 PCIe 2.0x1,3588 强大的 GPU 和 NPU 以及其他 IO 被完全浪费了。

lspci -nnk

0000:00:00.0 PCI bridge [0604]: Rockchip Electronics Co., Ltd Device [1d87:3588] (rev 01)
        Kernel driver in use: pcieport
0000:01:00.0 Non-Volatile memory controller [0108]: Shenzhen Longsys Electronics Co., Ltd. Device [1d97:5236] (rev 01)
        Subsystem: Shenzhen Longsys Electronics Co., Ltd. Device [1d97:5236]
        Kernel driver in use: nvme
0003:30:00.0 PCI bridge [0604]: Rockchip Electronics Co., Ltd Device [1d87:3588] (rev 01)
        Kernel driver in use: pcieport
0003:31:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
        Subsystem: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125]
        Kernel driver in use: r8169
        Kernel modules: r8169
4.2.3 Sysbench 测试

Sysbench是一个开源的、模块化的、跨平台的多线程性能测试工具,可以用来进行CPU、内存、磁盘I/O、线程、数据库的性能测试。

安装使用命令 sudo apt-get install sysbench
测试CPU命令 sysbench cpu run

其中 CPU speed: events per second 是衡量 CPU 速度的指标。

在单线程下:2512.46

sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  2512.46

General statistics:
    total time:                          10.0004s
    total number of events:              25131

Latency (ms):
         min:                                    0.39
         avg:                                    0.40
         max:                                    2.43
         95th percentile:                        0.42
         sum:                                 9995.00

Threads fairness:
    events (avg/stddev):           25131.0000/0.00
    execution time (avg/stddev):   9.9950/0.00
内容概要:本文介绍了一种利用元启发式算法(如粒子群优化,PSO)优化线性二次调节器(LQR)控制器加权矩阵的方法,专门针对复杂的四级倒立摆系统。传统的LQR控制器设计中,加权矩阵Q的选择往往依赖于经验和试错,而这种方法难以应对高维度非线性系统的复杂性。文中详细描述了如何将控制器参数优化问题转化为多维空间搜索问题,并通过MATLAB代码展示了具体实施步骤。关键点包括:构建非线性系统的动力学模型、设计适应度函数、采用对数缩放技术避免局部最优、以及通过实验验证优化效果。结果显示,相比传统方法,PSO优化后的LQR控制器不仅提高了稳定性,还显著减少了最大控制力,同时缩短了稳定时间。 适合人群:控制系统研究人员、自动化工程专业学生、从事机器人控制或高级控制算法开发的技术人员。 使用场景及目标:适用于需要精确控制高度动态和不确定性的机械系统,特别是在处理多自由度、强耦合特性的情况下。目标是通过引入智能化的参数寻优手段,改善现有控制策略的效果,降低人为干预的需求,提高系统的鲁棒性和性能。 其他说明:文章强调了在实际应用中应注意的问题,如避免过拟合、考虑硬件限制等,并提出了未来研究方向,例如探索非对角Q矩阵的可能性。此外,还分享了一些实践经验,如如何处理高频抖动现象,以及如何结合不同类型的元启发式算法以获得更好的优化结果。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值