TensorRT优化和Jetson TX2的性能优化


1.TensorRT优化原理

在这里插入图片描述

TensorRT加速DL Inference的能力来源于优化器和运行时,其优化原理包括四个方面:

  • Layer & Tensor fusion: 该部分将网络中的convolution、bias和ReLU层进行融合,调用一个统一的kernel进行处理,让kernel lauch时间减少,实现加速。此外,还会消除一些output未被使用的层、聚合一些相似的参数和相同的源张量。
  • Mix precision:使用混合精度,降低数据的大小,减少计算量。例如:使用FP32、FP16、INT8精度。
  • kernel auto-tuning:基于采用的硬件平台、输入的参数(如:workspace size、segment size等)选择一些layer的最优算法,比如不同卷积的算法,另外还会根据硬件特性,自动选择GPU上的kernel或者tensor core等。
  • Dynamic tensor memory:tensorrt在运行中会申请一块memory,最大限度的重复利用此内存,让计算变得高效。

为了使TensorRT的性能最大化,可以参考 【TensorRT性能优化指南】 来实现更快的加速。

1) TensorRT目前支持的Layer Fusions

下面列出了TensorRT目前支持的层融合,我们在编写网络时可以尽可能地使用下面网络层,来实现TensorRT层融合加速。

Convolution and ReLU Activation
卷积层可以是任何类型,并且对值没有限制。激活层必须是ReLU类型

FullyConnected and ReLU Activation
FullyConnected图层没有限制。激活层必须是ReLU类型。

Scale and Activation
如果Scale层后面是激活层,则Scale层可以融合到单个激活层中。

Convolution And ElementWise Sum
如果卷积层后面是ElementWise层的sum计算,则这个sum可以融合到卷积层中。

Shuffle and Reduce
对于没有进行reshape的Shuffle层,如果后面跟着Reduce层,则可以被融合进一个单独Reduce层。Shuffle layer可以进行permutations操作,但是不能进行任何reshape操作,Reduce层必须具有keepDimensions set.

Shuffle and Shuffle
每个Shuffle层由transpose, a reshape, and a second transpose这些操作组成。如果一个Shuffle layer后面还跟着一个Shuffle layer,那么这两层可以替换为单个Shuffle layer。如果两个Shuffle层都执行reshape操作,仅当第一个shuffle的second transpose是第二个shuffle的第一个transpose的倒数时才允许融合。

Scale
对于加0,乘以1的Scale层可以被剔除。

Convolution and Scale
如果一个卷积层后面跟着一个kUNIFORM or kCHANNEL的Scale层,那么可以通过调整卷积的权重来融合这两层。

Reduce
执行平均池化操作的Reduce层将替换为Pooling层,Reduce图层必须设置keepDimensions。

下图为进行TensorRT层融合的网络:
在这里插入图片描述

2) 使用Batch和混合精度

1)在GPU上使用较大的batch几乎总是更有效,batch的作用在于能尽可能多地并行计算。
例如:FullyConnected图层有V个输入和 K个输出,对于一个batch的实例,可以实现为 1xV的input矩阵乘以VxK的weight矩阵。如果是N个batch的实例,这就可以实现为NxV乘以 VxK矩阵。将向量-矩阵乘法变为矩阵-矩阵乘法,效率更高。
此外,当网络包含MatrixMultiply图层或FullyConnected图层时,如果硬件支持Tensor Core,对于FP16和INT8 Inference,将batch大小设置为32的倍数往往具有最佳性能。
2)使用混合精度,降低数据的大小,减少计算量。例如:使用FP32、FP16、INT8精度。FP32->FP16的转换,数据大小基本会缩减一半。

3) kernel auto-tuning

在进行TensorRT优化的过程中,TensorRT会基于当前的GPU计算能力、GPU缓存、SM数量、硬件平台信息、以及构建engine时设置的参数(workspace size、segment size等)进行auto-tuning。这包括利用workspace空间不断尝试一些layer的最优算法,如选择最优的卷积算法。另外还会根据硬件特性,自动选择GPU上的kernel或者tensor core等。


2. 影响TensorRT优化的因素

在进行上一节中四个方面的TensorRT优化后,我们要考虑一下影响TensorRT优化的因素有哪些?

  1. TensorRT的版本
  2. CUDA的版本
  3. 构建engine时设置的builder参数:
  • workspaceSize
  • BatchSize
  • 数据精度
  • segment size
  1. 当前系统的性能
  • 最高GPU图形时钟速度
  • 最大GPU内存时钟速度
  • GPU内存总线宽度
  • 总GPU内存
  • GPU L2缓存大小
  • SM处理器数量
  • 异步引擎计数

这里前三点是构建TensorRT engine的基本要素,这里先不做讲解,我们来关注下第四点,这是我们构建engine时会时常忽略的因素。
在构建engine时,当前系统的性能会影响生成engine的推理性能。

为了验证这个观点,在TX2做了如下测试:
1.在TX2上使用sudo nvpmodel -m 3命令,设置默认的性能模式,然后生成engine,测量inference时间。
2.在TX2上使用sudo nvpmodel -m 0命令,设置MAXN性能模式,然后生成engine,测量inference时间。
3.在TX2上执行脚本jetson_clock.sh,锁定GPU、CPU为最大频率,然后生成engine,测量inference时间。

测试结果:
1.在默认性能模式下生成的retinanet engine,测出lantency为:30ms
2.MAXN模式:MAX FREQ被设置为GPU能达到的最大值,并且GPU会根据使用自动调频。在此模式下生成retinanet engine,测出lantency为:26ms
在这里插入图片描述
3.设置CPU, GPU and EMC的clocks,锁定为最大值,在此状态下生成的retinanet engine,测出lantency为:20ms
在这里插入图片描述

结果分析:
在TensorRT构建engine时,kernel auto-tuning会根据当前系统的硬件信息以及性能指标信息不断尝试一些layer的最优算法和kernel的最优选取。因此当前系统性能不佳的情况下,会对kernel auto-tuning的效果产生影响,进而影响生成engine的inference速度。从测试结果来看,当前系统性能对engine有着20%~30%的性能影响,这是我们不能忽略的一部分。
结论:
当我们构建engine时,最大化当前系统的性能能够使engine的inference更快,这种加速有可能达到20%~30%的性能提升。


3. 如何最大化系统的性能(TX2)

对于Jetson系列的板卡可以参考: 【Jetson开发指南-Clock Frequency and Power Management】

1)Nvidia为Jetson系列的板卡提供了一套性能模式设置方案——nvpmodel。
在TX2上,nvpmodel定义了板卡上的CPU数量及其时钟频率,GPU频率和外部存储器控制器(EMC)频率。EMC控制对外部LPDDR4内存的访问速度。TX2上提供了五种模式可供选择,这些模式定义在/etc/nvpmodel.conf文件中。
五种mode如下,分别对应不同的性能,这些具体的freq值可以自己在/etc/nvpmodel.conf中设定。
在这里插入图片描述

我们来分析一下/etc/nvpmodel.conf文件:

# Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved.
#
# FORMAT:
# < PARAM TYPE=PARAM_TYPE NAME=PARAM_NAME >
# ARG1_NAME ARG1_PATH_VAL
# ARG2_NAME ARG2_PATH_VAL
# ...
# This starts a section of PARAM definitions, in which each line
# has the syntax below:
# ARG_NAME ARG_PATH_VAL
# ARG_NAME is a macro name for argument value ARG_PATH_VAL.
# PARAM_TYPE can be FILE, or CLOCK.
#
# < POWER_MODEL ID=id_num NAME=mode_name >
# PARAM1_NAME ARG11_NAME ARG11_VAL
# PARAM1_NAME ARG12_NAME ARG12_VAL
# PARAM2_NAME ARG21_NAME ARG21_VAL
# ...
# This starts a section of POWER_MODEL configurations, followed by
# lines with parameter settings as the format below:
# PARAM_NAME ARG_NAME ARG_VAL
# PARAM_NAME and ARG_NAME are defined in PARAM definition sections.
# ARG_VAL is an integer for PARAM_TYPE of CLOCK, and -1 is taken
# as INT_MAX. ARG_VAL is a string for PARAM_TYPE of FILE.
# This file must contain at least one POWER_MODEL section.
#
# < PM_CONFIG DEFAULT=default_mode >
# This is a mandatory section to specify one of the defined power
# model as the default.
#
#add KNEXT path node to support current kernel and next kernel simultaneously
#since some node may change for different kernel version

###########################
#                         #
# PARAM DEFINITIONS       #
#                         #
###########################

< PARAM TYPE=FILE NAME=CPU_ONLINE >
CORE_0 /sys/devices/system/cpu/cpu0/online
CORE_1 /sys/devices/system/cpu/cpu1/online
CORE_2 /sys/devices/system/cpu/cpu2/online
CORE_3 /sys/devices/system/cpu/cpu3/online
CORE_4 /sys/devices/system/cpu/cpu4/online
CORE_5 /sys/devices/system/cpu/cpu5/online

< PARAM TYPE=CLOCK NAME=CPU_A57 >
FREQ_TABLE /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
MAX_FREQ /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
MIN_FREQ /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
FREQ_TABLE_KNEXT /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
MAX_FREQ_KNEXT /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
MIN_FREQ_KNEXT /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq

< PARAM TYPE=CLOCK NAME=CPU_DENVER >
FREQ_TABLE /sys/devices/system/cpu/cpu1/cpufreq/scaling_available_frequencies
MAX_FREQ /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
MIN_FREQ /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
FREQ_TABLE_KNEXT /sys/devices/system/cpu/cpu1/cpufreq/scaling_available_frequencies
MAX_FREQ_KNEXT /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
MIN_FREQ_KNEXT /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq

< PARAM TYPE=CLOCK NAME=GPU >
FREQ_TABLE /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/available_frequencies
MAX_FREQ /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/max_freq
MIN_FREQ /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/min_freq
FREQ_TABLE_KNEXT /sys/devices/17000000.gp10b/devfreq/devfreq0/available_frequencies
MAX_FREQ_KNEXT /sys/devices/17000000.gp10b/devfreq/devfreq0/max_freq
MIN_FREQ_KNEXT /sys/devices/17000000.gp10b/devfreq/devfreq0/min_freq



< PARAM TYPE=CLOCK NAME=EMC >
MAX_FREQ /sys/kernel/nvpmodel_emc_cap/emc_iso_cap
MAX_FREQ_KNEXT /sys/kernel/nvpmodel_emc_cap/emc_iso_cap

###########################
#                         #
# POWER_MODEL DEFINITIONS #
#                         #
###########################

# MAXN is the NONE power model to release all constraints
< POWER_MODEL ID=0 NAME=MAXN >
CPU_ONLINE CORE_1 1
CPU_ONLINE CORE_2 1
CPU_ONLINE CORE_3 1
CPU_ONLINE CORE_4 1
CPU_ONLINE CORE_5 1
CPU_A57 MIN_FREQ 0
CPU_A57 MAX_FREQ -1
CPU_DENVER MIN_FREQ 0
CPU_DENVER MAX_FREQ -1
GPU MIN_FREQ 0
GPU MAX_FREQ -1
EMC MAX_FREQ 0

< POWER_MODEL ID=1 NAME=MAXQ >
CPU_ONLINE CORE_1 0
CPU_ONLINE CORE_2 0
CPU_ONLINE CORE_3 1
CPU_ONLINE CORE_4 1
CPU_ONLINE CORE_5 1
CPU_A57 MIN_FREQ 0
CPU_A57 MAX_FREQ 1200000
GPU MIN_FREQ 0
GPU MAX_FREQ 850000000
EMC MAX_FREQ 1331200000

< POWER_MODEL ID=2 NAME=MAXP_CORE_ALL >
CPU_ONLINE CORE_1 1
CPU_ONLINE CORE_2 1
CPU_ONLINE CORE_3 1
CPU_ONLINE CORE_4 1
CPU_ONLINE CORE_5 1
CPU_A57 MIN_FREQ 0
CPU_A57 MAX_FREQ 1400000
CPU_DENVER MIN_FREQ 0
CPU_DENVER MAX_FREQ 1400000
GPU MIN_FREQ 0
GPU MAX_FREQ 1120000000
EMC MAX_FREQ 1600000000

< POWER_MODEL ID=3 NAME=MAXP_CORE_ARM >
CPU_ONLINE CORE_1 0
CPU_ONLINE CORE_2 0
CPU_ONLINE CORE_3 1
CPU_ONLINE CORE_4 1
CPU_ONLINE CORE_5 1
CPU_A57 MIN_FREQ 0
CPU_A57 MAX_FREQ 2000000
GPU MIN_FREQ 0
GPU MAX_FREQ 1120000000
EMC MAX_FREQ 1600000000

< POWER_MODEL ID=4 NAME=MAXP_CORE_DENVER >
CPU_ONLINE CORE_1 1
CPU_ONLINE CORE_2 0
CPU_ONLINE CORE_3 0
CPU_ONLINE CORE_4 0
CPU_ONLINE CORE_5 0
CPU_A57 MIN_FREQ 0
CPU_A57 MAX_FREQ 345600
CPU_DENVER MIN_FREQ 0
CPU_DENVER MAX_FREQ 2035200
GPU MAX_FREQ 1120000000
EMC MAX_FREQ 1600000000

# mandatory section to configure the default mode
< PM_CONFIG DEFAULT=3 >

该conf文件通过配置系统参数来调整GPU、CPU的MIN频率、MAX频率以及是否disable某个core。

在Linux系统中,通常设置一些内核参数来优化系统性能。如:

/sys/devices/system/cpu/cpu1/cpufreq/该目录有如下文件,通过这些文件来设置CPU运行时的性能:

/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
/sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq
/sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
/sys/devices/system/cpu/cpu1/cpufreq/scaling_available_governors
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governors
/sys/devices/system/cpu/cpu1/cpufreq/scaling_driver
/sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed
/sys/devices/system/cpu/cpu1/cpufreq/affected_cpus
/sys/devices/system/cpu/cpu1/cpufreq/related_cpus
/sys/devices/system/cpu/cpu1/cpufreq/stats
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_cur_freq
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_min_freq
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_max_freq
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_transition_latency

查看当前CPU可获得的frequencies

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
345600 499200 652800 806400 960000 1113600 1267200 1420800 1574400 1728000 1881600 2035200

设置当前CPU frequencies,以及最小、最大频率

# echo 2035200 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq
# echo 2035200 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
# echo 2035200 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

查看当前CPU可获得的governor策略

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
interactive conservative ondemand userspace powersave performance schedutil

设置当前CPU的governor策略

# echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governors

GPU的相关参数设置

/sys/devices/17000000.gp10b/devfreq/17000000.gp10b/该目录有如下文件,通过这些文件来设置GPU运行时的性能:

available_frequencies cur_freq
governor
min_freq
power/
target_freq
uevent
available_governors
device/
max_freq
polling_interval
subsystem/
trans_stat

查看当前GPU可获得的frequencies

# cat /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/available_frequencies
114750000 216750000 318750000 420750000 522750000 624750000 726750000 828750000 930750000 1032750000 1134750000 1236750000 1300500000

设置当前GPU的frequencies、最小freq、最大freq

# echo 1300500000 > /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/cur_freq
# echo 1300500000 > /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/min_freq
# echo 1300500000 > /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/max_freq

查看当前GPU可获得的governor策略

# cat /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/available_governors
wmark_active wmark_simple nvhost_podgov userspace performance simple_ondemand

设置当前GPU的governor策略

echo nvhost_podgov > /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/governors

2)除了nvpmodel,我们可以使用jetson_clock.sh脚本,来设置性能最大化。

# /usr/bin/jetson_clocks --help
Maximize jetson performance by setting static max frequency to CPU, GPU and EMC clocks.
Usage:
jetson_clocks.sh [options]
  options,
  --show             display current settings
  --store [file]     store current settings to a file (default: ${HOME}/l4t_dfs.conf)
  --restore [file]   restore saved settings from a file (default: ${HOME}/l4t_dfs.conf)
  run jetson_clocks.sh without any option to set static max frequency to CPU, GPU and EMC clocks.

--show参数显示当前系统的CPU、GPU、EMC、FAN的信息。

# jetson_clocks --show
SOC family:tegra186  Machine:quill
Online CPUs: 0-5
CPU Cluster Switching: Disabled
cpu0: Online=1 Governor=performance MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c7=0
cpu1: Online=1 Governor=performance MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c6=0 c7=0
cpu2: Online=1 Governor=performance MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c6=0 c7=0
cpu3: Online=1 Governor=performance MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c7=0
cpu4: Online=1 Governor=performance MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c7=0
cpu5: Online=1 Governor=performance MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c7=0
GPU MinFreq=114750000 MaxFreq=1300500000 CurrentFreq=114750000
EMC MinFreq=40800000 MaxFreq=1866000000 CurrentFreq=1866000000 FreqOverride=1
Fan: speed=255
NV Power Mode: MAXN

从上面我们可以看出,即使使用nvpmodel -m 0设置了MAXN模式后,系统性能仍不是最大值,GPU会使用动态电压和频率调节(DFVS)调速器,这时GPU是频率是动态的,不是最大频率。

jetson_clocks.sh可以为当前的nvpmodel模式设置最佳性能。jetson_clocks.sh将时钟值调整为最大值,并禁用动态电压和频率调节(DFVS),并且会调整风扇值使性能最好。当不指定任何参数执行jetson_clocks时,脚本会将CPU, GPU和EMC clocks设置为目前硬件支持的最大频率。

执行jetson_clocks,再次执行jetson_clocks --show,发现GPU CurrentFreq = GPU MaxFreq。

# jetson_clocks
# jetson_clocks --show
SOC family:tegra186  Machine:quill
Online CPUs: 0-5
CPU Cluster Switching: Disabled
cpu0: Online=1 Governor=performance MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c7=0
cpu1: Online=1 Governor=performance MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c6=0 c7=0
cpu2: Online=1 Governor=performance MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c6=0 c7=0
cpu3: Online=1 Governor=performance MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c7=0
cpu4: Online=1 Governor=performance MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c7=0
cpu5: Online=1 Governor=performance MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c7=0
GPU MinFreq=1300500000 MaxFreq=1300500000 CurrentFreq=1300500000
EMC MinFreq=40800000 MaxFreq=1866000000 CurrentFreq=1866000000 FreqOverride=1
Fan: speed=255
NV Power Mode: MAXN


最后,列出jetson_clock.sh代码,对今后其他系统进行性能优化时,可以将此脚本设置的参数项作为参考,来使系统性能最大化。

#!/bin/bash

CONF_FILE=${HOME}/l4t_dfs.conf
RED='\e[0;31m'
GREEN='\e[0;32m'
BLUE='\e[0;34m'
BRED='\e[1;31m'
BGREEN='\e[1;32m'
BBLUE='\e[1;34m'
NC='\e[0m' # No Color

usage()
{
        if [ "$1" != "" ]; then
                echo -e ${RED}"$1"${NC}
        fi

                cat >& 2 <<EOF
Maximize jetson performance by setting static max frequency to CPU, GPU and EMC clocks.
Usage:
jetson_clocks.sh [options]
  options,
  --show             display current settings
  --store [file]     store current settings to a file (default: \${HOME}/l4t_dfs.conf)
  --restore [file]   restore saved settings from a file (default: \${HOME}/l4t_dfs.conf)
  run jetson_clocks.sh without any option to set static max frequency to CPU, GPU and EMC clocks.
EOF

        exit 0
}

restore()
{
        for conf in `cat "${CONF_FILE}"`; do
                file=`echo $conf | cut -f1 -d :`
                data=`echo $conf | cut -f2 -d :`
                case "${file}" in
                        /sys/devices/system/cpu/cpu*/online |\
                        /sys/kernel/debug/clk/override*/state)
                                if [ `cat $file` -ne $data ]; then
                                        echo "${data}" > "${file}"
                                fi
                                ;;
                        /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq)
                                echo "${data}" > "${file}" 2>/dev/null
                                ;;
                        *)
                                echo "${data}" > "${file}"
                                ret=$?
                                if [ ${ret} -ne 0 ]; then
                                        echo "Error: Failed to restore $file"
                                fi
                                ;;
                esac
        done
}

store()
{
        for file in $@; do
                if [ -e "${file}" ]; then
                        echo "${file}:`cat ${file}`" >> "${CONF_FILE}"
                fi
        done
}

do_nvpmodel()
{
        case "${ACTION}" in
                show)
                        NVPMODEL_BIN="/usr/sbin/nvpmodel"
                        NVPMODEL_CONF="/etc/nvpmodel.conf"
                        if [ -e "${NVPMODEL_BIN}" ]; then
                                if [ -e "${NVPMODEL_CONF}" ]; then
                                        POWER_MODE="`nvpmodel -q | grep "NV Power Mode"`"
                                        echo "${POWER_MODE}"
                                fi
                        fi
                        ;;
                esac
}

do_fan()
{
        TARGET_PWM="/sys/devices/pwm-fan/target_pwm"
        TEMP_CONTROL="/sys/devices/pwm-fan/temp_control"
        FAN_SPEED=255

        # Jetson-TK1 CPU fan is always ON.
        if [ "${machine}" = "jetson-tk1" ] ; then
                        return
        fi

        if [ ! -w "${TARGET_PWM}" ]; then
                echo "Can't access Fan!"
                return
        fi

        case "${ACTION}" in
                show)
                        echo "Fan: speed=`cat ${TARGET_PWM}`"
                        ;;
                store)
                        store "${TARGET_PWM}"
                        store "${TEMP_CONTROL}"
                        ;;
                *)
                        if [ -w "${TEMP_CONTROL}" ]; then
                                echo "0" > "${TEMP_CONTROL}"
                        fi
                        echo "${FAN_SPEED}" > "${TARGET_PWM}"
                        ;;
        esac
}

do_clusterswitch()
{
        case "${ACTION}" in
                show)
                        if [ -d "/sys/kernel/cluster" ]; then
                                ACTIVE_CLUSTER=`cat /sys/kernel/cluster/active`
                                echo "CPU Cluster Switching: Active Cluster ${ACTIVE_CLUSTER}"
                        else
                                echo "CPU Cluster Switching: Disabled"
                        fi
                        ;;
                store)
                        if [ -d "/sys/kernel/cluster" ]; then
                                store "/sys/kernel/cluster/immediate"
                                store "/sys/kernel/cluster/force"
                                store "/sys/kernel/cluster/active"
                        fi
                        ;;
                *)
                        if [ -d "/sys/kernel/cluster" ]; then
                                echo 1 > /sys/kernel/cluster/immediate
                                echo 0 > /sys/kernel/cluster/force
                                echo G > /sys/kernel/cluster/active
                        fi
                        ;;
        esac
}

do_hotplug()
{
        case "${ACTION}" in
                show)
                        echo "Online CPUs: `cat /sys/devices/system/cpu/online`"
                        ;;
                store)
                        for file in /sys/devices/system/cpu/cpu[0-9]/online; do
                                store "${file}"
                        done
                        ;;
                *)
                        if [ "${SOCFAMILY}" != "tegra186" -a "${SOCFAMILY}" != "tegra194" ]; then
                                for file in /sys/devices/system/cpu/cpu*/online; do
                                        if [ `cat $file` -eq 0 ]; then
                                                echo 1 > "${file}"
                                        fi
                                done
                        fi
        esac
}

do_cpu()
{
        FREQ_GOVERNOR="cpufreq/scaling_governor"
        CPU_MIN_FREQ="cpufreq/scaling_min_freq"
        CPU_MAX_FREQ="cpufreq/scaling_max_freq"
        CPU_CUR_FREQ="cpufreq/scaling_cur_freq"
        CPU_SET_SPEED="cpufreq/scaling_setspeed"
        INTERACTIVE_SETTINGS="/sys/devices/system/cpu/cpufreq/interactive"
        SCHEDUTIL_SETTINGS="/sys/devices/system/cpu/cpufreq/schedutil"

        case "${ACTION}" in
                show)
                        for folder in /sys/devices/system/cpu/cpu[0-9]; do
                                CPU=`basename ${folder}`
                                idle_states=""
                                for idle in ${folder}/cpuidle/state[0-9]; do
                                        idle_states+="`cat ${idle}/name`";
                                        idle_disable="`cat ${idle}/disable`"
                                        idle_states+="=$((idle_disable==0)) ";
                                done
                                if [ -e "${folder}/${FREQ_GOVERNOR}" ]; then
                                        echo "$CPU: Online=`cat ${folder}/online`" \
                                                "Governor=`cat ${folder}/${FREQ_GOVERNOR}`" \
                                                "MinFreq=`cat ${folder}/${CPU_MIN_FREQ}`" \
                                                "MaxFreq=`cat ${folder}/${CPU_MAX_FREQ}`" \
                                                "CurrentFreq=`cat ${folder}/${CPU_CUR_FREQ}`"\
                                                "IdleStates: $idle_states";
                                fi
                        done
                        ;;
                store)
                        for file in \
                                /sys/devices/system/cpu/cpu[0-9]/cpufreq/scaling_min_freq; do
                                store "${file}"
                        done

                        for file in \
                                /sys/devices/system/cpu/cpu[0-9]/cpuidle/state[0-9]/disable; do
                                store "${file}"
                        done
                        ;;
                *)
                        for folder in /sys/devices/system/cpu/cpu[0-9]; do
                                cat "${folder}/${CPU_MAX_FREQ}" > "${folder}/${CPU_MIN_FREQ}" 2>/dev/null
                        done

                        for file in \
                                /sys/devices/system/cpu/cpu[0-9]/cpuidle/state[0-9]/disable; do
                                echo 1 > "${file}"
                        done
                        ;;
        esac
}

do_gpu()
{
        case "${SOCFAMILY}" in
                tegra194)
                        GPU_MIN_FREQ="/sys/devices/17000000.gv11b/devfreq/17000000.gv11b/min_freq"
                        GPU_MAX_FREQ="/sys/devices/17000000.gv11b/devfreq/17000000.gv11b/max_freq"
                        GPU_CUR_FREQ="/sys/devices/17000000.gv11b/devfreq/17000000.gv11b/cur_freq"
                        GPU_RAIL_GATE="/sys/devices/17000000.gv11b/railgate_enable"
                        ;;
                tegra186)
                        GPU_MIN_FREQ="/sys/devices/17000000.gp10b/devfreq/17000000.gp10b/min_freq"
                        GPU_MAX_FREQ="/sys/devices/17000000.gp10b/devfreq/17000000.gp10b/max_freq"
                        GPU_CUR_FREQ="/sys/devices/17000000.gp10b/devfreq/17000000.gp10b/cur_freq"
                        GPU_RAIL_GATE="/sys/devices/17000000.gp10b/railgate_enable"
                        ;;
                tegra210)
                        GPU_MIN_FREQ="/sys/devices/57000000.gpu/devfreq/57000000.gpu/min_freq"
                        GPU_MAX_FREQ="/sys/devices/57000000.gpu/devfreq/57000000.gpu/max_freq"
                        GPU_CUR_FREQ="/sys/devices/57000000.gpu/devfreq/57000000.gpu/cur_freq"
                        GPU_RAIL_GATE="/sys/devices/57000000.gpu/railgate_enable"
                        ;;
                *)
                        echo "Error! unsupported SOC ${SOCFAMILY}"
                        exit 1;
                        ;;
        esac

        case "${ACTION}" in
                show)
                        echo "GPU MinFreq=`cat ${GPU_MIN_FREQ}`" \
                                "MaxFreq=`cat ${GPU_MAX_FREQ}`" \
                                "CurrentFreq=`cat ${GPU_CUR_FREQ}`"
                        ;;
                store)
                        store "${GPU_MIN_FREQ}"
                        store "${GPU_RAIL_GATE}"
                        ;;
                *)
                        echo 0 > "${GPU_RAIL_GATE}"
                        cat "${GPU_MAX_FREQ}" > "${GPU_MIN_FREQ}"
                        ret=$?
                        if [ ${ret} -ne 0 ]; then
                                echo "Error: Failed to max GPU frequency!"
                        fi
                        ;;
        esac
}

do_emc()
{
        case "${SOCFAMILY}" in
                tegra186 | tegra194)
                        EMC_ISO_CAP="/sys/kernel/nvpmodel_emc_cap/emc_iso_cap"
                        EMC_MIN_FREQ="/sys/kernel/debug/bpmp/debug/clk/emc/min_rate"
                        EMC_MAX_FREQ="/sys/kernel/debug/bpmp/debug/clk/emc/max_rate"
                        EMC_CUR_FREQ="/sys/kernel/debug/clk/emc/clk_rate"
                        EMC_UPDATE_FREQ="/sys/kernel/debug/bpmp/debug/clk/emc/rate"
                        EMC_FREQ_OVERRIDE="/sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked"
                        ;;
                tegra210)
                        EMC_MIN_FREQ="/sys/kernel/debug/tegra_bwmgr/emc_min_rate"
                        EMC_MAX_FREQ="/sys/kernel/debug/tegra_bwmgr/emc_max_rate"
                        EMC_CUR_FREQ="/sys/kernel/debug/clk/override.emc/clk_rate"
                        EMC_UPDATE_FREQ="/sys/kernel/debug/clk/override.emc/clk_update_rate"
                        EMC_FREQ_OVERRIDE="/sys/kernel/debug/clk/override.emc/clk_state"
                        ;;
                *)
                        echo "Error! unsupported SOC ${SOCFAMILY}"
                        exit 1;
                        ;;

        esac

        if [ "${SOCFAMILY}" = "tegra186" -o "${SOCFAMILY}" = "tegra194" ]; then
                emc_cap=`cat "${EMC_ISO_CAP}"`
                emc_fmax=`cat "${EMC_MAX_FREQ}"`
                if [ "$emc_cap" -gt 0 ] && [ "$emc_cap" -lt  "$emc_fmax" ]; then
                        EMC_MAX_FREQ="${EMC_ISO_CAP}"
                fi
        fi

        case "${ACTION}" in
                show)
                        echo "EMC MinFreq=`cat ${EMC_MIN_FREQ}`" \
                                "MaxFreq=`cat ${EMC_MAX_FREQ}`" \
                                "CurrentFreq=`cat ${EMC_CUR_FREQ}`" \
                                "FreqOverride=`cat ${EMC_FREQ_OVERRIDE}`"
                        ;;
                store)
                        store "${EMC_FREQ_OVERRIDE}"
                        ;;
                *)
                        cat "${EMC_MAX_FREQ}" > "${EMC_UPDATE_FREQ}"
                        echo 1 > "${EMC_FREQ_OVERRIDE}"
                        ;;
        esac
}

main ()
{
        while [ -n "$1" ]; do
                case "$1" in
                        --show)
                                echo "SOC family:${SOCFAMILY}  Machine:${machine}"
                                ACTION=show
                                ;;
                        --store)
                                [ -n "$2" ] && CONF_FILE=$2
                                ACTION=store
                                shift 1
                                ;;
                        --restore)
                                [ -n "$2" ] && CONF_FILE=$2
                                ACTION=restore
                                shift 1
                                ;;
                        -h|--help)
                                usage
                                exit 0
                                ;;
                        *)
                                usage "Unknown option: $1"
                                exit 1
                                ;;
                esac
                shift 1
        done

        [ `whoami` != root ] && \
                echo Error: Run this script\($0\) as a root user && exit 1

        case $ACTION in
                store)
                        if [ -e "${CONF_FILE}" ]; then
                                echo "File $CONF_FILE already exists. Can I overwrite it? Y/N:"
                                read answer
                                case $answer in
                                        y|Y)
                                                rm -f $CONF_FILE
                                                ;;
                                        *)
                                                echo "Error: file $CONF_FILE already exists!"
                                                exit 1
                                                ;;
                                esac
                        fi
                        ;;
                restore)
                        if [ ! -e "${CONF_FILE}" ]; then
                                echo "Error: $CONF_FILE file not found !"
                                exit 1
                        fi
                        restore
                        exit 0
                        ;;
        esac

        do_hotplug
        do_clusterswitch
        do_cpu
        do_gpu
        do_emc
        do_fan
        do_nvpmodel
}

if [ -e "/sys/devices/soc0/family" ]; then
        CHIP="`cat /sys/devices/soc0/family`"
        if [[ "${CHIP}" =~ "Tegra21" ]]; then
                SOCFAMILY="tegra210"
        fi

        if [ -e "/sys/devices/soc0/machine" ]; then
                machine="`cat /sys/devices/soc0/machine`"
        fi
elif [ -e "/proc/device-tree/compatible" ]; then
        if [ -e "/proc/device-tree/model" ]; then
                machine="$(tr -d '\0' < /proc/device-tree/model)"
        fi
        CHIP="$(tr -d '\0' < /proc/device-tree/compatible)"
        if [[ "${CHIP}" =~ "tegra186" ]]; then
                SOCFAMILY="tegra186"
        elif [[ "${CHIP}" =~ "tegra210" ]]; then
                SOCFAMILY="tegra210"
        elif [[ "${CHIP}" =~ "tegra194" ]]; then
                SOCFAMILY="tegra194"
        fi
fi

main $@
exit 0
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值