使用Python API（AutoTVM），2024年最新面试官最后应该说什么

本文链接：https://blog.csdn.net/2401_84141293/article/details/137812250

先自我介绍一下，小编浙江大学毕业，去过华为、字节跳动等大厂，目前阿里P7

深知大多数程序员，想要提升技能，往往是自己摸索成长，但自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！

因此收集整理了一份《2024年最新Python全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友。

既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，涵盖了95%以上Python知识点，真正体系化！

由于文件比较多，这里只是将部分目录截图出来，全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频，并且后续会持续更新

如果你需要这些资料，可以添加V获取：vip1024c （备注Python）

正文

that hold properties of read-write workspace pools that could be
used by the inference.

constant_memory_pools : Optional[ConstantMemoryPools]
The object that contains an Array of ConstantPoolInfo objects
that hold properties of read-only pools that could be
used by the inference.

params : dict of str to NDArray
Input parameters to the graph that do not change
during inference time. Used for constant folding.

mod_name: Optional[str]
The module name we will build

Returns

factory_module : tvm.relay.backend.executor_factory.ExecutorFactoryModule
The runtime factory for the TVM graph executor.
“”"


`tvm.device()`函数使用给定的设备类型和id构造TVM设备：

def device(dev_type, dev_id=0):
“”"Construct a TVM device with given device type and id.

Parameters

dev_type: int or str
The device type mask or name of the device.

dev_id : int, optional
The integer device id

Returns

dev: tvm.runtime.Device
The corresponding device.

Examples

Device can be used to create reflection of device by
string representation of the device type.

… code-block:: python

assert tvm.device(“cpu”, 1) == tvm.cpu(1)
assert tvm.device(“cuda”, 0) == tvm.cuda(0)
“”"


`tvm.contrib.graph_executor.GraphModule`类是`tvm.runtime.Module`的浅包装，使用它可以直接调用`tvm.runtime.Module`的`set_input()`、`run()`、`get_output()`函数：

class GraphModule(object):
“”"Wrapper runtime module.

This is a thin wrapper of the underlying TVM module.
you can also directly call set_input, run, and get_output
of underlying module functions

Parameters

module : tvm.runtime.Module
The internal tvm module that holds the actual graph functions.

Attributes

module : tvm.runtime.Module
The internal tvm module that holds the actual graph functions.

Examples

… code-block:: python

import tvm
from tvm import relay
from tvm.contrib import graph_executor

build the library using graph executor

lib = relay.build(…)
lib.export_library(“compiled_lib.so”)

load it back as a runtime

lib: tvm.runtime.Module = tvm.runtime.load_module(“compiled_lib.so”)

Call the library factory function for default and create

a new runtime.Module, wrap with graph module.

gmod = graph_executor.GraphModule(lib"default")

use the graph module.

gmod.set_input(“x”, data)
gmod.run()
“”"


## 第3步：运行


### 预处理


preprocess.py

from tvm.contrib.download import download_testdata
from PIL import Image
import numpy as np

def read_img():
img_url = “https://s3.amazonaws.com/model-server/inputs/kitten.jpg”
img_path = download_testdata(img_url, “imagenet_cat.png”, module=“data”)

# 重设大小为 224x224
resized_image = Image.open(img_path).resize((224, 224))
img_data = np.asarray(resized_image).astype("float32")

# ONNX 需要 NCHW 输入, 因此对数组进行转换
img_data = np.transpose(img_data, (2, 0, 1))

# 根据 ImageNet 进行标准化
imagenet_mean = np.array([0.485, 0.456, 0.406])
imagenet_stddev = np.array([0.229, 0.224, 0.225])
norm_img_data = np.zeros(img_data.shape).astype("float32")
for i in range(img_data.shape[0]):
    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]

# 添加 batch 维度
img_data = np.expand_dims(norm_img_data, axis=0)

return img_data


### 运行


直接调用`tvm.contrib.graph_executor.GraphModule`的set\_input()、run()、get\_output()函数：

class GraphModule(object):
def set_input(self, key=None, value=None, **params):
“”"Set inputs to the module via kwargs

Parameters

key : int or str
The input key

value : the input value.
The input value

params : dict of str to NDArray
Additional arguments
“”"

def run(self, \*\*input_dict):
    """Run forward execution of the graph

Parameters

input_dict: dict of str to NDArray
List of input values to be feed to
“”"

def get\_output(self, index, out=None):
    """Get index-th output to out

Parameters

index : int
The output index

out : NDArray
The output array container
“”"


### 后处理


postprocess.py

import numpy as np
from scipy.special import softmax
from tvm.contrib.download import download_testdata

def print_labels(data):
# 下载标签列表
labels_url = “https://s3.amazonaws.com/onnx-model-zoo/synset.txt”
labels_path = download_testdata(labels_url, “synset.txt”, module=“data”)

with open(labels_path, "r") as f:
    labels = [l.rstrip() for l in f]

scores = softmax(data)
scores = np.squeeze(scores)
ranks = np.argsort(scores)[::-1]

for rank in ranks[0:5]:
    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))


### 性能测量


performance.py

import timeit
import numpy as np

def test_perf(module):
timing_number = 10
timing_repeat = 10
timing_results_list = (
np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
* 1000 / timing_number
)
timing_statistics = {
“mean”: np.mean(timing_results_list),
“median”: np.median(timing_results_list),
“std”: np.std(timing_results_list),
}

return timing_statistics


## 第4步：调优


`tvm.autotvm.measure_option()`设置测量选项，为了测量配置，需要构建并运行它，也就是设置`Builder`和`Runner`参数：

def measure_option(builder, runner):
“”"
Set options for measure. To measure a config, we will build it and run it.
So we have to set options for these two steps.
They have their own options on timeout, parallel, etc.

Parameters

builder: Builder
Specify how to build programs
runner: Runner
Specify how to run programs

Examples

example setting for using local devices

measure_option = autotvm.measure_option(
builder=autotvm.LocalBuilder(), # use all local cpu cores for compilation
runner=autotvm.LocalRunner( # measure them sequentially
number=10,
timeout=5)
)

Note

To make measurement results accurate, you should pick the correct value for the argument
number and repeat in Runner(). Some devices need a certain minimum running time to
“warm up,” such as GPUs that need time to reach a performance power state.
Using min\_repeat\_ms can dynamically adjusts number, so it is recommended.
The typical value for NVIDIA GPU is 150 ms.
“”"


`tvm.autotvm.LocalBuilder`类在本地编译代码：

class LocalBuilder(Builder):
“”"Run compilation on local machine

Parameters

timeout: float
The timeout of a compilation
n_parallel: int
The number of tasks run in parallel. “None” will use all cpu cores
build_kwargs: dict
If supplied, additional kwargs passed to build_func. Overrides any build_kwargs supplied
by the Runner.
build_func: callable or str
If is ‘default’, use default build function
If is ‘ndk’, use function for android ndk
If id ‘stackvm’, use function for stackvm
If is callable, use it as custom build function, expect lib_format field.
do_fork: bool
If False, do not fork when building. Requires n_parallel=1.
runtime: Optional[Runtime]
Specify the runtime to generate artifacts for
“”"


`tvm.autotvm.LocalRunner`类在本地运行代码：

class LocalRunner(RPCRunner):
“”"Run generated code on local devices.

Parameters

timeout: float
The timeout of a compilation
number: int
The number of times to run the generated code for taking average.
We call these runs as one `repeat` of measurement.
repeat : int, optional
The number of times to repeat the measurement.
In total, the generated code will be run (1 + number x repeat) times,
where the first one is warm up and will be discarded.
The returned result contains `repeat` costs,
each of which is an average of `number` costs.
min_repeat_ms: int, optional
The minimum duration of one `repeat` in milliseconds.
By default, one `repeat` contains `number` runs. If this parameter is set,
the parameters `number` will be dynamically adjusted to meet the
minimum duration requirement of one `repeat`.
i.e., When the run time of one `repeat` falls below this time, the `number` parameter
will be automatically increased.
cooldown_interval: float, optional
The cool down interval between two measurements.
enable_cpu_cache_flush: bool
Whether to flush cache on CPU between repeated measurements.
Flushing cache can make the measured latency of one operator closer to
its actual latency during end-to-end inference.
To make this option effective, the argument `number` should also be set to 1.
This is only has effect on CPU task.
Note

This is a “fake” local mode. We start a silent rpc tracker and rpc server
for the user. In this way we reuse timeout/isolation mechanism in RPC infrastructure.
“”"


`tvm.autotvm.task.extract_from_program()`函数从要调优的`tvm.IRModule`或`ralay.funtion.Function`中提取任务：

def extract_from_program(mod, params, target, target_host=None, ops=None):
“”"Extract tuning tasks from a relay program.

This function is the single program version of extract_from_multiple_program.

Parameters

mod: tvm.IRModule or relay.function.Function
The module or function to tune
params: dict of str to numpy array
The associated parameters of the program
target: tvm.target.Target
The compilation target
target_host: tvm.target.Target
The host compilation target
ops: List[tvm.ir.Op] or None
List of relay ops to be tuned. If not specified, all tunable ops will be extracted.

Returns

task: Array of autotvm.task.Task
collected tasks
“”"


`tvm.autotvm.tuner.XGBTuner`类是使用xgboost作为cost模型的调优器，`tuner_obj.tune()`调用了`XGBTuner.tune()`进行调优：

class XGBTuner(ModelBasedTuner):
“”"Tuner that uses xgboost as cost model

Parameters

task: Task
The tuning task
plan_size: int
The size of a plan. After plan\_size trials, the tuner will refit a new cost model
and do planing for the next plan\_size trials.
feature_type: str, optional
If is ‘itervar’, use features extracted from IterVar (loop variable).
If is ‘knob’, use flatten ConfigEntity directly.
If is ‘curve’, use sampled curve feature (relation feature).

Note on choosing feature type:
For single task tuning, ‘itervar’ and ‘knob’ are good.
‘itervar’ is more accurate but ‘knob’ is much faster.
There are some constraints on ‘itervar’, if you meet
problems with feature extraction when using ‘itervar’,
you can switch to ‘knob’.

For cross-shape tuning (e.g. many convolutions with different shapes),
‘itervar’ and ‘curve’ has better transferability,
‘knob’ is faster.

For cross-device or cross-operator tuning, you can use ‘curve’ only.
loss_type: str
If is ‘reg’, use regression loss to train cost model.
The cost model predicts the normalized flops.
If is ‘rank’, use pairwise rank loss to train cost model.
The cost model predicts relative rank score.
If is ‘rank-binary’, use pairwise rank loss with binarized labels to train cost model.
The cost model predicts relative rank score.

num_threads: int, optional
The number of threads.

optimizer: str or ModelOptimizer, optional
If is ‘sa’, use a default simulated annealing optimizer.
Otherwise it should be a ModelOptimizer object.

diversity_filter_ratio: int or float, optional
If is not None, the tuner will first select
top-(plan_size * diversity_filter_ratio) candidates according to the cost model
and then pick batch_size of them according to the diversity metric.

log_interval: int = 50
The verbose level.
If is 0, output nothing.
Otherwise, output debug information every verbose iterations.
“”"

def tune(self, \*args, \*\*kwargs):  # pylint: disable=arguments-differ
	super(XGBTuner, self).tune(\*args, \*\*kwargs)

	# manually close pool to avoid multiprocessing issues
	self.cost_model._close_pool()


`XGBTuner.tune()`函数内部调用了`XGBTuner`的祖先类`tvm.autotvm.tuner.Tuner`的`Tuner.tune()`函数：

class Tuner(object):
“”"Base class for tuners

Parameters

task: autotvm.task.Task
Tuning Task
“”"

def tune(self, n_trial, measure_option, early_stopping=None, callbacks=(), si_prefix="G"):
    """Begin tuning

Parameters

n_trial: int
Maximum number of configs to try (measure on real hardware)
measure_option: dict
The options for how to measure generated code.
You should use the return value ot autotvm.measure_option for this argument.
early_stopping: int, optional
Early stop the tuning when not finding better configs in this number of trials
callbacks: List of callable
A list of callback functions. The signature of callback function is
(Tuner, List of MeasureInput, List of MeasureResult)
with no return value. These callback functions will be called on
every measurement pair. See autotvm/tuner/callback.py for some examples.
si_prefix: str
One of tvm.autotvm.utils.SI_PREFIXES. The SI prefix to use when reporting FLOPS.
“”"


调用`XGBTuner.tune()`时使用了两个回调函数，其中`tvm.autotvm.callback.progress_bar()`显示调优的进度条，`tvm.autotvm.callback.log_to_file()`将调优记录保存到文件中，日志的行以`autotvm.record.encode`的格式存储：

def progress_bar(total, prefix=“”, si_prefix=“G”):
“”"Display progress bar for tuning

Parameters

total: int
The total number of trials
prefix: str
The prefix of output message
si_prefix: str
SI prefix for flops
“”"

def log_to_file(file_out, protocol=“json”):
“”"Log the tuning records into file.
The rows of the log are stored in the format of autotvm.record.encode.

Parameters

file_out : File or str
The file to log to.
protocol: str, optional
The log protocol. Can be ‘json’ or ‘pickle’

Returns

callback : callable
Callback function to do the logging.
“”"

如果你也是看准了Python，想自学Python，在这里为大家准备了丰厚的免费学习大礼包，带大家一起学习，给大家剖析Python兼职、就业行情前景的这些事儿。

一、Python所有方向的学习路线

Python所有方向路线就是把Python常用的技术点做整理，形成各个领域的知识点汇总，它的用处就在于，你可以按照上面的知识点去找对应的学习资源，保证自己学得较为全面。

二、学习软件

工欲善其必先利其器。学习Python常用的开发软件都在这里了，给大家节省了很多时间。

三、全套PDF电子书

书籍的好处就在于权威和体系健全，刚开始学习的时候你可以只看视频或者听某个人讲课，但等你学完之后，你觉得你掌握了，这时候建议还是得去看一下书籍，看权威技术书籍也是每个程序员必经之路。

四、入门学习视频

我们在看视频学习的时候，不能光动眼动脑不动手，比较科学的学习方法是在理解之后运用它们，这时候练手项目就很适合了。

四、实战案例

光学理论是没用的，要学会跟着一起敲，要动手实操，才能将自己的所学运用到实际当中去，这时候可以搞点实战案例来学习。

五、面试资料

我们学习Python必然是为了找到高薪的工作，下面这些面试题是来自阿里、腾讯、字节等一线互联网大厂最新的面试资料，并且有阿里大佬给出了权威的解答，刷完这一套面试资料相信大家都能找到满意的工作。

成为一个Python程序员专家或许需要花费数年时间，但是打下坚实的基础只要几周就可以，如果你按照我提供的学习路线以及资料有意识地去实践，你就有很大可能成功！
最后祝你好运！！！

网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是浅尝辄止，不再深入研究，那么很难做到真正的技术提升。

需要这份系统化的资料的朋友，可以添加V获取：vip1024c （备注python）

一个人可以走的很快，但一群人才能走的更远！不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人，都欢迎加入我们的的圈子（技术交流、学习资源、职场吐槽、大厂内推、面试辅导），让我们一起学习成长！
blog.csdnimg.cn/img_convert/252731a671c1fb70aad5355a2c5eeff0.png)