【從零開始學(xué)深度學(xué)習(xí)編譯器】四,解析TVM算子
【GiantPandaCV導(dǎo)語】這篇文章主要是基于Relay 卷積算子和TOPI Resize算子來梳理了一下TVM的算子扭轉(zhuǎn)和實現(xiàn)的具體過程,在這個過程中也了解到了對于不同的后端,TVM提供了各種scheduler來幫助模型加速。最后,希望看完這篇文章的讀者能有所收獲。其實自己看TVM算子扭轉(zhuǎn)和實現(xiàn)的過程中對于一些小細(xì)節(jié)把握得不夠,也歡迎大家和我一起討論。(另,本人剛接觸TVM,不是帶佬,請不要一直知乎私信我做TVM相關(guān)的項目了,謝謝)。
0x0. 回顧
上篇文章詳細(xì)的梳理了TVM的ONNX前端,我們知道了TVM是如何加載ONNX模型并將ONNX的算子轉(zhuǎn)換為Relay表達式的。這篇文章我們將以卷積算子為例,來看一下Relay 表達式是如何轉(zhuǎn)換為TOPI算子并結(jié)合TVM的scheduler在后端上運行的。
0x1. Relay 卷積算子的扭轉(zhuǎn)過程
首先還是來到ONNX前端,我們把ONNX的卷積OP轉(zhuǎn)換為TVM OP這個類貼出來:
class Conv(OnnxOpConverter):
"""Operator converter for Conv."""
@classmethod
def _impl_v1(cls, inputs, attr, params):
# Use shape of input to determine convolution type.
data = inputs[0]
input_shape = infer_shape(data)
ndim = len(input_shape)
# 處理ONNX的卷積算子屬性和TVM Relay的卷積OP屬性不一致的問題
kernel_type = infer_type(inputs[1])
kernel_shapes = [get_const_tuple(kernel_type.checked_type.shape)]
if "kernel_shape" not in attr:
attr["kernel_shape"] = kernel_shapes[0][2:]
if "auto_pad" in attr:
attr["auto_pad"] = attr["auto_pad"].decode("utf-8")
if attr["auto_pad"] in ("SAME_UPPER", "SAME_LOWER"):
# Warning: Convolution does not yet support dynamic shapes,
# one will need to run dynamic_to_static on this model after import
data = autopad(
data,
attr.get("strides", [1] * (ndim - 2)),
attr["kernel_shape"],
attr.get("dilations", [1] * (ndim - 2)),
ndim,
)
elif attr["auto_pad"] == "VALID":
attr["pads"] = tuple([0 for i in range(ndim - 2)])
elif attr["auto_pad"] == "NOTSET":
pass
else:
msg = 'Value {} in attribute "auto_pad" of operator Conv is invalid.'
raise tvm.error.OpAttributeInvalid(msg.format(attr["auto_pad"]))
attr.pop("auto_pad")
# 完成屬性的轉(zhuǎn)換以及OP轉(zhuǎn)換
out = AttrCvt(
op_name=dimension_picker("conv"),
transforms={
"kernel_shape": "kernel_size",
"dilations": ("dilation", 1),
"pads": ("padding", 0),
"group": ("groups", 1),
},
custom_check=dimension_constraint(),
)([data, inputs[1]], attr, params)
use_bias = len(inputs) == 3
if use_bias:
out = _op.nn.bias_add(out, inputs[2])
return out
可以看到這個類的核心就是調(diào)用了AttrCvt函數(shù)來完成ONNX的卷積算子轉(zhuǎn)換為Relay 卷積算子,這個轉(zhuǎn)換包含了屬性的轉(zhuǎn)換以及根據(jù)layout對weights,inputs,outputs進行重排并返回一個Relay 卷積算子。AttrCvt的調(diào)用位于python/tvm/relay/frontend/common.py文件夾中,根據(jù)注釋可以看到這個類主要實現(xiàn)了算子扭轉(zhuǎn),即根據(jù)輸入的op_name映射到Relay的算子。具體過程是,先對傳入的attrs進行檢查,如果有非法的屬性就報錯,如果屬性有相應(yīng)的轉(zhuǎn)換策略就直接轉(zhuǎn)換(即上面代碼中的transforms),最后調(diào)用get_relay_op返回一個TVM Relay卷積算子。get_relay_op函數(shù)的實現(xiàn)如下:
def get_relay_op(op_name):
"""基于OP的名字從Relay中獲得調(diào)用函數(shù)
參數(shù)
----------
op_name : str
Relay OP的名字
"""
if "." in op_name:
# explicit hierachical modules
op = _op
try:
for opn in op_name.split("."):
op = getattr(op, opn)
except AttributeError:
op = None
else:
# try search op in various modules
for candidate in (_op, _op.nn, _op.image, _op.vision, _op.contrib):
op = getattr(candidate, op_name, None)
if op is not None:
break
if not op:
raise tvm.error.OpNotImplemented("Unable to map op_name {} to relay".format(op_name))
return op
所有的op都位于python/tvm/relay/op包中,conv在op/nn中定義。上面代碼中的幾個for loop就是在python/tvm/relay/op下去搜尋滿足OP name為op_name的Relay算子,找到就返回。至于為什么要分兩種情況,這是既支持用戶寫module.xxx也支持直接寫xxx,這里的module可以是python/tvm/relay/op包中的任何一級文件夾比如nn。nn.py中包含如下調(diào)用關(guān)系:conv2d -> _make.conv2d()。然后在_make.py中實際上實現(xiàn)了C++類到python類的注冊,就是一行代碼:
tvm._ffi._init_api("relay.op.nn._make", __name__)
_init_api這個函數(shù)又實現(xiàn)在tvm/python/tvm/_ffi/registry.py中,具體如下:
def _init_api(namespace, target_module_name=None):
"""Initialize api for a given module name
namespace : str
The namespace of the source registry
target_module_name : str
The target module name if different from namespace
"""
target_module_name = target_module_name if target_module_name else namespace
if namespace.startswith("tvm."):
_init_api_prefix(target_module_name, namespace[4:])
else:
_init_api_prefix(target_module_name, namespace)
def _init_api_prefix(module_name, prefix):
module = sys.modules[module_name]
for name in list_global_func_names():
if not name.startswith(prefix):
continue
fname = name[len(prefix) + 1 :]
target_module = module
if fname.find(".") != -1:
continue
f = get_global_func(name)
ff = _get_api(f)
ff.__name__ = fname
ff.__doc__ = "TVM PackedFunc %s. " % fname
setattr(target_module, ff.__name__, ff)
可以看到這個函數(shù)實際上就是通過名字來獲取C++注冊的函數(shù),即get_global_func會加載我們編譯好的TVM動態(tài)庫獲取這個動態(tài)庫里面的函數(shù)名稱來進行匹配。獲取到C++注冊的函數(shù)之后就可以設(shè)置到_make.py文件中,即相當(dāng)于在_make.py中定義了conv2d算子的函數(shù)了。conv2d算子的注冊代碼在tvm/src/relay/op/nn/convolution.cc中:
// relay.nn.conv2d
TVM_REGISTER_NODE_TYPE(Conv2DAttrs);
TVM_REGISTER_GLOBAL("relay.op.nn._make.conv2d")
.set_body_typed([](Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
Array<IndexExpr> dilation, int groups, IndexExpr channels,
Array<IndexExpr> kernel_size, String data_layout, String kernel_layout,
String out_layout, DataType out_dtype) {
return MakeConv<Conv2DAttrs>(data, weight, strides, padding, dilation, groups, channels,
kernel_size, data_layout, kernel_layout, out_layout, out_dtype,
"nn.conv2d");
});
RELAY_REGISTER_OP("nn.conv2d")
.describe(R"code(2D convolution layer (e.g. spatial convolution over images).
This layer creates a convolution kernel that is convolved
with the layer input to produce a tensor of outputs.
- **data**: This depends on the `layout` parameter. Input is 4D array of shape
(batch_size, in_channels, height, width) if `layout` is `NCHW`.
- **weight**: (channels, in_channels, kernel_size[0], kernel_size[1])
- **out**: This depends on the `layout` parameter. Output is 4D array of shape
(batch_size, channels, out_height, out_width) if `layout` is `NCHW`.
)code" TVM_ADD_FILELINE)
.set_attrs_type<Conv2DAttrs>()
.set_num_inputs(2)
.add_argument("data", "Tensor", "The input tensor.")
.add_argument("weight", "Tensor", "The weight tensor.")
.set_support_level(2)
.add_type_rel("Conv2D", Conv2DRel<Conv2DAttrs>)
.set_attr<FInferCorrectLayout>("FInferCorrectLayout", ConvInferCorrectLayout<Conv2DAttrs>);
然后我們繼續(xù)跟進C++看一下卷積算子的實現(xiàn),TVM_REGISTER_GLOBAL這個宏定義將算子注冊到一個全局對象中。可以看一下這個宏定義:
#define TVM_REGISTER_GLOBAL(OpName) \
TVM_STR_CONCAT(TVM_FUNC_REG_VAR_DEF, __COUNTER__) = ::tvm::runtime::Registry::Register(OpName)
可以看到注冊的實現(xiàn)在Registry類中,這個類有一個Register成員函數(shù),這個函數(shù)會通過全局manager來將算子注冊進去:
Registry& Registry::Register(const std::string& name, bool can_override) { // NOLINT(*)
Manager* m = Manager::Global();
std::lock_guard<std::mutex> lock(m->mutex);
if (m->fmap.count(name)) {
ICHECK(can_override) << "Global PackedFunc " << name << " is already registered";
}
Registry* r = new Registry();
r->name_ = name;
m->fmap[name] = r;
return *r;
}
然后在上面的conv2d算子注冊時,set_body_typed這個函數(shù)實現(xiàn)如下:
* \param f The function to forward to.
* \tparam FLambda The signature of the function.
*/
template <typename FLambda>
Registry& set_body_typed(FLambda f) {
using FType = typename detail::function_signature<FLambda>::FType;
return set_body(TypedPackedFunc<FType>(std::move(f), name_).packed());
}
其中set_body將通過MakeConv構(gòu)建一個conv算子,然后注冊到registry中。在MakeConv中,首先根據(jù)傳入的conv參數(shù),包括strides,kernel,layout等,構(gòu)建atrrs對象,然后根據(jù)op的名字從已經(jīng)注冊過的conv算子中得到conv的算子,然后再將attrs和op一起打包到call類中。即在tvm/src/relay/op/nn/convolution_make.h中的:
template <typename T>
inline Expr MakeConv(Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
Array<IndexExpr> dilation, int groups, IndexExpr channels,
Array<IndexExpr> kernel_size, std::string data_layout,
std::string kernel_layout, std::string out_layout, DataType out_dtype,
std::string op_name) {
auto attrs = make_object<T>();
attrs->strides = std::move(strides);
attrs->padding = std::move(padding);
attrs->dilation = std::move(dilation);
attrs->groups = groups;
attrs->channels = std::move(channels);
attrs->kernel_size = std::move(kernel_size);
attrs->data_layout = std::move(data_layout);
attrs->kernel_layout = std::move(kernel_layout);
attrs->out_layout = std::move(out_layout);
attrs->out_dtype = std::move(out_dtype);
const Op& op = Op::Get(op_name);
return Call(op, {data, weight}, Attrs(attrs), {});
}
這個Call類是繼承了Expr類:
class Call : public Expr {
public:
/*!
* \brief The destructor
*/
~Call();
/*!
* \brief The constructor
* \param op The operator will be invoked.
* \param args The arguments of the call.
* \param attrs The attributes of the call node.
* \param type_args The type arguments passed to a polymorphic function.
* \param span The source span of the expression.
*/
TVM_DLL Call(Expr op, Array<Expr> args, Attrs attrs = Attrs(),
Array<Type> type_args = Array<Type>(), Span span = Span());
TVM_DEFINE_OBJECT_REF_METHODS(Call, RelayExpr, CallNode);
};
Op算子是通過RELAY_REGISTER_OP注冊到一個公共AttrRegistry中的。在一個op類中實際上并沒有包含這個op的計算過程,只是納入了這個算子的輸入輸出以及屬性的信息。
特別注意Relay OP并沒有包含具體的計算過程!上面的一系列操作僅僅是拿到了Relay 卷積OP的IR以及輸入和屬性。那么這個OP的計算過程是在哪里完成的呢?是的,就是下面要介紹的TVM的TOPI中。
0x2. TOPI Resize算子扭轉(zhuǎn)過程
TOPI是TVM自己的一個算子庫,這些算子可以通過te來進行表達,可以參考官方文檔:http://tvm.apache.org/docs/tutorials/language/schedule_primitives.html#sphx-glr-tutorials-language-schedule-primitives-py。這里以O(shè)NNX的Resize算子為例介紹一下TOPI算子在TVM中的扭轉(zhuǎn)過程。首先還是定位到tvm/python/tvm/relay/frontend/onnx.py中的Resize類:
class Resize(OnnxOpConverter):
"""Operator converter for Resize"""
@classmethod
def _impl_v10(cls, inputs, attr, params):
mode = attr.get("mode").decode("ascii")
if mode == "nearest":
method = "nearest_neighbor"
elif mode == "linear":
method = "bilinear"
elif mode == "cubic":
method = "bicubic"
else:
raise tvm.error.OpAttributeInvalid(
'Value {} in attribute "mode" of operator Resize is not valid.'.format(mode)
)
scale = inputs[1]
size = _op.cast(shape_of(inputs[0]), infer_type(scale).checked_type.dtype) * scale
layout = "NCHW" # ONNX assumes NCHW layout
out_size = fold_constant(_op.strided_slice(size, [2], [4]))
return _op.image.resize(inputs[0], out_size, layout, method, "asymmetric")
可以看到這個Resize類最后調(diào)用了tvm/python/tvm/relay/op/image/image.py中的resize函數(shù):
def resize(
data,
size,
layout="NCHW",
method="bilinear",
coordinate_transformation_mode="half_pixel",
rounding_method="",
bicubic_alpha=-0.5,
bicubic_exclude=0,
out_dtype=None,
):
"""Image resize operator.
This operator takes data as input and does 2D scaling to the given scale factor.
In the default case, where the data_layout is `NCHW`
with data of shape (n, c, h, w)
out will have a shape (n, c, size[0], size[1])
method indicates the algorithm to be used while calculating the out value
and method can be one of ("bilinear", "nearest_neighbor", "bicubic")
Parameters
----------
data : relay.Expr
The input data to the operator.
size: Tuple of Int or Expr
The out size to which the image will be resized.
layout : str, optional
Layout of the input.
method : str, optional
Scale method to used [nearest_neighbor, bilinear, bicubic].
coordinate_transformation_mode : string, optional
Describes how to transform the coordinate in the resized tensor
to the coordinate in the original tensor.
Refer to the ONNX Resize operator specification for details.
[half_pixel, align_corners, asymmetric]
rounding_method: string, optional
indicates how to find the "nearest" pixel in nearest_neighbor method
[round, floor, ceil]
bicubic_alpha: float
Spline Coefficient for Bicubic Interpolation
bicubic_exclude: int
Flag to exclude exterior of the image during bicubic interpolation
out_dtype : str, optional
Type to return. If left None returns the same type as input.
Returns
-------
result: relay.Expr
The resized result.
"""
if isinstance(size, Constant):
size = list(size.data.asnumpy().astype("int32"))
if isinstance(size, Expr):
return _dyn_make.resize(
data,
size,
layout,
method,
coordinate_transformation_mode,
rounding_method,
bicubic_alpha,
bicubic_exclude,
out_dtype,
)
return _make.resize(
data,
size,
layout,
method,
coordinate_transformation_mode,
rounding_method,
bicubic_alpha,
bicubic_exclude,
out_dtype,
)
這里又是經(jīng)過了_make.resize函數(shù),在上一節(jié)Relay 卷積算子的扭轉(zhuǎn)過程中我們已經(jīng)知道在_make.py中實際上實現(xiàn)了C++類到python類的注冊,因此這里對應(yīng)了TVM的TOPI Resize算子的C++算子接口。即對應(yīng)了tvm/src/relay/op/image/resize.cc中的Resize OP注冊代碼:
TVM_REGISTER_GLOBAL("relay.op.image._make.resize").set_body_typed(MakeResize);
RELAY_REGISTER_OP("image.resize")
.describe(R"code(Perform resize to input array with nearest neighbour or bilinear interpolation.
- **data**: data is 4D array of shape
(batch_size, channels, in_height, in_width) for NCHW
(batch_size, in_height, in_width, channels) for NHWC
- **out**: Output is 4D array of shape
for layout NCHW
(batch_size, channels, size[0], size[1])
for layout NHWC
(batch_size, size[0], size[1], channels)
)code" TVM_ADD_FILELINE)
.set_attrs_type<ResizeAttrs>()
.set_num_inputs(1)
.add_argument("data", "Tensor", "The input tensor.")
.set_support_level(5)
.add_type_rel("Resize", ResizeRel)
.set_attr<TOpPattern>("TOpPattern", kInjective);
然后在TVM的CodeBase例子中這樣介紹:

最重要的一句話是,Operators corresponding to each node are registered insrc/relay/op. Implementations of operators are in topi, and they are coded in either C++ or Python.
從這里我們可以知道TVM的算子具體實現(xiàn)都是在tvm/python/tvm/topi這里完成的。
繼續(xù)回到Resize算子,它是通過tvm/python/tvm/relay/op/image/_image.py中的下面的代碼建立了OP和TOPI算子的連接:
# resize
@reg.register_compute("image.resize")
def compute_resize(attrs, inputs, out_type):
""" compute definition for resize op """
size = attrs.size
layout = attrs.layout
method = attrs.method
coord_trans = attrs.coordinate_transformation_mode
rounding_method = attrs.rounding_method
bicubic_alpha = attrs.bicubic_alpha
bicubic_exclude = attrs.bicubic_exclude
out_dtype = attrs.out_dtype
return [
topi.image.resize(
inputs[0],
size,
layout,
method,
coord_trans,
rounding_method,
bicubic_alpha,
bicubic_exclude,
out_dtype,
)
]
reg.register_injective_schedule("image.resize")
上一節(jié)的Relay nn相關(guān)的算子也有建立連接的過程,在tvm/python/tvm/relay/op/nn/_nn.py中有nn.conv2d關(guān)鍵字的地方。
然后我們來看一下TOPI Resize算子的具體實現(xiàn)代碼,在tvm/python/tvm/topi/image/resize.py中的resize函數(shù),最后一行就是根據(jù)上面?zhèn)魅氲膍ethod來選擇使用哪種插值方式進行Resize。
# Determine which interpolation method to use then run it.
if method == "nearest_neighbor":
compute_func = _nearest_neighbor
elif method == "bilinear":
compute_func = _bilinear
elif method == "bicubic":
compute_func = _bicubic
else:
raise ValueError("%s method is not supported." % method)
return te.compute(output_shape, compute_func, name="resize", tag=tag.INJECTIVE)
每個函數(shù)的具體實現(xiàn)方式就不細(xì)講了,感興趣的讀者可以直接在tvm/python/tvm/topi/image/resize.py這里找到源碼。
0x3. 調(diào)度
在介紹上面的TOPI算子時貼出了tvm/python/tvm/relay/op/image/_image.py中建立OP和TOPI算子的連接的代碼, 其中最后一行代碼如下:
reg.register_injective_schedule("image.resize")
這一行代碼實際上就完成了TVM中調(diào)度的功能,我們在第二節(jié)講過TVM中的調(diào)度是通過scheduler來完成的。【從零開始學(xué)深度學(xué)習(xí)編譯器】二,TVM中的scheduler
從下面的截圖我們可以清楚的看到TVM對于多種硬件設(shè)備都設(shè)置了對應(yīng)的scheduler

對于arm_cpu來說,卷積以及深度可分離卷積等都有特定的scheduler,而上面注冊的調(diào)度方法injective是通用的scheduler,觀察代碼實現(xiàn)可以發(fā)現(xiàn)僅僅是做了vectorize,即下面的s[x].vectorize(ii)。
def schedule_injective(outs):
"""ARM CPU schedule for injective op.
Parameters
----------
outs: Array of Tensor
The computation graph description of injective in the format
of an array of tensors.
Returns
-------
sch: Schedule
The computation schedule for the op.
"""
outs = [outs] if isinstance(outs, te.tensor.Tensor) else outs
s = te.create_schedule([x.op for x in outs])
x = outs[0]
if list(s[x].op.axis):
# do not vectorize for broadcast
(io, ii) = s[x].split(list(s[x].op.axis)[-1], 4)
s[x].vectorize(ii)
tvm.te.schedule.AutoInlineInjective(s)
if not is_empty_shape(x.shape):
schedule_injective_from_existing(s, x)
return s
0x4. 總結(jié)
這篇文章主要是基于Relay 卷積算子和TOPI Resize算子來梳理了一下TVM中的算子扭轉(zhuǎn)和實現(xiàn)的具體過程,在這個過程中我們也了解到了對于不同的后端,TVM提供了各種scheduler來幫助模型加速,希望看完的讀者有所收獲。其實自己看算子扭轉(zhuǎn)和實現(xiàn)的過程中對于一些小細(xì)節(jié)可能還把握得不夠,也歡迎大家一起討論。
0x5. 參考
https://zhuanlan.zhihu.com/p/351403985 https://zhuanlan.zhihu.com/p/149386093
歡迎關(guān)注GiantPandaCV, 在這里你將看到獨家的深度學(xué)習(xí)分享,堅持原創(chuàng),每天分享我們學(xué)習(xí)到的新鮮知識。( ? ?ω?? )?
有對文章相關(guān)的問題,或者想要加入交流群,歡迎添加BBuf微信:
