【GiantPandaCV導(dǎo)語】這篇文章主要是基于Relay 卷積算子和TOPI Resize算子來梳理了一下TVM的算子扭轉(zhuǎn)和實現(xiàn)的具體過程，在這個過程中也了解到了對于不同的后端，TVM提供了各種scheduler來幫助模型加速。最后，希望看完這篇文章的讀者能有所收獲。其實自己看TVM算子扭轉(zhuǎn)和實現(xiàn)的過程中對于一些小細(xì)節(jié)把握得不夠，也歡迎大家和我一起討論。（另，本人剛接觸TVM，不是帶佬，請不要一直知乎私信我做TVM相關(guān)的項目了，謝謝）。

0x0. 回顧

上篇文章詳細(xì)的梳理了TVM的ONNX前端，我們知道了TVM是如何加載ONNX模型并將ONNX的算子轉(zhuǎn)換為Relay表達式的。這篇文章我們將以卷積算子為例，來看一下Relay 表達式是如何轉(zhuǎn)換為TOPI算子并結(jié)合TVM的scheduler在后端上運行的。

0x1. Relay 卷積算子的扭轉(zhuǎn)過程

首先還是來到ONNX前端，我們把ONNX的卷積OP轉(zhuǎn)換為TVM OP這個類貼出來：

class Conv(OnnxOpConverter):
    """Operator converter for Conv."""

    @classmethod
    def _impl_v1(cls, inputs, attr, params):
        # Use shape of input to determine convolution type.
        data = inputs[0]
        input_shape = infer_shape(data)
        ndim = len(input_shape)
  # 處理ONNX的卷積算子屬性和TVM Relay的卷積OP屬性不一致的問題
        kernel_type = infer_type(inputs[1])
        kernel_shapes = [get_const_tuple(kernel_type.checked_type.shape)]
        if "kernel_shape" not in attr:
            attr["kernel_shape"] = kernel_shapes[0][2:]
  
        if "auto_pad" in attr:
            attr["auto_pad"] = attr["auto_pad"].decode("utf-8")
            if attr["auto_pad"] in ("SAME_UPPER", "SAME_LOWER"):
                # Warning: Convolution does not yet support dynamic shapes,
                # one will need to run dynamic_to_static on this model after import
                data = autopad(
                    data,
                    attr.get("strides", [1] * (ndim - 2)),
                    attr["kernel_shape"],
                    attr.get("dilations", [1] * (ndim - 2)),
                    ndim,
                )
            elif attr["auto_pad"] == "VALID":
                attr["pads"] = tuple([0 for i in range(ndim - 2)])
            elif attr["auto_pad"] == "NOTSET":
                pass
            else:
                msg = 'Value {} in attribute "auto_pad" of operator Conv is invalid.'
                raise tvm.error.OpAttributeInvalid(msg.format(attr["auto_pad"]))
            attr.pop("auto_pad")
        # 完成屬性的轉(zhuǎn)換以及OP轉(zhuǎn)換
        out = AttrCvt(
            op_name=dimension_picker("conv"),
            transforms={
                "kernel_shape": "kernel_size",
                "dilations": ("dilation", 1),
                "pads": ("padding", 0),
                "group": ("groups", 1),
            },
            custom_check=dimension_constraint(),
        )([data, inputs[1]], attr, params)

        use_bias = len(inputs) == 3
        if use_bias:
            out = _op.nn.bias_add(out, inputs[2])
        return out

可以看到這個類的核心就是調(diào)用了AttrCvt函數(shù)來完成ONNX的卷積算子轉(zhuǎn)換為Relay 卷積算子，這個轉(zhuǎn)換包含了屬性的轉(zhuǎn)換以及根據(jù)layout對weights，inputs，outputs進行重排并返回一個Relay 卷積算子。AttrCvt的調(diào)用位于python/tvm/relay/frontend/common.py文件夾中，根據(jù)注釋可以看到這個類主要實現(xiàn)了算子扭轉(zhuǎn)，即根據(jù)輸入的op_name映射到Relay的算子。具體過程是，先對傳入的attrs進行檢查，如果有非法的屬性就報錯，如果屬性有相應(yīng)的轉(zhuǎn)換策略就直接轉(zhuǎn)換（即上面代碼中的transforms），最后調(diào)用get_relay_op返回一個TVM Relay卷積算子。get_relay_op函數(shù)的實現(xiàn)如下：

def get_relay_op(op_name):
    """基于OP的名字從Relay中獲得調(diào)用函數(shù)
    參數(shù)
    ----------
    op_name : str
        Relay OP的名字
    """
    if "." in op_name:
        # explicit hierachical modules
        op = _op
        try:
            for opn in op_name.split("."):
                op = getattr(op, opn)
        except AttributeError:
            op = None
    else:
        # try search op in various modules
        for candidate in (_op, _op.nn, _op.image, _op.vision, _op.contrib):
            op = getattr(candidate, op_name, None)
            if op is not None:
                break
    if not op:
        raise tvm.error.OpNotImplemented("Unable to map op_name {} to relay".format(op_name))
    return op

所有的op都位于python/tvm/relay/op包中，conv在op/nn中定義。上面代碼中的幾個for loop就是在python/tvm/relay/op下去搜尋滿足OP name為op_name的Relay算子，找到就返回。至于為什么要分兩種情況，這是既支持用戶寫module.xxx也支持直接寫xxx，這里的module可以是python/tvm/relay/op包中的任何一級文件夾比如nn。nn.py中包含如下調(diào)用關(guān)系：conv2d -> _make.conv2d()。然后在_make.py中實際上實現(xiàn)了C++類到python類的注冊，就是一行代碼：

tvm._ffi._init_api("relay.op.nn._make", __name__)

_init_api這個函數(shù)又實現(xiàn)在tvm/python/tvm/_ffi/registry.py中，具體如下：

def _init_api(namespace, target_module_name=None):
    """Initialize api for a given module name

    namespace : str
       The namespace of the source registry

    target_module_name : str
       The target module name if different from namespace
    """
    target_module_name = target_module_name if target_module_name else namespace
    if namespace.startswith("tvm."):
        _init_api_prefix(target_module_name, namespace[4:])
    else:
        _init_api_prefix(target_module_name, namespace)


def _init_api_prefix(module_name, prefix):
    module = sys.modules[module_name]

    for name in list_global_func_names():
        if not name.startswith(prefix):
            continue

        fname = name[len(prefix) + 1 :]
        target_module = module

        if fname.find(".") != -1:
            continue
        f = get_global_func(name)
        ff = _get_api(f)
        ff.__name__ = fname
        ff.__doc__ = "TVM PackedFunc %s. " % fname
        setattr(target_module, ff.__name__, ff)

可以看到這個函數(shù)實際上就是通過名字來獲取C++注冊的函數(shù)，即get_global_func會加載我們編譯好的TVM動態(tài)庫獲取這個動態(tài)庫里面的函數(shù)名稱來進行匹配。獲取到C++注冊的函數(shù)之后就可以設(shè)置到_make.py文件中，即相當(dāng)于在_make.py中定義了conv2d算子的函數(shù)了。conv2d算子的注冊代碼在tvm/src/relay/op/nn/convolution.cc中：

// relay.nn.conv2d
TVM_REGISTER_NODE_TYPE(Conv2DAttrs);

TVM_REGISTER_GLOBAL("relay.op.nn._make.conv2d")
    .set_body_typed([](Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
                       Array<IndexExpr> dilation, int groups, IndexExpr channels,
                       Array<IndexExpr> kernel_size, String data_layout, String kernel_layout,
                       String out_layout, DataType out_dtype) {
      return MakeConv<Conv2DAttrs>(data, weight, strides, padding, dilation, groups, channels,
                                   kernel_size, data_layout, kernel_layout, out_layout, out_dtype,
                                   "nn.conv2d");
    });

RELAY_REGISTER_OP("nn.conv2d")
    .describe(R"code(2D convolution layer (e.g. spatial convolution over images).

This layer creates a convolution kernel that is convolved
with the layer input to produce a tensor of outputs.

- **data**: This depends on the `layout` parameter. Input is 4D array of shape
            (batch_size, in_channels, height, width) if `layout` is `NCHW`.
- **weight**: (channels, in_channels, kernel_size[0], kernel_size[1])
- **out**:  This depends on the `layout` parameter. Output is 4D array of shape
            (batch_size, channels, out_height, out_width) if `layout` is `NCHW`.

)code" TVM_ADD_FILELINE)
    .set_attrs_type<Conv2DAttrs>()
    .set_num_inputs(2)
    .add_argument("data", "Tensor", "The input tensor.")
    .add_argument("weight", "Tensor", "The weight tensor.")
    .set_support_level(2)
    .add_type_rel("Conv2D", Conv2DRel<Conv2DAttrs>)
    .set_attr<FInferCorrectLayout>("FInferCorrectLayout", ConvInferCorrectLayout<Conv2DAttrs>);

然后我們繼續(xù)跟進C++看一下卷積算子的實現(xiàn)，TVM_REGISTER_GLOBAL這個宏定義將算子注冊到一個全局對象中。可以看一下這個宏定義：

#define TVM_REGISTER_GLOBAL(OpName) \
  TVM_STR_CONCAT(TVM_FUNC_REG_VAR_DEF, __COUNTER__) = ::tvm::runtime::Registry::Register(OpName)

可以看到注冊的實現(xiàn)在Registry類中，這個類有一個Register成員函數(shù)，這個函數(shù)會通過全局manager來將算子注冊進去：

Registry& Registry::Register(const std::string& name, bool can_override) {  // NOLINT(*)
  Manager* m = Manager::Global();
  std::lock_guard<std::mutex> lock(m->mutex);
  if (m->fmap.count(name)) {
    ICHECK(can_override) << "Global PackedFunc " << name << " is already registered";
  }

  Registry* r = new Registry();
  r->name_ = name;
  m->fmap[name] = r;
  return *r;
}

然后在上面的conv2d算子注冊時，set_body_typed這個函數(shù)實現(xiàn)如下：

   * \param f The function to forward to.
   * \tparam FLambda The signature of the function.
   */
  template <typename FLambda>
  Registry& set_body_typed(FLambda f) {
    using FType = typename detail::function_signature<FLambda>::FType;
    return set_body(TypedPackedFunc<FType>(std::move(f), name_).packed());
  }

其中set_body將通過MakeConv構(gòu)建一個conv算子，然后注冊到registry中。在MakeConv中，首先根據(jù)傳入的conv參數(shù)，包括strides，kernel，layout等，構(gòu)建atrrs對象，然后根據(jù)op的名字從已經(jīng)注冊過的conv算子中得到conv的算子，然后再將attrs和op一起打包到call類中。即在tvm/src/relay/op/nn/convolution_make.h中的：

template <typename T>
inline Expr MakeConv(Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
                     Array<IndexExpr> dilation, int groups, IndexExpr channels,
                     Array<IndexExpr> kernel_size, std::string data_layout,
                     std::string kernel_layout, std::string out_layout, DataType out_dtype,
                     std::string op_name) {
  auto attrs = make_object<T>();
  attrs->strides = std::move(strides);
  attrs->padding = std::move(padding);
  attrs->dilation = std::move(dilation);
  attrs->groups = groups;
  attrs->channels = std::move(channels);
  attrs->kernel_size = std::move(kernel_size);
  attrs->data_layout = std::move(data_layout);
  attrs->kernel_layout = std::move(kernel_layout);
  attrs->out_layout = std::move(out_layout);
  attrs->out_dtype = std::move(out_dtype);
  const Op& op = Op::Get(op_name);
  return Call(op, {data, weight}, Attrs(attrs), {});
}

這個Call類是繼承了Expr類：

class Call : public Expr {
 public:
  /*!
   * \brief The destructor
   */
  ~Call();

  /*!
   * \brief The constructor
   * \param op The operator will be invoked.
   * \param args The arguments of the call.
   * \param attrs The attributes of the call node.
   * \param type_args The type arguments passed to a polymorphic function.
   * \param span The source span of the expression.
   */
  TVM_DLL Call(Expr op, Array<Expr> args, Attrs attrs = Attrs(),
               Array<Type> type_args = Array<Type>(), Span span = Span());

  TVM_DEFINE_OBJECT_REF_METHODS(Call, RelayExpr, CallNode);
};

Op算子是通過RELAY_REGISTER_OP注冊到一個公共AttrRegistry中的。在一個op類中實際上并沒有包含這個op的計算過程，只是納入了這個算子的輸入輸出以及屬性的信息。

特別注意Relay OP并沒有包含具體的計算過程！上面的一系列操作僅僅是拿到了Relay 卷積OP的IR以及輸入和屬性。那么這個OP的計算過程是在哪里完成的呢？是的，就是下面要介紹的TVM的TOPI中。

0x2. TOPI Resize算子扭轉(zhuǎn)過程

TOPI是TVM自己的一個算子庫，這些算子可以通過te來進行表達，可以參考官方文檔：http://tvm.apache.org/docs/tutorials/language/schedule_primitives.html#sphx-glr-tutorials-language-schedule-primitives-py。這里以O(shè)NNX的Resize算子為例介紹一下TOPI算子在TVM中的扭轉(zhuǎn)過程。首先還是定位到tvm/python/tvm/relay/frontend/onnx.py中的Resize類：

class Resize(OnnxOpConverter):
    """Operator converter for Resize"""

    @classmethod
    def _impl_v10(cls, inputs, attr, params):
        mode = attr.get("mode").decode("ascii")
        if mode == "nearest":
            method = "nearest_neighbor"
        elif mode == "linear":
            method = "bilinear"
        elif mode == "cubic":
            method = "bicubic"
        else:
            raise tvm.error.OpAttributeInvalid(
                'Value {} in attribute "mode" of operator Resize is not valid.'.format(mode)
            )

        scale = inputs[1]
        size = _op.cast(shape_of(inputs[0]), infer_type(scale).checked_type.dtype) * scale
        layout = "NCHW"  # ONNX assumes NCHW layout
        out_size = fold_constant(_op.strided_slice(size, [2], [4]))
        return _op.image.resize(inputs[0], out_size, layout, method, "asymmetric")

可以看到這個Resize類最后調(diào)用了tvm/python/tvm/relay/op/image/image.py中的resize函數(shù)：

def resize(
    data,
    size,
    layout="NCHW",
    method="bilinear",
    coordinate_transformation_mode="half_pixel",
    rounding_method="",
    bicubic_alpha=-0.5,
    bicubic_exclude=0,
    out_dtype=None,
):
    """Image resize operator.

    This operator takes data as input and does 2D scaling to the given scale factor.
    In the default case, where the data_layout is `NCHW`
    with data of shape (n, c, h, w)
    out will have a shape (n, c, size[0], size[1])

    method indicates the algorithm to be used while calculating the out value
    and method can be one of ("bilinear", "nearest_neighbor", "bicubic")

    Parameters
    ----------
    data : relay.Expr
        The input data to the operator.

    size: Tuple of Int or Expr
        The out size to which the image will be resized.

    layout : str, optional
        Layout of the input.

    method : str, optional
        Scale method to used [nearest_neighbor, bilinear, bicubic].

    coordinate_transformation_mode : string, optional
        Describes how to transform the coordinate in the resized tensor
        to the coordinate in the original tensor.
        Refer to the ONNX Resize operator specification for details.
        [half_pixel, align_corners, asymmetric]

    rounding_method: string, optional
        indicates how to find the "nearest" pixel in nearest_neighbor method
        [round, floor, ceil]

    bicubic_alpha: float
        Spline Coefficient for Bicubic Interpolation

    bicubic_exclude: int
            Flag to exclude exterior of the image during bicubic interpolation

    out_dtype : str, optional
        Type to return. If left None returns the same type as input.

    Returns
    -------
    result: relay.Expr
        The resized result.
    """
    if isinstance(size, Constant):
        size = list(size.data.asnumpy().astype("int32"))
    if isinstance(size, Expr):
        return _dyn_make.resize(
            data,
            size,
            layout,
            method,
            coordinate_transformation_mode,
            rounding_method,
            bicubic_alpha,
            bicubic_exclude,
            out_dtype,
        )
    return _make.resize(
        data,
        size,
        layout,
        method,
        coordinate_transformation_mode,
        rounding_method,
        bicubic_alpha,
        bicubic_exclude,
        out_dtype,
    )

這里又是經(jīng)過了_make.resize函數(shù)，在上一節(jié)Relay 卷積算子的扭轉(zhuǎn)過程中我們已經(jīng)知道在_make.py中實際上實現(xiàn)了C++類到python類的注冊，因此這里對應(yīng)了TVM的TOPI Resize算子的C++算子接口。即對應(yīng)了tvm/src/relay/op/image/resize.cc中的Resize OP注冊代碼：

TVM_REGISTER_GLOBAL("relay.op.image._make.resize").set_body_typed(MakeResize);

RELAY_REGISTER_OP("image.resize")
    .describe(R"code(Perform resize to input array with nearest neighbour or bilinear interpolation.

- **data**: data is 4D array of shape
            (batch_size, channels, in_height, in_width) for NCHW
            (batch_size, in_height, in_width, channels) for NHWC

- **out**: Output is 4D array of shape
           for layout NCHW
           (batch_size, channels, size[0], size[1])

           for layout NHWC
           (batch_size, size[0], size[1], channels)
)code" TVM_ADD_FILELINE)
    .set_attrs_type<ResizeAttrs>()
    .set_num_inputs(1)
    .add_argument("data", "Tensor", "The input tensor.")
    .set_support_level(5)
    .add_type_rel("Resize", ResizeRel)
    .set_attr<TOpPattern>("TOpPattern", kInjective);

然后在TVM的CodeBase例子中這樣介紹：

最重要的一句話是，Operators corresponding to each node are registered insrc/relay/op. Implementations of operators are in topi, and they are coded in either C++ or Python.

從這里我們可以知道TVM的算子具體實現(xiàn)都是在tvm/python/tvm/topi這里完成的。

繼續(xù)回到Resize算子，它是通過tvm/python/tvm/relay/op/image/_image.py中的下面的代碼建立了OP和TOPI算子的連接：

# resize
@reg.register_compute("image.resize")
def compute_resize(attrs, inputs, out_type):
    """ compute definition for resize op """
    size = attrs.size
    layout = attrs.layout
    method = attrs.method
    coord_trans = attrs.coordinate_transformation_mode
    rounding_method = attrs.rounding_method
    bicubic_alpha = attrs.bicubic_alpha
    bicubic_exclude = attrs.bicubic_exclude
    out_dtype = attrs.out_dtype
    return [
        topi.image.resize(
            inputs[0],
            size,
            layout,
            method,
            coord_trans,
            rounding_method,
            bicubic_alpha,
            bicubic_exclude,
            out_dtype,
        )
    ]


reg.register_injective_schedule("image.resize")

上一節(jié)的Relay nn相關(guān)的算子也有建立連接的過程，在tvm/python/tvm/relay/op/nn/_nn.py中有nn.conv2d關(guān)鍵字的地方。

然后我們來看一下TOPI Resize算子的具體實現(xiàn)代碼，在tvm/python/tvm/topi/image/resize.py中的resize函數(shù)，最后一行就是根據(jù)上面?zhèn)魅氲膍ethod來選擇使用哪種插值方式進行Resize。

# Determine which interpolation method to use then run it.
    if method == "nearest_neighbor":
        compute_func = _nearest_neighbor
    elif method == "bilinear":
        compute_func = _bilinear
    elif method == "bicubic":
        compute_func = _bicubic
    else:
        raise ValueError("%s method is not supported." % method)

    return te.compute(output_shape, compute_func, name="resize", tag=tag.INJECTIVE)

每個函數(shù)的具體實現(xiàn)方式就不細(xì)講了，感興趣的讀者可以直接在tvm/python/tvm/topi/image/resize.py這里找到源碼。

0x3. 調(diào)度

在介紹上面的TOPI算子時貼出了tvm/python/tvm/relay/op/image/_image.py中建立OP和TOPI算子的連接的代碼，其中最后一行代碼如下：

reg.register_injective_schedule("image.resize")

這一行代碼實際上就完成了TVM中調(diào)度的功能，我們在第二節(jié)講過TVM中的調(diào)度是通過scheduler來完成的。【從零開始學(xué)深度學(xué)習(xí)編譯器】二，TVM中的scheduler

從下面的截圖我們可以清楚的看到TVM對于多種硬件設(shè)備都設(shè)置了對應(yīng)的scheduler

對于arm_cpu來說，卷積以及深度可分離卷積等都有特定的scheduler，而上面注冊的調(diào)度方法injective是通用的scheduler，觀察代碼實現(xiàn)可以發(fā)現(xiàn)僅僅是做了vectorize，即下面的s[x].vectorize(ii)。

def schedule_injective(outs):
    """ARM CPU schedule for injective op.

    Parameters
    ----------
    outs: Array of Tensor
          The computation graph description of injective in the format
          of an array of tensors.

    Returns
    -------
    sch: Schedule
        The computation schedule for the op.
    """
    outs = [outs] if isinstance(outs, te.tensor.Tensor) else outs
    s = te.create_schedule([x.op for x in outs])
    x = outs[0]

    if list(s[x].op.axis):
        # do not vectorize for broadcast
        (io, ii) = s[x].split(list(s[x].op.axis)[-1], 4)
        s[x].vectorize(ii)
    tvm.te.schedule.AutoInlineInjective(s)

    if not is_empty_shape(x.shape):
        schedule_injective_from_existing(s, x)
    return s

0x4. 總結(jié)

這篇文章主要是基于Relay 卷積算子和TOPI Resize算子來梳理了一下TVM中的算子扭轉(zhuǎn)和實現(xiàn)的具體過程，在這個過程中我們也了解到了對于不同的后端，TVM提供了各種scheduler來幫助模型加速，希望看完的讀者有所收獲。其實自己看算子扭轉(zhuǎn)和實現(xiàn)的過程中對于一些小細(xì)節(jié)可能還把握得不夠，也歡迎大家一起討論。

0x5. 參考

https://zhuanlan.zhihu.com/p/351403985
https://zhuanlan.zhihu.com/p/149386093

歡迎關(guān)注GiantPandaCV, 在這里你將看到獨家的深度學(xué)習(xí)分享，堅持原創(chuàng)，每天分享我們學(xué)習(xí)到的新鮮知識。( ? ?ω?? )?

有對文章相關(guān)的問題，或者想要加入交流群，歡迎添加BBuf微信：

二維碼

【從零開始學(xué)深度學(xué)習(xí)編譯器】四，解析TVM算子