【從零開始學(xué)深度學(xué)習(xí)編譯器】十二,MLIR Toy Tutorials學(xué)習(xí)筆記一
?本筆記由學(xué)習(xí)MLIR Tutorials總結(jié)而成,歡迎批評(píng)指正。
Chapter1: Toy語言和AST
MLIR提供了一種Toy語言來說明MLIR的定義和執(zhí)行的流程。Toy語言是一種基于張量的語言,我們可以使用它來定義函數(shù),執(zhí)行一些數(shù)學(xué)計(jì)算以及輸出結(jié)果。下面要介紹的例子中限制Tensor的維度是<=2的,并且Toy語言中唯一的數(shù)據(jù)類型是64位浮點(diǎn)類型,對(duì)應(yīng)C語言中的"double"。另外Values是不可以重寫的,即每個(gè)操作都會(huì)返回一個(gè)新分配的值,并自動(dòng)管理釋放。直接看下面這個(gè)例子:
def?main()?{
??#?Define?a?variable?`a`?with?shape?<2,?3>,?initialized?with?the?literal?value.
??#?The?shape?is?inferred?from?the?supplied?literal.
??var?a?=?[[1,?2,?3],?[4,?5,?6]];
??#?b?is?identical?to?a,?the?literal?tensor?is?implicitly?reshaped:?defining?new
??#?variables?is?the?way?to?reshape?tensors?(element?count?must?match).
??var?b<2,?3>?=?[1,?2,?3,?4,?5,?6];
??#?transpose()?and?print()?are?the?only?builtin,?the?following?will?transpose
??#?a?and?b?and?perform?an?element-wise?multiplication?before?printing?the?result.
??print(transpose(a)?*?transpose(b));
}
類型檢查是通過類型推斷靜態(tài)執(zhí)行的。Toy語言只需在必要時(shí)指定Tensor形狀的類型聲明。下面定義了一個(gè)multiply_transpose函數(shù),注意這個(gè)函數(shù)里面參數(shù)a和b的形狀我們預(yù)先都是不知道的,只有調(diào)用這個(gè)函數(shù)時(shí)我們才知道,可以關(guān)注一下下面例子中的shape變化。
#?User?defined?generic?function?that?operates?on?unknown?shaped?arguments.
def?multiply_transpose(a,?b)?{
??return?transpose(a)?*?transpose(b);
}
def?main()?{
??#?Define?a?variable?`a`?with?shape?<2,?3>,?initialized?with?the?literal?value.
??var?a?=?[[1,?2,?3],?[4,?5,?6]];
??var?b<2,?3>?=?[1,?2,?3,?4,?5,?6];
??#?This?call?will?specialize?`multiply_transpose`?with?<2,?3>?for?both
??#?arguments?and?deduce?a?return?type?of?<3,?2>?in?initialization?of?`c`.
??var?c?=?multiply_transpose(a,?b);
??#?A?second?call?to?`multiply_transpose`?with?<2,?3>?for?both?arguments?will
??#?reuse?the?previously?specialized?and?inferred?version?and?return?<3,?2>.
??var?d?=?multiply_transpose(b,?a);
??#?A?new?call?with?<3,?2>?(instead?of?<2,?3>)?for?both?dimensions?will
??#?trigger?another?specialization?of?`multiply_transpose`.
??var?e?=?multiply_transpose(b,?c);
??#?Finally,?calling?into?`multiply_transpose`?with?incompatible?shape?will
??#?trigger?a?shape?inference?error.
??var?f?=?multiply_transpose(transpose(a),?c);
}
然后我們可以使用下面的命令來產(chǎn)生這個(gè)Toy語言程序的AST:
cd?llvm-project/build/bin
./toyc-ch1?../../mlir/test/Examples/Toy/Ch1/ast.toy?--emit=ast
前提是要構(gòu)建好llvm-project工程,構(gòu)建過程按照https://mlir.llvm.org/getting_started/ 這里的方法操作即可,這里再列一下完整過程:
$?git?clone?https://github.com/llvm/llvm-project.git
$?mkdir?llvm-project/build
$?cd?llvm-project/build
$?cmake?-G?"Unix?Makefiles"?../llvm?\
?????-DLLVM_ENABLE_PROJECTS=mlir?\
?????-DLLVM_BUILD_EXAMPLES=ON?\
?????-DLLVM_TARGETS_TO_BUILD="host"?\
?????-DCMAKE_BUILD_TYPE=Release?\
?????-DLLVM_ENABLE_ASSERTIONS=ON?
$?cmake?--build?.?--target?check-mlir
上面Toy程序產(chǎn)生的AST長(zhǎng)下面這樣:
Module:
????Function?
??????Proto?'multiply_transpose'?@../../mlir/test/Examples/Toy/Ch1/ast.toy:4:1
??????Params:?[a,?b]
??????Block?{
????????Return
??????????BinOp:?*?@../../mlir/test/Examples/Toy/Ch1/ast.toy:5:25
????????????Call?'transpose'?[?@../../mlir/test/Examples/Toy/Ch1/ast.toy:5:10
??????????????var:?a?@../../mlir/test/Examples/Toy/Ch1/ast.toy:5:20
????????????]
????????????Call?'transpose'?[?@../../mlir/test/Examples/Toy/Ch1/ast.toy:5:25
??????????????var:?b?@../../mlir/test/Examples/Toy/Ch1/ast.toy:5:35
????????????]
??????}?//?Block
????Function?
??????Proto?'main'?@../../mlir/test/Examples/Toy/Ch1/ast.toy:8:1
??????Params:?[]
??????Block?{
????????VarDecl?a<>?@../../mlir/test/Examples/Toy/Ch1/ast.toy:11:3
??????????Literal:?<2,?3>[?<3>[?1.000000e+00,?2.000000e+00,?3.000000e+00],?<3>[?4.000000e+00,?5.000000e+00,?6.000000e+00]]?@../../mlir/test/Examples/Toy/Ch1/ast.toy:11:11
????????VarDecl?b<2,?3>?@../../mlir/test/Examples/Toy/Ch1/ast.toy:15:3
??????????Literal:?<6>[?1.000000e+00,?2.000000e+00,?3.000000e+00,?4.000000e+00,?5.000000e+00,?6.000000e+00]?@../../mlir/test/Examples/Toy/Ch1/ast.toy:15:17
????????VarDecl?c<>?@../../mlir/test/Examples/Toy/Ch1/ast.toy:19:3
??????????Call?'multiply_transpose'?[?@../../mlir/test/Examples/Toy/Ch1/ast.toy:19:11
????????????var:?a?@../../mlir/test/Examples/Toy/Ch1/ast.toy:19:30
????????????var:?b?@../../mlir/test/Examples/Toy/Ch1/ast.toy:19:33
??????????]
????????VarDecl?d<>?@../../mlir/test/Examples/Toy/Ch1/ast.toy:22:3
??????????Call?'multiply_transpose'?[?@../../mlir/test/Examples/Toy/Ch1/ast.toy:22:11
????????????var:?b?@../../mlir/test/Examples/Toy/Ch1/ast.toy:22:30
????????????var:?a?@../../mlir/test/Examples/Toy/Ch1/ast.toy:22:33
??????????]
????????VarDecl?e<>?@../../mlir/test/Examples/Toy/Ch1/ast.toy:25:3
??????????Call?'multiply_transpose'?[?@../../mlir/test/Examples/Toy/Ch1/ast.toy:25:11
????????????var:?b?@../../mlir/test/Examples/Toy/Ch1/ast.toy:25:30
????????????var:?c?@../../mlir/test/Examples/Toy/Ch1/ast.toy:25:33
??????????]
????????VarDecl?f<>?@../../mlir/test/Examples/Toy/Ch1/ast.toy:28:3
??????????Call?'multiply_transpose'?[?@../../mlir/test/Examples/Toy/Ch1/ast.toy:28:11
????????????Call?'transpose'?[?@../../mlir/test/Examples/Toy/Ch1/ast.toy:28:30
??????????????var:?a?@../../mlir/test/Examples/Toy/Ch1/ast.toy:28:40
????????????]
????????????var:?c?@../../mlir/test/Examples/Toy/Ch1/ast.toy:28:44
??????????]
??????}?//?Block
AST的解析具體實(shí)現(xiàn)在mlir/examples/toy/Ch1/include/toy/Parser.h和mlir/examples/toy/Ch1/include/toy/Lexer.h中,感興趣的讀者可以看一下。我對(duì)這一塊并不熟悉,就暫時(shí)不深入下去了,但這個(gè)AST看起來還是比較直觀的,首先有兩個(gè)Function對(duì)應(yīng)了Toy程序里面的multiply_transpose和main,Params表示函數(shù)的輸入?yún)?shù),Proto表示這個(gè)函數(shù)在ast.toy文件中的行數(shù)和列數(shù),BinOp表示transpose(a) * transpose(b)中的*是二元Op,并列出了左值和右值。其它的以此類推也比較好理解。
第一章就是簡(jiǎn)單介紹了一下Toy語言的幾個(gè)特點(diǎn)以及Toy示例程序產(chǎn)生的AST長(zhǎng)什么樣子,如果對(duì)AST的解析感興趣可以去查看代碼實(shí)現(xiàn)。
Chapter2. 生成初級(jí)MLIR
MLIR 被設(shè)計(jì)成完全可擴(kuò)展的基礎(chǔ)框架,沒有封閉的屬性集、操作和類型。MLIR 通過Dialect(https://mlir.llvm.org/docs/LangRef/#dialects)的概念來支持這種可擴(kuò)展性。Dialect在一個(gè)特定的namespace下為抽象提供了分組機(jī)制。
在MLIR里面,Operation是抽象和計(jì)算的核心單元,在許多方面與 LLVM 指定類似。具有特定于應(yīng)用程序的語義,并且可以用于表示 LLVM 中的所有核心的 IR 結(jié)構(gòu):指令、globals(類似function)和模塊。下面展示一個(gè)Toy語言產(chǎn)生的的transpose Operation。
%t_tensor?=?"toy.transpose"(%tensor)?{inplace?=?true}?:?(tensor<2x3xf64>)?->?tensor<3x2xf64>?loc("example/file/path":12:1)
結(jié)構(gòu)拆分解釋:
%t_tensor:這個(gè)Operation定義的結(jié)果的名字,前面的%是避免沖突,見https://mlir.llvm.org/docs/LangRef/#identifiers-and-keywords 。一個(gè)Operation可以定義0或者多個(gè)結(jié)果(在Toy語言中,只有單結(jié)果的Operation),它們是SSA值。該名稱在解析期間使用,但不是持久的(例如,它不會(huì)在 SSA 值的內(nèi)存表示中進(jìn)行跟蹤)。"toy.transpose":Operation的名字。它應(yīng)該是一個(gè)唯一的字符串,Dialect 的命名空間前綴為“.”。這可以理解為Toy Dialect 中的transpose Operation。(%tensor):零個(gè)或多個(gè)輸入操作數(shù)(或參數(shù))的列表,它們是由其它操作定義或引用塊參數(shù)的 SSA 值。{ inplace = true }:零個(gè)或多個(gè)屬性的字典,這些屬性是始終為常量的特殊操作數(shù)。在這里,我們定義了一個(gè)名為“inplace”的布爾屬性,它的常量值為 true。(tensor<2x3xf64>) -> tensor<3x2xf64>:函數(shù)形式表示的操作類型,前者是輸入,后者是輸出。<2x3xf64>號(hào)中間的內(nèi)容描述了張量的尺寸2x3和張量中存儲(chǔ)的數(shù)據(jù)類型f64,中間使用x連接。loc("example/file/path":12:1):此操作的源代碼中的位置。
了解了MLIR指令的基本結(jié)構(gòu)后,我們把目光放到Chapter2要做什么事情上?即生成初級(jí)MLIR。我們執(zhí)行下面的命令為Chapter2測(cè)試?yán)又械?code style="font-size: 14px;word-wrap: break-word;padding: 2px 4px;border-radius: 4px;margin: 0 2px;background-color: rgba(27,31,35,.05);font-family: Operator Mono, Consolas, Monaco, Menlo, monospace;word-break: break-all;color: #f48a00;">codegen.toy產(chǎn)生MLIR。
./toyc-ch2?../../mlir/test/Examples/Toy/Ch2/codegen.toy?-emit=mlir?-mlir-print-debuginfo
其中codegen.toy的內(nèi)容為:
def?multiply_transpose(a,?b)?{
??return?transpose(a)?*?transpose(b);
}
def?main()?{
??var?a<2,?3>?=?[[1,?2,?3],?[4,?5,?6]];
??var?b<2,?3>?=?[1,?2,?3,?4,?5,?6];
??var?c?=?multiply_transpose(a,?b);
??var?d?=?multiply_transpose(b,?a);
??print(d);
}
產(chǎn)生的MLIR為:
module??{
??func?@multiply_transpose(%arg0:?tensor<*xf64>?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":4:1),?%arg1:?tensor<*xf64>?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":4:1))?->?tensor<*xf64>?{
????%0?=?toy.transpose(%arg0?:?tensor<*xf64>)?to?tensor<*xf64>?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":5:10)
????%1?=?toy.transpose(%arg1?:?tensor<*xf64>)?to?tensor<*xf64>?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":5:25)
????%2?=?toy.mul?%0,?%1?:?tensor<*xf64>?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":5:25)
????toy.return?%2?:?tensor<*xf64>?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":5:3)
??}?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":4:1)
??func?@main()?{
????%0?=?toy.constant?dense<[[1.000000e+00,?2.000000e+00,?3.000000e+00],?[4.000000e+00,?5.000000e+00,?6.000000e+00]]>?:?tensor<2x3xf64>?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":9:17)
????%1?=?toy.reshape(%0?:?tensor<2x3xf64>)?to?tensor<2x3xf64>?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":9:3)
????%2?=?toy.constant?dense<[1.000000e+00,?2.000000e+00,?3.000000e+00,?4.000000e+00,?5.000000e+00,?6.000000e+00]>?:?tensor<6xf64>?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":10:17)
????%3?=?toy.reshape(%2?:?tensor<6xf64>)?to?tensor<2x3xf64>?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":10:3)
????%4?=?toy.generic_call?@multiply_transpose(%1,?%3)?:?(tensor<2x3xf64>,?tensor<2x3xf64>)?->?tensor<*xf64>?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":11:11)
????%5?=?toy.generic_call?@multiply_transpose(%3,?%1)?:?(tensor<2x3xf64>,?tensor<2x3xf64>)?->?tensor<*xf64>?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":12:11)
????toy.print?%5?:?tensor<*xf64>?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":13:3)
????toy.return?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":8:1)
??}?loc("../../mlir/test/Examples/Toy/Ch2/codegen.toy":8:1)
}?loc(unknown)
我們需要弄清楚codegen.toy是如何產(chǎn)生的MLIR文件。也即下圖的AST到MLIR表達(dá)式那部分(包含Dialect)。

生成MLIR的流程

這里首先有一個(gè)MLIRGen函數(shù)負(fù)責(zé)遍歷AST。具體在mlir/examples/toy/Ch2/mlir/MLIRGen.cpp文件中實(shí)現(xiàn),里面有一個(gè)mlirGen函數(shù),實(shí)現(xiàn)如下:
///?Dispatch?codegen?for?the?right?expression?subclass?using?RTTI.
??mlir::Value?mlirGen(ExprAST?&expr)?{
????switch?(expr.getKind())?{
????case?toy::ExprAST::Expr_BinOp:
??????return?mlirGen(cast(expr));
????case?toy::ExprAST::Expr_Var:
??????return?mlirGen(cast(expr));
????case?toy::ExprAST::Expr_Literal:
??????return?mlirGen(cast(expr));
????case?toy::ExprAST::Expr_Call:
??????return?mlirGen(cast(expr));
????case?toy::ExprAST::Expr_Num:
??????return?mlirGen(cast(expr));
????default:
??????emitError(loc(expr.loc()))
??????????<"MLIR?codegen?encountered?an?unhandled?expr?kind?'"
??????????<"'";
??????return?nullptr;
????}
??}
這個(gè)函數(shù)會(huì)根據(jù)AST中的節(jié)點(diǎn)類型遞歸調(diào)用其它的mlirGen子函數(shù),并在各個(gè)子函數(shù)完成真正的轉(zhuǎn)換MLIR表達(dá)式的操作。以上面codege.toy的transpose(a)操作為例,對(duì)應(yīng)的mlirGen子函數(shù)為:
///?Emit?a?call?expression.?It?emits?specific?operations?for?the?`transpose`
??///?builtin.?Other?identifiers?are?assumed?to?be?user-defined?functions.
??mlir::Value?mlirGen(CallExprAST?&call)?{
????llvm::StringRef?callee?=?call.getCallee();
????auto?location?=?loc(call.loc());
????//?Codegen?the?operands?first.
????SmallVector4>?operands;
????for?(auto?&expr?:?call.getArgs())?{
??????auto?arg?=?mlirGen(*expr);
??????if?(!arg)
????????return?nullptr;
??????operands.push_back(arg);
????}
????//?Builtin?calls?have?their?custom?operation,?meaning?this?is?a
????//?straightforward?emission.
????if?(callee?==?"transpose")?{
??????if?(call.getArgs().size()?!=?1)?{
????????emitError(location,?"MLIR?codegen?encountered?an?error:?toy.transpose?"
????????????????????????????"does?not?accept?multiple?arguments");
????????return?nullptr;
??????}
??????return?builder.create(location,?operands[0]);
????}
????//?Otherwise?this?is?a?call?to?a?user-defined?function.?Calls?to
????//?user-defined?functions?are?mapped?to?a?custom?call?that?takes?the?callee
????//?name?as?an?attribute.
????return?builder.create(location,?callee,?operands);
??}
我們可以看到if (callee == "transpose")這里是對(duì)函數(shù)簽名進(jìn)行判斷,如果是transpose 那么就需要新建一個(gè)TransposeOp類型的MLIR節(jié)點(diǎn),即builder.create。這行代碼涉及到MLIR的Dialect和TableGen,我們?cè)敿?xì)解釋一下。
在【從零開始學(xué)深度學(xué)習(xí)編譯器】十一,初識(shí)MLIR 中已經(jīng)說過,MLIR是通過Dialect來統(tǒng)一各種不同級(jí)別的IR,即負(fù)責(zé)定義各種Operation和解析,同時(shí)還具有可擴(kuò)展性。在Toy語言中我們也定義了Dialect,定義這個(gè)Dialect的時(shí)候是通過TableGen規(guī)范來定義到mlir/examples/toy/Ch2/include/toy/Ops.td中的。
//?Provide?a?definition?of?the?'toy'?dialect?in?the?ODS?framework?so?that?we
//?can?define?our?operations.
def?Toy_Dialect?:?Dialect?{
??let?name?=?"toy";
??let?cppNamespace?=?"::mlir::toy";
}
在MLIR中,Dialect和Operation(也可以說算子)的定義是框架是基于TableGen(一種聲明性編程語言)規(guī)范構(gòu)造的,在源碼中它以.td的格式存在,在編譯時(shí)會(huì)自動(dòng)生成對(duì)應(yīng)的C++文件,生成定義好的Dialect。使用TableGen的好處不僅是因?yàn)樗锹暶餍缘恼Z言讓新增Dialect和Operation變得簡(jiǎn)單,而且容易修改和維護(hù)??赡芪医忉尩貌皇呛苤庇^,但我們可以直接結(jié)合Chapter2的代碼mlir/examples/toy/Ch2/include/toy/Ops.td 來理解。后面我們會(huì)看到在Toy語言的示例中,.td文件的組成以及TableGen是如何自動(dòng)解析.td生成C++代碼的。
這里首先在td中定義一下Toy Dialect,并建立和Dialect的鏈接,它負(fù)責(zé)將后續(xù)在Toy Dialect空間下定義的所有Operation聯(lián)系起來。即:
//?Provide?a?definition?of?the?'toy'?dialect?in?the?ODS?framework?so?that?we
//?can?define?our?operations.
def?Toy_Dialect?:?Dialect?{
??let?name?=?"toy";
??let?cppNamespace?=?"::mlir::toy";
}
然后構(gòu)造一個(gè)Toy_Op類代表Toy Dialect下所有Operation的基類,后面新增Operation都需要繼承這個(gè)類。
//?Base?class?for?toy?dialect?operations.?This?operation?inherits?from?the?base
//?`Op`?class?in?OpBase.td,?and?provides:
//???*?The?parent?dialect?of?the?operation.
//???*?The?mnemonic?for?the?operation,?or?the?name?without?the?dialect?prefix.
//???*?A?list?of?traits?for?the?operation.
class?Toy_Op?traits?=?[]>?:
????Op;
下面給出transpose Operation的定義感受一下:
def?TransposeOp?:?Toy_Op<"transpose">?{
??let?summary?=?"transpose?operation";
??let?arguments?=?(ins?F64Tensor:$input);
??let?results?=?(outs?F64Tensor);
??let?assemblyFormat?=?[{
????`(`?$input?`:`?type($input)?`)`?attr-dict?`to`?type(results)
??}];
??//?Allow?building?a?TransposeOp?with?from?the?input?operand.
??let?builders?=?[
????OpBuilder<(ins?"Value":$input)>
??];
??//?Invoke?a?static?verify?method?to?verify?this?transpose?operation.
??let?verifier?=?[{?return?::verify(*this);?}];
}
在繼承Toy_Op的基礎(chǔ)上,還使用TableGen語法定義了描述信息,參數(shù),值,builder,verfier這些元素。
編寫完td文件之后,就可以使用mlir-tblgen工具生成C++代碼,先使用下面的命令生成Dialect的C++代碼:./mlir-tblgen -gen-dialect-decls llvm-project/mlir/examples/toy/Ch2/include/toy/Ops.td -I ../../mlir/include/

把上面的命令換成./mlir-tblgen -gen-op-defs llvm-project/mlir/examples/toy/Ch2/include/toy/Ops.td -I ../../mlir/include/ 就可以生成Operation的C++代碼。感興趣的讀者可自行查看。
與工具鏈 toyc-ch2 的聯(lián)系,查看CMakeList.txt 文件(默認(rèn)位置為 llvm-project/mlir/examples/toy/Ch2/include/toy):
set(LLVM_TARGET_DEFINITIONS?Ops.td)
mlir_tablegen(Ops.h.inc?-gen-op-decls)
mlir_tablegen(Ops.cpp.inc?-gen-op-defs)
mlir_tablegen(Dialect.h.inc?-gen-dialect-decls)
mlir_tablegen(Dialect.cpp.inc?-gen-dialect-defs)
add_public_tablegen_target(ToyCh2OpsIncGen)
使用mlir-tblgen搭配 -gen-op-decls 和 -gen-op-defs 選項(xiàng),生成 Ops.h.inc 聲明代碼和 Ops.cpp.inc 定義代碼,將兩者作為構(gòu)建工具鏈 toyc-ch2 的代碼依賴。
總結(jié)一下,Chapter2主要介紹了MLIR中的MLIRGen,Dialect,Operation以及TableGen這幾個(gè)MLIR的核心組成部分以及它們是如何相互作用的。它們的關(guān)系可以借用中科院Zhang Hongbin同學(xué)的PPT來更好的描述:

小結(jié)
這是閱讀MLIR Toy Tutorials第一章和第二章的筆記,歡迎指出錯(cuò)誤和不合理之處。
