Code Review —— Myia Compile - leozp/Myia-Issues GitHub Wiki

8. Compile and NNVM Flow

8.1 Compile 及运行流程分析

静态图

整体流程分为以下5步:
1. step_wrap_primitives: 将Graph与结果无关的常量节点进行封装
1. step_compile: 将Graph进行切分后，初始化编译相关资源和环境，并将Graph编译成指令集，预留优化环节
1. step_link: 将指令集中预留的push_graph指令，链接对应的push指令，完成指令链接
1. step_export: 生成可调用的VM环境，并初始化相关配置
1. step_wrap: 调用VM的eval方法，通过NNvmRunner运行计算，并将输出封装为所需类型返回

step_wrap_primitives

    step_wrap_primitives = WrapPrimitives.partial()
    
    Inputs:
        graph: A graph

    Outputs:
        graph: The transformed graph

在”/compile/transform.py“文件中，由独立模块 WrapPrimitives 实现其功能。
功能：从处理过程上看类似于剪枝，将Graph中与结果计算无关的constant node进行封装，使其无法调用。
流程：遍历所有节点，如果本身是常量而输入不是常量的节点，将其转换为常量节点进行封装，并通过set_edge替换节点。

step_compile

    step_compile = CompileGraphs.partial(linear_impl='nnvm', target='cpu', dev_id=0)

    Inputs:
        graph: A graph

    Outputs:
        mapping: map each graph to its starting position in the code list.
        uinstrs: list of unlinked instructions for all the graphs in
                 the cluster, starting with the passed-in graph.

功能：将Graph转换为指令集
流程：将Graph进行切分后，初始化编译相关资源和环境，并将Graph编译成指令集，预留优化环节
编译操作有以下3步：
1. SplitGraph 将图切分为线性和控制流
1. CompileGraph 将切分后的图转换为线性指令序列
1. OptimizeInstrs 进行指令优化，目前暂未实现

编译参数操作如下：

    graph_transform = PipelineDefinition(
        resources=dict(
            lin_convert=nnvm_convert,
            target='cpu',
            dev_id=0,
        ),
        steps=dict(
            split=SplitGraph.partial(),
            compile=CompileGraph.partial(),
            optimize=OptimizeInstrs.partial(),
        )
    )

支持 debug 和 nnvm 两种模式
nnvm 模式可选取 CPU 和 GPU
当前流程配置为 nnvm 和 CPU，仅对此流程详细分析

SplitGraph 图切分

class SplitGraph(PipelineStep):
    """Pipeline step to cut the graph into linear portions and control flow.

    Inputs:
        graph: A graph

    Outputs:
        splits: list of graph portions

    """

处理流程：
1. 只处理apply节点
1. apply节点且input[0]不为常量，或者input[0]为return，partial，switch，make_tuple之一，进行切分
1. 切分后，保存到splits列表

CompileGraph 图编译

class CompileGraph(PipelineStep):
    """Step to convert splits into linear instruction flow.

    Inputs:
        graph: A graph
        splits: list of graph portions

    Outputs:
        uinstrs: list of instructions for the graph (unlinked)

    """

程序中对应可转换的指令集如下：
指令集列表：

函数	条件	指令
push_graph	获取常量图的栈索引	add_instr('push_graph', node.value)
push	获取Node值的的栈索引	add_instr('push', node.value)
dup	确保节点值在堆栈顶	add_instr('dup', self.ref(node))
external	节点为Apply list	add_instr('external', run, args)
return_	节点为Apply且输入为常量	add_instr('return', self.ref(split.inputs[1]), self.height)
partial	节点为Apply且输入为常量	add_instr('partial', self.ref(split.inputs[1]), *tuple(split.inputs[2:]))
switch	节点为Apply且输入为常量	add_instr('switch', self.ref(split.inputs[1], inputs[2], inputs[3]))
make_tuple	节点为Apply且输入为常量	add_instr('tuple', *[self.ref(i) for i in split.inputs[1:]])
tailcall	节点为Apply且output，输入不是常量	add_instr('tailcall', self.ref(fn), self.height,len(split.inputs[1:]))
call	节点为Apply且输入不是常量	add_instr('call', self.ref(fn))
pad_stack	需要增加栈，插入到指令首位	instrs.insert(0, ('pad_stack', need_stack))

处理流程：
1. 将graph.parameter倒序后，进行压栈操作
1. 对切分后的graph（只包含apply节点），如果是apply list，建立运行资源和环境，获取 run, input, output参数
- 对input参数，组装ref索引，生成args
- 对run参数, 组装指令 add_instr('external', run, args)
- 对output参数, push压栈
1. 对切分后的graph，如果是apply节点
- 如果input[0]是常量，根据节点value生成相应的 return，partial，switch，tuple 指令
- 如果input[0]不是常量，节点是Graph输出，生成 tailcall 指令，其他则生成 call 指令
1. 如果需要增加栈，生成 pad_stack 指令，插入到首位
1. 完成指令转换操作，并返回对应指令

OptimizeInstrs 指令优化

class OptimizeInstrs(PipelineStep):
    """Run peephole optimizations.

    Inputs:         
        uinstrs: List of unlinked instructions

    Outputs:
        uinstrs: List of unlinked instructions
    """

指令优化部分暂未实现

step_link

step_link = LinkInstrs.partial()

class LinkInstrs(PipelineStep):
    """Link unlinked instructions.

    Inputs:
        mapping: graph map
        uinstrs: unlinked instructions

    Outputs:
        instrs: linked instructions

    """

    def step(self, mapping, uinstrs):
        """Link instructions."""
        for i in range(len(uinstrs)):
            instr = uinstrs[i]
            if instr[0] == 'push_graph':
                uinstrs[i] = ('push', mapping[instr[1]])

        return {'instrs': uinstrs}

Link 处理流程：
1. 依次遍历uinstrs指令序列，将push_graph指令，替换为push指令及mapping中对应参数
1. 具体功能在LinkInstrs模块中实现，相当于将常量图符号，替换为其具体指令

step_export

step_export = VMExporter.partial()

class VMExporter(PipelineStep):
    """Make a callable out of instructions.

    Inputs:
        instrs: instruction list

    Outputs:
        output: callable
    """

    def step(self, instrs):
        """Make a callable."""
        return {'output': FinalVM(instrs)}

Export 功能：生成可调用的指令集的运行环境，并初始化相关配置
流程：
1. 具体功能在VMExporter模块中实现
1. VMExporter模块，调用FinalVM，并传入指令集参数，初始化相关配置

step_wrap

def step_wrap(self,
              graph,
              output,
              argspec,
              outspec,
              orig_argspec=None,
              orig_outspec=None,
              erase_class=False,
              erase_tuple=False):
        """Convert args to vm format, and output from vm format."""

Wrap 功能: 调用VM的eval方法，通过NNvmRunner运行计算，并将输出封装为源环境所需类型返回
流程：
1. 调用 convert_arg 将输入arg转换为myia类型，并通过tuple进行打包成参数
1. 通过调用方式，res = fn(*args)，触发运行调用 FinalVM的eval方法
1. 调用 convert_result 将返回结果转换为源环境所需类型后返回

8.2 NNVM调用流程分析

结合代码分析NNVM调用流程和各模块的功能

代码实例

def f1(x, y):
    def f(xs, ys):
        return array_map(scalar_add, xs, ys)
    # return asscalar(array_reduce(scalar_add, f(x[:], y[:]), ()))
    return asscalar(array_reduce(scalar_add, f(x, y), ()))

@myia
def main(x, y):
    dfdx = grad(f1)(x, y)
    return dfdx

编译前Graph：

_parameter8 (4852717832) = {NoneType} None
_apply9 (4853043944) = {NoneType} None
_apply10 (4853045568) = {NoneType} None
_constant5 (4852372088) = {NoneType} None
_apply11 (4852558424) = {NoneType} None
_constant:1.0 (4852253752) = {NoneType} None
_constant:scalar_to_array (4852254032) = {NoneType} None
_constant:distribute (4852372872) = {NoneType} None
_constant:return (4852373880) = {NoneType} None
_parameter12 (4853046296) = {NoneType} None
__len__ = {int} 10

编译后指令

{'uinstrs': [
    ('pad_stack', 1), 
    ('external', <myia.compile.nnvm.NNVMRunner object at 0x1207689b0>, []), 
    ('return', -1, 3)
]}

NNVM调用关系

静态图

功能模块主要为 NNVMConvertor，NNVMRunner两个功能模块
涉及流程:
step_compile: 编译过程中，注册实际运行的环境，nnvm则注册NNVMConvertor
step_export: 输出过程中，初始化FinalVM 和 NNVMRunner
step_wrap: 运行过程中，调用FinalVM的eval函数，调用NNVMRunner的call运行过程
若为debug模式，则注册debug_convert，并调用VM模块运行

NNVMConvertor模块

功能: 将Myia Apply算子映射到nnvm对应的实现
流程:
1. 初始化中，完成nnvm的simple_map和complex_map对应算子注册
1. convert主功能模块中，设置输入输出参数: input_names，input_types，output_specs
1. 创建nnvm图: nnvm.graph.create(sym.Group(list(self.eqv[o] for o in outputs)))
1. 生成编译环境: dg, lib, params = nnvm.compiler.build
1. 生成执行环境: module = graph_runtime.create(dg, lib, context)
1. 设置输入: module.set_input(n, p)
1. 关联运行模块: (NNVMRunner(module, self.input_names,input_types, output_specs, context),self.inputs, outputs)

NNVMRunner模块

初始化参数: input_names，input_types，output_specs，out
运行过程:
1. 设置输入: mod.set_input(**nnvm_args)
1. 执行操作: self.mod.run()
1. 获取输出: mod.get_output(out)

NNVM调用交互流程分析

动态图

1. Pipeline编程过程，通过step_compile接口(1)，调用CompileGraph模块
1. CompileGraph模块运行时，通过注册的NNVMConvertor模块，调用convert方法(2)
1. NNVMConvertor完成资源和环境配置后，关联NNVMRunner并进行初始化(3)，完成指令编译(4)
1. Pipeline链接过程，通过step_link接口(5)，完成常量子图与指令的链接
1. Pipeline输出过程，通过step_export接口(6)，调用VMExport模块，并初始化FinalVM模块(7)，完成指令输出(8)
1. Pipeline运行过程，通过step_wrap接口(9)，调用FinalVM模块的eval方法(10)，运行指令
1. FinalVM模块的eval运行指令过程中，调用NNVMRunner的call方法(11)，完成每个算子运行，返回输出(12)完成整个运行流程

关键处理部分

1. 将函数式图转换为计算图形式
- 处理常量和参数，生成算子的输入
- 逐个apply节点将函数，转换为nnvm映射算子
- 生成算子的输出

# inputs
def ref(self, n):
    """Resolve a reference to a node."""

    if n.is_constant() and not n.is_constant_graph():
        name = f"cst{next(self.c)}"
        self.constants[name] = np.array([n.value], 
                                        dtype=type_to_np_dtype(n.type), 
                                        copy=False, ndmin=1)
        setn(name, n)
    elif n not in self.eqv:
        name = f"i{next(self.c)}"
        self.inputs.append(n)
        self.input_names.append(name)
        setn(name, n)
    return self.eqv[n]

# mapping nnvm op
for n in lst:
    assert n.is_apply()
    assert n.inputs[0].is_constant(Primitive)
    fn = n.inputs[0].value
    conv = self.mapping.get(fn, None)
    if conv is not None:
        self.eqv[n] = conv(self, *n.inputs[1:])
    else:
        raise NotImplementedError(fn)

#outputs
outputs = get_outputs(lst, lst[0].graph.manager.uses,set(self.eqv.keys()))

1. 调用NNVM进行运行
- 运行指令，调用FinalVM模块的eval方法，其中inst_external指令出发运行
- 指令inst_external调用NNVMRunner的call方法，完成算子运行
- inst_external 具体实现如下：

def inst_external(self, fn, args):
        """Call external function.

        This will call the provided function with the specified values
        and push any outputs that function has (may be more than one).

        Arguments:
           fn: Callable external function.
           args: sequence of stack references.

        """
        outs = fn(*(self._ref(a) for a in args))
        for o in outs:
            self._push(o)

class NNVMRunner:
    """Adapter to run an NNVM module."""

    def __call__(self, *args):
        """Run the module on the arguments."""
        assert len(args) == len(self.input_names)
        nnvm_args = dict()
        for n, tp, v in zip(self.input_names, self.input_types, args):
            nnvm_args[n] = np.array(v, dtype=tp, copy=False, ndmin=1)
        self.mod.set_input(**nnvm_args)
        self.mod.run()
        for i, out in enumerate(self._outs):
            out = self.mod.get_output(i, out)
        return [o.asnumpy() for o in self._outs]

8.3 Compile 及 NNVM 核心操作

具体操作

1. 编译前，Myia生成的Graph图中，apply节点存在return、call、switch等控制流操作
1. 编译过程中，将Graph的分支和控制流进行切分，转换为指令集；
1. 编译过程中，完成函数式Apply节点到计算图的转换，计算图算子list转换为external指令；
1. 在指令运行时，通过FinalVM实现控制指令的相关操作，相关指令有
- call, tailcall, return, partial, switch, tuple, pad_stack, external
1. 在指令运行时，external指令，会调用NNVMRunner实现算子运行。

Myia对接后端需求

1. Myia输出图中，计算包含有控制流操作，需要算子支持
1. 考虑设计Myia转换模块，将Myia图转换为对应的计算图

Code Review —— Myia Compile - leozp/Myia-Issues GitHub Wiki

8. Compile and NNVM Flow

8.1 Compile 及 运行流程分析

step_wrap_primitives

step_compile

SplitGraph 图切分

CompileGraph 图编译

OptimizeInstrs 指令优化

step_link

step_export

step_wrap

8.2 NNVM调用流程分析

代码实例

NNVM调用关系

NNVMConvertor模块

NNVMRunner模块

NNVM调用交互流程分析

关键处理部分

8.3 Compile 及 NNVM 核心操作

具体操作

Myia对接后端需求

8.1 Compile 及运行流程分析