IR Research —— Relay - leozp/Myia-Issues GitHub Wiki

设计初衷

Relay is being designed as a purely-functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability. This work proposes Relay, a new high-level intermediate representation (IR) and language designed to balance efficient compilation, expressiveness, and portability by combining insights from the approaches of static graphs and dynamic graphs under the aegis of a functional programming language.

1. Graph-level challenges such as control flow and sub-graphs have become necessary features to natively support and optimize.
1. The tight coupling between runtime representation and compile-time representation has limited flexibility and frustrated developers; Relay will decouple the representations.
1. Finally we believe the high level must be designed in tandem with the low level IR, allowing for the two layers to communicate during compilation to achieve optimal performance.

TensorFlow——>静态图的问题

TensorFlow employs a dataflow graph of primitive operators extended with restricted control edges to represent differentiable programs.

优点
1. 可以在执行前优化
1. 可以跨平台运行
缺点
Because the topology is fixed before execution, TensorFlow does not lend itself well to certain applications. As an example, unmodified TensorFlow does not support building models where the shape of the computation graph is dependent on the input.

However, since the computation graph has a different shape and size for every input, such networks do not directly support batched training or inference. They are also difficult to implement in popular deep learning libraries, which are based on static data-flow graphs. (https://arxiv.org/pdf/1702.02181.pdf)

While there does exist a library to mitigate this particular problem (see [24]), this pattern suggests that should new dependencies become of interest in the future, similar libraries would also have to be written to address each one, entailing considerable engineering effort.

(https://ai.googleblog.com/2017/02/announcing-tensorflow-fold-deep.html) We distinguish between individual operations appearing as nodes in the underlying data-flow graph, such as addition or matrix-multiply, and small sub-graphs that conceptually act as functions over tensors, such as a feed-forward layer or LSTM cell. We refer to the former as “ops”, and to the latter as “operations.” Operations, (i.e. sub-graphs), form the building-blocks from which neural networks with DCGs are composed; dynamic batching schedules operations, not ops. Our algorithm requires that all operations which might be used be specified in advance, and it enumerates them for scheduling purposes. For example, a binary TreeRNN for NLP parse trees has two operations: embedding table lookups for words at the leaves of the tree, and RNN cells for the non-terminals.

动态图

Control flow is executed in the Python interpreter and the dataflow is executed by the framework code with is implemented as Python exten- sion. However when using dynamic frameworks information about control flow is lost, reducing the ability to optimize them. Additionally, dynamic frameworks need to re-optimize any time the graph topology changes, costing CPU cycles and the overhead of moving data between the host and ac- celerators. This can be solved by transforming the Python code but is effectively the same as a static framework where Python is the input IR. tf 2.0动态图的支持

AST vs Graph

Since these graphs are essentially a modified form of an abstract syntax tree (AST), we consider the transformations and analyses that have been performed on computation graphs as program transforms and program analyses. While other DL frameworks also adopt this perspective, their graph-based approaches have made it difficult to bring the full arsenal of traditional compiler and program- ming languages techniques to bear.
Most traditional source code transformation methods use a stack (tape) to store intermediate values on at runtime. This introduces a mutable runtime structure into the program, which complicates type inference and optimization. Higher-order gradients are complicated by the fact that the gradient transform must explicitly handle read and write operations on this tape. If, on the other hand, the transform produces a program without side-effects and valid in the original formulation, then it should be possible to get higher order gradients simply by applying the transform repeatedly. Since we do not introduce explicit runtime data structures, all regular optimization methods will remain valid and effective.
TVM已经展现了编译固定shape类型的能力，有了relay ir，就不仅仅是数据流图的表达了，可以是整个编程语言的完整表达。 Static typing enables direct compilation of models into em- bedded hardware and accelerators, which has been demon- strated in prior work done in the TVM stack [9]. Having an IR like Relay enables the deployment of richer dynamic models for applications such as natural language process- ing. By taking this point of view, we can leverage decades of programming language research to help us express and understand these deep learning models not as a restricted data flow language, but as a full programming language.
Relay’s IR has two main design contributions over computation graphs: the addition of functions and a rich type system that can capture the relationship of tensor operations.

架构设计

relay

A Python frontend, which translates Python code into Relay’s C++ data structures.
A module for automatic differentiation of Relay pro- grams.
A shape-dependent tensor type system.
A simple evaluator for prototyping and debugging.
A type-specialized operator compiler built on TVM.
An efficient runtime system, which is still in progress.

type system

更加容易写优化pass，例如改变维度的和layout的变化

auto diff

依赖与函数式的IR，可以实现高级微分可以保留DAG和ANF式的格式，容易实现更多的优化pass 支持打印输出，写pass更加友好

runtime

把更多的依赖提前到编译时进行优化，提高运行效率；静态图，静态时优化，需要运行时对DAG进行调度；动态图，完全依赖于运行时调度

relay的介绍

https://github.com/dmlc/tvm/pull/2324/files

动态融合

使用TVM作为编译后端反馈式的启发式融合算法 relay优化pass