[RFC]: Relay a new high level IR for TVM
Relay is a new high level intermediate representation (IR) intended to act as v2.0 of NNVM.
Motivation
Computation graphs are a powerful program representation as demonstrated by the first generation of DL frameworks. Most popular frameworks have employed computation graphs as their input, intermediate representation, and execution data structure.
However, as workloads continue to evolve, the design of our high level IRs needs to evolve to better support the needs of developers and users
Graph-level challenges such as control flow and sub-graphs have become necessary features to natively support and optimize.
The tight coupling between runtime representation and compile-time representation has limited flexibility and frustrated developers; Relay will decouple the representations.
Finally we believe the high level must be designed in tandem with the low level IR, allowing for the two layers to communicate during compilation to achieve optimal performance.
Design
The first version of NNVM set out to solve some of these challenges, and we view Relay as second generation IR designed specifically for integration into the TVM stack as the input layer. Our goal is to focus on TVM as our primary backend, easing development and maintenance for both TVM developers and current NNVM users, as well as enabling new features.
In order to address the challenges presented above we designed Relay to build on the things computation graphs are good at (pure, dataflow, compositional), and improve on the things they struggle with (control flow, subgraph, runtime/compilation distinction).
Core IR
Relay is a typed pure functional IR, with a few basic features such as functions, if-then-else control flow, recursion, operator and function calls, and variable binding.
We have iterated on Relay's design over the past 8 months. This versions represents the culmination of our experiments. This PR does not contain all the pieces of the previous version, instead we focus on introducing the core IR, its associated data structures, and a few integral passes.
The core IR is defined in just a few files:
include/tvm/relay/base.h
(the base classes and common data)
include/tvm/relay/type.h
(the type system and all relevant nodes)
include/tvm/relay/expr.h
(the expression language)
Typing
All Relay programs are typed, similar to more conventional languages such as C++.
A type system allows us to statically (i.e at compile time) distinguish between different sorts of values. This means we know whether an expression will evaluate to a tensor, a function (i.e (float32, float32) -> float32) or a tuple (float32, int32). Furthermore, our type system has the ability to be shape generic (i.e polymorphism, templating).
Type inference and checking take the place of shape inference in traditional computation graphs style IRs.
This PR implements type inference and checking for Relay, the code can be found in src/tvm/relay/pass/type_infer.cc
, and relevant helper utilities in src/tvm/relay/pass
.
Control Flow
Relay adds a notion of control flow to the IR, in the form of simple if (cond) { true_branch } else { false_branch}
. Relay requires that the condition variable computes a single boolean
value controlling which branch is taken. if
is an expression in Relay, meaning the result of the entire
expression is the result of the branch taken.
We introduce this to add a formal way to distinguish between data flow and control flow without having to conflate the two in the representation. Because we separate the control signal, we can easily batch a program without affecting control flow.
The definition of control flow can be found in include/tvm/relay/expr.h
.
Abstraction
Relay supports the definition of functions which can be used to represent "sub-graphs" (i.e chunks of reusable computation).
Relay functions are like traditional functions: they represent some set of parameters (i.e placeholders) and a body which is a chunk of computation involving the the parameters (i.e sub-graph). We can build a full network/model by composing together functions.
Compilation
The Relay IR is designed as a compile time representation of models. The new features are exposed only in Relay's abstract syntax tree, and used for compile time program manipulation. We do not intend to use Relay's IR as a data structure for serious interpretation or execution.
Runtime
These new features increase the expressivity of the current computation model, and one may ask how to execute programs using these features with the existing runtime. Our goal is to introduce Relay as the compiler representation in this PR, and reuse the existing runtime maintaining compatibility on both the frontend and backend. We anticipate a new version of the runtime having native support for Relay's new constructs in the future.
TVM Co-design
We made an effort to model Relay's implementation after TVM and reuse much of the existing infrastructure in order to provide better compatibility between TOPI operators and Relay programs. One big design decision is reusing the TVM node system to expose the Relay language to Python in the style of TVM. Users who are familiar with TVM's expression language should feel comfortable working with the Relay AST's definition in C++, and Python. We also share representations for many data structures. For example tensor containers (i.e tvm::runtime::NDArray
), and generic attributes which can be shared between Relay and TVM are two such shared structures.
Transitioning from NNVM
We plan on adding a guide for transitioning programs from NNVM to Relay. This is one of the remaining work items before releasing the Relay Alpha. The goal is users can use the Relay operators and builder API to construct Relay programs, and we will follow-up with a compatibility layer to make transitioning from NNVM smooth.
For an implementation see #1672 which implements this bit.
status: RFC