Issue #104 proposed an implementation for static instruction sets using an enum. While this proposal sounded great in theory, it introduced a lot of extra complexity in implementation. By migrating to an enum, it requires every instance where the thread program references a
next_instruction to use a match statement and implement conditional logic depending on whether the thread has a "dynamic" instruction set or a "static" instruction set. Given the number of occurrences where this happens, this solution began to feel very fragile and the code was increasingly difficult to understand.
This enum-based interface additionally calls into question the use of the
next_instruction property in the
ThreadResponse. Can one use this with static instruction sets? Does it only work for dynamic instruction sets? Does it break static threads if I return a dynamic instruction? Would my dynamic instruction just be ignored? These questions and their answers are confusing and suggest that an enum is probably not the ideal interface for introducing static instruction sets.
This PR introduces an alternative interface. In short, the fundamental schema change is to replace
kickoff_instruction: InstructionData with
instructions: Vec<InstructionData>. It's fairly straightforward to see how this change adds support for static instruction sets. One can simply initialize a thread with a list of static instructions, and upon kickoff, the thread will execute the list of instructions sequentially. What is less obvious are the subtle implications this change has for flow control with dynamic instructions.
There are many potential ways of handling dynamic instructions here, but personally I think the best approach is to introduce as little change as possible and leave the current interface guarantees unchanged. That is, if any instruction returns a
ThreadResponse with a
next_instruction value, that instruction will be executed immediately following the current one. The noteworthy thing here is how dynamic instructions can "interrupt" the execution of a static instruction set. If dynamic instructions are executed immediately after they are returned, this means they can be injected or inserted between two instructions of the static set.
Confused? One way to visualize how a thread will process instructions (the flow control) is with a grid, visualized below. In the left-most column, going vertically from top to bottom, are the static instructions our example thread was initialized with (1, 2, 3, and 4 in the diagram below). Going horizontally, each row represents the dynamic instruction that were returned by the invoked program (1a, 1b, 1c, 2a, 4a, 4b, etc.). The diagram can be read like a book. Instructions are executed line by line, left-to-right, top-to-bottom. If a dynamic instruction is not returned, the thread simply proceeds to the next instruction in the static set and we use a newline to represent this.
1 → 1a → 1b →1c
2 → 2a
4 → 4a → 4b → 4c → 4d
This is a more complex method of flow control than the single "linked list" that we support today. One could describe this new flow control as a "list of linked lists". This approach has many benefits. It achieves the desired goal of supporting both static instruction sets and dynamic instructions sets. It also allows for a much simpler implementation than the enum-based approach. And perhaps most interestingly, it allows threads to combine dynamic instructions with static instructions to process complex workflows.
Is this added complexity necessary? Well, fortunately most use-cases will never need to delve into this complexity. The vast majority of threads today only use a single instruction or two. But this new model can be very useful for situations where the workflows themselves are complex. Take for example the network program. The network program uses a thread to drive epoch transactions. On each epoch transition, the thread is responsible for executing a series of "jobs". Depending on the state of the network, each job may have zero, one, few, or many "subtasks". At a high-level, the jobs of the epoch transitions can be broken down into the following checklist:
- Lock the registry
- Distribute fees to workers
- Process unstake requests
- Delegate stake to workers
- Create a snapshot
- Cutover to the new snapshot
- Delete the old snapshot
With the thread interfaces currently available in v1.X, the network program is forced to "flatten" all of these jobs into a single long linked list of instructions. This flattening requires us to implement complex branching logic at the end of each instruction to dynamically jump to the correct next instruction depending on whether the current job is done or not. This branching logic is verbose, easy to screw up, and difficult to reason about. With the new thread interfaces proposed in this PR, a lot of this can be reduced and simplified.
One could use a thread's static instruction set as a series of kickoff instructions for sequential "jobs". If a job has some work to do, it can return dynamic instructions to do that work. If a job has no work to do, it can return a null response to simply proceed to the next job. By taking this approach, we can a lot of the branching logic in the network program and organize the code in a much more coherent way. Where we currently have a giant
instructions/ folder with all the automated instructions and all the manual instructions co-mingled together, we can refactor the automated instructions into a
jobs/ folder and create subfolders for each particular job (e.g.
jobs/take_snapshot/, etc.). When we create the thread, we now only need to initialize it with the set of kickoff instructions for each job.
Note that "jobs" here are not a new abstraction that's being introduced or required by the thread program. They're simply an abstraction that is "allowed" by the flow control of the new interfaces. Developers can choose to use this abstraction if they find it useful, or they can chose to follow the programming patterns we currently recommend (put everything in a single giant linked list). The flows allowed by the new interface are a pure superset of the instruction flows supported today. This means existing threads can easily migrate to the new model without a significant rewrite (on thread creation, users simply to wrap their existing kickoff instruction in a
As I was going through it, this PR grew and got a bit sprawling. At a high-level it contains the following changes:
- It updates the thread program for the new interfaces.
- It updates the network program to use a "jobs" abstraction.
- It also removes the
kickoff_instruction value from
ThreadResponse. (This value doesn't make as much sense with the new schema, and I've never actually seen anyone use it correctly. I've only ever seen it cause confusion and be used incorrectly.)