There is this old saying that there is nothing new under the sun. While I thought I invented something new, it came to my attention the Transport Triggered Architecture (TTA) which I did not know before. I found some papers as early as 1994, by Henk Corporaal.
Both MISC and TTA specify an instruction set architecture where each ALU has associated input and output ports, and writing to the input ports trigger a computation.
MISC falls short describing how the instruction set would actually be implemented in hardware. It assumes that there is a hardware scheduler that can detect when the output port is ready and advances the pipeline. The compiler only needs to worry about transforming the program into a set of pipelines. This abstracts out the timing details such as gate length out of the instruction set architecture.
TTA programs consist of cycle-stepped move instructions that has to take gate length into account, due to the lack of a global control logic. If an operation requires two inputs—one is called the operand and the other is called a tigger—the operand must be stored before or during the same cycle as the trigger. If the program moves a value out of the result too early, it would read out a corrupt value. This means that the compiler has to take gate length into account while scheduling instructions. If a function unit takes too long (e.g. cache misses) to complete an operation, it must lock the data transport bus so additional move would not take place.
In contrast, MISC exposes no cycle-sensitive timing in the pipeline. The monadic design of the functional units allows pipelines to synchronize by data dependency graph, not by timing. Hence there is no need to globally lock the pipeline. I mentioned one instruction to wait for all moves in the instruction queue to finish, but I only added there for the purpose of context switching. I still need to elaborate more on how I plan to support preemptive multitasking.
TTA already has a hardware implementation. It is not clear if the idealized scheduler for MISC is implementable in hardware, or if it turns out to be the performance bottleneck.