Friday, November 9, 2012

Microcosmic Instruction Set Computer

When it comes to instruction set architecture, there are many philosophies, ranging from CISC to RISC down to extremes like OISC or ZISC. The dominent is still Intel x86 or x86-64 which is CISC, but ARM is getting popular too which is RISC. I've not seen any commercial product based on OISC or ZISC so they are probably not practical.

Having had some experience in Internet planetary scale distributed computing where remote procedure calls are made between computer services, coming back to looking at instruction set design gives me the revelation that even a single-core microprocessor is itself a distributed system. Rather than a distributed system of disk storage, memcache, and web servers, the microprocessor is a distributed system of arithmetic logic units, memory controller, and I/O buses. This gives rise to the idea of a microcosmic instruction set computer where the instruction set takes care of basic register file and control flow, but offloads all computing and memory I/O activities to the microcosm of distributed services on the processor die.

The instruction set features:

  • Some (yet-to-be specified) instructions for unconditional and conditional branching, namely to affect the instruction fetching.
  • A small I/O address space (~16K?) each specifies a word-sized port, which are buffered memory that can be written to and read from. Each port also has a “ready” bit to signal the availability of data, for the purpose of instruction scheduling.
  • Lower tiered I/O ports (0-15) are simply buffered general purpose register file. Middle tier ports (16-255) are for multiple ALUs, memory controllers, and external I/O controllers. Upper tier ports (256-?) are laid out in groups of 256 like the lower and middle tier ports (0-255) to enable additional instruction level parallelism.
  • An instruction specifies a move from one source port to a destination port. The instruction is only executed when data for that port is ready, but multiple instructions can be queued by the instruction scheduler. The move instruction can be seen as “connecting” a pair of ports.
  • An instruction to write a small constant value directly to a port.
  • One instruction to wait for all moves in the instruction queue to finish.
All ports can be used like a register (i.e. they are buffered), but some ports are used for inputs and some are used for outputs. For example, an adder for \( i + j = k \) would occupy three ports, \( p_i \), \( p_j \), and \( p_k \). The adder begins working whenever the ready bits of \( p_i \) and \( p_j \) are set, and the ready bit of \( p_k \) is only asserted when the result is available. The adder can be powered off when it's not doing work.

The instruction scheduler could dynamically map port numbers in groups of 256 if it wishes to turn off additional die area to reduce power use even further.

Even within a single group, ports of the same functionality may indeed be a queue to a smaller number of units. For example, the port assignment might give \( 8 \times 3 = 24 \) ports to an adder, but a particular chip might only have 2 physical adders doing the work of 8 logical ones. It is particularly useful to have multiple logical units of memory controller to allow memory I/O to be queued.

To be continued...

No comments: