## Saturday, November 10, 2012

### MISC: Computation Modules

This is a continuation of Microcosmic Instruction Set Computer, defining the computation modules before I decide on the port assignments. The notation is $$(I_1, I_2, ..., I_m) \to (O_1, O_2, ..., O_n)$$ for input ports $$I_1 ... I_m$$ and output ports $$O_1 ... O_n$$. The total number of input and output ports for each operator are kept as power of 2 to allow less fragmentation in port assignment later.

#### Arithmetic operators

• add: $$(a, b) \to (r, c)$$ where $$r = a + b$$ and $$c$$ is the carry bit.
• sub: $$(a, b) \to (r, c)$$ where $$r = a - b$$ and $$c$$ is the carry bit.
• neg: $$a \to b$$ where $$b = -a$$.
• mul: $$(a, b) \to (r, s)$$ where $$r s = a \times b$$, i.e. $$r$$ is the upper word of the result, and $$s$$ is the lower word.
• div.mod: $$(a, b) \to (q, r)$$ where $$q \times b = (a - r)$$, i.e. $$q$$ is the quotient and $$r$$ is the remainder.
• smul, sdiv.mod: same as mul and div.mod but for signed integers.
• and: $$(a, b) \to (c, c')$$ where $$c = a \wedge b$$ and $$c' = a \wedge \neg b$$, both bitwise.
• or.xor: $$(a, b) \to (c, c')$$ where $$c = a \vee b$$ and $$c' = a \oplus b$$, both bitwise.
• rot.left: $$(a, c) \to (r, s)$$ where $$r$$ is the result of shifting $$a$$ to the left by $$c$$ bits, and $$s$$ is the overflow part of $$a$$ adjacent to the least significant bit.
• rot.right: $$(a, c) \to (r, s)$$ where $$r$$ is the result of shifting $$a$$ to the right by $$c$$ bits, and $$s$$ is the underflow part of $$a$$ adjacent to the most significant bit.
• not: $$a \to b$$ where $$b = \neg a$$, bitwise.
Signed integers use two's complement representation, so they don't need to distinguish addition and subtraction. Note that the conditional branch instruction should be able to use the condition from the least significant bit which is the carry bit, as well as the most significant bit which is the signed bit. This allows comparing both signed and unsigned integers.

#### Convenience operators

• lea: $$(b, a, x) \to r$$ computes $$r = 2^a \cdot x + b$$.
• add3: $$(a, b, c) \to r$$ computes $$r = a + b + c$$.
• or3: $$(a, b, c) \to r$$ computes $$r = a \vee b \vee c$$.
• xor3: $$(a, b, c) \to r$$ computes $$r = a \oplus b \oplus c$$.
Any of these operators—especially “or” or “xor”—could be used for the purpose of joining the completion of all computations. These are not to be confused with non-deterministic join. A non-deterministic join is not necessary; just have two pipes connecting from some two different output ports to the same input port.

#### Conditional operators

• cond: $$(c, a, b) \to r$$ where $$r = c \mathop{?} a : b$$, i.e. returns $$a$$ if $$c$$ is nonzero, otherwise $$b$$.
• cx: $$x \to c$$ where $$c$$ extends the carry bit of $$x$$ to fill the whole word.
• sx: $$x \to s$$ where $$s$$ extends the signed bit of $$x$$ to fill the whole word.

#### Memory controller

• read: $$a \to r$$ where $$r$$ is the value of memory word at address $$a$$ which must be word-aligned.
• write: $$(a, x, z) \to z'$$ to write $$x$$ to the memory word at address $$a$$ which must be word-aligned, whenever $$z$$ is also ready. The value of $$z$$ is not used. When the write finishes, $$z'$$ becomes ready with a zero value.
• cswap: $$(a, new, old) \to old'$$ writes $$new$$ to the memory word at address $$a$$ only when $$a$$ still contains the expected $$old$$ value. The actual value of $$a$$ before the swap is returned as $$old'$$.

#### Byte packing

• pack: $$(a, b, p) \to r$$ where $$r$$ is the result of permuting the concatenation of $$a b$$ according to mapping $$p$$.
• unpack: $$(a, p) \to (r, s)$$ where $$r$$ and $$s$$ are the two words selected from $$a$$ by mapping $$p$$ and they can be optionally sign extended.

#### Coprocessor I/O

• exec: $$(k, a_1, a_2) \to r$$ sends a two-argument command to co-processor $$k$$ and returns its result in $$r$$.
• exec2: $$(k, a_1, a_2, a_3, a_4, a_5) \to (r_1, r_2)$$ sends a five-argument command to co-processor $$k$$ and returns a pair of results in $$r_1$$ and $$r_2$$.
This can be used to drive the floating point coprocessor as well. The floating point coprocessor, for example, would occupy a range of k in the coprocessor namespace, one for each operation.

#### Some concluding remark

The fact that all computation modules above have non-empty inputs and outputs means that this forms a monadic set of operators, where synchronization is accomplished by port pipelining. I guess you can call this Monadic Instruction Set Computer also.

Because of the fine granularity of some of the computation modules, it makes sense to base the cycle on the finest granule which is the gate distance, e.g. the bitwise operators are all single gate distance, hence single cycle. This means that while some modules are single cycle, others can be hundreds or thousands of cycles.