Instruction Set Architecture

We take a quick detour to see other instruction set architecture (ISA). Some of the topics here may be similar to the topic introduced when we talk about MIPS but may be generalised further to include other architecture.

To illustrate, we will first tell you about one of the most famous battles in the design philosophies for ISA. There are two camps here.

	CISC	RISC
Name	Complex Instruction Set Computer	Reduced Instruction Set Computer
Example	x86-32, IA32	MIPS, ARM
Advantages	Smaller program size, especially if memory is premium	Small and simple instruction set, easier to build/optimise hardware
Disadvantages	Complex processor implementation, difficult hardware optimisation	Burden on software to compile efficiently

We will not be comparing those two. Instead, we will discuss the 5 design concepts in ISA design.

Data Storage
Memory Addressing Modes
Operations in the Instruction Set
Instruction Formats
Encoding the Instruction Set

Objective

Concept #1: Data Storage

Modern computers employe the von Neumann architecture for which data and program are stored in memory. However, in the case of processors, storage architecture concerns more with:

Where do we store the operands so that the computation can be performed?
Where do we store the computation result afterwards?
How do we specify the operands?

Some of the major storage architecture includes:

Stack Architecture: Operands are implicitly on top of the stack.
- Examples: Python bytecode assumes stack architecture.
Accumulator Architecture: One operand is implicitly in the accumulator (a special register).
- Examples: IBM 701, DEC PDP-8.
General-Purpose Register Architecture (GPR): Only explicit operands. There are two major sub-architecture:
1. Register-Memory Architecture: One operand in memory.
  - Examples: Motorola 68000, Intel 80386.
2. Register-Register Architecture: All operands in register. Also called load-store architecture.
  - Examples: MIPS, DEC Alpha.
Memory-Memory Architecture: All operands in memory.
- Examples: DEC VAX.

C = A + B

We will illustrate the difference between the storage architecture by using the example of C = A + B and how each storage architecture perform this computation assuming that all the values are currently in a variable which is mapped to a memory location.

Do NOT Sort

For obvious reason, do not sort. If you do so, the order of the operations will not be in the correct order. You should reload the page to get the original order.

Stack	Accumulator	Register (Load-Store)	Memory-Memory
`Push A`	`Load A`	`Load R1, A`	`Add C, A, B`
`Push B`	`Add B`	`Load R2, B`
`Add`	`Store C`	`Add R3, R2, R1`
`Pop C`		`Store R3, C`

Visually, we can represent them as follows:

Storage Architecture

The most common storage design for modern processor is the general-purpose register (GPR). This includes many of the RISC processors (e.g., MIPS and ARM processors) which typically use register-register (i.e., load/store) architecture. On the other hand, CISC processors (e.g., IA32) use a mixture of register-register and register-memory architecture.

Concept #2: Memory Addressing Mode

We are concerned with memory locations and addresses, the addressing modes as well as the memory content.

Memory Address

Given k-bit address, the address space is of size 2^k. Each memory transfer consists of one word of n bits.

Memory Addressing Mode

Memory Content: Endianness

Endianness is concerned with the relative ordering of the bytes in a multiple-byte word stored in memory. This is another major battle in the design philosophies.

The name comes from the root-word "end"¹. In particular, which end should be in front. In other words, whether the most significant byte (i.e., big-end) should be at the lowest address or should it be the least significant byte (i.e., little-end). Note that here we order the byte and not the bit. There are two major camps here:

	Big-Endian	Little-Endian
Order	Most significant byte stored in lowest address (big-end)	Least significant byte stored in lowest address (little-end)
Processor	IBM 360/370, Motorola 6800, MIPS, SPARC	Intel 80x86, DEC VAX, DEC Alpha
Example	0xDE AD BE EF is stored as	0xDE AD BE EF is stored as

Addressing Modes

Addressing mode is the ways to specify an operand in an assembly language. In MIPS, there are only 3 addressing modes:

Register: Operand is in a register (e.g., add $t1, $t2, $t3).
Immediate: Operand is specified in the instruction directly (e.g., addi $t1, $t2, 98).
Displacement: Operand is in memory with address calculated as Base + Offset (e.g., lw $t1, 20($t2)).

There are, however, other addressing modes summarised below:

Addressing Mode	Example	Meaning
Register	`Add R4, R3`	`R4 ↤ R4 + R3`
Immediate	`Add R4, #3`	`R4 ↤ R4 + 3`
Displacement	`Add R4, 100(R1)`	`R4 ↤ R4 + M[100+R1]`
Register Indirect	`Add R4, (R1)`	`R4 ↤ R4 + M[R1]`
Indexed / Base	`Add R3, (R1 + R2)`	`R3 ↤ R3 + M[R1 + R2]`
Direct or Absolute	`Add R1, (1001)`	`R1 ↤ R1 + M[1001]`
Memory Indirect	`Add R1, @R3`	`R1 ↤ R1 + M[M[R3]]`
Auto-Increment	`Add R1, (R2)+`	`R1 ↤ R1+M[R2]; R2 ↤ R2+d`
Auto-Decrement	`Add R1, -(R2)`	`R2 ↤ R2-d; R1 ↤ R1+M[R2]`
Scaled	`Add R1, 100(R2)[R3]`	`R1 ↤ R1 + M[100+R2+R3×d]`

Concept #3: Operations in Insturctions Set

There are some standard operations that are common to most modern processors:

Operations	Examples
Data Movement	load (from memory) store (to memory) memory-to-memory move register-to-register move input (from I/O device) output (to I/O device) push (to stack) pop (_from stack)
Arithmetic	the operations typically applicable to both integer and floating point add subtract multiply divide
Shift	shift left shift right rotate left rotate right
Logical	not and or set clear
Control Flow	jump (unconditional) branch (conditional)
Subroutine Linkage	call (to procedure) return (from procedure)
Interrupt	trap return
Synchronisation	test & set (atomic r-m-w)
String	search move compare
Graphics	pixel and vertex operations compression/decompression

Additionally, we can look at the frequently used instructions. The idea is that since these instructions are frequently used, it makes sense to make these instructions fast!

Amdahl's Law

Make the common case fast!

Rank	Integer Instructions	Average %
1	Load	22%
2	Conditional Branch	20%
3	Compare	16%
4	Store	12%
5	Add	8%
6	Bitwise AND	6%
7	Sub	5%
8	Move Register to Register	4%
9	Procedure Call	1%
10	Return	1%
	*Total*	96%

Concept #4: Instruction Formats

We are concerned with instruction length as well as instruction fields. In particular, for instruction fields, we are interested in the type and size of operands.

Instruction Length

Variable-Length Instructions
- Examples:
  1. Intel 80x86: Instructions vary from 1 to 17 bytes long.
  2. Digital VAX: Instructions vary from 1 to 54 bytes long.
- Require multi-step fetch and decode.
- Allow for more flexible (but complex) and compact instruction set.
Fixed-Length Instructions
- Examples:
  1. MIPS, PowerPC: Instructions are 4 bytes long.
- Allow for easy fetch and decode.
- Simplify pipelining and parallelism.
- Instruction bits are scarce.
Hybrid Instructions: A mix of variable- and fixed-length instructions.

Instruction length is closely related to the encoding.

Encoding Length

Instruction Fields

An instruction consists of:

Opcode: Unique code to specify the desired operation.
Operands: Zero or more additional information needed for the operation.

The operation designates the type and size of the operands. Typical type and size are as follows²:

Characters: 8 bits.
Half-Word: 16 bits.
Word: 32 bits.
Single-Precision Floating Point: 1 word.
Double-Precision Floating Point: 2 words.

Given the typical size above, any new 32-bit architectures are expected to support 8-, 16- and 32-bit integer as well as 32- and 64-bit floating point operations. Additionally, a 64-bit architecture would need to support 64-bit integers as well.

Concept #5: Encoding the Instruction Set

Here, we are interested in how the instructions are represented in binary format for execution by the processor. This is the assembly process.

There are certain issues that we need to think about when we are designing the instruction set. Some of the common issues include the code size, speed (or performance) and design complexity. We further need to decide the following:

Number of registers.
Number of addressing modes.
Number of operands in an instruction.

We generally have 3 encoding choices that are closely related to the instruction length.

Encoding

The fixed-length instruction presents a much more interesting challenge. We will need to fit multiple sets of instruction types into the same limited number of bits. To do that, we will need to work with the most constrained instruction types first. There is also the expanding opcode scheme that allows the opcode to have variable length for different instructions. We have encountered this before with the use of funct field. This is a good way to maximize the instruction bits.

Although there are some formula that can be used, the formula depends largely on the constraints that we have. As such, we will illustrate the encoding with examples instead.

Maximum Number of Instructions

Consider a 16-bit fixed length instructions with 2 types of instructions.

Type-A: 2 operands, each operand is 5-bit.
Type-B: 1 operand, the operand is still 5-bit.

We want to answer the following question:

What is the maximum number of unique instructions that can be encoded assuming that each type must have at least 1 instruction?

Fixed Length OpcodeExpanding Opcode

We will follow the design principles by keeping the common part common. The actual ordering does not actually matter, but we only need to line up the different fields. One possible alignment is shown below:

Types	6 Bits	5 Bits	5 Bits
A	`opcode`	`Operand`	`Operand`
B	`opcode`	`Operand`	unused

We can see from this design that the maximum total number of instructions is 2⁶ of 64 instructions. This is because the opcode is 6-bit long. This is because we can only use the opcode to differentiate the type of instructions.

This leads to the most obvious problem. 5-bits in type-B instructions are wasted because it is not used. A solution to this is to use expanding opcode scheme by using the unused bits as additional opcodes. This is what we are going to discuss in the next example.

The basic idea is to extend th opcode for type-B instructions to 11 bits by using the previously unused bits. There are many possible alignments. We will use the following alignment which differ slightly from the lecture notes but it highlights the similarity to the previous scheme.

Types	6 Bits	5 Bits	5 Bits
A	`opcode`	`Operand`	`Operand`
B	`opcode`	`Operand`	`opcode2`

We use the name opcode2 to show that this is also an opcode but the expanded version. To determine the maximum number of different instructions that we can have, we need to first be able to distinguish between type-A and type-B instructions.

There are two possibilities:

Minimise Type-A Instruction
- In this case, we can simply assign 1 opcode for type-A (e.g., 000000).
- Then, we can use the opcode from 000001 to 111111 for type-B.
  - But type-B as opcode2 which can range from 00000 to 11111.
- This leads to a total of:
  - Type-A:
    - 1 instruction with opcode 000000.
  - Type-B:
    - For each opcode in the range of 000001 to 111111 (i.e., 2⁶-1 instructions).
    - We have opcode2 in the range of 00000 to 11111 (i.e., 2⁵ instructions).
    - Total is 2⁶-1 × 2⁵ = 2016
  - Grand total = 2016 + 1 = 2017
Maximise Type-A Instruction
- In this case, we assign 2⁶-1 instructions to type-A with the range from 000001 to 111111.
- We need 1 opcode (e.g., 000000) for type-B but we still have 2⁵ instructions from opcode2 in the range of 00000 to 11111.
- This leads to a total of:
  - Type-A:
    - 2⁶-1 instructions with opcode from 000001 to 111111.
  - Type-B:
    - 2⁵ instructions with opcode2 from 00000 to 11111.
  - Grand total = 63 + 32 = 95

Obviously the first possibility gives more possible instructions.

Minimum Number of Instruction

If the question is about the minimum number of instruction such that the encoding space is completely utilised (i.e., no more instructions can be accommodated), then the second possibility gives us this value.

So, the maximum number of instruction is 2017 using the expanding opcode scheme.

Expanding Opcode Design

Consider a 36-bit fixed length instructions with 3 types of instructions:

Type-A: 3 operands (2 addresses + 1 register number)
Type-B: 2 operands (1 address + 1 register number)
Type-C: 0 operand

The operands take the following number of bits:

Address: 15 bits.
Register Number: 3 bits.

Additionally, you are asked that each type of instructions must have at least the following number of instructions:

Type-A: 7 instructions
Type-B: 500 instructions
Type-C: 50 instructions

AnswerSteps

One possible answer is as follows:

Types	3 Bits	15 Bits	15 Bits	3 Bits
A	`opcode`	`Address`	`Address`	`Register`
B	`opcode`	`opcode2`	`Address`	`Register`
C	`opcode`	`opcode2`	unused	unused

More specifically, we can specify the following opcode and opcode2:

Types	3 Bits	15 Bits
A	`000` - `110`
B	`111`	`000000 000000000` - `000000 111111111` (2⁹ = 512)
C	`111`	`000001 000000000` - `111111 000000000` (2⁶-1 = 63)

First, we will need to identify how many bits are available for opcode and possibly for expanded opcode (e.g., opcode2). To find this number, we simply look at the most restrictive instruction. In this case, it is type-A because it requires 3 operands. Since type-A instruction requires 2 addresses and 1 register number, we can calculate the remaining bits.

Opcode Bits
= opcode-bits + (2 × address-bits) + reg-number
= opcode-bits + (2 × 15          ) + 3
= opcode-bits + 33
= opcode-bits

Secondly, we may want to find a layout the bits for type-A instruction and match the layout for the common parts of type-B and type-C instructions. Since technically the layout can be anything we want, this step can actually be abstracted away. But it is still a very useful step. A possible layout is shown below:

Types	3 Bits	15 Bits	15 Bits	3 Bits
A	`opcode`	`Address`	`Address`	`Register`
B	`opcode`	`opcode2`	`Address`	`Register`
C	`opcode`	`opcode2`	`opcode3`	`opcode4`

Note that since type-C has no operand, we can have 4 kinds of opcodes (i.e., opcode, opcode2, opcode3 and opcode4).

Lastly, we find the possible opcodes. A suggestion is to start from the most restrictive instructions with the fewest number of bits for the opcode. Since we require 7 type-A instructions, we can assign any 7 opcode. Let us simply choose 000 - 110.

This means that we only have 1 remaining opcode for both type-B and type-C. Luckily, that is enough since we have opcode2, opcode3 and opcode4. At this point, we can easily give 1 value from opcode2 for type-C since there are still opcode3 and opcode4 that make up to 2¹⁸ possible instructions. This would still give us 2¹⁵-1 (the -1 is from giving 1 value from opcode2 for type-C) for type-B instruction.

The answer we gave in the "Answer" tab is a simple optimisation.

Past Midterm/Exam Questions

QuestionsAnswers

A certain machine has 12-bit instructions and 4-bit addresses. Some instructions have one address and others have two. Both types of instructions exist in the machine

What is the maximum number of instructions with one address?

(a) 15
(b) 16
(c) 240
(d) 256
None of the above
What is the minimum total number of instructions, assuming the encoding space is completely utilised (that is, no more instructions can be accommodated)?

(a) 31
(b) 32
(c) 48
(d) 256
None of the above

c

Steps

Bit layout (total is 12 bits so opcode is 4 bits):

Types 4 Bits 4 Bits 4 Bits

A opcode Address Address

B opcode opcode2 Address

Maximise type-B by minimising type-A. Give only 1 opcode for type-A (e.g., 0000). So, we can have the following opcode for type-B: 0001 - 1111 (i.e., 2⁴-1 = 15). For each of those, we can have the following opcode2 for type-B: 0000 - 1111 (i.e., 2⁴ = 16). This gives a total for type-B to be 15 × 16 = 240.
a

Steps

Bit layout (total is 12 bits so opcode is 4 bits):

Types 4 Bits 4 Bits 4 Bits

A opcode Address Address

B opcode opcode2 Address

Minimise total by maximising type-A. Give only 1 opcode for type-B (e.g., 0000). So, we can have the following opcode for type-A: 0001 - 1111 (i.e., 2⁴-1 = 15). We can still have the following opcode2 for type-B: 0000 - 1111 (i.e., 2⁴ = 16). This gives a total of 15 + 16 = 31.

The name was introduced by Danny Cohen in 1980 in Internet Engineering Note #137. It was borrowed from the novel Gulliver's Travels (1726). In the novel, the term describes two opposing factions in the nation of Lilliput. The big-endian broke their boiled eggs at the big end. They rebelled against the King who demanded his subject to break their eggs at the little end. ↩
The actual size may depend on the the processor themselves. The values given here are common for 32-bit processors. A word is typically defined to be the same length as processor line, in this case it's 32 bits. ↩