From C to Execution

Summary

High-LevelAssemblyMachine CodeDatapath

High-Level

We first write a program in high-level language (e.g., C).

High-Level
if(x != 0) {
  a[0] = a[1] + x;
}

Assembly

Compiler translates to assembly language (e.g., MIPS).

Assembly
      beq $16, $0, Else
      lw  $8 , 4($17)
      add $8 , $8, $16
      sw  $8 , 0($17)
Else:

Binaries

Assembler translates to machine language (i.e., binaries).

Assembly
0010 0000 0000 0000 0000 0000 0011 # 0x12000003
1110 0010 1000 0000 0000 0000 0100 # 0x8E280004
0001 0001 0000 0100 0000 0010 0000 # 0x01104020
1110 0010 1000 0000 0000 0000 0000 # 0xAE280000

Immediate Values?

Assembly
beq $16, $0, 3
lw  $8 , 4($17)
add $8 , $8, $16
sw  $8 , 0($17)

Datapath

Processor executes the machine code (i.e., binaries).

Processor

We will play the role of programmer, compiler, assembler and processor.

Programmer: Show the workflow of compiling, assembling and executing C program.
Compiler: Show how the program is compiled into MIPS (tools | slides).
Assembler: Show how the MIPS is translated into binaries (tools | slides).
Processor: Show how the datapath is activated in the processor (tools | slides).

Our program is the same as above, reproduced below.

sample.c
if(x != 0) {
  a[0] = a[1] + x;
}

Programmer

As a programmer, we typically concern only about the writing programs using high-level language. Later on in the semester, you will learn how certain access pattern may actually improve (or reduce) the performance of your program drastically. The main workflow that we are interested in as a programmer is the Edit, Compile, Execute workflow.

Workflow

Steps

EditCompileExecute

Create a new C program file (with the extension .c) or edit an existing one.
- We would recommend the use of vim (or vi improved) as it is available in all UNIX system like sunfire.
Save the file (e.g., sample.c).

Compile your C program file from "Edit" step using GCC (e.g., gcc sample.c).
You will either:
1. Produce an executable code (e.g., a.out) then you continue to "Execute" step.
2. Fail to compile (i.e., compile error) then you repeat "Edit" step to repair the program and remove the compile error.

Execute your executable (e.g., a.out or ./a.out).
You will either:
1. Get the correct program output and you are satisfied with your work and finish.
2. Get the incorrect program output then you repeat "Edit" and "Compile" step to repair the program until you get the correct program output.

Compiler

Now that we have our program, we can start compiling this program into MIPS. We will highlight key ideas in compilation. Although we will use the specific example of sample.c, we hope that the key ideas are applicable to other programs.

Key Ideas

Key Idea #1

Compilation is a structured process. In particular, each structure can be compiled independently.

In the case of the program in sample.c, we can identify two structures.

Inner Structure
    :
    :
  a[0] = a[1] + x;
    :
    :

Outer Structure
if(x != 0) {
    :
    :
    :
}

If you combine structures, you will then get the original program:

Combined StructureInner ContributionOuter Contribution

Full Structure
if(x != 0) {
    :
  a[0] = a[1] + x;
    :
}

Full Structure
if(x != 0) {
    :
  a[0] = a[1] + x;
    :
}

Full Structure
if(x != 0) {
    :
  a[0] = a[1] + x;
    :
}

We will highlight certain common structures in high-level programming languages. You will notice that they match nicely to the syntax of the program. You may want to try finding the structures for other programming languages. The structures will be marked with <structure name>.

Common Structures

If-ElseWhile LoopDo-While LoopFor Loop

If-Else
if (<condition>) {
  <then_branch>
} else {
  <else_branch>
}

While Loop
while (<condition>) {
  <loop_body>
}

While Loop
do {
  <loop_body>
} while (<condition>);

While Loop
for (<init> ; <condition> ; <update>) {
  <loop_body>
}

Key Idea #2

Variables are mapped to register.

Consider our sample program again. We can identify two variables: a and x. We then have to find a mapping for these two variables. Remember that not all registers are available.

Additionally, there are "conventional usage" but that is simply a convention. Your assembly code will still work, but might be less intuitive to read. Using the convention, "program variables" should use the registers in the range of $s0 - $s7. We can consider the following mapping:

Variable	Register	Number
`x`	`$s0`	`$16`
`&a[0]`	`$s1`	`$17`

Common Techniques

Common Technique #1

Invert the condition for shorter code.

This common technique works for if-statement but may not provide benefit for if-else-statement. Additionally, this is a common technique for while-loop and for-loop but may not be useful for do-while-loop.

Why?

In the case of if-else-statemenet, we have both branches available. So the technique provide no benefit. As for do-while-loop, we are repeating by branching back to the top when the condition is still satisfied. This match nicely with the semantics of do-while-loop without inversion.

Consider the inner structure again. We can compile this using the inversion technique as follows:

Outer Structure
if(x != 0) { // invert to ==
    :
    :
    :
}

Outer MIPS
      beq $16, $0, Else
        :
        :
        :
Else: 

Common Technique #2

Break complex operations and use temporary registers to store the intermediate result.

This is where using convention is useful. If we have kept registers $t0 - $t9 free, we can use these registers during our compilation without worrying whether it will break other parts of the program or not¹.

Inner Structure
    :
    :
  a[0] = a[1] + x;
    :
    :

Simplified Inner
    :
  $t1  = a[1];
  $t1  = $t1 + x;
  a[0] = $t1;
    :

Common Technique #3

Array access is lw, array update is sw and the offset depends on the type.

Simplified Inner
    :
  $t1  = a[1];
  $t1  = $t1 + x;
  a[0] = $t1;
    :

Inner MIPS
        :
      lw  $8 , 4($17)
      add $8 , $8, $16
      sw  $8 , 0($17)
        :

Common Error

Note that common technique #3 mentions that the offset depends on the type. The most common error is to assume that the address of the next word can be found by incrementing the address in a register by 1. This is wrong because we have to increment it by the word size in bytes. The principle used here is word-alignment.

Now the last step is to combine the inner MIPS with the outer MIPS to produce the full MIPS.

Inner MIPS
        :
      lw  $8 , 4($17)
      add $8 , $8, $16
      sw  $8 , 0($17)
        :

Outer MIPS
      beq $16, $0, Else
        :
        :
        :
Else: 

This combined structure yield the compiled version of sample.c.

Combined StructureInner ContributionOuter Contribution

Full MIPS
      beq $16, $0, Else
      lw  $8 , 4($17)
      add $8 , $8, $16
      sw  $8 , 0($17)
Else: 

Full MIPS
      beq $16, $0, Else
      lw  $8 , 4($17)
      add $8 , $8, $16
      sw  $8 , 0($17)
Else: 

Full MIPS
      beq $16, $0, Else
      lw  $8 , 4($17)
      add $8 , $8, $16
      sw  $8 , 0($17)
Else: 

Assembler

There are two main steps to perform the assembly.

Label Removal
1. Convert label to immediate constant.
Binary Conversion
1. Find the instruction formats: R-format, I-format or J-format.
2. For each field in the format, fill in the value in decimal/hexadecimal.
  - Since the labels are gone, we can fill in the immediate values.
3. Convert to binary.
  - You may combine this step with the previous one if you already prepared all values in binaries.

Label Removal

With Label
      beq $16, $0, Else
      lw  $8 , 4($17)
      add $8 , $8, $16
      sw  $8 , 0($17)
Else: 

Without Label
beq $16, $0, 3
lw  $8 , 4($17)
add $8 , $8, $16
sw  $8 , 0($17)
# no more label for Else

How do we remove the label? There are two ways, the first is the equational method (i.e., formulaic method by solving a certain equation) and the second is the diagrammatic method. Both are illustrated below. Note that the equational method abstracted the current address as you will see later that it will nicely cancel out due to the formula.

Immediate Conversion

EquationalDiagrammatic

The current $PC is X. The target address $PC' is X+16. We need to solve the equation $PC' = ($PC + 4) + (immediate × 4) given that we know the value of $PC and $PC'.

Computation of Immediate
   $PC' = ($PC + 4) + (immediate × 4)
=> X+16 = (X+0 + 4) + (immediate × 4)
=> 16   = 4         + (immediate × 4)
=> 16-4 =             (immediate × 4)
=> 12   =             (immediate × 4)
=> 3    =              immediate

So, the value of immediate field should be 3.

Draw a line below the current instruction (this represents the $PC + 4).
Draw a line above the target instruction (this represents the $PC').
Complete the box.
Count the number of instructions within the box (this represents the immediate × 4).
- If the target label is after, keep the value as positive.
- If the target label is before, convert the value to negative.

Similar to the previous method, we get the answer for the immediate field as 3.

Binary Conversion

So now we can start from the intermediate assembly code without label.

Intermediate MIPS
beq $16, $0, 3
lw  $8 , 4($17)
add $8 , $8, $16
sw  $8 , 0($17)

The steps are laid down below.

Line 1Line 2Line 3Line 4BinaryHexadecimal

Intermediate MIPS
beq $16, $0, 3
lw  $8 , 4($17)
add $8 , $8, $16
sw  $8 , 0($17)

- Format: I-Format
- Fields:
  
  Fields Decimal Value Binaries
  
  opcode 4 000100
  
  $rs 16 10000
  
  $rt 0 00000
  
  immediate 3 0000000000000011
Code: beq $16, $0, 3
- Binary: 00010010000000000000000000000011

Intermediate MIPS
0001 0010 0000 0000 0000 0000 0000 0011
lw  $8 , 4($17)
add $8 , $8, $16
sw  $8 , 0($17)

- Format: I-Format
- Fields:
  
  Fields Decimal Value Binaries
  
  opcode 35 100011
  
  $rs 17 10001
  
  $rt 8 01000
  
  immediate 4 0000000000000100
Code: lw $8 , 4($17)
- Binary: 10001110001010000000000000000100

Intermediate MIPS
0001 0010 0000 0000 0000 0000 0000 0011
1000 1110 0010 1000 0000 0000 0000 0100
add $8 , $8, $16
sw  $8 , 0($17)

- Format: R-Format
- Fields:
  
  Fields Decimal Value Binaries
  
  opcode 0 000000
  
  $rs 8 01000
  
  $rt 16 10000
  
  $rd 8 01000
  
  shamt 0 00000
  
  funct 32 100000
Code: add $8 , $8, $16
- Binary: 00000001000100000100000000100000

Intermediate MIPS
0001 0010 0000 0000 0000 0000 0000 0011
1000 1110 0010 1000 0000 0000 0000 0100
0000 0001 0001 0000 0100 0000 0010 0000
sw  $8 , 0($17)

- Format: R-Format
- Fields:
  
  Fields Decimal Value Binaries
  
  opcode 43 101011
  
  $rs 17 10001
  
  $rt 8 01000
  
  immediate 0 0000000000000000
Code: sw $8 , 0($17)
- Binary: 10101110001010000000000000000000

Binary
0010 0000 0000 0000 0000 0000 0011
1110 0010 1000 0000 0000 0000 0100
0001 0001 0000 0100 0000 0010 0000
1110 0010 1000 0000 0000 0000 0000

Hexadecimal
0x12000003
0x8E280004
0x01104020
0xAE280000

Processor

Once we have the machine code, we can let the processor to run the code. Typically the machine code is written in hexadecimal for human to read, but it is given to processor in binary to execute. Note that when we say it is in binary, it is truly in binary and not textual representation of binary.

So let us try to put ourselves as a processor and look at the execution of the program looking mainly at the datapath for the 5 stages. We put the update of $PC in the writeback stage instead of ALU stage. However, we will still be simulating the control signal to show you which inputs are selected by multiplexers, which operations are performed by the ALU, etc. Lastly, note that there are two possible execution of the program:

When the content of register $16 is equal to the content of register $0.
When the content of register $16 is not equal to the content of register $0.

The language MipC is designed in such a way as to mimic the simplified inner structure shown below. ↩

Fields	Decimal Value	Binaries
`opcode`	4	`000100`
`$rs`	16	`10000`
`$rt`	0	`00000`
`immediate`	3	`0000000000000011`

Fields	Decimal Value	Binaries
`opcode`	35	`100011`
`$rs`	17	`10001`
`$rt`	8	`01000`
`immediate`	4	`0000000000000100`

Fields	Decimal Value	Binaries
`opcode`	0	`000000`
`$rs`	8	`01000`
`$rt`	16	`10000`
`$rd`	8	`01000`
`shamt`	0	`00000`
`funct`	32	`100000`

Fields	Decimal Value	Binaries
`opcode`	43	`101011`
`$rs`	17	`10001`
`$rt`	8	`01000`
`immediate`	0	`0000000000000000`