Skip to main content

Assembly Instructions

We've now learned what assembly is theoretically and what registers are, but how do we use them? Each CPU exposes an ISA (Instruction Set Architecture): a set of instructions with which to modify and interact with its registers and with the RAM. There are over 1000 instructions in the x64 ISA. There are even instructions for efficiently encrypting data. Find out more about them by enrolling in the Hardware Assisted Security track.

Before we dive into the instructions themselves, it's useful to first look at their generic syntax:

instruction_name destination, source

Most Assembly instructions have 2 operands: a source and a destination. For some operations, such as arithmetic, the destination is also an operand. The result of each instruction is always stored in the destination.

Below we'll list some fundamental instructions. We will be using the Intel Assembly syntax.

mov

mov is the most basic instruction in Assembly. It copies (or moves) data from the source to the destination. Also note that comments in Assembly are preceded by ; and that the language is case-insensitive.

mov eax, 3              ; eax = 3

mov rbx, "SSS Rulz" ; place the string "SSS Rulz" in `rbx`
; This places each byte of the string "SSS Rulz" in rbx.

mov r8b, bh ; r8b = bh
; The sizes of the operands must be equal (1 byte each in this case).

Data Manipulation

Now that we've learnt how to place data in registers we need to learn how to do math with it. As you've seen so far, Assembly instructions are really simple. Below is a table with the most common and useful arithmetic instructions. Try to figure out what each example does. Use the fact that the general anatomy of an instruction is usually instruction destination, source. The result is always stored in the destination

InstructionDescriptionExamples
add <dest>, <src>dest += srcadd rbx, 5
add r11, 0x99
sub <dest>, <src>dest -= srcsub ecx, 'a'
sub r9, r8
shl <dest>, <bits>dest <<= bitsshl rax, 3
shl rdi, cl
shr <dest>, <bits>dest >>= bitsshr r15, 5
shr rsi, cl
and <dest> <src>dest &= srcand al, ah
and bx, 13
or <dest> <src>dest \|= srcor r10b, cl
or r14, 0x2000
xor <dest> <src>dest ^= srcxor ebx, edx
xor rcx, 1
inc <dest>dest++inc rsi
dec <dest>dest--dec r10w

Control Flow

Now we know how to do maths and move bits around. This is all good, but we still can't write full programs. We need a mechanism similar to ifs from Python and also loops in order to make the code run based on conditions.

jmp

The simplest instruction for control flow is the jmp instruction. It simply loads an address into the rip register. But when Assembly code is generated or written either by the compiler or by us, instructions don't have addresses yet. These addresses are assigned during the linking or loading phase, as you know from the Application Lifetime session.

For this reason, we use labels as some sort of anchors. We jmp to them and then the assembler will replace them with relative addresses which are then replaced with full addresses during linking. The way in which jmp and labels function is very simple. Remember that in the absence of jmps, Assembly code is executed linearly just like a script.

    jmp skip_next_section

; Whatever code is here is never executed.

skip_next_section:
; Only the code below this label is executed.

Warning Do not confuse labels with functions. A label does not stop the execution of code when it's reached. They are simply ignored by anything except for jmp.

For example, in the following code, both instructions are executed in the absence of jmps:

    mov rax, 2
some_label:
mov rbx, 3
; rax = 2; rbx = 3

eflags

Each instruction (except for mov) changes the inner state of the CPU. In other words, several aspects regarding the result of the instruction are stored in a special register that we cannot access directly, called eflags. There are instructions that can set or clear some flags in eflags, but we cannot write something like mov eflags, 2.

As its name implies, each bit in eflags is a flag that is activated (i.e. set to 1) if a certain condition is true about the result of the last executed instruction. We won't be using these flags per se with one exception: ZF - the zero flag. When active, it means that the result of the last instruction was... 0, duh! This is useful for testing if numbers are equal for example. We'll talk about this in the next section.

Conditional jumps

Now we know that there is an internal state of the CPU which is modified by each instruction, except for mov. We still need a way to leverage this state. We can do this via conditional jumps.

They are like jmp instructions, but the jump is made only when certain conditions are met. Otherwise, code execution continues from the next instruction. The general syntax of a conditional jump is

j[n]<cond> label

where the letter n is optional and means the jump will be made if the condition is not met.

cmp and test

We can use the regular arithmetic instructions that we've learned so far to modify eflags. But this has the drawback of also modifying our data. It would be great if we had a means to modify eflags without changing the data that we evaluate. We can do this using cmp and test.

cmp dest, src modifies eflags as if you were subtracting src from dst, but without modifying dst. This is great for testing if 2 things are equal, or for testing which is greater or lower.

test dest, src is similar to cmp, but modifies eflags according to the and instruction. This comes in handy when we want to check if a register is 0.

test rax, rax
jz rax_is_zero

is equivalent to

cmp rax, 0
jz rax_is_zero

Now let's have a look at some conditional jumps:

Conditional jumpMeaning
jz / jeJump if the Zero Flag is active
jnz / jneJump if the Zero Flag is not active
cmp rax, rbx
j[n]g
Jump if rax is (not) greater (signed) than rbx
cmp rax, rbx
j[n]a
Jump if rax is (not) greater (unsigned) than rbx
cmp rax, rbx
j[n]ge
Jump if rax is (not) greater (signed) or equal than rbx
cmp rax, rbx
j[n]ae
Jump if rax is (not) greater (unsigned) or equal than rbx
cmp rax, rbx
j[n]l
Jump if rax is (not) lower (signed) than rbx
cmp rax, rbx
j[n]b
Jump if rax is (not) lower (unsigned) than rbx
cmp rax, rbx
j[n]le
Jump if rax is (not) lower (signed) or equal than rbx
cmp rax, rbx
j[n]be
Jump if rax is (not) lower (unsigned) or equal than rbx

Loops

We can create loops simply by combining labels and conditional jumps. For example, for i in range(0, 10) from Python is equivalent to:

    xor rcx, rcx    ; i = rcx; same as mov rcx, 0
for_loop:
cmp rcx, 10
je done_loop ; verify i < 10

; The body of the for loop.

inc rcx ; rcx++
jmp for_loop ; re-evaluate the condition

done_loop:

Or alternatively, we can verify rcx < 10 at the end of the loop:

    xor rcx, rcx
for_loop:
; The body of the for loop.

inc rcx ; rcx++
cmp rcx, 10
jb for_loop ; verify i < 10

; The code here is executed only after the loop ends.