Getting Started with Assembly
A while ago I got curious and started to wonder how computers actually work. The answer was assembly. Instead of reading a 500-page assembly manual, I created a small collection of simple programs of which each taught me a different concept. This collection became BergerAPI/asm, and this post introduces you to assembly.
Prerequisites
Before you start writing assembly, you must have (some of) these tools installed. This post focuses on using NASM (Netwide Assembler) in an x86-64 Linux environment.
-
NASM (Netwide Assembler)
sudo apt-get install nasm -
GNU Binutils (Includes the linker
ldandobjdump)sudo apt-get install binutils -
GNU Make (already on most Linux systems)
sudo apt-get install build-essential -
GDB (optional but highly recommended)
sudo apt-get install gdb -
gcc (optional, for mixing assembly with C later)
sudo apt-get install gcc
Key Concepts
Program Structure
The code is structured using sections:
section .datafor initialized global and static data (likechar *msg = "hello";)section .textfor the executable code instructionssection .bssfor uninitialized global and static data
One can also use labels, which give a name to a specific memory address to structure their code. This can be used for program entry, functions or points to jump to or call.
_start:
; instructions
loop:
; instructions
Instructions tell the CPU what to do, and generally look like this (dummy code):
mov rax, 1
push ebx
ret
Registers
Registers are used to store data and manipulate it using instructions like mov
and add. One can write to it and then read from it. There are many registers.
I listed the most important here:
| Register | Purpose |
|---|---|
| rax | return value, syscall number, accumulator |
| rcx | 4th argument of a function; Loop counter |
| rdx | 3rd argument of a function |
| rsi | 2nd argument of a function |
| rdi | 1st argument of a function |
| rsp | stack pointer (top of the stack) |
| r8-r15 | General purpose |
Stack
The stack is a region of memory for storing temporary data, like function parameters and local variables. The stack grows downwards in memory, meaning that as you push data onto the stack, the stack pointer (RSP) decreases.
Using push and pop we can manipulate the stack.
push raxadds an element to the top of the stack and decreaes the stack pointer by the size of the data being pushedpop rbxremoves the element at the top of the stack and moves it into the specified register. Therefore it increases the stack pointer by the size of the data being popped.
The data section
Use section .data to place initialized, writable global data in your binary.
Caution: Put large uninitialized buffers in .bss using resb/resq (keeps
binary smaller).
section .data
Basic directives
dbdefines byte(s) (useful for strings and raw bytes)dwdefines 2‑byte wordsdddefines 4‑byte doublewordsdqdefines 8‑byte quadwords (useful for 64‑bit integers/pointers)$is the current assembly address$ - labelcomputes the number of bytes from the specified label to the current assembly addresstimes n db 0repeats a value n times
Strings and lengths:
msg: db "Hello, world!", 0x0a ; newline-terminated string
len: equ $ - msg ; compute length at assemble time
Numeric values (note little-endian layout):
val32: dd 0x11223344 ; bytes will be 44 33 22 11 in memory
val64: dq 0x1122334455667788
Arrays:
arr: dq 1, 2, 3, 4 ; array of 64-bit integers
zeros: times 8 db 0 ; 8 zero bytes
First Steps with Hello World
This example will print "Hello World" to the console and return the exit code 0, which stands for success.
section .data
msg db "Hello, World!", 0x0a ; defines a byte array containing the text "Hello, World!" plus a newline (0x0a)
len equ $ - msg ; resolves to 14 (length of "Hello, World!")
section .text
global _start
_start:
; write syscall: rax=1, rdi=fd, rsi=buf, rdx=count
mov rax, 1 ; syscall write
mov rdi, 1 ; file descriptor (stdout)
mov rsi, msg ; pointer to message
mov rdx, len ; message length
syscall ; write(stdout, message, message_length)
; exit syscall: rax=60, rdi=code
mov rax, 60 ; syscall exit
mov rdi, 0 ; exit code 0 (success)
syscall