Processor Architecture
February 27, 2020 · View on GitHub
Table of Contents
Assembler exercises: ASM1 ASM2 ASM3
Architecture Overview
The general architecture of a Central Processing Unit (CPU) is shown in the diagram below:

- The program is stored in the form of binary machine instructions in the Program Memory which is usually attached externally to the Integrated Circuit containing the Centrol Processing Unit (CPU).
- The program flow is controlled by the Instruction Counter which in normal mode proceeds linearly, i.e. with increasing instruction addresses through the program memory.
- Each binary instruction addressed in the program memory by the instruction counter is split by the Decoder into its components: The Op Code like
addormovitself and optional Source and Destination storage locations which can be either Registers for fast access or addresses of data words that have to be fetched from external Data Memory, first. - Data memory can be further split into the Stack which temporarily stores the Local Variables of functions and the Heap which allocates Dynamic Memory requested e.g. by the malloc() system command.
- Instructions doing some computations on its input data make use of the Arithmetic Logic Unit (ALU) that performs arithmetic and bitwise operations on integer binary numbers in hardware. Floating Point operations are usually done by a special Floating Point Unit (FPU).
- Instruction executing a
jumpoperation either unconditionally or based on a precedingcmportestcomparison operation overwrite the instruction counter with the destination jump address. Also thecallandretinstructions used to enter and leave Functions change the instruction counter.
The x86-64 Registers
The x86_64 processor architecture has sixteen 64-bit registers that may also be accessed as 32-, 16-, or 8-bit registers. The register names are as follows:
| 64 bit (bytes 0..7) | 32 bit (bytes 0..3) | 16 bit (bytes 0..1) | 8 bit (byte 0) | function calls |
|---|---|---|---|---|
| %rax | %eax | %ax | %al | return value |
| %rcx | %ecx | %cx | %cl | parameter 4 |
| %rdx | %edx | %dx | %dl | parameter 3 |
| %rbx | %ebx | %bx | %bl | callee-saved |
| %rsi | %esi | %si | %sil | parameter 2 |
| %rdi | %edi | %di | %dil | parameter 1 |
| %rsp | %esp | %sp | %spl | stack pointer |
| %rbp | %ebp | %bp | %bpl | callee-saved* |
| %r8 | %r8d | %r8w | %r8b | parameter 5 |
| %r9 | %r9d | %r9w | %r9b | parameter 6 |
| %r10 | %r10d | %r10w | %r10b | |
| %r11 | %r11d | %r11w | %r11b | |
| %r12 | %r12d | %r12w | %r12b | callee-saved |
| %r13 | %r13d | %r13w | %r13b | callee-saved |
| %r14 | %r14d | %r14w | %r14b | callee-saved |
| %r15 | %r15d | %r15w | %r15b | callee-saved |
* %rbp is optionally used as base or frame pointer (disabled by -fomit-frame-pointer gcc option)
The Stack
The next C program example is used to show the allocation of local variables on the processor stack.
C 1: The C file incr.c implements a function incr() which increments its unsigned integer arguments v64, v32, v16 and v8 by one and returns the results in host order packed into a byte arrary buf:
1 #include "incr.h"
2
3 void incr(uint64_t v64, uint32_t v32, uint16_t v16, uint8_t v8, uint8_t buf[])
4 {
5 /* increment inputs by one */
6 v64 += 1;
7 v32 += 1;
8 v16 += 1;
9 v8 += 1;
10
11 /* cast results into buffer */
12 *((uint64_t*)&buf[ 0]) = v64;
13 *((uint32_t*)&buf[ 8]) = v32;
14 *((uint16_t*)&buf[12]) = v16;
15 *((uint8_t*) &buf[14]) = v8;
16 }
This function incr() is called by a main program ctypes.c
1 #include <stdlib.h>
2 #include <stdio.h>
3
4 #include "incr.h"
5
6 int main(int argc, char** argv)
7 {
8 uint64_t v64 = 0x0011223344556677;
9 uint32_t v32 = 0x8899aabb;
10 uint16_t v16 = 0xccdd;
11 uint8_t v8 = 0xee;
12 uint8_t buf[16];
13 const int len = sizeof(uint64_t) + sizeof(uint32_t) +
14 sizeof(uint16_t) + sizeof(uint8_t);
15 int i;
16
17 incr(v64, v32, v16, v8, buf);
18
19 printf("buf = 0x");
20 for (i = 0; i < len; i++)
21 {
22 printf("%02x", buf[i]);
23 }
24 printf("\n");
25 exit(0);
26 }
In order for the main program to be able to call the external function incr(), the function interface must be known by ctypes.c. This is done by including the header file incr.h.
1 #ifndef INCR_H_
2 #define INCR_H_
3
4 #include <stdint.h>
5
6 /**
7 * Increments each of the inputs v64, v32, v16, and v8 by one and returns
8 * the results in host order packed into buf
9 */
10 void incr(uint64_t v64, uint32_t v32, uint16_t v16, uint8_t v8, uint8_t buf[]);
11
12 #endif /** INCR_H_ */
The standard unsigned integer types are defined by the included header file stdint.h and the #ifndef C preprocessor macro encapsulating the header file prevents that incr.h is included multiple times.
gcc compiles both incr.c and ctypes.c and links both object files into an executable program ctypes
> gcc -ggdb -fomit-frame-pointer -fstack-protector-strong -o ctypes ctypes.c incr.c
If we execute ctypes on the command line we get the results in host order i.e. as little-endian unsigned integer values packed into a byte array buf:
> ./ctypes
buf = 0x7866554433221100bcaa9988deccef
ASM 1: We load the ctypes binary into the debugger, set breakpoints on lines 17 and 19 of main, i.e. just before and after incr() is called and then run the program
> gdb ctypes
(gdb) break 17
Breakpoint 1 at 0x771: file ctypes.c, line 17.
(gdb) break 19
Breakpoint 2 at 0x744: file ctypes.c, line 19.
(gdb) run
Starting program: /home/andi/cyber/Computing_Systems/ctypes
Breakpoint 1, main (argc=1, argv=0x7fffffffddc8) at ctypes.c:17
17 incr(v64, v32, v16, v8, buf);
Now we disassemble main and show the instructions up to the return from the incr() function call
(gdb) disassemble main
Dump of assembler code for function main:
0x00005555555546da <+0>: sub \$0x58,%rsp ; lower %rsp by 0x58
0x00005555555546de <+4>: mov %edi,0xc(%rsp) ; copy argc to stack
0x00005555555546e2 <+8>: mov %rsi,(%rsp) ; copy argv to stack
0x00005555555546e6 <+12>: mov %fs:0x28,%rax ; init canary
0x00005555555546ef <+21>: mov %rax,0x48(%rsp) ; copy canary to stack
0x00005555555546f4 <+26>: xor %eax,%eax ; zero %eax
0x00005555555546f6 <+28>: movabs \$0x11223344556677,%rax ; init v64 in %rax
0x0000555555554700 <+38>: mov %rax,0x28(%rsp) ; copy v64 to stack
0x0000555555554705 <+43>: movl \$0x8899aabb,0x20(%rsp) ; init v32 on stack
0x000055555555470d <+51>: movw \$0xccdd,0x1a(%rsp) ; init v16 on stack
0x0000555555554714 <+58>: movb \$0xee,0x19(%rsp) ; init v8 on stack
0x0000555555554719 <+63>: movl \$0xf,0x24(%rsp) ; init len=15 on stack
=> 0x0000555555554721 <+71>: movzbl 0x19(%rsp),%ecx ; copy v8 to %ecx (arg4)
0x0000555555554726 <+76>: movzwl 0x1a(%rsp),%edx ; copy v16 to %edx (arg3)
0x000055555555472b <+81>: lea 0x30(%rsp),%rdi ; copy buf addr to %rdi
0x0000555555554730 <+86>: mov 0x20(%rsp),%esi ; copy v32 to %esi (arg2)
0x0000555555554734 <+90>: mov 0x28(%rsp),%rax ; copy v64 to %rax
0x0000555555554739 <+95>: mov %rdi,%r8 ; copy buf addr to %r8 (arg5)
0x000055555555473c <+98>: mov %rax,%rdi ; copy v64 to %rdi (arg1)
0x000055555555473f <+101>: callq 0x5555555547a3 <incr> ; call incr
0x0000555555554744 <+106>: lea 0x159(%rip),%rdi # 0x5555555548a4
...
End of assembler dump.
Since we are handling 8-, 16-, 32- and 64-bit unsigned integers but the registers have an intrinsic size of 64-bits the following suffixes are used for most instructions to restrict the size of the operands:
| Suffix | Type | Operand Size | Example |
|---|---|---|---|
| b | byte | 8-bit integer | movb |
| w | word | 16-bit integer | movw |
| l | doubleword | 32-bit integer | movl |
| q | quadword | 64-bit integer | movq |
There are operations like movz that have two suffixes, e.g. movzbl or movzwl where the first suffix applies to the source and the second to the destination.
Next we check the current value of the stack pointer %rsp within main and determine the storage location of the local variables on the stack:
(gdb) print/x $rsp
\$1 = 0x7fffffffdc90
(gdb) print/x &argv
\$2 = 0x7fffffffdc90
(gdb) print/x &argc
\$3 = 0x7fffffffdc9c
(gdb) print/x &v8
\$4 = 0x7fffffffdca9
(gdb) print/x &v16
\$5 = 0x7fffffffdcaa
(gdb) print/x &v32
\$6 = 0x7fffffffdcb0
(gdb) print/x &v64
\$7 = 0x7fffffffdcb8
(gdb) print/x &buf
\$8 = 0x7fffffffdcc0
(gdb) print/x argv
\$9 = 0x7fffffffddc8
The last query above returns the content of the argv variable which contains a pointer to the argv[] array stored on the local stack of the shell that called the main program. All this collected address information will be later used to construct a linear map of the stack.
Since everything has been analyzed within the main program we now set a breakpoint just inside the incr function:
(gdb) break incr
Breakpoint 3 at 0x5555555547a3: file incr.c, line 4.
(gdb) continue
Continuing.
Breakpoint 3, incr (v64=9, v32=0, v16=25927, v8=0 '\000', buf=0x0) at incr.c:4
4 {
The info frame command gives a summary on the current stack and instruction pointer context:
(gdb) info frame
Stack level 0, frame at 0x7fffffffdc90:
rip = 0x5555555547a3 in incr (incr.c:4); saved rip = 0x555555554744
called by frame at 0x7fffffffdcf0
source language c.
Arglist at 0x7fffffffdc80, args: v64=9, v32=0, v16=25927, v8=0 '\000', buf=0x0
Locals at 0x7fffffffdc80, Previous frame's sp is 0x7fffffffdc90
Saved registers:
rip at 0x7fffffffdc88
We see that the instruction
0x000055555555473f <+101>: callq 0x5555555547a3 <incr> ; call incr
caused a jump from 0x55555555473f in main to 0x5555555547a3 in incr.
The info frame information shows that the %rip instruction pointer of main has been pushed to the stack at0x7fffffffdc88 by the callq instruction so that the program flow can be resumed in main at the next instruction 0x555555554744 after the return from incr.
(gdb) x/1xg 0x7fffffffdc88
0x7fffffffdc88: 0x0000555555554744
The assembly code of the function incr looks as follows:
(gdb) disassemble incr
Dump of assembler code for function incr:
=> 0x00005555555547a3 <+0>: mov %rdi,-0x8(%rsp) ; copy v64 to stack
0x00005555555547a8 <+5>: mov %esi,-0xc(%rsp) ; copy v32 to stack
0x00005555555547ac <+9>: mov %ecx,%eax ; copy v8 to %eax
0x00005555555547ae <+11>: mov %r8,-0x20(%rsp) ; copy buf addr to stack
0x00005555555547b3 <+16>: mov %dx,-0x10(%rsp) ; copy v16 to stack
0x00005555555547b8 <+21>: mov %al,-0x14(%rsp) ; copy v8 to stack
0x00005555555547bc <+25>: addq \$0x1,-0x8(%rsp) ; add 1 to v64
0x00005555555547c2 <+31>: addl \$0x1,-0xc(%rsp) ; add 1 to v32
0x00005555555547c7 <+36>: addw \$0x1,-0x10(%rsp) ; add 1 to v16
0x00005555555547cd <+42>: addb \$0x1,-0x14(%rsp) ; add 1 to v8
0x00005555555547d2 <+47>: mov -0x20(%rsp),%rax ; copy buf addr to %rax
0x00005555555547d7 <+52>: mov -0x8(%rsp),%rdx ; copy v64 to %rdx
0x00005555555547dc <+57>: mov %rdx,(%rax) ; copy v64 to &buf[0]
0x00005555555547df <+60>: mov -0x20(%rsp),%rax ; copy buf addr to %rax
0x00005555555547e4 <+65>: lea 0x8(%rax),%rdx ; load buf addr+8 to %rdx
0x00005555555547e8 <+69>: mov -0xc(%rsp),%eax ; copy v32 to %eax
0x00005555555547ec <+73>: mov %eax,(%rdx) ; copy v32 to &buf[8]
0x00005555555547ee <+75>: mov -0x20(%rsp),%rax ; copy buf addr to %rax
0x00005555555547f3 <+80>: lea 0xc(%rax),%rdx ; load buf addr+12 to %rdx
0x00005555555547f7 <+84>: movzwl -0x10(%rsp),%eax ; copy v16 to %eax
0x00005555555547fc <+89>: mov %ax,(%rdx) ; copy v16 to &buf[12]
0x00005555555547ff <+92>: mov -0x20(%rsp),%rax ; copy buf addr to %rax
0x0000555555554804 <+97>: lea 0xe(%rax),%rdx ; load buf addr+14 to %rdx
0x0000555555554808 <+101>: movzbl -0x14(%rsp),%eax ; copy v8 to %eax
0x000055555555480d <+106>: mov %al,(%rdx) ; copy v8 to &buf[14]
0x000055555555480f <+108>: nop ; no operation
0x0000555555554810 <+109>: retq ; return
End of assembler dump.
Within the incr function we determine the value of the stack pointer %rsp and the storage location of all the local variables on the stack:
(gdb) print/x $rsp
\$10 = 0x7fffffffdc88
(gdb) print/x &v64
\$11 = 0x7fffffffdc80
(gdb) print/x &v32
\$12 = 0x7fffffffdc7c
(gdb) print/x &v16
\$13 = 0x7fffffffdc78
(gdb) print/x &v8
\$14 = 0x7fffffffdc74
(gdb) print/x &buf
\$15 = 0x7fffffffdc68
From all the collected address information we can now construct a map of the processor stack:
| stack address | %rsp offset | variable | owner |
|---|---|---|---|
0x7fffffffddc8 | argv[0] | shell | |
0x7fffffffdce8 | saved %rip | ← %rsp of shell | |
0x7fffffffdcd8 | +0x48 | canary | main |
0x7fffffffdcc0 | +0x30 | buf | main |
0x7fffffffdcb8 | +0x28 | v64 | main |
0x7fffffffdcb4 | +0x24 | len | main |
0x7fffffffdcb0 | +0x20 | v32 | main |
0x7fffffffdcaa | +0x1a | v16 | main |
0x7fffffffdca9 | +0x19 | v8 | main |
0x7fffffffdc9c | +0x0c | argc | main |
0x7fffffffdc90 | +0x00 | argv | ← %rsp of main |
0x7fffffffdc88 | -0x00 | saved %rip | ← %rsp of incr |
0x7fffffffdc80 | -0x08 | v64 | incr |
0x7fffffffdc7c | -0x0c | v32 | incr |
0x7fffffffdc78 | -0x10 | v16 | incr |
0x7fffffffdc74 | -0x14 | v8 | incr |
0x7fffffffdc68 | -0x20 | buf addr | incr |
We now proceed to the next breakpoint which is located just after the return from function `incr'.
(gdb) continue
Continuing.
Breakpoint 2, main (argc=1, argv=0x7fffffffddc8) at ctypes.c:19
19 printf("buf = 0x");
gdb) x/i $rip
=> 0x555555554744 <main+106>: lea 0x159(%rip),%rdi # 0x5555555548a4
We see that the instruction pointer of main has been retrieved from the stack by the retq instruction and that the execution resumes at 0x555555554744.
(gdb) x/15xb 0x7fffffffdcc0
0x7fffffffdcc0: 0x78 0x66 0x55 0x44 0x33 0x22 0x11 0x00
0x7fffffffdcc8: 0xbc 0xaa 0x99 0x88 0xde 0xcc 0xef
(gdb) continue
Continuing.
buf = 0x7866554433221100bcaa9988deccef
[Inferior 1 (process 26884) exited normally]
Before leaving the main program we dump the contents of the buf byte array containing the final result.
Code Optimization
Now we are going to have a look at how object code is improved through applying compiler optimization using the -O2 gcc option:
> gcc -ggdb -fomit-frame-pointer -fstack-protector-strong -O2 -o ctypes ctypes.c incr.c
ASM 2: Let's examine the optimized assembly code of main and incr
(gdb) disassemble main
Dump of assembler code for function main:
0x00005555555545f0 <+0>: push %r12 ; save %r12 on stack
0x00005555555545f2 <+2>: push %rbp ; save %rbp on stack
=> 0x00005555555545f3 <+3>: mov \$0xee,%ecx ; init v8 in %ecx (arg4)
0x00005555555545f8 <+8>: push %rbx ; save %rbx on stack
0x00005555555545f9 <+9>: mov \$0xccdd,%edx ; init v16 in %edx (arg3)
0x00005555555545fe <+14>: mov \$0x8899aabb,%esi ; init v32 in %esi (arg2)
0x0000555555554603 <+19>: movabs \$0x11223344556677,%rdi ; init v64 in %rdi (arg1)
0x000055555555460d <+29>: lea 0x229(%rip),%rbp # 0x55555555483d
0x0000555555554614 <+36>: sub \$0x20,%rsp ; lower %rsp by 0x20
0x0000555555554618 <+40>: mov %rsp,%rbx ; copy buf addr to %rbx
0x000055555555461b <+43>: mov %rbx,%r8 ; copy buf addr to %r8 (arg5)
0x000055555555461e <+46>: lea 0xf(%rbx),%r12 ; copy addr above buf to %r12
0x0000555555554622 <+50>: mov %fs:0x28,%rax ; init canary
0x000055555555462b <+59>: mov %rax,0x18(%rsp) ; copy canary to stack
0x0000555555554630 <+64>: xor %eax,%eax ; zero %eax
0x0000555555554632 <+66>: callq 0x555555554790 <incr> ; call incr
0x0000555555554637 <+71>: lea 0x1f6(%rip),%rsi # 0x555555554834
...
End of assembler dump
(gdb) disassemble incr
Dump of assembler code for function incr:
=> 0x0000555555554790 <+0>: add \$0x1,%rdi ; add 1 to v64
0x0000555555554794 <+4>: add \$0x1,%esi ; add 1 to v32
0x0000555555554797 <+7>: add \$0x1,%edx ; add 1 to v16
0x000055555555479a <+10>: add \$0x1,%ecx ; add 1 to v8
0x000055555555479d <+13>: mov %rdi,(%r8) ; copy v64 to %buf[ 0]
0x00005555555547a0 <+16>: mov %esi,0x8(%r8) ; copy v32 to %buf[ 8]
0x00005555555547a4 <+20>: mov %dx,0xc(%r8) ; copy v16 to %buf[12]
0x00005555555547a9 <+25>: mov %cl,0xe(%r8) ; copy v8 to %buf[14]
0x00005555555547ad <+29>: retq ; return
End of assembler dump
With the exception of the large buf byte array all variables are strictly kept in fast processor registers. This is also visible in the stack memory map below:
| stack address | %rsp offset | variable | owner |
|---|---|---|---|
0x7fffffffdd18 | saved %rip | ← %rsp of shell | |
0x7fffffffdd10 | saved %r12 | main | |
0x7fffffffdd08 | saved %rbp | main | |
0x7fffffffdd00 | saved %rbx | main | |
0x7fffffffdcf8 | +0x18 | canary | main |
0x7fffffffdce0 | +0x00 | buf | ← %rsp of main |
0x7fffffffdcd8 | -0x00 | saved %rip | ← %rsp of incr |
Actually the reason that the incr() function was moved to a separate file incr.c was the fact that the optimizing compiler would just have got rid of the function call and implemented the functionality directly in main as in-line code.
The Heap
We slightly modify the ctypes program in order to show the use of dynamically generated data objects that are stored on the heap.
C 3: The C file dyn_incr.c implements a function incr() which increments its unsigned integer arguments v64, v32, v16 and v8 by an offset n and returns the results in network order in a dynamically allocated byte arrary :
1 #include <stdlib.h>
2 #include <arpa/inet.h>
3
4 #include "dyn_incr.h"
5
6 uint8_t* incr(uint64_t v64, uint32_t v32, uint16_t v16, uint8_t v8, uint8_t n)
7 {
8 uint8_t *buf = (uint8_t*)malloc(16);
9
10 /* increment inputs by n */
11 v64 += n;
12 v32 += n;
13 v16 += n;
14 v8 += n;
15
16 /* cast results in network order into buffer */
17 *((uint32_t*)&buf[0]) = htonl(v64 >> 32);
18 *((uint32_t*)&buf[4]) = htonl(v64 & 0x0000000ffffffff);
19 *((uint32_t*)&buf[8]) = htonl(v32);
20 *((uint16_t*)&buf[12]) = htons(v16);
21 *((uint8_t*) &buf[14]) = v8;
22
23 return buf;
24 }
This change requires an update of the incr() interface definition in the dyn_incr.h header file
1 #ifndef DYN_INCR_H_
2 #define DYN_INCR_H_
#include <stdint.h>
1 /**
2 * Increments each of the inputs v64, v32, v16, and v8 by n and returns
3 * the results in a network order packed byte array
4 */
5 uint8_t* incr(uint64_t v64, uint32_t v32, uint16_t v16, uint8_t v8, uint8_t n);
6
7 #endif /** DYN_INCR_H_ */
In the main program dyn_ctypes.c we call the function incr() twice with different increments and print the resulting outputs.
1 #include <stdlib.h>
2 #include <stdio.h>
3
4 #include "dyn_incr.h"
5
6 /* global uninitialized variables */
7 uint8_t buf[16]
8
9 /* global initialized variables */
10 uint64_t v64 = 0x0011223344556677;
11 uint32_t v32 = 0x8899aabb;
12
13 /* global constants */
14 const uint16_t v16 = 0xccdd;
15 const uint8_t v8 = 0xee;
16
17 int main(int argc, char** argv)
18 {
19 uint8_t *buf1, *buf2;
20 const int len = sizeof(uint64_t) + sizeof(uint32_t) +
21 sizeof(uint16_t) + sizeof(uint8_t);
22 int i;
23
24 buf1 = incr(v64, v32, v16, v8, 1);
25 buf2 = incr(v64, v32, v16, v8, 2);
26
27 printf("buf1 = 0x");
28 for (i = 0; i < len; i++)
29 {
30 printf("%02x", buf1[i]);
31 }
32 printf("\n");
33
34 printf("buf2 = 0x");
35 for (i = 0; i < len; i++)
36 {
37 printf("%02x", buf2[i]);
38 }
39 printf("\n");
40
41 free(buf1);
42 free(buf2);
43 exit(0);
44 }
We compile the program with optimization level -O2 and execute dyn_ctypes
> gcc -ggdb -fomit-frame-pointer -fstack-protector-strong -O2 -o dyn_ctypes dyn_ctypes.c dyn_incr.c
> ./dyn_ctypes
buf1 = 0x00112233445566788899aabcccdeef
buf2 = 0x00112233445566798899aabdccdff0
As you can clearly see the integer variables packed into the buffer are now in network order with most significant byte stored first..
ASM 3: We analyze dyn_ctypes by setting a breakpoint at line 23 of dyn_incr.c, i.e. just before returning to the main program.
> gdb dyn_ctypes
(gdb) break dyn_incr.c:23
Breakpoint 1 at 0x8f2: file dyn_incr.c, line 23.
(gdb) run
Starting program: /home/andi/cyber/Computing_Systems/dyn_ctypes
Breakpoint 1, incr (v64=4822678189205112, v32=<optimized out>, v16=52446, v8=<optimized out>, n=<optimized out>) at dyn_incr.c:24
24 }
The optimized machine code uses a lot of registers so that some of them must be saved to the stack first. We also notice with surprise that the call of the htonl() library function has been replaced by the bswap instruction and htons() by a ror 0x8 instruction which rotates a 16-bit word by 8 bits thus swapping the order of the two bytes.
Dump of assembler code for function incr:
0x0000555555554890 <+0>: push %r15 ; save %r15 on stack
0x0000555555554892 <+2>: push %r14 ; save %r14 on stack
0x0000555555554894 <+4>: mov %r8d,%r15d ; copy n to %r15d
0x0000555555554897 <+7>: push %r13 ; save %r13 on stack
0x0000555555554899 <+9>: push %r12 ; save %r12 on stack
0x000055555555489b <+11>: mov %rdi,%r14 ; copy v64 to %r14
0x000055555555489e <+14>: push %rbp ; save %rbp on stack
0x000055555555489f <+15>: push %rbx ; save %rbx on stack
0x00005555555548a0 <+16>: mov \$0x10,%edi ; init %edi (arg1) wit 16 (malloc)
0x00005555555548a5 <+21>: mov %edx,%r12d ; copy v16 to %r12d
0x00005555555548a8 <+24>: mov %esi,%r13d ; copy v32 to %r13d
0x00005555555548ab <+27>: mov %ecx,%ebp ; copy v8 to %ebp
0x00005555555548ad <+29>: sub \$0x8,%rsp ; lower %rsp by 0x08
0x00005555555548b1 <+33>: add %r15d,%ebp ; add n to v8
0x00005555555548b4 <+36>: callq 0x555555554640 <malloc@plt> ; alloc buf (%rax)
0x00005555555548b9 <+41>: movzbl %r15b,%edi ; copy n to %edi
0x00005555555548bd <+45>: movzbl %r15b,%esi ; copy n to %esi
0x00005555555548c1 <+49>: mov %bpl,0xe(%rax) ; copy v8 to &buf[14]
0x00005555555548c5 <+53>: add %r14,%rdi ; add n to v64
0x00005555555548c8 <+56>: add %esi,%r13d ; add n to v32
0x00005555555548cb <+59>: movzbl %r15b,%esi ; copy n to %esi
0x00005555555548cf <+63>: mov %rdi,%rdx ; copy v64 to %rdx
0x00005555555548d2 <+66>: bswap %r13d ; swap byte order of v32
0x00005555555548d5 <+69>: bswap %edi ; swap byte order of v64-low
0x00005555555548d7 <+71>: shr \$0x20,%rdx ; v64 >> 32
0x00005555555548db <+75>: mov %edi,0x4(%rax) ; copy v64-low to &buf[4]
0x00005555555548de <+78>: mov %r13d,0x8(%rax) ; copy v32 to %buf[8]
0x00005555555548e2 <+82>: bswap %edx ; swap byte order of v64-hi
0x00005555555548e4 <+84>: mov %edx,(%rax) ; copy v64-hi to &buf[0]
0x00005555555548e6 <+86>: lea (%rsi,%r12,1),%edx ; add n to v16
0x00005555555548ea <+90>: ror \$0x8,%dx ; rotate v16 by 8 bits
0x00005555555548ee <+94>: mov %dx,0xc(%rax) ; copy v16 to &buf[12]
=> 0x00005555555548f2 <+98>: add \$0x8,%rsp ; increase %rsp by 0x08
0x00005555555548f6 <+102>: pop %rbx ; restore %rbx
0x00005555555548f7 <+103>: pop %rbp ; restore %rbp
0x00005555555548f8 <+104>: pop %r12 ; restore %r12
0x00005555555548fa <+106>: pop %r13 ; restore %r13
0x00005555555548fc <+108>: pop %r14 ; restore %r14
0x00005555555548fe <+110>: pop %r15 ; restore %r15
0x0000555555554900 <+112>: retq ; return
End of assembler dump.
The info frame shows that besides the %rip instruction pointer all six callee-saved registers have been pushed on the stack.
(gdb) info frame
Stack level 0, frame at 0x7fffffffdcd0:
rip = 0x5555555548f2 in incr (dyn_incr.c:24); saved rip = 0x5555555546b1
called by frame at 0x7fffffffdd00
source language c.
Arglist at 0x7fffffffdc88, args: v64=4822678189205112, v32=<optimized out>, v16=52446, v8=<optimized out>, n=<optimized out>
Locals at 0x7fffffffdc88, Previous frame's sp is 0x7fffffffdcd0
Saved registers:
rbx at 0x7fffffffdc98, rbp at 0x7fffffffdca0, r12 at 0x7fffffffdca8,
r13 at 0x7fffffffdcb0, r14 at 0x7fffffffdcb8, r15 at 0x7fffffffdcc0,
rip at 0x7fffffffdcc8
We check the heap address of the byte buffer returned by incr()
(gdb) print/z &buf
Address requested for identifier "buf" which is in register $rax
(gdb) print/z buf
\$1 = 0x0000555555756260
and then proceed to the second call of incr()
(gdb) continue
Continuing.
Breakpoint 1, incr (v64=4822678189205113, v32=<optimized out>, v16=52447, v8=<optimized out>, n=<optimized out>) at dyn_incr.c:24
24 }
(gdb) print/z &buf
Address requested for identifier "buf" which is in register $rax
(gdb) print/z buf
\$2 = 0x0000555555756280
Then we return to the main program
(gdb) break dyn_ctypes.c:27
Breakpoint 2 at 0x55555555470e: file dyn_ctypes.c, line 27.
(gdb) continue
Continuing.
Breakpoint 2, main (argc=<optimized out>, argv=<optimized out>) at dyn_ctypes.c:28
28 for (i = 0; i < len; i++)
Again we notice that the optimized code needs a lot of registers
(gdb) disassemble main
Dump of assembler code for function main:
0x0000555555554680 <+0>: mov 0x20098a(%rip),%esi # 0x555555755010 <v32> (arg2)
0x0000555555554686 <+6>: mov 0x20098b(%rip),%rdi # 0x555555755018 <v64> (arg1)
0x000055555555468d <+13>: mov \$0x1,%r8d ; set n to 1 in %r8d (arg5)
0x0000555555554693 <+19>: push %r14 ; save %r14
0x0000555555554695 <+21>: push %r13 ; save %r13
0x0000555555554697 <+23>: mov \$0xee,%ecx ; init v8 in %ecx (arg4)
0x000055555555469c <+28>: push %r12 ; save %r12
0x000055555555469e <+30>: push %rbp ; save %rpb
0x000055555555469f <+31>: mov \$0xccdd,%edx ; init v16 in %edx (arg3)
0x00005555555546a4 <+36>: push %rbx ; save %rbx
0x00005555555546a5 <+37>: lea 0x2f2(%rip),%r13 # 0x55555555499e
0x00005555555546ac <+44>: callq 0x555555554890 <incr> ; call incr
0x00005555555546b1 <+49>: mov 0x200959(%rip),%esi # 0x555555755010 <v32> (arg2)
0x00005555555546b7 <+55>: mov 0x20095a(%rip),%rdi # 0x555555755018 <v64> (arg1)
0x00005555555546be <+62>: mov %rax,%r12 ; copy buf1 addr to %r12
0x00005555555546c1 <+65>: mov \$0x2,%r8d ; set n to 2 in %r8d (arg5)
0x00005555555546c7 <+71>: mov \$0xee,%ecx ; init v8 in %ecx (arg4)
0x00005555555546cc <+76>: mov \$0xccdd,%edx ; init v16 in %edx (arg3)
0x00005555555546d1 <+81>: lea 0xf(%r12),%r14 ; copy addr above buf1 to %r14
0x00005555555546d6 <+86>: mov %r12,%rbx ; copy buf1 addr to %rbx
0x00005555555546d9 <+89>: callq 0x555555554890 <incr> ; call incr
0x00005555555546de <+94>: lea 0x2af(%rip),%rsi # 0x555555554994
0x00005555555546e5 <+101>: mov %rax,%rbp ; copy buf2 addr to %rbp
...
End of assembler dump.
The main program pushes the %rip and five callee-saved registers on the stack
(gdb) info frame
Stack level 0, frame at 0x7fffffffdd00:
rip = 0x55555555470e in main (dyn_ctypes.c:28); saved rip = 0x7ffff7a05b97
source language c.
Arglist at 0x7fffffffdcc8, args: argc=<optimized out>, argv=<optimized out>
Locals at 0x7fffffffdcc8, Previous frame's sp is 0x7fffffffdd00
Saved registers:
rbx at 0x7fffffffdcd0, rbp at 0x7fffffffdcd8, r12 at 0x7fffffffdce0,
r13 at 0x7fffffffdce8, r14 at 0x7fffffffdcf0, rip at 0x7fffffffdcf8
As an additional source code change we have defined v64 and v32 as global initialized variables and v16 and v8 as global constants. We also added an unused global initialized byte array buf[16] that is automatically initialized to all zeros when the C program starts up. As you can see all these global [or static] variables and constants do not reside on the stack but are loaded just above the program memory.
(gdb) print/x &v64
\$3 = 0x555555755018
(gdb) print/x &v32
\$4 = 0x555555755010
(gdb) print/x &v16
\$5 = 0x5555555549b0
(gdb) print/x &v8
\$6 = 0x5555555549ae
(gdb) print/x &buf
\$7 = 0x555555755040
(gdb) x/16b buf
0x555555755040 <buf>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x555555755048 <buf+8>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
We can also verify that addresses of the dynamically allocated byte arrays buf1 and buf2 have been made available in the main program.l
(gdb) print/x &buf1
Address requested for identifier "buf1" which is in register $r12
(gdb) print/x buf1
\$8 = 0x555555756260
(gdb) x/15b 0x555555756260
0x555555756260: 0x00 0x11 0x22 0x33 0x44 0x55 0x66 0x78
0x555555756268: 0x88 0x99 0xaa 0xbc 0xcc 0xde 0xef
(gdb) print/x &buf2
Address requested for identifier "buf2" which is in register $rbp
(gdb) print/x buf2
\$9 = 0x555555756280
(gdb) x/15xb 0x555555756280
0x555555756280: 0x00 0x11 0x22 0x33 0x44 0x55 0x66 0x79
0x555555756288: 0x88 0x99 0xaa 0xbd 0xcc 0xdf 0xf0
And as the last component in our memory puzzle the command info sharedlibrary shows the location of the shared libraries needed to run the dyn_ctypes program:
(gdb) info sharedlibrary
From To Shared Object Library
0x00007ffff7dd5f10 0x00007ffff7df4b20 /lib64/ld-linux-x86-64.so.2
0x00007ffff7a052d0 0x00007ffff7b7dc3c /lib/x86_64-linux-gnu/libc.so.6
Upon leaving the main program these two buffers are printed
(gdb) continue
Continuing.
buf1 = 0x00112233445566788899aabcccdeef
buf2 = 0x00112233445566798899aabdccdff0
[Inferior 1 (process 20469) exited normally]
Virtual Memory Map
Based on all the collected address information we can now draw the following virtual memory map of an x86-64 process:
| memory address | content | type |
|---|---|---|
0x7fffffffdcf8 | saved %rip | stack main |
0x7fffffffdcf0 | saved %r14 | stack main |
0x7fffffffdce8 | saved %r13 | stack main |
0x7fffffffdce0 | saved %r12 | stack main |
0x7fffffffdcd8 | saved %rbp | stack main |
0x7fffffffdcd0 | saved %rbx | ← %rsp of main |
0x7fffffffdcc8 | saved %rip | stack incr |
0x7fffffffdcc0 | saved %r15 | stack incr |
0x7fffffffdcb8 | saved %r14 | stack incr |
0x7fffffffdcb0 | saved %r13 | stack incr |
0x7fffffffdca8 | saved %r12 | stack incr |
0x7fffffffdca0 | saved %rbp | stack incr |
0x7fffffffdc98 | saved %rbx | ← %rsp of incr |
| ↓ | stack grows downwards | |
0x7ffff7df4b20 | shared libraries | |
0x7ffff7dd5f10 | /lib64/ld-linux-x86-64.so.2 | shared libraries |
0x7ffff7b7dc3c | shared libraries | |
0x7ffff7a052d0 | /lib/x86_64-linux-gnu/libc.so.6 | shared libraries |
| ↑ | heap grows upwards | |
0x555555756280 | buf2 | heap |
0x555555756260 | buf1 | heap |
0x555555755040 | buf | global/static uninitialized variables |
0x555555755018 | v64 | global/static initialized variables |
0x555555755010 | v32 | global/static initialized variables |
0x5555555549b0 | v16 | global/static constants |
0x5555555549ae | v8 | global/static constants |
0x555555554900 | text incr | |
0x555555554890 | incr | text incr |
0x555555554777 | text main | |
0x555555554680 | main | text main |
Author: Andreas Steffen CC BY 4.0