A short note about macros

Good day folks.
Take a look at the following simple c source code:

So, how the macros after pre-processor directive expand in the main function?
In the unix world there is a tool called cpp (man cpp) which may help to answer this question :)

As you can see the NEW(foo, bar) macro has been expanded to Object_new(sizeof(foo), fooProto, bar). The syntax T##Proto says to "concat Proto at the end of T", so if you have NEW(foo, bar) then it'd make fooProto there.
The _(N) macro is expanded to tmp.blah when we call it by _(blah), that's it. Good luck.
Tags: ,

emacs gud library

To see what's going on inside entrails which compiler generated I use GUD module which is distributed with emacs by default.
First of all I load the source code by C-x file_name, then I divide the main screen into several windows by C-x-2, C-x-3.
The С-x-o keystroke is used to switch between windows.
The GUD module has registers and disassembly buffers which allows you to see the state of registers and disassembled source code during runtime. You can invoke these buffers by M-x-gdb-display-registers-buffer and M-x-gdb-display-disassembly-buffer.

On the picture below you can see gdb prompt on the left window above, the assembler source code on the right window above, the registers and disassembly buffers on the left and right bottom windows accordingly.
Tags: , , ,

reverse a string

One of the tasks on the technical interview for beginner C developer in my company is to reverse the string. There are two ways to accomplish this task: using arrays of characters or using pointers. The first way is boring, hence, I would like to examine the pointers approach. Let's see how the compiler translates C source code to assembly.

The output of the main section of the objdump which I have commented a little bit is following:

On guy from the internet asked once what the pointer is. So, I would like to accent attention from the part of source code where the pointers are defined.

These two lines are translated to assembly as:

The lea instruction (0x4004f9 address offset) just calculates and loads the address of [rbp-0x30] displacement into the rax register. This address is where our array of chars starts (the 0-th element). Then this address is placed to [rbp-0x10] displacement — where we defined *first pointer. The sizeof() counts the elements of array and returns an integer value. But as you know strings in C language are terminated by null char (\0), that's why we have to subtract 1 from the array to get the last character in array. In the second part of assembly source code we just fetches the address which was loaded into [rbp-0x10] displacement by lea instruction from the memory and "load" it to rax register again (0x400501 address offset). To get to the end of the array we have to append 0xb to the start address because I've hard-coded the "Hello world" string which has 11 characters + \0 which we cut off by -1. This all is done by add $0xb,%rax instruction. After these manipulations the mov %rax, -0x18(%rbp) instruction just "loads" the calculated address offset of the last character from the array to the address displacement [rbp-0x18].

I won't explain the other part of the code - it's simple enough and I examined while loop and setting up registers before calling printf() function in the previous post. If you have a questions just ask :)
Tags: ,

Bitwise operators

Good day folks. Today I would like to talk about bitwise operators in C. According to K&R (page 45, pt. 2.9 Bitwise operators) there are six operators for bit manipulation:
— & AND;
— | OR;
— ^ XOR;
— << left shift;
— >> right shift;
— ~ one's complement.
These may be only applied to char, short, int and long, whether signed or unsigned. Also the short examples for each operator are present in that section and I want to disassemble every example and look how this operators look and translated to assembly.

Before starting discussion of bitwise logic I have to say a word about bit numbering. By convention, bits in assembly are numbered from 0, at least-significant bit in byte, word, dword or qword (8, 16, 32 and 64 bits accordingly). The least-significant bit is the one with the least value in the binary number system. It's also the bit on the far right if you write the value down as a binary number in the conventional manner. I've drawn this below for byte representing:

Actually, there is no matter how many bits you are dealing with as I sad above: bytes, words, dwords or more. Bit 0 is always on the right-hand end, and the bit numbers increase toward the left. That's all.

In Boolean logic we manipulate True (1 bit) and Fals (0 bit) abstract values. The condition_1 and condition_2 will be considered True of both condition_1 and condition_2 are true. If either condition is False, the result will be False. That is how AND operator works. I've summarized all combinations for AND operator in so-called truth table below:

Read the 0 bit as False and the 1 bit as True.

So, to see how the AND operator in C is represented in assembly language I take the source from K&R and disassemble it.

it is translated to assembly like:

If you recall from my previous post the first two instructions by offset 0x40048c and 0x40048d set up the stack frame. The mov instruction by 0x400490 address offset just assigns decimal 251 to the 'n' variable. BTW, char is 1 byte size. Then in the offset 0x40097 you may admire AND instruction :)
The K&R book explains this as 'sets to zero all but the low-order 7 bits of n'.
I think some explanation is needed here. The 0177 number is the octal representation of decimal 0x7f or the decimal 127 number. If you convert it to binary it will be filled by 1 from the 0-th to the 6-th bit (the count starts from 0) — 01111111. And after AND instruction the decimal 123 will be assigned to the 'n' variable. Are you confusing? There is no magic here :) The decimal 251 is the 11111011 in binary, we compare bit by bit and according to the AND truth table the comparison will be:

The OR instruction works identically to AND, structurally. But the truth table is different: OR is satisfied that at least one operand has a 1 value.

The result of the XOR (exclusive or) instruction will be 1 if the two operands are different: 1 and 0, 0 and 1. Here is the truth table for XOR:

There is a little trick for beginners with XOR :) If you XOR both operands as the same register, that register will be cleared to 0:

To investigate left and right shift operation let me introduce a very simple C source code:

and here is the result of execution this program:

I defined one byte variable a with decimal 5 value which is 00000101 in binary. After executing left shift operation a << 1 it became decimal 10 which is binary 00001010. Keep in mind that that each digit in a binary number is one bit. A 0 has been inserted at the right-hand end, and the whole shebang has been bumped toward the left by one digit. The last bit shifted out of the left end of the binary number is bumped into a temporary bucket for bits called the Carry flag, enerally abbreviated as CF. The right shift works exactly as a left shift but moves bit in the opposite direction - left :)

And the last bitwise operator - one's complement. In the K&R book there is a statement: "The unary operator ~ yields the one's complement of an integer; that is, it converts each 1-bit into a 0-bit and vise versa". In other words - it's NOT operator in assembly language. Take a look on the following truth table:

And to show you how it looks "under" the compiler I just ran the simple C source code in the gdb:

Here is the gdb output:

We are working with char variable that's why it has the 8-th bit length and if you can see from the gdb output after NOT instruction the 00000011 bits are inverted to 11111100, that's it.

Intel vs AT&T syntax

While I've been learning assembly language I've used to use Intel syntax because the author of the book which I read gerally used Intel syntax there. From my point of view it's easy to read for newcomers in assembly world. But if you look around through the internet forums, IRC channels, mailing lists, etc., you will find that the true maniacs use at&at syntax. Therefore, if you're going to become natural samurai of assembly language you should know the difference between Intel and at&t assembly syntax. The friend of mine cottidianus always tells me: "Use at&t syntax only, there's no other way!" :)
Eventually, in Unix world (I don't care what's going on in the <censored> world) when gcc compiles a C source code it translates the C source to assembly source code, using at&t mnemonics. Disassemble somewhat C sources and look through the output :)

Here's the short lines of differences:
- the register names and at&t mnemonics are invariably lowercase, in Intel syntax the names of registers or instructions aren't case sensitivity;
- the registers names are always preceded by % symbol in at&t (%eax, %rax);
- the at&t instruction that has operands has a single-character suffix which indicates how large the operands are. The suffix b means 'byte', w — 'word', l — 'long' (32 bits) and q —'quad'(64 bits). So, the Intel's instruction MOV BX,AX looks like movw %ax,%bx in at&t;
- the source and destination operands are placed in the opposite order in the at&t from Intel syntax. If you would write MOV EBX,EAX in Intel syntax, you have to get accustomed to movl %eax,%ebx. In other words, in at&t syntax, the source operand comes first, followed by the destination;
- the immediate operands are always preceded by the dollar sign ($). Instead of PUSH 32 instruction in Intel syntax, you have to use pushl $32.
- the displacements in memory references are signed quantities placed outside parentheses containing the BASE, INDEX and SCALE values. For example:

Since there I will use at&t syntax in my future posts. Of course you can tell gdb or objdump programs to use Intel syntax, but as I said above you will face with a tones of assembly code in the internet preferably in at&t syntax. Have fun :)

More about at&t syntax you can find here: http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
Tags: , ,

Hello world under the hood

Zhivago | zloy: you are an idiot.
Zhivago | zloy: Assembly cannot be used to explain how C works.
© from #c channel on irc.freenode.net


I decided to learn C language by the following approach — disassemble the most interesting listings (from K&R of course :]) and investigate what is going on under the hood. I use emacs as editor, compiler, shell, debugger and... coffee machine :), GNU C compiler (gcc) and gdb+gud.el (emacs gdb ui) for debugging. BTW, I will appreciate it if you leave any remarks and criticisms in the comments.

Ok, let's begin.
Today I want examine the guts of a simple program from page 12 chapter 34 of K&R book which counts tabs, spaces, new lines and other symbols in the input and shows the statistics to user when interrupted by Ctrl-D. From my point of view it is one of the most interesting listing from chapter 1 from the book, not too hard and shows various things like defining variables, array and loops which want to analyze. I think it is good enough for my purpose. So, here is the source code:

I compile ths source with -ggdb3 flag to include debugging symbols into resulting binary (gcc -ggdb3 -o count count.c).
Let's see objdump output (I'm going to use intel syntax):

objdump -S -M intel -d count

Before I start investigating I have to say a few words about the stack, stack frames and related things.
The stack is a place in a memory where we can tuck away any number of double (32-bit) or quad (64-bit) words for the time being and access them by their addresses later. In fact it is a way to manage data in the memory. The amd64 Call Stack is upside down. It grows from top (the highest address) to bottom (the lowest address).

The stack frame is a chunk of allocated memory from the stack. Every time any function is called it cuts off enough memory for its arguments and locals from the stack. You see the function PUSHES its locals (Call Stack Frame) below the TOS. This region of memory is called a stack frame. I thought that function obtains its own frame in the memory in the past. But actually it doesn't obtain the memory. The stack's 8mbs are already allocated and belong to the program since the program was started. The stack frame is framed by the rsp (Stack Pointer) register at the bottom (which additionally is a TOS) and rbp which is a pointer to the highest address where stack frame ends. In other words before the function will be called we should to store the address in rsp to rbp (in assembly it looks like mov rbp, rsp) to return to the previous 'state' of the stack when function will end work. Therefore, if a function pushes more values onto the stack, it is effectively growing its frame.

The main() function from 'hello world' is called by __libc_start_main() which is called by __start() routine.

Now, take a look at the output of objdump program:

'push rbp' stores current address of rbp to the stack and than 'mov rbp,rsp' sets the address from rsp to rbp, it is the start of memory which the function can use. The third instruction 'sub rsp,0x40' (the stack grows down, remember?) offsets the stack pointer 64 bytes down. Therefore, these three instructions set up a stack frame for main() function.

In our C 'hello world' program the first thing we did is declared variables and assigned the nwhite and nother variables to 0 value. Here is the output from the objdump:

Some interesting juggling goes on here. The 'mov DWORD PTR [rbp-0xc],0x0' instruction sets the 0 value to 12 bytes offset address, we defined 'c', 'nwhite' and 'nother' as integer type — 4 bytes, so, 4 * 3 = 12. For variable 'i' compiler also allocates 4 bytes but a little bit later, look at for loop below.
Then 'nother's' variable value is set to 0 via eax (32-bit) register. In fact the record 'a = b = 0;' is compiled as 'b = 0; a = b;' that's why the value of nwhite (rbp-0xc) is stored in 32-bit register eax and then assigned to memory address rbp-0x8 of the stack frame.

To be sure that the every element of ndigit doesn't contain garbage from the memory we have to set each element to 0 value using 'for' loop. How compiler translated it to assembly you can see below:

The 'mov DWORD PTR [rbp-0x4],0x0' instruction at 0x400569 address initializes 'i' (rbp-0x4) = 0 in our loop. 0x400570 jumps the cpu to 0x400583 — it is 'cmp DWORD PTR [rbp-0x4],0x9' which verifies that value in rbp-0x4 doesn't exceed 9. jle at address 400587 means Jump if Less or Equal - in circumstance of the first itteration (because [rbp-0x4] = 0) it jumps to 'mov eax,DWORD PTR [rbp-0x4]' at 0x400572 where the value of 'i' is stored in eax register.
The 'cdqe' simple converts double word int eax register into quad word rax register with sign (bit 31).
The next instruction 'mov DWORD PTR [rbp+rax*4-0x40],0x0' contains a flavor of address displacement and I recommend to read it as:

where rbp-0x40 - the 'ndigit' array itself in the stack frame and (rax*4) is displacement inside 'ndigit' and every element of array has integer type (4 bytes). For example, if we want to set 5 to the third element of our array the displacement would be 3 * 4 and our 5 will be placed between the 12th and 16th bits. In the 'add DWORD PTR [rbp-0x4],0x1' we just increment 'i' counter by 'add DWORD PTR [rbp-0x4],0x1' and after it we compare the value of i with 9 again. If the value in i <= 9 we jump to the top of our 'for' loop. It repeats while i won't be > 9. If jle doesn't jump then the next unstruction is executed and from now on we are at the doorstep of the 'while' loop:

At the start of the 'while' loop (offset 0x4005d9) we immediately jump to the address 0x40061b offset 'jmp 40061b' where getchar() function is called. The 'man 3 getchar' describes "fgetc() reads the next character from stream and returns it as an unsigned char cast to an int, or EOF on end of file or error. getc() is quivalent to fgetc() except that it may be implemented as a macro which evaluates stream more than once. getchar() is equivalent to getc(stdin)." it returns the character read as an unsigned char cast to an int or EOF on end of file or error. The returned value is placed in eax register and then assigned to 'c' variable by 'mov DWORD PTR [rbp-0x10],eax' at 0x400620. Then this value is compared with -1 (Ctrl-D):

'cmp DWORD PTR [rbp-0x10],0xffffffff' and if 'с' isn't equal -1 we jump to 0x4005db address 'jne 4005db'. At this address we get into if condition:

where '0x4005db: cmp DWORD PTR [rbp-0x10],0x2f' with 'jle' look if the value in rbp-0x10 is less or equal 0x2f ('/' symbol before '0' in ASCII table) it jumps to the 0x4005ff offset then. The next pair 'cmp DWORD PTR [rbp-0x10],0x39' and 'jg 4005ff' verifies that the value in rbp-0x10 is not grater than 0x39 ('9' symbol in ASCII table) and if so, we jump to the address — 0x4005ff. The reason of these verifications is to determine if 'c' contains number. If so, the byte value in 'c' is stored in eax register (0x4005e7 offset). The point is that the 'c' variable contains the ASCII hex value of the digit (e.g. decimal 6 is 0x36 in hex) that's why the eax value is subtracted by 0x30 (0x4005ea offset) to obtain the 'real' value of the digit in 'c' for further manipulations. Then at address 0x4005ed offset we have to adjust the 32-bit value in eax to 64-bit value because we cannot use 32-bit register in address calculations. Take a look at the offset 0x4005f0, we fetch the value from the memory address (rbp-0x40) + (rdx*4) in eax, where rbp-0x40 - is the start of our 'ndigit' array and rdx*4 - is rdx's element offset which has 4 byte length and just increment it by '0x4005f4: add edx,0x1' instruction.
You should recall what does 'cdqe' do from the above discussion of 'for' loop, if you don't please refer to that section of the post. Eventually, the two instructions which I have to discuss in this chunk of code remain: the '4005f9: mov DWORD PTR [rbp+rax*4-0x40],edx' just stores the value from the edx register to the address displacement of the memory. As usual rbp-0x40 — location of our 'ndigit' array and rax*4 — the element of 'ndigit'. Finally, at the last 'jmp 40061b' instruction CPU jumps to the address displacement 0x40061b where the next byte is fetched by getchar() function from the sequence of input characters.

Next, at the 0x4005ff offset another if condition is located — 'else if (c == ' ' || c == '\n' || c == '\t')', where we increment 'nwhite' or 'nother' counter according to the occurrence of one or another character and of course it has a sequence of conditional jumps inside:

At this moment you should be familiar with cmp and various of conditional jumps, I hope :) The first instruction compares the value of the 'c' variable (rbp-0x10) with 0x20 (ASCII white space) and then jumps (je = Jump if Equal) to the 0x400611 address if it the received character is the white space. Further, we check if 'c' (rbp-0x10) contains 0xa (ASCII new line - \n) and jump to the same 0x400611 address in the case of new line there. The last compre/jump pair verifies the value in 'c' by rbp-0x10 address and if it isn't equal (jne - Jump if Not Equal) 0x9 (ASCII tabulation) we shift to the 0x400617 (there we get the next character from the sequence by calling getchar() function) address where the 'nother' variable is incremented: 'add DWORD PTR [rbp-0xc],0x1'. But if the tab code is present in 'c' variable the CPU jumps to the 0x400611 address where our 'nwhite' variable is incremented by 'add DWORD PTR [rbp-0x8],0x1' instruction. That's all, at this point the 'while' loop ends up.

The other last part of the code prints accumulated data which has been collected in the above code.

First of all we call printf() function to display "digits =" string to the output. The code translated into assembly looks interesting from my point of view and needs some explanation:

In the first instruction we store the address of "digits =" string to eax register. As you might know eax is 1/2 of rax register and we store the whole value in rax into rdi register. The 'mov eax,0x0' is a mark that we have no floating numbers before calling printf(). The last instruction speaks for itself and I hope there are no reasons to explain it in detail :)
Now I'd like to show another example with printf() function. Presume we have a simple C code that prints out a decimal number 23.

By the AMD64 SystemV Calling Convention the first two arguments are put in the rdi and rsi registers (so the instruction at offset address 0x4004f0 loads a pointer to the "%d\n" to rdi, and the 0x4004ee — the value of i to rsi), if you look at the memory address 0x4005b4 you can see that the 4 bytes 0x000a6425 are stored there and in my example we have to read this in little endianess notation, so according to ASCII table 0x25 - "%", 0x64 - "d", 0x0a - new line "\n" and 0x00 - is the null-char.

OK, let's return to our example from K&R book.
To print out each element of the 'digit' array we are using 'for' loop. The C code expands to assembly like:

I've already explained the 'for' loop and calling printf() function above, therefore, I won't describe every instruction in detail.
Briefly, we set 0x0 value to 'i' variable then jump to 0x400665 address where 'i' is compared with 0x9 value. Then we jump back to the 0x400644 address, store the value of 'i' in eax register, convert eax to rax (because only 64-bit register can be used in address displacement calculation in 64-bit system) with sign, store the value of the first element of 'digit' array to edx and set up registers before call printf() function, call printf(), increment 'i' and repeat everything again and again until `i' exceeds 0x8.

If you recall the rbp-0xc and rbp-0x8 is the 'nwhite' and 'nother' variables accordingly. In the last fragment of assembly we simply set up more registers (then previous explanation of the printf example code) before calling the printf() function because it has more parameters and of course call it.

The last two instructions which I haven't mentioned yet are 'leave' and 'ret'. At the beginning of this post I explained what is the stack frame and how the compiler sets up registers to allocate a region from the memory for our main() function. Linux kernel treats program as function. The leave instruction is short for 'mov rsp, rbp pop rbp'. In other words we return the rbp/rsb to what they were before we entered main(). The ret instruction pops the return address off of the stack and jumps to it. That's all.

I want to thank cottidianus for his help and patience while I wrote this post :)
The sources which I used to create this post:
Intel x86 Function-call Conventions - Assembly View
All About EBP
Intel® 64 and IA-32 Architectures Software Developer's Manual
Combined Volumes 3A, 3B, and 3C: System Programming Guide, Parts 1 and 2
  • Current Mood: curious
Tags: , ,