Intro to C

| June 11, 2007

c.jpg

Editor’s Note: We’re proud to be able to bring you the first article in this great, new column from Craig Heffner. This column is aimed squarely at those in the InfoSec field who are tired of hearing that you truly can’t be a security professional without knowing how to code.

Why even learn to program at all?

Not everyone will have a need to learn programming. I’m sure there are many people who are quite accomplished in the field of computer security and have never written a program. Personally, I constantly find myself modifying programs to add or change their functionality, or just writing my own. And needless to say, if you are going to be doing any type of exploit discovery, you will need some programming knowledge.

Without raising the "to code or not to code" argument, here is the way I look at it: hacking is about controlling a computer and making it do what you want – often when it is not designed to do so. A computer by itself is nothing but a bunch of silicon, wires, and metal. Software controls the computer, and, if you can control software, well…there ya go. :)

Active Image
Active Image del.icio.us

Discuss in Forums {mos_smf_discuss:Heffner}


Introduction

In this tutorial we will cover the basics of C programming, a little history, and create a semi-useful program.  I have chosen to begin with C because:

1) It is one of (if not the) most popular programming languages.

2) Unix and Linux are written primarily in C.

3) Its abstraction from assembly code makes it much easier to learn than ASM.

4) Despite the above, it still provides sufficient low-level access to the computer.

All code examples here will be intended for use on *nix systems, but they will work on Windows machines as well.

All About C (sort of)

The C programming language was developed at Bell labs in the early 70s by Brian Kernighan and Dennis Ritchie (Richie also co-designed the Unix operating system). It was based on the B programming language which was originally created by Ken Thompson while building Unix. B was in turn based on the earlier BCPL language.

C is a compiled language, that is, you write the human-readable code to a file, and a C compiler translates that code into binary instructions that the computer understands. These instructions are saved to an executable file which when run executes those binary instructions.

C is also a portable language in that the code you write can be compiled and run on any operating system for which a C compiler has been written. In most applications these days this doesn’t hold true due to OS specific function calls and APIs, but the language itself is portable. The code presented in this paper for example compiles perfectly on both Linux and Windows machines.

What Do I Need?

All that is needed is a compiler and a text editor. That’s it. If you are running *nix/Mac/BSD, you’ll already have these. If you are running Windows, you will need to get a C compiler. I suggest Dev-C++ as it is free and easy to use.

Basic Program Structure

OK, let’s jump right into programming shall we? Here is a very basic program written in C that prints the name of the program:

#include <stdio.h>
int main(int argc, char *argv[]){
          printf("The name of this program is: %sn",argv[0]);
return 0;
}

Let’s take a look at each line of the program and explain what is going on.

The #include <stdio.h> line tells the compiler to include the standard Input/Output library file with the program. Library files contain commonly used functions that make life much easier. In the case of our program, the printf() function is declared in stdio.h. Note that stdio.h contains common functions to read and write to stdout/stdin (standard output and input, i.e., the terminal screen and the keyboard); these functions are so common that stdio.h is included automatically by the compiler even if it is not explicitly declared.

The line int main(){} defines the main function. Every C program must have a main function; this is the first function that is executed when the program is run. All code contained within the main function’s brackets ({}) is part of the main function. The int at the beginning of the function declaration indicates that the main function returns an integer value (more on this later).

The main function can accept two arguments, argc and argv. Note that function arguments are always separated by commas. The argc argument is declared as an integer variable and contains the number of arguments passed to the program, including the program name itself. So, when run without any arguments, argc will equal 1. The argv argument is a character array containing pointers to the arguments passed to the program from the command line. Like argc, argv’s contents include the name of the program, thus the first element of the argv array (argv[0]) always contains the name of the program. We will learn more about arrays and pointers later.

The printf("The name of this program is: %sn", argv[0]); line is a function call to printf() which is defined in stdio.h. The printf() function is used to print data to stdio. In this case, printf() takes two arguments: the string to be printed and the variable argv[0]. Recall that argv[0] contains the name of the program; the %s in the string passed to prinf() is a placeholder that tells printf() to insert the contents of a string variable in that position of the sentence. We tell printf() which variable to use (argv[0]) by passing it as the second argument. The n at the end of the string is the symbol for a carriage return.

To exit the function and specify a return value, we use return 0. This causes the main function to return the numerical value of 0 to the code that called it (recall that main() was defined as returning an integer value).

Something else to notice in this example is the use of the semicolon. Nearly every line of a C program must end with a semicolon with the primary exceptions being the beginning of functions, conditional statements, and loops (these lines all end with brackets instead).

If we save the above code as test.c and compile it specifying an out file name of ‘test‘, we can run the program and see the fruits of our labor:

$gcc test.c -o test
$./test
The name of this program is: ./test

Variables

Any program that does anything worthwhile needs to be able to define variables. Variables are simply representations of memory addresses where we can read/write/store data. So, instead of having to remember that we stored the number 25 at memory address 0×00401234, we can reference that memory address by using an easy to remember variable name such as ‘num‘ or ‘i‘. The two basic types of variables are those that store characters and those that store integers.

Integer variables can be declared by preceding the variable name with ‘int’. The following defines an integer variable called i and sets it equal to 45:

int i = 45;

Two other types of numerical variables are short and float. An int variable is 4 bytes long, in other words, it can store numbers between 0 and 4,294,967,295. If you want to declare a smaller integer variable, you can use short which is 2 bytes long (0 – 65535). Float variables contain floating point numbers (5.6, 2.3, etc).

Similarly, character variables are defined with ‘char’:

char a = "a";

A string variable is really an array of characters. If you store the word "dog" into a variable called animal, then element 0 of the animal variable will be "d", element 1 will be "o" and element 2 will be "g":

animal[0] = d
animal[1] = o
animal[2] = g
animal[3] = 0×00

Note that the character array ends with a null byte. All ASCII strings must end with a null byte, so this null character is automatically appended to our string causing it to be a total of 4 bytes long. This must be taken into account when defining fixed-length character arrays.

There are several ways to define character arrays:

char animal[4] = "dog";

This defines a character array with four elements and sets its value equal to "dog". Note that we gave it four elements to account for the extra null byte at the end.

char animal[] = "dog";

This does the same as the above, but the number of elements in the animal character array is automatically set.

char *animal = "dog";

This creates a pointer to the string "dog". A pointer is a special variable that stores the memory address where data is located. In this example, the memory address that the animal variable represents now holds another memory address where the string "dog" is actually located. In other words, the animal variable is pointing to another memory location that contains the word "dog".

Exploring I/O

Now that we can define variables, let’s take a closer look at the printf() function and its close cousin, scanf(). We saw earlier that printf() can print out pre-defined strings and allows us to specify placeholders within the string for each variable we want to print. Example:

int i = 12;
printf("The value of i is: %d",i);

This would print "The value of i is: 12". The %d is a placeholder that indicates a decimal value should be inserted at that location in the printed string. Note that printf() is not intelligent, and it replaces placeholders with variables in the order they are given. For example:

int i = 12;
char a = "a";
printf("The value of i is: %d, and the value of a is: %c",i,a);

Here we tell printf() that we have two variables we want printed: one is an integer, and the other is a character. This would print "The value of i is: 12, and the value of a is: a". However, if we inadvertently switched the i and a variables when passing them to the printf() function then printf() would try to print the character variable as an integer and the integer variable as a character:

printf("The value of i is:%d, and the value of a is: %c",a,i);

Note that it is possible to call the printf() function like this:

char animal[] = "dog";
printf(animal);

This will properly print the word "dog". However, if animal is a user-controlled variable, i.e., some value is taken from the user and stored in the animal variable, then the program would be vulnerable to a format string attack. For this reason, printf() should always be used properly:

char animal[] = "dog";
printf("%s",animal);

The scanf() function works exactly like printf() in reverse: it reads data from stdin (aka, the keyboard) into a variable. So if we asked a user to input their age and wanted to read it into a variable, we could write the following code:

int i;
scanf("%d", &i);

We tell scanf() to read data from the keyboard and store it as a decimal value into i. Note the ampersand when specifying the destination variable. The ampersand is called the address operator, so this is telling scanf() to save the data in the memory address of variable i.

Note that you should be very careful when using scanf() as it does not perform bounds checking when reading data into a buffer (i.e., buffer overflow!).

Conditional Statements

Conditional statements are just that: conditional. They evaluate an expression, and if the expression is true, then a specified action is performed. If not, then the action is not performed, or an alternate course of action is taken. The most common conditional statement is the if..then..else statement:

if(i == 12){
     i++;
} else if(i > 13){
     i–;
} else {
     i = 0;
}

The first line, if(i == 12), checks to see if the variable i is equal to 12. Note that when comparing values to see if they are equal, the double equal sign (==) is used. If i is equal to 12, then i is incremented by one (the ‘++’ operator increases the preceding variable by one; ‘–’ decreases it by one), and the remainder of the conditional statement is skipped.

If i does not equal 12, then the else if(i > 13) statement checks to see if i is greater than 13. If so, then i is decremented by one and the remainder of the conditional statement is skipped.

If none of the previous conditions were met, then i is set equal to 0. This is called a default condition as its only condition is that none of the other conditions have been met.

If you are going to be checking a single variable against many different values, the if..then..else statements quickly become unwieldy to type over and over again. This situation is ideal for a switch..case statement however:

switch (i) {
     case 12:
             i++;
             break;
     case 14:
             i–;
             break;
     default:
             i = 0;
}

The switch (i) defines the variable to be used in each case.

Each case defines a value to test against the value of i; for example, case 12: tests if i is equal to 12.

The break; is necessary to break out of the entire case..switch statement. Note also that the default condition is defined with the line default:.

Here is a list of valid conditional operators:

<               smaller than
<=             smaller than or equal to
==             equal to
!=              not equal to
>=             greater than or equal to
>               greater than

And here are valid boolean operators:

&&             and
||              or
!               not

The boolean operators can be used to specify multiple conditions in one conditional statement. For example, if we wanted to test if i was greater than 0 and less than 5:

if(i>0 && i<5)

If we wanted to test if i was not equal to 0 or was greater than 100:

if(i != 0 || i > 100)

Loops

Computers are great for repetitive tasks, and that is exactly what a loop allows you to do: perform a task or tasks repeatedly until a condition is met. The while loop is the simplest of loops:

int a = 0;
while(a < 10){
     printf("This is line #%dn",a);
     a++;
}

This code says: while a is less than 10, execute all the code within the brackets; once a is greater than or equal to 10, exit the loop. Note that we had to specifically increase a by one each time the loop executes. If we had not, then a would have remained equal to 0 and the loop would have repeated infinitely.

The for loop is another common type of loop:

for(int i=100; i > 10; i–){
     printf("This is line %dn",i);
}

For loops allow us to define three things right off the bat:

1) int i=100;

What variable we will use, and what its initial value will be. If i has been previously declared, then we don’t need the int. In fact, if you try to re-declare a variable, the compiler will throw up an error and refuse to compile the program.

2) i > 10;

The condition that must be met in order to continue the loop. In this case, we loop as long as i is greater than 10.

3) i–

An action to perform on the variable each time the loop iterates. Here, we are decreasing the value of i by one each time.

Arrays

We already introduced arrays when we were explaining character variables. An array is an organized collection of data where each piece of data is part of a larger group and can be accessed independently of other pieces in the group. For example, in a character array (EX: char name[] = "jim";), you can access the entire array by specifying the name of the variable(name), or you can access individual characters of the array by specifying their position in the character array(name[2]). However, arrays aren’t always made up of characters. Often you will have a need for an array of integers or even pointers.

One special array that is used in almost every C program is argv which has been briefly introduced. The argv array is a character array of pointers that is used in the main() function to accept command line arguments passed to a program at run time including the name of the program. For example, when you run the following command:

$./some_app arg1 arg2 arg3

Then the argv array of some_app’s main function contains:

argv[0] -> "./some_app"
argv[1] -> "arg1"
argv[2] -> "arg2"
argv[3] -> "arg3"

Each element of the argv array contains the memory address (aka, it points to that address) that holds one of the arguments passed to the program.

Functions

We’ve already seen the main() and printf() functions in the above examples, but many programs contain other programmer-defined functions (aka, sub routines) in them as well. Functions can take arguments and return data, but they don’t have to accept arguments and don’t have to return anything. Take a look at the following example:

int check_num(int c) {

if(c == 1){
     return 1;
} else {
     return 0;
}

}

void main(int argc, char *argv[]) {

int a = 1;
int b;

b=check_num(a);

if(b == a){
     printf("Congratulations! ‘a’ = %d!n",a);
}

}

Here we have the main() and printf() functions again, but also a check_num() function which has been created by the programmer. Sub-routines must be defined before the main() function.

The main() function is declared as a void function, in other words, it does not return a value.

If you look at the check_num() function, you see that it accepts one argument, an integer value, which is stored inside a variable named c. It is also declared as an int function meaning that it returns an integer value. When check_num() is called from inside the main function, variable a is passed to it. If the number passed to check_num() is 1, then it returns 1, else, it returns 0. Back in main(), the return value is stored in b. If a equals b, we get a "congratulations" message.

Global vs Local

In relation to a function, a variable can either be local or global. Thus far we have used only local variables, i.e., variables that have been defined inside of a function. Local variables only exist inside the function in which they were declared; if you wish for another function to process the data stored in a local variable, then you must pass that variable’s data between functions as an argument; this was done in the preceding example where main() passed the a variable to check_num(). Additionally, different functions can have local variables with the same name. For instance, main() could have a variable named animal, while check_num() could have its own variable named animal. These two variables are in no way related. One final note about local variables is that once the function in which they are defined returns, then those variables are gone. If the function is called again then the variable will be re-created, but it will not retain any of its previous data. If you wish to preserve the data in a local variable beyond the life of its function, then you must pass that data to another function (either as an argument or as a return value), or store it in a global variable.

Global variables are global to the entire program, where any function can access and modify them. They are defined outside of any function, and no local variables can have the same name as a global variable. Example:

#include <stdio.h>
int global_var;

int print_var(){
     printf("The global_var is: %dn",global_var);
     return 0;
}

int main() {
     global_var = 1;
     print_var();
     return 0;
}

The above program will print "The global_var is: 1".

Structures

A structure can be thought of as a very large array. It is basically an array that is comprised of multiple variables. You can mix different types of variables inside a structure:

typedef struct {
     char a;
     int b;
     char buffer[256];
} mystructure;

Here we define a structure type (typedef struct), and name it mystructure. Inside the structure, we define three variables: one character, one integer, and one character array; these are each members of the structure. We can now define structures that will contain these members:

mystructure struct1;

Now, let’s assign values to each of the structure’s members:

struct1.a = "a";
struct1.b = 4;
strncpy(struct1.buffer,"this is a string!",sizeof(struct1.buffer));

Like arrays, structures can be referenced as a whole, or each member can be referenced individually. Because they can hold different data types, structures are useful for creating contiguous segments of data, such as for customized packets.

Useful C Functions

In the last example, we saw two new functions introduced, strncpy() and sizeof(). The strncpy() function takes three arguments: a destination string, a source string, and the maximum number of bytes to copy. The strncpy() function always decrements the maximum byte number by one to ensure that it does not overflow any buffers. The following call to strncpy() will copy 199 bytes from buff1 into buff2:

char buff1[] = "some long string";
char buff2[256];
strncpy(buff2,buff1,200);

The sizeof() function returns the size in bytes of the argument passed to it. For example, this will set buff_size equal to 200:

char buff[200];
int buff_size = sizeof(buff);

Here are some other useful functions:

  • FILE *fp = fopen("test.txt","rw"); – This opens the file test.txt for reading and writing. It returns a file pointer which is used by subsequent functions to read and write to the file.
  • fgets(buffer_pointer,256, fp); – This reads 255 bytes from the file pointed to by the fp file pointer into the buffer_pointer variable.
  • fscanf(fp,buffer_pointer); – This reads the contents of the file pointed to by the fp file pointer into buffer_pointer.
  • fprintf(fp,buffer_pointer); - This prints the contents of buffer_pointer into the file pointed to by the fp file pointer.
  • fclose(fp); – This closes the file pointed to by the fp file pointer.
  • memcpy(buff1,buff2,256); – This copies the memory contents of buff2 into buff1. A maximum of 255 bytes will be copied.
  • strncmp(buff1,buff2,25); – Compares the first 25 bytes of buff2 to buff1. If they are equal, it returns 0.

A Real Program

The first four bytes of Linux executable binaries contain the value 0x7F (hex value for delete) followed by the string "ELF". This program takes a file name as an argument, reads the first four bytes of the file and determines if the file is an executable by comparing the last three bytes to the string "ELF":

#include <stdio.h>

int main(int argc, char *argv[])
{
     /*Declare our variables. Data will be read into buff.
     Because we want to ignore the first byte that is read, we set buff_ptr to point one byte
     beyond the beginning of the buff character array.*/
     char buff[5];
     char *buff_ptr = buff+1;

     //Check usage
     if(argc != 2){
          printf("nUsage:nt%s <file name>nn", argv[0]);
          return 0;
     }

     //Open the specified file
     FILE *fp = fopen(argv[1],"r");

     //Check to make sure that fopen was successful
     if(!fp){
          printf("nUnable to open %s. Ensure that it exists and we have permissions.nn",argv[1]);
          return 0;
     }

     //Read the first four bytes. If fgets fails, print a message.
     if(!fgets(buff,sizeof(buff),fp)){
          printf("nUnable to read from %s!nn",argv[1]);
          return 0;
     }

     //Compare the strings. strncmp returns 0 if the strings are equal.
     if(!strncmp(buff_ptr,"ELF",3)){
          printf("n%s is an executable!nn",argv[1]);
     } else {
          printf("n%s is NOT an executable!nn",argv[1]);
     }

     //Exit!
     return 0;

}

And now to take it for a test run:

$gcc isexe.c -o isexe
$./isexe isexe

isexe is an executable!

$./isexe test.txt

test.txt is NOT an executable!

$

Conclusion

Hopefully this has gotten you up and running with C. While not as simple to use as some of the more popular scripting languages (Perl, Python, Ruby, etc), when you absolutely-gotta-get-some-low-level-stuff-done, you just can’t beat it. Even if you don’t program often in C, understanding C programming will serve you well during reverse engineering and program exploitation. This paper is of course just the tip of the iceberg. For more on C, check out the following references/suggested reading.

References/Suggested Reading

Bell Labs Computing Sciences Research Center

How C Programming Works

The C Library Reference Guide

The C Programming Language

UNIX Network Programming Vol.1

TCP/IP Illustrated Vol.2

Category: Heffner

Comments are closed.