.

Need help understanding packet sniffing code

<<

bahr

User avatar

Newbie
Newbie

Posts: 14

Joined: Tue Aug 16, 2011 2:30 am

Location: Denmark

Post Fri May 23, 2014 7:04 am

Need help understanding packet sniffing code

Hi everyone.

I'm learning the python language and I'm trying to implement a packet sniffer based on the article found here:

http://www.binarytides.com/python-packe ... ode-linux/

I've got it working, but I have some questions regarding the example that I really don't understand and haven't been able to find a satisfying answer for this on the net. Hence I would be very thankful if one could give me a more detailed explanation.

My questions are specifically pointed regarding these two parts:

IHL field of IPv4 Header

The example does this to unpack the ihl header field:

  Code:
    iph = unpack('!BBHHHBBH4s4s' , ip_header)
     
    version_ihl = iph[0]
    version = version_ihl >> 4
    ihl = version_ihl & 0xF
     
    iph_length = ihl * 4


As I am able to understand, the following line takes the binary representation of the version_ihl variable and shifts it 4 bits to the right, corresponding to dividing version_ihl with by 2^4, is that correct? And if so, why are we doing this?

  Code:
    version = version_ihl >> 4


and here as I understand it, we take 0xF (15 in hex) and do a logical AND of the binary representation of 15 with the binary representation of what is in versino_ihl. I don't understand this part at all really :-[
  Code:
    ihl = version_ihl & 0xF


TCP Header Unpacking

So the example unpacks the TCP header by the following format string:

  Code:
 tcph = unpack('!HHLLBBHHH' , tcp_header)


and then when referring to the tcp header documentation on wikipedia I see that the data offset header field is 4 bit in size, so according to my understanding that must correspond to the first 'B' in the "!HHLLBBHHH" format string.

So what I get out of this is that:

- We unpack source and destination port into an unsigned short type in python, corresponding to the first 2 HH
- We unpack sequence number into a an unsigned long type in python, and the same for acknowledgement number
- Then we come to the 4 bit Data Offset header, which seems to be unpacked into a 1 byte sized integer python type by using the B format specifier, that corresponds to a unsigned char C type?

This is where I get lost again completely :-[ According to the TCP header documentation, the data offset is only 4 bits in size. Why and how do I know I should unpack this into a python integer type that is 1 byte of size? Also don't I then waste 4 bits of unused space, as the header field is only 4 bits in size, but I'm storing it in a datatype that is 1 byte of size? And why exactly do we use the B format string which is corresponding to a unsigned char C type, to unpack the data offset field? How do I know that I should use this format string instead of lets say the c or b format string specifiers?

Then this is followed later by the following piece of code:

  Code:
  doff_reserved = tcph[4]
  tcph_length = doff_reserved >> 4


Why do we again do a bitwise shift 4 times to the right here? What is the purpose of this?

I hope my questions made sense, and I'm sorry if they are too stupid. I just feel like I'm missing something obvious here, and can't seem to figure out what it is, so I would appreciate any inputs on this :)

Thanks in advance!
<<

dynamik

Recruiters
Recruiters

Posts: 1134

Joined: Sun Nov 09, 2008 11:00 am

Location: Mile High City

Post Wed May 28, 2014 8:59 pm

Re: Need help understanding packet sniffing code

While it's cool that you're digging into the bytes, you'll probably find Scapy to be much more useful for what you're trying to do: http://www.secdev.org/projects/scapy/

bahr wrote:IHL field of IPv4 Header
  Code:
    version = version_ihl >> 4

As I am able to understand, the following line takes the binary representation of the version_ihl variable and shifts it 4 bits to the right, corresponding to dividing version_ihl with by 2^4, is that correct? And if so, why are we doing this?

Not really. Bit shifting left/right will multiple/divide by two for integers (and this can get tricky if you're dealing with signed integers).

However, you're not dealing with an integer value, but rather two nibbles (four bits), which make up the first byte of the IP header. The first nibble is the IP Version, and the next is the IP Header Length.

69 is a common integer value for this field, so we'll start with that (I'm using the interactive Python shell, and format() is simply used to show eight binary digits that are zero-padded):

  Code:
>>> x=69
>>> format(x, '08b')
'01000101'
>>> y=x>>4
>>> format(y, '08b')
'00000100'

So what this effectively does is move the IP Version value (0100) into the IP Header Length field and discards the IP Header Length value (0101). The bits that are shifted in (from the left) are zeros, which makes the new value 00000100. Also, this value is assigned to the variable, so the original value isn't actually modified.

You can see we still have the original value, which is the first byte we pulled from the header, and the new variable has a value of four, which is indeed the IP Version of the packet:

  Code:
>>> x
69
>>> y
4

bahr wrote:and here as I understand it, we take 0xF (15 in hex) and do a logical AND of the binary representation of 15 with the binary representation of what is in versino_ihl. I don't understand this part at all really :-[
  Code:
    ihl = version_ihl & 0xF


To get the header length, the first nibble must be converted to zeros. This can be done by ANDing the original byte with 00001111 (0xf). The following code section shows the original byte's value and the binary representation of 0xf.

  Code:
>>> format(x, '08b')
'01000101'
>>> format(0xf, '08b')
'00001111'

As expected, ANDing these two values removes the value of the IP Version and gives us the value of the IP Header Length (5).
  Code:
>>> z = x&0xf
>>> format(z, '08b')
'00000101'
>>> z
5

The value of the IP Header Length represents the number of words (32 bits/4 bytes) that make up the IP Header, so multiplying this number by four will give us the actual number of bytes (5*4=20, which is the standard size if options are not specified).
bahr wrote:TCP Header Unpacking

So the example unpacks the TCP header by the following format string:

  Code:
 tcph = unpack('!HHLLBBHHH' , tcp_header)


and then when referring to the tcp header documentation on wikipedia I see that the data offset header field is 4 bit in size, so according to my understanding that must correspond to the first 'B' in the "!HHLLBBHHH" format string.

So what I get out of this is that:

- We unpack source and destination port into an unsigned short type in python, corresponding to the first 2 HH

Correct
bahr wrote:- We unpack sequence number into a an unsigned long type in python, and the same for acknowledgement number

Correct
bahr wrote:- Then we come to the 4 bit Data Offset header, which seems to be unpacked into a 1 byte sized integer python type by using the B format specifier, that corresponds to a unsigned char C type?

Not exactly. There's eight bits in a byte, so again, you're getting the first nibble, which is the Offset, along with the next nibble, which is Reserved. Also, chars are the same as one-byte integers: https://docs.python.org/2/library/struc ... characters
bahr wrote:This is where I get lost again completely :-[ According to the TCP header documentation, the data offset is only 4 bits in size. Why and how do I know I should unpack this into a python integer type that is 1 byte of size?

It's very difficult to work with memory that isn't a multiple of one-byte. In Python, one byte is the smallest value you can unpack.
bahr wrote:Also don't I then waste 4 bits of unused space, as the header field is only 4 bits in size, but I'm storing it in a datatype that is 1 byte of size?

The waste is negligible, and as I said earlier, you can't really work with values that aren't multiples of one-byte.
bahr wrote:And why exactly do we use the B format string which is corresponding to a unsigned char C type, to unpack the data offset field? How do I know that I should use this format string instead of lets say the c or b format string specifiers?

Because you want a positive integer. The c format is a one-byte string in python, not an integer, and b will give you values -128 to 127, and you're never going to have a negative Offset.

b would hypothetically work as long as you never have a value greater than 127. However, remember that the value you want is only the first nibble. Therefore, the value may only be 15 (1111), but as the first nibble, this was make the value of the byte you've retrieved 11110000 (assuming the Reserved bits are all zeros). This value is 240, which is obviously greater than 127, and it would cause your code to return invalid values, as shown below:

  Code:
>>> 0b11110000
240
>>> hex(0b11110000)
'0xf0'
>>> struct.unpack('b', '\xf0')
(-16,)
>>> struct.unpack('B', '\xf0')
(240,)
>>> struct.unpack('c', '\xf0')
('\xf0',)

bahr wrote:Then this is followed later by the following piece of code:

  Code:
  doff_reserved = tcph[4]
  tcph_length = doff_reserved >> 4


Why do we again do a bitwise shift 4 times to the right here? What is the purpose of this?


This is the same thing we saw earlier. I'll use 80, which is a common value for this field:
  Code:
>>> x = 80
>>> format(x, '08b')
'01010000'
>>> y = x>>4
>>> format(y, '08b')
'00000101'
>>> y
5

After the shift, we get 5, which is the actual value (and much more realistic than 80)
The day you stop learning is the day you start becoming obsolete.
<<

bahr

User avatar

Newbie
Newbie

Posts: 14

Joined: Tue Aug 16, 2011 2:30 am

Location: Denmark

Post Thu May 29, 2014 9:40 am

Re: Need help understanding packet sniffing code

Thank you so much!

Those were excellent explanations, and now I really understand what I'm doing and why.

This feels like a serious relieve. I've been wondering about this for weeks.

You are the man 8)

Return to Programming

Who is online

Users browsing this forum: No registered users and 1 guest

cron
.
Powered by phpBB® Forum Software © phpBB Group.
Designed by ST Software