After a long love affair with Ruby, I was excited to get back into more Python in the new year. One of my main goals was to build additional skills with Python, and continue to build up skills in defense and response. When “Python Forensics: A workbench for inventing and sharing digital forensic technology“ by Chet Hosmer came out, I was excited about all of the possibilities. There are a number of books about using Python for attacking, but a strong book on building forensics tools is a nice change of pace.
Python Forensics target audience is “anyone who has a desire to learn how to leverage the Python language to forensic and digital investigation problems.” Hosmer hits the target audience well by both having introductory sections that go over some Python basics as well as a number of cookbook-style chapters that have programs to perform a number of different forensic functions. Let’s take a closer look at this Syngress Publishing title.
Detailed Look at Python Forensics
Chapter 1 starts off with why the reader should use Python for forensics. This is a bit of an interesting choice, as the reader has already purchased the book so you’ve already bought into using Python. One piece of this chapter that I found to be very valuable was the section about the Daubert evidence standard, the rules of evidence required in court. This is the highlight of the first chapter and provides important information for forensics examiners that are thinking about using open source or self-developed tools for gathering evidence.
Before any code can be written, we need a development environment. Chapter 2 focuses on how to set that environment up. This chapter also starts a brief introduction to some of the capabilities of Python including some of the third party modules that can be used and what additional functionality that they provide. In addition, the author presents a brief conversation about what IDE is best for use with Python, although there is little information about popular forensics distros that already have the files and environment needed. This is also heavily Windows focused, but there are brief mentions of how to get Python on everything from Linux to mobile devices.
Now that the environment is setup, it’s time for building an application. Chapter 3 walks us through building a basic tool that will hash filesystem objects in order to create hashes of files. This is a common task during a forensics investigation, so it’s a good first choice for an application. During Chapter 3, we are introduced to some of the more formal design considerations for these applications with tables laying out why certain choices are being made and what the requirements are for the application.
Chapter 3 also lays out how application logic can be split up, core functionality of a number of different modules, and has a good code walk-through for the program. The walk-through covers each chunk of code and explains in words what it does. We also see how to use the script and some sample output. Finally, we are also given the full code for the application, so that we can copy and paste it out of the book.
One of the things we frequently have to do in forensics is find files that relate to certain topics. This can be anything from finding a web shell to finding emails regarding something illegal. Chapter 4 presents a program that will search through files and then search for keywords. This chapter includes a lot of explanation and some more outlines of why certain design and code choices were made. There are a number of different code techniques in here that are interesting, but I feel like the example is wasted. Something along the lines of grep is going to produce similar results, and, because it’s not interpreted, it’s going to do it faster. The script presents a few things that may be considered improvements over grep, but not enough to convince me to use it on a case.
We frequently deal with images in forensics. Whether it’s dealing with where an image was taken, who took it, or with what it was taken, EXIF data is frequently useful. Chapter 5 uses the PIL library to help parse EXIF data and gather image data. This is one of my favorite chapters of the book. There is good code information in here about a variety of data structures as well as a tool that is truly useful at the end. This chapter includes design considerations as well as functional requirements. In the end, it shows you how to take geolocation information from images and apply it to a Google map which is very handy in explaining evidence or tracking large amounts of information.
Chapter 6 is where the book moves from being a language learning book to a cookbook. This chapter covers how to leverage time in Python and specifically NTP to deal with time. I have not found this to be a huge issue in forensics, as the boxes that I have used have had NTP built in, and other systems that I have used to analyze data have had NTP libraries built in. Doing this in Python doesn’t seem like a big win and there wasn’t a lot of additional educational content. This is the first chapter that doesn’t leverage the functional and design criteria that scripts were written with, so you lose a little bit of the thoughts behind why certain actions are taken.
In Chapter 7, we look at Natural Language Processing. This seemed like a neat idea, so I was excited about this chapter. Along the way I realized that I didn’t care about this chapter as much as I had hoped. I think the reason was that much of this chapter might be useful for e-discovery, but for filesystem forensics, there isn’t as much application. The chapter focuses on how to process the language in documents to find words that are related to other words and words that are likely found in groupings.
While this is somewhat interesting from a “languages are neat” perspective, I didn’t really see how this related to forensics. The biggest relation that I can see is something along the lines of mobile forensics, but that wasn’t the example that was used. Instead, the OJ Simpson trial was used as the example, leaving it feeling a little flat.
Python is something that I have used a lot for network tasks. Chapters 8 and 9 cover using Python for network forensics. The only problem is, there wasn’t a lot of forensics happening in these chapters. There is coverage for creating basic clients and servers, some basic GUI creation, a port scanner, and a packet sniffer. There are lots of tools out there that do these things very well, and they aren’t really used as part of these chapters. For instance, scapy has been around for a while and is great at packet parsing, but it was only mentioned in the final chapter in the book as a side bullet.
The GUI information was a good learning experience, but the rest of the networking tasks don’t really help add to anything in the toolkit, so I felt like this was a huge letdown. Some great uses for this chapter could be anything from filtering PCAP files that have been captured, to parsing custom data, or something else, but alas, we’re left with some tools that have some additional functionality we can learn, but aren’t great things to add to our toolkit.
Chapter 10 brought back some sanity and dealt with multiprocessing. This is great for a number of tasks, and it even covered using multiple threads for processing hashes on the filesystem. There are a number of different threading models covered as well as where they are best used. This is one of the better and more educational sections of the book for those who have had some Python experience.
In Chapter 11, the author lost me again. This chapter focuses on using clouds to optimize tasks. The task that the author chose was creating rainbow tables in the cloud for hashes with Python. This is kind of nuts as there are great tools for doing this with optimized actual rainbow tables, and the tables that the author shows us how to create aren’t truly rainbow tables. None the less, the introduction to the concept of Python clouds is interesting, and some good examples of providers are here. That was my big take away from this chapter.
The final chapter, Chapter 12, is titled “Looking Ahead”. This chapter doesn’t as much cover where things are headed as much as it tries to give the reader some ideas for additional things to work on as well as some tips for things to think about when creating new tools. This was a little thought provoking, although none of the ideas were particularly groundbreaking. It was a good summary of things that could help make forensics as a practice better.
My excitement for the book started high, and got lower as I went along in the book. For a person looking to get into Python, the first 5 chapters are a good start with good information. For people who are hoping to get into forensics, this book doesn’t deliver as much, but when leveraged with some other forensic materials, could likely help someone get the basis they need to build other tools.
Many of the cookbook formulas missed their target in my opinion, as I think some other simple techniques could be used and directly applicable for the new forensicator. There are a handful of examples that are worth it, and there are a few places with interesting code that I may leverage in other tools.
Overall, I’m glad I read Python Forensics, but I wish that there had been better examples picked. I feel like this isn’t really a forensics book, but instead a Python primer that has a few examples that relate to forensics. The networking and hashing sections will be useful to a broad audience for learning about network communications and hashing, but won’t likely directly map into a forensics toolkit. If you are someone who is interested in getting into Python and want some quality examples and explanations, this is your book. If you want to learn more about forensics along the way, you’re likely to be disappointed.
Review by Ryan Linn, columnist for EH-Net. For more from Ryan, please see his column by using the menu above or simply click his name.Tags: book review forensics network forensics programming python