Learning Python for Forensics
上QQ阅读APP看书,第一时间看更新

Getting started

Before we get started, it is necessary that you install Python on your machine. It is important to understand that, at the time of writing this book, there are two supported versions of Python: Python 2 and 3. We will use both Python 2 and 3 to develop our solutions. Historically, many of the useful third-party forensic libraries were developed for Python 2. At this point, most libraries are compatible with Python 3, which has superior Unicode handling, a major headache in Python 2, among a number of other improvements. All of the code in this book has been tested with the latest appropriate versions of Python 2 (v. 2.7.15) or 3 (v. 3.7.1). In some cases, our code is compatible with both Python 2 and 3, or only works with one of the two. Each chapter will describe what version of Python is required to run the code.

Additionally, we recommend using an integrated development environment, or IDE, such as JetBrain's PyCharm. An IDE will highlight errors and offer suggestions that help streamline the development process and promote best practices when writing code. In the case that the installation of an IDE is not available, a simple text editor will work. We recommend an application such as Notepad++, Sublime Text, or Visual Studio Code. For those who are command line orientated, an editor such as vim or nano will work as well.

With Python installed, let's open the interactive prompt by typing python into your Command Prompt or Terminal. We will begin by introducing some built-in functions for use in troubleshooting. The first line of defense when confused by any object or function discussed in this book, or found in the wild, is the type(), dir(), and help() built-in functions. We realize we have not yet introduced common data types and so the following code might appear confusing.

However, that is exactly the point of this exercise. During development, you will encounter data types you are unfamiliar with or be unsure what methods exist to interact with the object. These three functions help solve those issues. We will introduce the fundamental data types later in this chapter.

The type() function, when supplied with an object, will return its __name__ attribute, providing type identifying information about the object. The dir() function, when supplied with a string representing the name of an object, will return its attributes, showing the available options of the functions and parameters belonging to the object. The help() function can be used to display the specifics of these methods through its docstrings. Docstrings are nothing more than descriptions of a function that detail the inputs, outputs, and how to use the function.

Let's look at the str, or string, object as an example of these three functions. In the following example, passing a series of characters surrounded by single quotes to the type() function results in a type of str, or string.

When we show examples where our typed input follows the >>> symbol, this indicates that you should type these statements in the Python interactive prompt. The Python interactive prompt can be accessed by typing python in the Command Prompt.

These basic functions behave similarly in both Python 2 and 3. Unless otherwise stated, these function calls and their output are executed with Python 3.7.1. Please note, however, that the purposes of these built-in functions largely remain the same and have similar outputs between Python versions.

Here is an example:

>>> type('what am I?')
<class 'str'>

If we pass in an object to the dir() function, such as str, we can see its methods and attributes. Let's say that we want to know what one of these functions, title(), does. We can use the help() function specifying the object and its function as the input.

The output of the function tells us no input is required, the output is a string object, and that the function capitalized the first character of every word. Let's use the title method on the what am I? string:

>>> dir(str) 
['__add__', '__class__', '__contains__', '__delattr__',
'__doc__', '__eq__',
...
'swapcase', 'title', 'translate', 'upper', 'zfill']

>>> help(str.title)
Help on method_descriptor:

title(...)
S.title() -> str

Return a titlecased version of S, i.e. words start with title case characters, all remaining cased characters have lower case.

>>> 'what am I?'.title()
'What Am I?'

Next, type number = 5. Now we have created a variable, called number, that has the numerical value of 5. Using type() on that object indicates that 5 is an int, or integer. Going through the same procedure as before, we can see a series of available attributes and functions for the integer object. With the help() function, we can check what the __add__() function does for our number object. From the following output, we can see that this function is equivalent to using the + symbol on two values:

>>> number = 5
>>> type(number)
<class 'int'>

>>> dir(number)
>>> ['__abs__', '__add__', __and__', '__class__', '__cmp__', '__coerce__',
...
'denominator', 'imag', 'numerator', 'real']

>>> help(number.__add__)
__add__(...)
x.__add__(y) <==> x+y

Let's compare the difference between the __add__() function and the + symbol to verify our assumption. Using both methods to add 3 to our number object results in a returned value of 8, as expected. Unfortunately, we've also broken a best practice rule illustrating this example:

>>> number.__add__(3)
8
>>> number + 3
8

Notice how some methods, such as __add__(), have double leading and trailing underscores. These are referred to as magic methods, and are methods the Python interpreter calls and should not be called by the programmer. These magic methods are instead called indirectly by the user. For example, the integer __add__() magic method is called when using the + symbol between two numbers. Following the previous example, you should never run number.__add__(3) instead of number + 3.

This rule is broken in a few cases, which we will cover throughout this book, though unless the documentation recommends using a magic method, it is best to avoid them.

Python, like any other programming language, has a specific syntax. Compared to other common programming languages, Python is rather English-like and can be read fairly easily in scripts. This feature has attracted many, including the forensics community, to use this language. Even though Python's language is easy to read, it is not to be underestimated as it is powerful and supports common programming paradigms.

Most programmers start with a simple Hello World script, a test that proves they are able to execute code and print the famous message into the console window. With Python, the code to print this statement is a single line, as seen here, written on the first line of a file:

001 print("Hello World!") 
Please note that when discussing the code in a script, as opposed to code in the interactive prompt, line numbers, starting at 001, are shown for reference purposes only. Please do not include these line numbers in your script. The code for this script and all scripts can be downloaded at https://packtpub.com/books/content/support.

Save this line of code in a file called hello.py. To run this script, we call Python and the name of the script. If you are using Python 3, the message Hello World! should be displayed in your Terminal:

Let's discuss why this simple script will not execute successfully in some versions of Python 2.