Learning Python for Forensics
上QQ阅读APP看书,第一时间看更新

Files

We will often create file objects to read or write data from a file. File objects can be created using the built-in open() method. The open() function takes two arguments, the name of the file and the mode. These modes dictate how we can interact with the file object. The mode argument is optional, and if left blank defaults to read-only. The following table illustrates the different file modes available for use:

 

Most often, we will use read and write in standard or binary mode. Let's take a look at a few examples and some of the common functions we might use. For this section, we will create a text file called file.txt with the following content:

This is a simple test for file manipulation.
We will often find ourselves interacting with file objects.
It pays to get comfortable with these objects.

In the following example, we open a file object that exists, file.txt, and assign it to a variable, in_file. Since we do not supply a file mode, it is opened in read-only mode by default. We can use the read() method to read all lines as a continuous string. The readline() method can be used to read individual lines as a string. Alternatively, the readlines() method creates a string for each line and stores it in a list. These functions take an optional argument, specifying the size of bytes to read.

The readline() and readlines() functions use the \n or \r newline characters to segment the lines of a file. This is good for most files, though may not always work based on your input data. As an example, CSV files with multiple lines in a single cell would not display properly with this type of file-reading interface.

Python keeps track of where we currently are in the file. To illustrate the examples we've described, we need to use the seek() operation to bring us back to the start of the file before we run our next example. The seek() operation accepts a number and will navigate to that decimal character offset within the file. For example, if we tried to use the read() method before seeking back to the start, our next print function (showcasing the readline() method) would not return anything. This is because the cursor would be at the end of the file as a result of the read() function:

>>> in_file = open('file.txt')
>>> print(in_file.read())
This is a simple test for file manipulation.
We will often find ourselves interacting with file objects.
It pays to get comfortable with these objects.
>>> in_file.seek(0)
>>> print(in_file.readline())
This is a simple test for file manipulation.
>>> in_file.seek(0)
>>> print(in_file.readlines())
['This is a simple test for file manipulation.\n', 'We will often find ourselves interacting with file objects.\n', 'It pays to get comfortable with these objects.']

In a similar fashion, we can create, or open and overwrite, an existing file using the w file mode. We can use the write() function to write an individual string or the writelines() method to write any iterable object to the file. The writelines() function essentially calls the write() method for each element of the iterable object.

For example, this is tantamount to calling write() on each element of a list:

>>> out_file = open('output.txt', 'w')
>>> out_file.write('Hello output!')
>>> data = ['falken', 124, 'joshua']
>>> out_file.writelines(data)

Python does a great job of closing connections to a file object automatically. However, best practice dictates that we should use the flush() and close() methods after we finish writing data to a file. The flush() method writes any data remaining in a buffer to the file, and the close() function closes our connection to the file object:

>>> out_file.flush()
>>> out_file.close()