Learning Python for Forensics
上QQ阅读APP看书,第一时间看更新

Understanding the main() function

Let's start by examining the main() function, which is called on line 90, as seen in the previous code block. This function, on line 42, requires the vid and pid information supplied by the user's arguments for resolution in the usb.ids database. On lines 43 through 46, we create our initial variables. The url variable stores the URL containing the USB data source. We use the urlopen() function from the urllib module to create a list of strings from our online source. We will use a lot of string operations, such as startswith()isalnum()islower(), and count(), to parse the usb.ids file structure and store the parsed data in the usbs dictionary. The curr_id variable, defined as an empty string on line 46, will be used to keep track of which vendor we are currently processing in our script:

042 def main(vid, pid):
043     url = 'http://www.linux-usb.org/usb.ids'
044 usbs = {}
045 usb_file = urlopen(url)
046 curr_id = ''

An important concept in Python string manipulation is encoding. This is one of the most common issues when writing Python 2 and Python 3 compatible code. The following for loop on line 48 starts iterating over each line in the file, providing the line for review. For Python 3 support, we have to check whether the line variable is an instance of bytes, a raw data type that (in this case) is holding encoded string data. If this is the case, we must decode it using the decode() method and provide the proper encoding—latin-1 in this instance, as seen on line 50. Python 2 reads data from files as strings and therefore will not enter this conditional, so we can move forward with parsing the line:

048     for line in usb_file:
049 if isinstance(line, bytes):
050 line = line.decode('latin-1')

Our next conditional checks for commented lines in the usb.ids file, skipping any blank lines (only containing a newline or tab character) and any comment lines starting with a pound character. To check for comment lines, we can use the startswith() string method to check whether the provided string, of one or more characters, is the same as the line we are checking. To simplify our code, we also leveraged the in statement, which allows us to handle an or-like comparison of equality for the line. This is a handy shortcut you will see in a variety of scripts. If either of these conditions is true, we will use the continue statement as seen on line 52 to step into the next loop iteration:

051         if line.startswith('#') or line in ('\n', '\t'):
052 continue

The second half of our conditional handles additional validation of the line format. We want to confirm that the line we are inspecting matches the format of a vendor line, so we can include our vendor-related parsing code within it. To do this, we check to make sure the line does not start with a tab character and the first character is alphanumeric with the isalnum() call:

053         else:
054 if not(line.startswith('\t')) and line[0].isalnum():

Knowing that the line passed our check for confirming it is a vendor informational line, we can start extracting the needed values and fill out our data structure. On line 55, we extract our two values from the line, uid and name, by stripping the line and using the split() method. The split() method is using two parameters here, one for the character to split on and the second for the number of times to split. In this case, we are splitting on a space character and only splitting after finding the first space.

This is useful, as our vendor name may contain a space in it and we want to keep those details together. Since we anticipate two values returning, we can use the assignment seen on line 55 to simultaneously populate the uid and name variables with the correct values, though this can lead to errors if the split() method only returns one object. In this instance, we know our data source and have validated that this should always return two values, though this is a great spot to add a try-except block in your version of the code to handle any errors that may arise.

We then assign the uid variable to the curr_id value for use while parsing PID details on line 56. Finally, on line 57, we add this information to our data structure, usbs. Since the usbs structure is a dictionary, we assign the VID's uid value as the key and set up our list with the VID common name as the first element and an empty dictionary for product details as a second. On line 57, we ensure that the vendor name does not have any unwanted whitespace characters on it by calling the strip() method on the string:

055                 uid, name = line.strip().split(' ', 1)
056 curr_id = uid
057 usbs[uid] = [name.strip(), {}]

Now that we have processed the vendor data pattern, let's turn our attention to the product data pattern. First, we will use an elif conditional to check that the line does start with a tab character and, using the count() method, ensure that it is the only tab character in the line. On line 59, we make a familiar call to strip and split the line into our required values. On line 60, we then add the product information to our data structure. As a quick refresher, usbs is a dictionary, where the keys are VIDs. Within a VID's value is a list where element zero is the vendor name and element one is the dictionary to store PID details. As expected, we will use the uid value as the key for the product details and assign the product name to the PID key. Notice how we use the curr_id value from the prior vendor line to ensure we are correlating the VIDs and PIDs properly:

058             elif line.startswith('\t') and line.count('\t') == 1:
059 uid, name = line.strip().split(' ', 1)
060 usbs[curr_id][1][uid] = name.strip()

The previous lines then repeat in a for loop until the end of the file is reached, parsing out the vendor and product details and adding them into the usbs dictionary.

We are almost there—the last part of our main() function is a call to the search_key() function, which takes the user-supplied vid and pid information, along with our newly built usbs dictionary for lookup. Notice how this call is indented with four spaces, placing it outside of the for loop and allowing us to only call this method one time, once the usbs lookup dictionary is complete:

062     search_key(vid, pid, usbs)

This takes care of the logic in the main() function. Now, let's take a look at the search_key() function to determine how we will lookup our VID and PID values.