Developing our first forensic script – usb_lookup.py
Now that we've gotten our feet wet writing our first Python script, let's write our first forensic script. During forensic investigations, it is not uncommon to see references to external devices by their vendor identifier (VID) and product identifier (PID) values; these values are represented by four hexadecimal characters. In cases where the vendor and product name are not identified, the examiner must look up this information. One such location for this information is the following web page: http://linux-usb.org/usb.ids. For example, on this web page, we can see that a Kingston DataTraveler G3 has a VID of 0951 and a PID of 1643. We will use this data source when attempting to identify vendor and product names by using the defined identifiers.
First, let's look at the data source we're going to be parsing. A hypothetical sample illustrating the structure of our data source is mentioned later. There are USB vendors and, for each vendor, a set of USB products. Each vendor or product has four-digit hexadecimal characters and a name. What separates vendor and product lines are tabs because products are tabbed over once under their parent vendor. As a forensic developer, you will come to love patterns and data structures, as it is a happy day when data follows a strict set of rules. Because of this, we will be able to preserve the relationship between the vendor and its products in a simple manner. Here is the afore-mentioned hypothetical sample:
0001 Vendor Name
0001 Product Name 1
0002 Product Name 2
...
000N Product Name N
This script, named usb_lookup.py, takes a VID and PID that's supplied by the user and returns the appropriate vendor and product names. Our program uses the urlopen method from the urllib module to download the usb.ids database to memory and create a dictionary of VIDs and their products. Since this is one of the libraries that changed between versions 2 and 3 of Python, we have introduced some logic in a try and except block to ensure we are able to call the urlopen method without issue, as shown in the following code. We also import the argparse module to allow us to accept VID and PID information from the user:
001 """Script to lookup USB vendor and product values."""
002 from __future__ import print_function
003 try:
004 from urllib2 import urlopen
005 except ImportError:
006 from urllib.request import urlopen
007 import argparse
If a vendor and product combination is not found, error handling will inform the user of any partial results and exit the program gracefully.
The main() function contains the logic to download the usb.ids file, store it in memory, and create the USB dictionary. The structure of the USB dictionary is somewhat complex and involves mapping a VID to a list, containing the name of the vendor as the first element, and a product dictionary as the second element. This product dictionary maps PIDs to their names. The following is an example of the USB dictionary containing two vendors, VendorId_1 and VendorId_2, each mapped to a list containing the vendor name, and a dictionary for any product ID and name pairs:
usbs = {
VendorId_1: [
VendorName_1,
{ProductId_1: ProductName_1,
ProductId_2: ProductName_2,
ProductId_N: ProductName_N}
], VendorId_2: [
VendorName_2,
{ProductId_1: ProductName_1}
], ...
}
It may be tempting to just search for VID and PID in the lines and return the names rather than creating this dictionary that links vendors to their products. However, products can share the same ID across different vendors, which could result in mistakenly returning a product from a different vendor. With our previous data structure, we can be sure that the product belongs to the associated vendor.
Once the USB dictionary has been created, the search_key() function is responsible for querying the dictionary for a match. It first assigns the user-supplied two arguments, VID and PID, before continuing with the execution of the script. Next, it searches for a VID match in the outermost dictionary. If VID is found, the innermost dictionary is searched for the responsive PID. If both are found, the resolved names are printed to the console. Lastly, starting at line 81, we define our arguments for the user to provide the VID and PID values before calling the main() function:
042 def main():
...
065 def search_key():
...
080 if __name__ == '__main__':
081 parser = argparse.ArgumentParser(
082 description=__description__,
083 epilog='Built by {}. Version {}'.format(
084 ", ".join(__authors__), __date__),
085 formatter_class=argparse.ArgumentDefaultsHelpFormatter
086 )
087 parser.add_argument('vid', help="VID value")
088 parser.add_argument('pid', help="PID value")
089 args = parser.parse_args()
090 main(args.vid, args.pid)
For larger scripts, such as this, it is helpful to view a diagram that illustrates how these functions are connected together. Fortunately, a library named code2flow, available on GitHub (https://github.com/scottrogowski/code2flow.git), exists to automate this process for us. The following schematic illustrates the flow from the main() function to the search_key() function. There are other libraries that can create similar flow charts. However, this library does a great job of creating a simple and easy to understand flowchart: