Learning Python for Forensics
上QQ阅读APP看书,第一时间看更新

Parsing Text Files

Text files, usually sourced from application or service logs, are common sources for artifacts in digital investigations. Log files can be quite large or contain data that makes human review difficult. A manual examination can devolve into a series of grep searches, which may or may not be fruitful; additionally, prebuilt tools may not have support for a specific log file format. For these instances, we will need to develop our own solution to properly parse and extract the relevant information. In this chapter, we will analyze the setupapi.dev.log file, which records device information on Windows machines. This log file is commonly examined, as it can extract the first connection time of USB devices on the system.

We will step through several iterations of the same code through this chapter. Though redundant, we encourage writing out each iteration for yourself. By rewriting the code, we will progress through the material together and find a more fitting solution, learn about bug handling, and implement efficiency measures. Please rewrite the code for yourself and test each iteration to see the changes in the output and code handling.

In this chapter, we will be covering the following topics:

  • Identifying repetitive patterns in this log file for USB device entries
  • Extracting and processing artifacts from text files
  • Iteratively improving our script design and features
  • Enhancing the presentation of data in a deduplicated and readable manner
The code for this chapter is developed and tested using Python 2.7.15 and Python 3.7.1.