How it works…
To begin, we import the required libraries: argparse for argument handling, datetime for interpretation of timestamps, and os to access the stat() method. The sys module is used to identify the platform (operating system) the script is running on. Next, we create our command-line handler, which accepts one argument, FILE_PATH, a string representing the path to the file we will extract metadata from. We assign this input to a local variable before continuing execution of the script:
from __future__ import print_function
import argparse
from datetime import datetime as dt
import os
import sys
__authors__ = ["Chapin Bryce", "Preston Miller"]
__date__ = 20170815
__description__ = "Gather filesystem metadata of provided file"
parser = argparse.ArgumentParser(
description=__description__,
epilog="Developed by {} on {}".format(", ".join(__authors__), __date__)
)
parser.add_argument("FILE_PATH",
help="Path to file to gather metadata for")
args = parser.parse_args()
file_path = args.FILE_PATH
Timestamps are one of the most common file metadata attributes collected. We can access the creation, modification, and access timestamps using the os.stat() method. The timestamps are returned as a float representing the seconds since 1970-01-01. Using the datetime.fromtimestamp() method, we convert this value into a readable format.
stat_info = os.stat(file_path)
if "linux" in sys.platform or "darwin" in sys.platform:
print("Change time: ", dt.fromtimestamp(stat_info.st_ctime))
elif "win" in sys.platform:
print("Creation time: ", dt.fromtimestamp(stat_info.st_ctime))
else:
print("[-] Unsupported platform {} detected. Cannot interpret "
"creation/change timestamp.".format(sys.platform)
)
print("Modification time: ", dt.fromtimestamp(stat_info.st_mtime))
print("Access time: ", dt.fromtimestamp(stat_info.st_atime))
We continue printing file metadata following the timestamps. The file mode and inode properties return the file permissions and inode as an integer, respectively. The device ID refers to the device the file resides on. We can convert this integer into major and minor device identifiers using the os.major() and os.minor() methods:
print("File mode: ", stat_info.st_mode)
print("File inode: ", stat_info.st_ino)
major = os.major(stat_info.st_dev)
minor = os.minor(stat_info.st_dev)
print("Device ID: ", stat_info.st_dev)
print("\tMajor: ", major)
print("\tMinor: ", minor)
The st_nlink property returns a count of the number of hard links to the file. We can print the owner and group information using the st_uid and st_gid properties, respectively. Lastly, we can gather file size using st_size, which returns an integer representing the file's size in bytes.
Be aware that if the file is a symbolic link, the st_size property reflects the length of the path to the target file rather than the target file’s size.
print("Number of hard links: ", stat_info.st_nlink)
print("Owner User ID: ", stat_info.st_uid)
print("Group ID: ", stat_info.st_gid)
print("File Size: ", stat_info.st_size)
But wait, that’s not all! We can use the os.path() module to extract a few more pieces of metadata. For example, we can use it to determine whether a file is a symbolic link, as shown below with the os.islink() method. With this, we could alert the user if the st_size attribute is not equivalent to the target file's size. The os.path() module can also gather the absolute path, check whether it exists, and get the parent directory. We can also gather the parent directory using the os.path.dirname() function or by accessing the first element of the os.path.split() function. The split() method is more commonly used to acquire the filename from a path:
# Gather other properties
print("Is a symlink: ", os.path.islink(file_path))
print("Absolute Path: ", os.path.abspath(file_path))
print("File exists: ", os.path.exists(file_path))
print("Parent directory: ", os.path.dirname(file_path))
print("Parent directory: {} | File name: {}".format(
*os.path.split(file_path)))
By running the script, we can relevant metadata about the file. Notice how the format() method allows us to print values without concern for their data types. Normally, we would have to convert integers and other data types to strings first if we were to try printing the variable directly without string formatting: