The Examiner’s Toolbox is a series of posts designed to explore technical aspects of digital investigations and the challenges faced in today’s ever-evolving technical world. Each post will focus on a different forensic artifact and explain how to programmatically extract vital information. In this blog post, we illustrate how to extract data from Prefetch files with Python.
Computer applications like email, word processors, web browsers and other enterprise software programs are fundamental to the day-to-day operations of most companies. However, these same business-enabling tools can also be used to cause your company harm. For example, a rogue employee might make an unauthorized copy of confidential business information, or an employee may unintentionally download a malicious file that infects your environment with malware. In these scenarios and many others, a forensic investigation seeking indicators, or artifacts, pointing to when the applications of interest have been used can provide key evidence in recreating a user’s actions.
A Prefetch file, first introduced with Windows XP, is a well-documented forensic artifact and one of many that can shed light on recent application usage. Prefetch is designed to decrease the amount of time it takes to open frequently used applications. This feature is enabled by default on Windows operating systems unless the operating system is installed on a solid state drive. The EnablePrefetcher registry value determines if and at what level Prefetch is enabled. This value can have a number ranging from zero to three, where zero indicates that Prefetch is disabled and any other number indicates Prefetch is enabled in some capacity.
Prefetch files are located in the C:\Windows\Prefetch folder and created when an application runs from a location on the file system for the first time. On Windows XP through Windows 7 systems, this folder is limited to the most recent 128 Prefetch files. This limit was increased to 1024 with the introduction of Windows 8, providing additional application history. Prefetch files have a strict but simple naming convention where the filename is made up of the name of the executable, a dash, a hash representing a fingerprint of the location where the executable resides, and a “.pf” extension.
Prefetch files contain a wealth of information, including the executable name and path, last execution time, number of executions, files referenced by the executable, and more. This information is embedded within the Prefetch file and must be extracted for review. Performing this extraction by hand would be tedious and woefully inefficient in today’s fast-paced investigations. Instead, we will illustrate how to use Python, a programming language, to automate this process. With Python, extracting and interpreting meaningful information from raw binary data is straightforward.
Prefetch files have a defined structure that varies slightly with different versions of Windows. By defined structure, we refer to offsets within the file at which we can locate and extract specific data elements. The table below contains relevant offsets specific to Windows 7 Prefetch files.
|Windows 7 Prefetch File Offsets|
|Name||Byte Offset||Length (Bytes)|
|Last Execution FILETIME||128||8|
All digital data is stored as a sequence of 1s and 0s, which is commonly interpreted as hexadecimal values (hex). Our code is comprised of the following steps:
- Read a sequence of bytes as hex from an open file;
- Interpret that hex as a specified data type; and
- Present it in a human-readable format for review.
One feature of Python is its impressive accompaniment of supported libraries. These libraries modify and expand upon Python’s native capabilities. We will use Python’s struct library to interpret our raw hex into human-readable values.
Before we can do anything, we must open our Prefetch file, SKYDRIVE-3C7833DC.pf, for reading. As seen in the code block below, the “rb” argument opens the specified file in “read binary” mode allowing us to only read, and not write, data to the file. (Remember to follow best practices and work from a copy of the evidence and/or employ write-blocking techniques.) The handle to this open file is stored in the pf_file variable. We will use this variable to read data from the open file.
# Open the file pf_file = open("SKYDRIVE-3C7833DC.pf", 'rb')
With the Prefetch file open, we can begin extracting relevant data. The first four bytes of the file, when interpreted as an integer, is the version of the Prefetch file. We store the raw data in the raw_version variable before interpreting it as an integer using the struct library.
# Read the Prefetch Version ## Offset 0, Length 4 raw_version = pf_file.read(4)
Struct uses format characters to specify how to interpret binary data. The three format characters we use are “i” to interpret 4 bytes as a 32-bit integer, “q” to interpret 8 bytes as a 64-bit integer, and “s” to interpret 1 byte as a string. The struct.unpack() function requires two inputs — the format character and raw data to interpret. We supply the “i” format character to appropriately interpret the raw_version variable. After doing so, we print the results to the console. The version number is different across separate versions of Windows. For Windows 7 Prefetch files, this version number will be the decimal integer 23, represented by the hexadecimal value 0x17.
Struct is often used to interpret an entire byte stream with multiple data types. Therefore, results are returned in a tuple no matter how many values are actually interpreted. With any iterable, such as a tuple, we can retrieve elements by their index. In Python, elements are referenced by incrementing indices starting at zero. Therefore, we specify the zero index “” of the tuple returned by the struct.unpack() method and store it in the pf_version variable.
pf_version = struct.unpack('i', raw_version) print("Prefetch Version: ", pf_version)
The next value we access is the file name, found at offset 16. To ignore the intervening space between the beginning of the file and offset 16, we must move the cursor to that position before reading data. To this effect, we use the file seek() function as demonstrated below. This function requires an offset to seek to, and optionally, the relative point to seek from. In this case, 16 is the offset we want to seek to and 0 specifies that we want to seek that 16 bytes from the beginning of the file. If we replaced 0 with 1, we would instead seek 16 bytes from our current position and end up at offset 20. After our seeking operation, we read 60 bytes to the processed_file_name variable. While we could type the “s” format character 60 times, struct provides a shortcut allowing us to use an integer to specify how many characters to read and combine the values for us.
# Read File Name ## Offset 16, Length 60 pf_file.seek(16, 0) raw_file_name = pf_file.read(60) processed_file_name = struct.unpack('60s', raw_file_name)
We need to clean up some of the extra padding, 0x00 hex values, because our executable name is shorter than 60 characters. While this padding can be removed in several ways, we use the string strip() function to remove all padding characters The strip() function takes a string of characters and removes all occurrences of it from the beginning and end of the string. The code below demonstrates the process of converting text values into a simple codec. In order for support of non-English characters such as Japanese, Russian, and even Emoji, many programs use a standard known as Unicode-16 (UTF-16) to store these text values. To present this information in an easy to read format, we will decode the executable name from UTF-16 to another standard, ASCII, which only supports basic characters. The string decode() method allows this conversion before we remove all of the extra padding. Once cleaned up, we print the extracted filename, SKYDRIVE.EXE, to the console.
# Remove padding of 0x00 characters from end of file name to byte 0x4C pf_filename = processed_filename.decode("utf-16").strip("\x00") print("Executable File Name: ", pf_filename)
Our next data point follows the now familiar process of seeking, reading, and then interpreting binary data. After moving our cursor to offset 76 and reading 4 bytes, we perform multiple conversions on the extracted information to interpret the hash. This process is required to re-order the hex values in the appropriate order. Beginning with the innermost struct.unpack() call, we extract the raw information as a 32-bit integer. The angle brackets allow us to specify the byte order and by using the right-pointing bracket we interpret the data as a big-endian integer. We then use the inverse, “struct.pack()”, method to transform the big-endian interpreted data back into hexadecimal bytes, reversing the order with the left-pointing bracket. Finally, we convert these hexadecimal bytes into a string of characters, using the “binascii” library, and convert all letters to uppercase. When we print the hash, it displays “3C7833DC” instead of “0xdc33783c”, which is much easier for the human eye to interpret.
# Read Prefetch Hash ## Offset 76, Length 4 pf_file.seek(76, 0) raw_hash = pf_file.read(4) # Reverse characters and format as a UTF-8 String pf_hash = binascii.hexlify(struct.pack('<i', *struct.unpack('>i', raw_hash))).decode("UTF-8").upper() print("Prefetch Hash:", pf_hash)
The last execution time is perhaps one of the most useful pieces of information embedded within the Prefetch file. To interpret it, we must seek to the offset, 128, and read 8 bytes of data. We will interpret this data as a 64-bit integer using struct.unpack() and the “q” formatting string.
# Read Last Execution Time ## Offset 128, Length 8 pf_file.seek(128, 0) raw_timestamp = pf_file.read(8) processed_timestamp = struct.unpack('q',raw_timestamp)
Microsoft commonly stores dates as a count of 100-nanosecond intervals since January 1, 1601. This type of timestamp is referred to as FILETIME. While this format is convenient for machines it is significantly more difficult for human consumption. We will convert the integer using the built-in datetime library and add January 1, 1601 to the stored time value, as a rounded number of microseconds. Afterwards, we print the converted date value (2016-07-04 02:56:26.969814) to the console. The file system dates can also provide some insight into application usage. Generally, the creation time reflects the first execution time and the modified time describes the most recent. Since there are conditions allowing for the modification of this timestamp, it is best to refer to the internal value where possible.
# Convert date from FILETIME to Python Datetime value pf_timestamp = datetime.datetime(1601,1,1) + \ datetime.timedelta(microseconds=date_as_microseconds/10.0)print("Last Execution Time: ", pf_timestamp)
The last piece of information we will gather in this post is the execution count. This will give us a sense of the number of application executions before, and including, the last executed time. In comparison to the prior two data points, this integer is easier to access and interpret. We move the cursor to offset 152 and read 4 bytes from that position. This is then interpreted as a 32-bit integer using struct and printed to the console.
# Read Execution Count ## Offset 152, Length 4 pf_file.seek(152, 0) raw_count = pf_file.read(4) pf_count = struct.unpack('i', raw_count) print("Number of Executions:", pf_count)
In a few lines of code we have extracted valuable information from a Prefetch file. We have learned that an executable named “SKYDRIVE.EXE” was last run on 2016-07-04 02:56:26 AM and executed a total of 13 times. More information is available within this file, including files referenced by the executable and the full path of the executable, and can be extracted using methods similar to those we demonstrated.
Prefetch Version: 23 Executable File Name: SKYDRIVE.EXE Prefetch Hash: 3C7833DC Last Execution Time: 2016-07-04 02:56:26 AM Number of Executions: 13
In a few lines of code, with some formatting to present data nicely, computer-friendly hexadecimal can be converted to human-friendly information. This constitutes one step an investigator could take to ascertain information of a given application within a relevant time frame. Analyzing Prefetch is one of many methods employed in digital forensics to answer investigative questions by performing this level of analysis on a wide variety of devices and digital mediums.
 Boot, but not application, prefetching is enabled by default.
 This key is located at HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\PreftechParameters.
 This folder only exists if Prefetch was enabled at one point in time.
 All code in this blog post was written with Python 3.5.
 Binary data is a representation of stored information. Different file types represent data in their own formats, an example of text representation formats include ASCII and Unicode.
 Prefetch technical specification: www.forensicswiki.org/wiki/Windows_Prefetch_File_Format
 Struct documentation: https://docs.python.org/3/library/struct.html
 This is not true for infrequently used applications which may have had their Prefetch deleted and then recreated at a later date.