Blog

The Examiner’s Toolbox: Uncovering Recent File Activity with Python

Stroz Friedberg is a specialized risk management firm built to help clients solve the complex challenges prevalent in today’s digital, connected, and regulated business world
The Examiner's Toolbox

The Examiner’s Toolbox is a series of posts designed to explore technical aspects of digital investigations and the challenges faced in today’s ever-evolving technical world. Each post will focus on a different forensic artifact and explain how to programmatically extract vital information. In this blog post, we discuss how to extract data from recently-used.xbel files present on Linux systems with Python. 

Knowing what a computer was used to access can be a powerful piece of information in a forensic investigation. Knowing the number of times a particular file was accessed is even more powerful. Suppose, for example, that an employee is suspected of inappropriately accessing a confidential file. Discovering evidence that proves the file was accessed on their system is important but isn’t always enough to confirm deliberate wrongdoing. Digging deeper to uncover how many times the confidential file had been accessed allows an investigator to discern between inadvertent, one-time access and calculated actions, providing a much clearer view into the employee’s intentions and the company’s resulting risk exposure. The artifact we will look at in this post, recently-used.xbel, can tell us just that and then some.

The recently-used.xbel file is a byproduct of Nautilus, the de facto file manager for GNOME, present on many different Linux distributions. It is reminiscent of Prefetch (covered previously) and Lnk file artifacts on Windows in terms of the information it can provide. Namely, recently-used.xbel is a user-specific XML file that keeps track of recently used files, relevant timestamps, the applications that were used with them, and a count of the number of executions per application. On the latest installation of Ubuntu (16.04.2 LTS), this file is located in the following directory: /home/[User]/.local/share.

An example of the data source we will be parsing can be seen in the screenshot below. Each bookmark tag references a single file. Within this tag, there are a number of metadata fields we can extract related to the specific file. In addition to these, the bookmark:applications tag may contain one or more applications used to access the file including a count of such occurrences and a timestamp. In this script, we will create a CSV with one or more rows for each file to account for every application that accessed it.

Uncovering-Recent-File-Activity-with-Python-Script-sample-1

Python[1] has a built-in library, xml, which can be used to process XMLstructured data. Specifically, we will import the xml module’s ElementTree to interact with and navigate the XML data. We will use the built-in datetime and urllib modules to normalize dates in the data and convert URL-encoded strings, respectively. The built-in csv library will be used to output the processed XML data to a CSV file for review.


import csv
import datetime
import urllib.parse
import xml.etree.ElementTree as ET

Oftentimes, we access the contents of a file by using the open() method. Instead, with XML files, we use the ElementTree.parse() method and the getroot() method to access the root XML element. Each element, such as root, can have a tag, attributes, and children nodes.


tree = ET.parse(‘recently-used.xbel’)
root = tree.getroot()

Using the findall() method, in conjunction with a for loop, we iterate over all child nodes with a “bookmark” tag. From each bookmark node, we store any extracted data into a list and append it to the recent_results list. This effectively creates a list of lists and organizes the processed data into logical units. Each bookmark contains file path and timestamp attributes. Using the get() method and specifying the attribute name, we can extract their values which we will later store in the recent_results list after all other data associated to the bookmark has been processed.


recent_results = []
for element in root.findall('bookmark'):
   
    # Grab attributes in bookmark tags -- returns None if not present
    visited = dateConverter(element.get('visited'))
    href = urllib.parse.unquote(element.get('href'))
    added = dateConverter(element.get('added'))
    modified = dateConverter(element.get('modified'))

Notice the use of the dateConverter() helper function in the immediately preceding code block. Timestamps in this file are of a format that Excel will not immediately interpret as a date, e.g. 2017-04-05T11:10:48Z. The dateConverter() function, shown in the code block below, converts this timestamp to a format Excel readily recognizes, e.g. 04/05/2017 11:10:48. This is accomplished with the strptime() and strftime() methods from the datetime module.

The strptime() function returns a datetime object from a string interpreted with provided datetime directives[2]. Each datetime directive begins with a “%” symbol to distinguish it from normal characters (like the “T” and “Z” characters in our string). However, because we do want a string as the final output, we use the datetime object just created, date, and the strftime() method to return a formatted date string. This function also requires directives to specify how the date within the string should be formatted.


def dateConverter(dt):
date = datetime.strptime(dt, '%Y-%m-%dT%H:%M:%SZ')
return datetime.strftime(date, '%m/%d/%Y %H:%M:%S')

Next, we extract the MIME type of the file from the mime:mime-type child node. We use the find() method to navigate to the child node. This particular node uses the mime namespace which we have already defined in the namespaces dictionary. This was the namespace defined in the particular XML file we were processing. Should it change in the future, the corresponding entry in the namespaces dictionary would need to reflect that change.

An AttributeError will be generated if the mime-type node cannot be found. To handle instances where the mime-type is not found, we catch the error using a try and except block and instead set the mime_type variable to an empty string. This will prevent us from further errors when referring to that variable later – as formerly this variable would not exist if the mime:mime-type node was not found.


namespaces = {'bookmark': 'http://www.freedesktop.org/standards/desktop-bookmarks',

                          'mime': 'http://www.freedesktop.org/standards/shared-mime-info'}

    # Try to process the mime type with error handling for cases where mime-type is not found
    try:
        mime_type = element.find('info/metadata/mime:mime-type', namespaces).get('type')
    except AttributeError:
        mime_type = ''

As alluded to earlier, each bookmark can have more than one application associated with it. Therefore, we need another for loop to iterate through each application. After accessing the bookmark:applications node, using the bookmark namespace from the namespaces dictionary, we use the iter() method to iterate through each child node. An if statement checks if the app child node has any attributes. If it does not we continue onto the next app child node in the for loop.

We append a separate list for each corresponding application in the else statement. This list contains the data we processed previously and the attributes we can extract from the bookmark:application node.


# Iterate through bookmark:applications to identify associated apps and relative counts
bookmark_app = element.find('info/metadata/bookmark:applications', namespaces)
         
for app in bookmark_app.iter():
    if app.keys() == []:
        continue
    else:
        recent_results.append([href, added, modified, visited, mime_type, app.get('name'), app.get('exec'), app.get('count'), dateConverter(app.get('modified'))])

Once we have processed all entries in the recently-used.xbel file, we pass the recent_results list to the writeOutput() function. Python makes writing data to a CSV file very simple. Often the hardest part of scripting these artifacts is processing them into Python data structures. Once that is accomplished, manipulating the data and outputting it is generally straightforward.

We first need to open the output file we will write to using the open() method. Note the optional keyword argument newline set equal to an empty string. Omitting this option will result in your CSV file containing blank rows between each writerow() or writerows() method call[3].


def writeOutput(recents, output_csv):
       with open(output_csv, 'w', newline='') as csvfile:
               csv_writer = csv.writer(csvfile)
               csv_writer.writerow(['Filename', 'Added Date/Time', 'Modified Date/Time', 'Visited Date/Time', 'Mime-type', 'App Name', 'Command Exec', 'Run Count', 'App Modified Date/Time'])
               csv_writer.writerows(recents)

After we open the csvfile, we need to pass the file handle to the csv.writer() method. With this complete, we can now begin to write data to the CSV file. Before doing anything else, we use the writerow() method to write the headers of the spreadsheet. Lastly, we use the writerows() method and pass it the list of lists to quickly and simply write each list to the output file.

A sample spreadsheet created by this script can be seen below. With the data normalized into a spreadsheet, we can now review it in a much more efficient manner by sorting on specific columns or applying filters.

Uncovering-Recent-File-Activity-with-Python-Script-sample-2
Any structured data file can be converted using techniques like those described in this post. Oftentimes, knowing which library or which Python data type to use can be the sticking point. With practice, Python can become a powerful tool in the hands of an investigator. Being able to manipulate data into a format conducive to analysis is not only time saving but can make all the difference in finding that proverbial needle in the haystack.

 

[1] All code in this blog post was written and tested with Python 3.6.0

[2] For a full list of datetime directives, please review https://docs.python.org/3/library/datetime.html

[3] Python 3 csv documentation can be viewed at https://docs.python.org/3/library/csv.html

 

Legal

Our lawyers don’t want to miss out on the fun and would like you to know that all of the posts are the opinions of the individual authors and don’t necessarily reflect the opinions or positions of Stroz Friedberg. The ideas and strategies discussed herein may not be appropriate for any one reader’s situation and are not meant to be construed as advice.

Professionals

Commentary, new discoveries, and innovative ideas
right to your inbox.

Stroz Friedberg

Sorry! You are using an older browser which is not supported by this website.

Please download one of these free browsers to enjoy all our website has to offer:
Firefox, Chrome or Internet Explorer.