The Bulk Extractor Viewer (BEviewer) is a graphical interface for bulk_extractor, "a C++ program that scans a disk image, a file, or a directory of files and extracts useful information without parsing the file system or file system structures. The results can be easily inspected, parsed, or processed with automated tools. bulk_extractor also creates a histogram of features that it finds, as features that are more common tend to be more important."
While originally intended for law enforcement, bulk_extractor can be used by digital archivists to quickly and thoroughly examine a disk image for a wide variety of information. The most common use for such analysis is locating personally identifiable information (PII) that a donor may want redacted before his or her materials are made publicly available, but bulk_extractor can locate other types of potentially sensitive information as well.
The instructions below take you through the process of running bulk_extractor via the Bulk Extractor Viewer utility, a GUI interface for running bulk_extractor and viewing the results. Archivists can view the results through the GUI and also further process them using the digital forensics tools in the BitCurator environment.
Clicking on the Generate a Report icon will open the "Run bulk_extractor" window. From this window you can specify the image you would like to analyze, the location of the output directory, and further refine the types of data objects bulk_extractor will search for.To begin your analysis, first select the type of media you want to scan, which in our case is a disk image, so we'll select the "image file" radial button (see Figure 2)
Enter the name of the directory you wish to create for holding the Bulk Extractor output; Bulk Extractor Viewer will create this directory for you. For example, if you wanted to save the bulk_extractor output to a directory named "be_reports" in your working directory, you would enter "/home/bcadmin/[name of working directory]/be_reports" in the Output Feature Directory field. If the directory be_reports doesn't already exist, BE Viewer will create it for you.
Optional: If desired, you may check or uncheck the scanner options from the list on the right (see Figure 3). You can refine your analysis of the disk image by choosing which "scanners" bulk_extractor employs. "Scanners" in bulk_extractor are modular sets of rules that allow bulk_extractor to find specific types of information on a disk image. For example, if you have the "accts" scanner selected, bulk_extractor will find objects (which it calls "features") such as credit card numbers, social security numbers, phone numbers, etc. A chart defining the various scanners, how they may be useful to you, and where they output their results can be found here.
Figure 3: Scanner options in the Bulk Extractor Viewer.
Note: Because bulk_extractor generates a large number of files, it requires a new directory for the output. Bulk_extractor will create the new directory. If you navigate to the output directory instead of type it in, a window will open asking you for the path to the new directory along with its name. In our example, the output directory would be "/home/bcadmin/test_data/" and the new directory to be created by bulk_extractor would be called "be_data" (see Figure 4).
Figure 4: The "Name the output folder for bulk_extractor.
Finally, click the "Start bulk_extractor" button at the bottom of the screen to begin the scan.
<div style="border: solid blue; font-weight: bold; text-align: center;"> If you would like to provide feedback for this page, please follow this <a href="https://docs.google.com/forms/d/e/1FAIpQLSelmRx1VmgDEg3dU5_8cXZy9MZ5v8_sAl-Ur2nPFLAi6Lvu2w/viewform?usp=sf_link">link to the BitCurator Wiki Google Form</a> for the BitCurator All Step-by-Step Guides section.