Você está na página 1de 9

PERFORMANCE EVALUATION OF OPEN-SOURCE DISK IMAGING TOOLS FOR COLLECTING DIGITAL EVIDENCE Saufi Bukhari, Ibrahim Yusof &

Mohd Fikri Azli Abdullah Faculty of Information, Science and Technology Multimedia University (MMU), Malaysia saufi.bukhari@gmail.com, ibrahim@ibrahimyusof.com School of Electronics and Computer Engineering Chonnam National University, South Korea mfikriazli@gmail.com Abstract Disk imaging is the key process of conducting digital forensic investigation whereby digital evidence is acquired using proper disk imaging tools. Disk imaging tools come in commercial package and open-source package, each of them offers different features to satisfy technological needs. Open-source tools are the most preferred to reduce the cost of conducting investigation. However, not all open-source disk imaging tools satisfy digital forensic standards because most of the tools are originally designed to image the hard drive for the purpose of data backup, data recovery, boot image, and other non-forensic based purposes. Few mandatory and optional requirements of digital forensic standards need to be satisfied by a particular disk imaging tool to ensure that digital forensic investigation will produce accurate and reliable outputs without neglecting the performance. This research aims to evaluate the performance of various open-source disk imaging tools such as dd, dd_rescue, GNU ddrescue, dcfldd, and aimage. These tools will be tested on the Linux system where the environment and configurations will be properly controlled which will also include applicable digital forensic settings of each tool and the result of the test will be analyzed and simplified for better understanding. Keywords: Digital Forensics, Digital evidence, Open-Source disk imaging tools. 1. Introduction Digital forensic is a small branch of forensic science which involves obtaining digital evidence from digital sources such as computer, PDA, iPod, memory card or stick, and other digital devices. There are 5 basic methodologies to the digital forensics which are preparation, collection, examination, analysis, and reporting. The main objective of performing digital forensic is to explain the current state of a digital artifact which includes computer system registries, storage medium, digital media or documents, and information from transmitted packets in the network. However, there are also other purposes of digital forensic such as to analyze the data or information inside the computer system, to recover lost or deleted or corrupted data, to analyze how the computer system being compromised by the hackers, to gather digital evidence, and to understand how the computer system works for the purpose of debugging, reverse-engineering, and performance optimization. Forensic disk imaging is the key of conducting digital forensic investigation whereby the storage medium will be duplicated sector-by-sector rather than separate file while maintaining the content and the structure of it. Unlike other imaging applications, forensic imaging also aims to create image of all data including ambient data which refers to data stored in operating systems swap file, unallocated space and file slack (Saudi, 2001). These locations are most likely the sources of evidence during forensic investigations. Digital
Proceedings of Regional Conference on Knowledge Integration in ICT 2010 353

evidence can be in many forms such as e-mails, images, logs, documents, history files, temporary files, audio or video files, and computer memory. These forms of evidence need to be acquired using digital forensic tools or disk imaging tools and normally being stored as raw image (.dd or .img) for further analysis. There are a lot of disk imaging tools available in the market which consists of commercial and open-source tools. Access Data Forensic Took Kit (FTK) and Guidance Software EnCase are the most popular and widely used tools in formal forensic investigations because they integrate many digital forensic tools inside such as disk imaging tool, evidence analysis tool, case management tool, and investigation documentation. However, neither FTK nor EnCase are cheap and cannot simply be afforded by small organizations. This is why open-source tools have been developed, and normally distributed separately as independent software. Examples of open-source disk imaging tools are dd, aimage, dcfldd, dd_rescue, guymaker, ewfacquire, and FTK Imager. However, not all open-source tools can be used for formal digital forensic investigations because some of them do not meet forensic standards. The best thing about open-source tools is they are totally free and can be redistributed or extended freely without infringing their copyright. This ensures that digital forensic investigations cost can be reduced greatly while maintaining the standards and effectiveness. This paper is created to compare the results of the test of open-source disk imaging tools in real environment. The tests are conducted on a personal computer whereby the scenario of digital evidence collection is replicated. The results is analyzed and simplified into graphs and tables for better understanding. Section 2 describes the digital forensic technology and section 3 describes the disk imaging technology. The preparations done before conducting the actual test are described in section 4, and the analysis and results are documented in section 5. 2. Digital Forensic Technology Digital forensic is a process of answering questions about the initial and current state of digital data which involves two major techniques which are dead and live analysis techniques (Carrier, 2006). Live analysis is a process of performing instant analysis on a particular system as soon as the attack is reported or detected by the Intrusion Detection System (IDS). Live analysis is normally being used if the attacked system such as web server and database server needs to stay running all the time. However, dead analysis is the most widely used method whereby the systems hard drive will be duplicated and analyzed using proper forensic tools. The purpose of both techniques is identical which is to analyze for potential evidence, the only difference is that live analysis techniques rely on the applications that could possibly being compromised to produce inaccurate data. 2.1. Digital Forensic Process A standard digital forensic process must include identification, preservation, analysis, and presentation (Li & Seberry, 2003). In this paper, identification is described as identifying the evidence to decide appropriate methodologies, preservation is described as preserving the integrity and maintaining the chain of custody, analysis refers to reviewing and examining the evidence, and presentation means presenting the evidence in a legally acceptable and understandable manner. By following the standard process, the possibility of diskovering useful evidence will be increased and the reliability of the evidence will be preserved and will be valid in legal proceedings. Similar with other types of forensic, digital forensic must be

Proceedings of Regional Conference on Knowledge Integration in ICT 2010

354

performed by experts who have knowledge and experience because this issue will be questioned during cases prosecution. 2.2. Digital Evidence Digital forensic is an application of forensic science intended to preserve, collect, validate, identify, analysis, interpret, document, and present digital evidence which is obtained from digital sources such computers and PDAs in order to facilitate or further reconstruct suspected criminal events or help to anticipate malicious activities (Palmer, 2001). To achieve these objectives, digital evidence must be obtained properly without damaging or altering the original source of evidence and at the same time must be as accurate as possible. Digital evidence mainly extracted from systems storage such as hard disk and memory card, and can be in many forms such as event logs, cache, readable documents, cookies, and temporary files. Since digital evidences integrity and confidentiality is crucial, it needs to be stored in a secured and trusted medium. There are five rules of forensic evidence which are applicable to digital evidence such as admissibility (usable in the court or elsewhere), authenticity (relevant relationship of the evidence with the incident), completeness (consideration and evaluation of all available evidence), reliability (no doubt on the evidences authenticity and veracity), and believability (clear presentation, ease of understanding and believable) (Li & Seberry, 2003). However, for digital evidence, integrity needs to be addressed in the basic rules of evidence because digital evidence is not presented physically which therefore needs to be verified digitally using hash or checksum and presented using appropriate tools or computer applications in the court. 3. Disk Imaging Technology Disk imaging operates below the file-system layer where it does modify not the file-system accounting time such as modification time, access time, and creation time (Tan, 2001). The reason to not disturb the file-system accounting is to preserve the integrity of the evidence. Unlike normal backup, imaging process will also capture deleted data which resides on the disk. This is one of the key characteristics of disk imaging because potential evidence might be deleted. However, the weakness of disk imaging is the image size will be exactly the same as the original. Although compression technique is applicable to the image, the actual size of the image will remain as it is after extraction. For example, the original image size is 250GB, therefore will produce 250GB of image file. If compression technique is applied, the size can be reduced to 100GB provided that not all space inside the original disk is used. But after being extracted, the compressed image will be completely restored and will produce 250GB of size, same with the size before compression. Disk image compression is highly demanded nowadays as the capacity of the hard disk is increasing rapidly. It is quite troublesome to prepare a large storage to store all images if the image is not compressed. The advantage of using compression technique is disk space optimization and in contrast will consume more CPU resources than normal imaging. Compressed image needs to be decompressed, if a single byte is lost during distribution, decompression process will be stalled until the byte is successfully acquired (Hibler et al., 2003). To reduce this risk, image normally will be splitted into several archives and compressed independently to ensure more robust distribution and more efficient installation at the expense of the sub-optimal compression. By using this method, archives can be stored in separate places for convenience because it is not practical to store 250GB disk image as a

Proceedings of Regional Conference on Knowledge Integration in ICT 2010

355

file. In fact, forensic investigation normally will be performed by several investigators; therefore, it is recommended to assign each of them with different disk archive for efficiency. 3.1. Disk Imaging Performance In order to handle large hard disk capacity, imaging process should be faster without disregarding the integrity and accuracy of the image. Despite using file-by-file imaging, entire hard disk will be read once to analyze allocated space on the disk and imaging will be performed on allocated space only to speed up imaging process (Saudi, 2001). This is not a good approach because unallocated space sometimes contains remnant data which normally left by the applications programs or file system. Remnant data refers to leftover data in which it is originally a part of something. This data contains useful information that can be classified as valid evidence. Therefore, a good forensic disk imager will not ignore unallocated space to maintain the evidences reliability and usefulness. 3.2. Disk Image Integrity One of the base specifications of disk imaging tool is able to verify the integrity of a disk image file (Saudi, 2001). In order to do this, hash algorithms such as MD5 and SHA-1, or checksum such as CRC16 and CRC32 will be calculated before and after the disk image duplication process is completed and will produce hash value or checksum. Both values will be compared with each other to check for difference bits. These algorithms rely on the bits content in the file, and in case one single bit is altered, unique hash value will be changed totally. This is useful to prove that the image file is an exact duplication of the original hard disk which makes it valid to be used for forensic purposes. However, there is the possibility that the hard disk data will be tampered by the authorized parties before imaging process is done. Therefore, the process of disk imaging should be supervised and documented by authorized parties in order to reduce this possibility. Another method is to use bootable operating system CD/DVD to perform disk imaging because it prevents data inside the hard disk from being modified or deleted. 4. Test Preparations There are several things that have been prepared before conducting the actual test. First, the test system is prepared which is a personal computer (AMD XP Athlon 1800+ 1.53GHz, 1.00GB of RAM, SUSE Linux Enterprise Server 10 Service Pack 2). Second, the tools are installed and configured properly to ensure they will run as smooth as possible. Third, the source is prepared which is a 20GB partition of hard disk (IDE/NTFS). Fourth, the destination is prepared which is an external hard disk of 120GB (USB to SATA/NTFS). And lastly, shell script is created for each tool to allow automated test with several other commands are timed to run simultaneously during the imaging process. To be accurate, all the caches will be cleared when the shell script loads. This method ensures that no data from previous test are left inside the cache which may compromises the test results. 5. Results Analysis Basically, there are two main variables measured in this research; the imaging rate and the completion time. Each tool is executed twice, with default mode where no additional options are used, and with applicable forensic options used. The comparison is done between the modes used within the tool. The imaging process performance is mostly dependent on the type of data transfer medium such as (IDE or SATA) and the speed of the hard disk (rpm). In this case, the imaging process performance is limited because it is being determined by the limitation of USB data transfer medium ragardless how fast the input and output rate.
Proceedings of Regional Conference on Knowledge Integration in ICT 2010 356

5.1. dd imager dd imager is a native imaging tool originally built for UNIX/LINUX system. This tool is used to perform imaging on hard drive or partition mounted in the system. It is recognized as the source of most disk imaging tools created nowadays. The good thing about this tool is no installation is required as it is pre-installed in most of Linux OS and the bad thing is it only allows clean imaging without preserving the confidentiality and integrity of the image file for digital forensic purpose. Table 1: Comparison between default and speed mode for dd imager Default Mode Speed Mode 20GB 20GB Source Size 512 Bytes 4096 Bytes Block Size 40965687 5120710 Total Block 4482 seconds 942 seconds Completion Time 4.7 MB/s 22.3 MB/s Average Imaging Rate As shown in Table 1, the completion time is reduced significantly and average imaging rate is increased greatly by approximately 80% when the block size is increased. This is because on every cycle, the number of blocks processed is fixed regardless how many bytes each of it holds. The reason why block size is set to be 4KB is because it has been tested and proven to be the minimum block size needed to get optimum average imaging rate. This means that if the block size is set to be 32KB, the average imaging rate will not be increased anymore. 5.2. dcfldd dcfldd is an enhanced version of dd imager for Linux system which is developed by U.S. Department of Defense Computer Forensics Lab. This imager works like dd with few additional features added to satisfy digital forensic standards. Significant features added include hashing on-the-fly whereby hashing is applied during data transfer, new user interface with progress bar, flexible disk wipes in which the disk content can be cleared quickly, image verification, simultaneous output, split output into multiple files, and piped logs and data into external application (Laverdire et al., 2009). The limitation of this program is that it can only produce raw image file without metadata (details of a file or folder) attached to it. Table 2: Comparison between default and forensic mode for dcfldd Default Mode Forensic Mode 20GB 20GB Source Size 32768 Bytes 32768 Bytes Block Size 640000 640000 Total Block 954 seconds 1575 seconds Completion Time 22.0 MB/s 13.3 MB/s Average Imaging Rate No SHA1 and MD5 Hashing No Yes Logs Table 2 shows the difference between default and forensic mode for dcfldd. There is 60% reduction of performance between both modes. This is because on-fly-hashing requires each block to be hashed by both algorithms (SHA1 and MD5) during the imaging process. This process consumes imaging time, thus, reducing the imaging rate. Logs such as error logs is
Proceedings of Regional Conference on Knowledge Integration in ICT 2010 357

generated at the end of imaging process, indicating the status of the whole process, and errors if exist, also hash logs are generated to store the calculated hash values. 5.3. dd_rescue dd_rescue is another enhanced version of dd imager, it does what dd does. The difference is the ability to continue the imaging process even if read errors occur, unless the maximum error number is specified. Another significant difference it the imaging process can be started forward or backward and enhanced user interface. Also, two block sizes are used which is large (soft) block size and small (hard) block size. When error occurs, it will switch to small block size to try to read the block with error and will be switched back to large block size automatically. Table 3: Comparison between default and forensic mode for dd_rescue Default Mode Forensic Mode 20GB 20GB Source Size 65536 Bytes 65536 Bytes Block Size (large) 512 Bytes 512 Bytes Block Size (small) 1004 seconds 1051 seconds Completion Time 20.4 MB/s 19.5 MB/s Average Imaging Rate No No Hashing No Yes Logs Table 3 shows slight decrease in performance between default mode and forensic mode. This is because in order to generate logs, the blocks being transferred need to be checked for errors. This will consume a little time compared to default mode. Since this tool does not have hashing, the imaging time and rate is almost similar with the other tools default mode result. 5.4. GNU ddrescue GNU ddrescue is an enhanced version of dd_rescue which copies data from source to another just like other disk imaging tools. The most significant feature of GNU ddrescue is the ability to pause and resume the imaging process without changing the structure of the image file (Rankin, 2009). Another feature of GNU ddrescue is the ability to rescue as much data that couldnt be read normally (read errors). Normally, read errors are caused by physical defect on the disk, if the data can be retrieved and stored to some other places, the error will most likely to be eliminated. GNU ddrescue normally does not truncate the output file; it will try to fill in the gaps when running on the same output file. Because of the nature that reads and writes data at random places, it only works on random access input and output file, and can use direct access to read the input file. GNU ddrescue is also an open-source tool developed under GNU General Public License (GPL) project. Table 4: Comparison between default and forensic mode for GNU ddrescue Default Mode Forensic Mode 20GB 20GB Source Size 512 Bytes 512 Bytes Block Size 128 blocks 128 blocks Copy Block Size 65536 Bytes 65536 Bytes Total Block Size 1034 seconds 1036 seconds Completion Time
Proceedings of Regional Conference on Knowledge Integration in ICT 2010 358

Average Imaging Rate Hashing Logs

20.3 MB/s No No

20.2 MB/s No Yes

Table 4 shows no reduction of performance between two modes because no other options of GNU ddrescue can be used for forensic mode except process logs. And normally, generating logs will consume only very little resource or affect the imaging process. 5.5. aimage aimage (advanced imager) is a disk imaging tool which is part of Advanced Forensic Format (AFF). AFF is a new file format which is almost similar with EnCase file format whereby it stores the disk image as a series of pages or segments. It also allows the image to be compressed optionally and allows metadata to be stored within the disk image or separately. Besides that, AFF has the ability image any file size into single disk image or split disk images (Garfinkel et al., 2006). Corrupted disk image can also be recovered using specifically designed internal consistency checker. AFF system has its own set of library which is called AFFLIB which implements a simple abstraction that makes the AFF image file appear as a persistent name/value database and as a standard file that can be opened, read, and seeked with library calls. AFF system is distributed with a set of tools which consists of a disk imaging program called aimage, an AFF metadata to XML converter program (afxml), a raw images to AFF images and back converter (afconvert) (Garfinkel, 2006). Table 5: Comparison between default and forensic mode for aimage Default Mode Forensic Mode 20GB 20GB Source Size 32768 Bytes 32768 Bytes Block Size 1291 seconds 2214 seconds Completion Time 16.6 MB/s 9.5 MB/s Average Imaging Rate No SHA1, SHA256 and MD5 Hashing No Yes Logs No Yes Verification Yes Yes Metadata Preservation No Yes Compression Table 5 shows reduction of performance by 58% between default mode and forensic mode. This is due to the implementation of hashing algorithms during the imaging process, verification after all blocks are imaged, and auto compression mechanism which will compress the image file automatically to increase the write process speed. 5.6. Overall Comparison Table 6: Comparison between forensic modes for all tools GNU dd dcfldd dd_rescue ddrescue Source Size (GB) Block Size 20 4096 20 32768 20 65536 20 65536

aimage 20 32768
359

Proceedings of Regional Conference on Knowledge Integration in ICT 2010

(Byte) Completion Time (seconds) Average Imaging Rate (MB/s) Hashing Logs Metadata Preservation Compression Verification 942 1575 1036 1051 2214

22.3

13.3 SHA1 and MD5 Other: SHA256, SHA384 and SHA512 Yes No No Yes

20.2

19.5

9.5 SHA1, SHA256 and MD5 Yes Yes Yes Yes

No No No No No

No Yes No No No

No Yes No No No

Table 6 shows summary of comparison between forensic modes of all tools. dd imager has the lowest completion time and highest imaging rate while aimage has the highest completion time and lowest imaging rate. However, in terms of digital forensic usage, aimage is the mo st suitable tool since it has additional capabilities such as hashing, logs generation, metadata preservation, compression, and verification. On the other hand, dcfldd is also suitable but not the best for digital forensic usage because it also has hashing, logs generation, and verification capability. 5.7. Summary of Imaging Speed of All Tools
25 20

Speed (MB/s)

15 10 5 0

Default Mode Forensic / Speed Mode

Figure 1: Column chart of overall imaging speed for all tools As shown in Figure 1, the most fastest imaging tool is dd and dcfldd (almost as fast). But the reason behind the speed is because no additional computations such as hashing, compression, metadata preservation, and logs generation are performed during or after imaging process. Both tools are suitable to be used for normal imaging as they provide accuracy and speed. For digital forensic purpose, speed is not as important as the integrity and confidentiality of the image file. In order to preserve integrity, few computations such as hashing, verification, and
Proceedings of Regional Conference on Knowledge Integration in ICT 2010 360

logs generation are required. This is why aimage (forensic mode) has the least speed compared to all other tools. dd (default mode) is neglected because it doesnt have implementation of digital forensic features and not suitable to be compared with aimage (forensic mode). 6. Conclusion All tests results may vary depending on the specifications of a particular system being used. However, the analysis results may be almost similar in terms of the reduction of performance between default mode and forensic mode. Surely speed is the main thing, but in digital forensic, the integrity and confidentiality of the evidence acquired is the most important thing besides speed. Based on the analysis, aimage is the best open-source disk imaging tool for collecting digital evidence while preserving the integrity and confidentiality. Its weakness is it doesnt provide a graphical user interface (GUI) to ease the user or investigator in performing the imaging process, similar to other open source tools. Another weakness is not all analysis tools support its file format (.aff) because it is quite new and the architecture is different than .dd or .raw file format. References Carrier, B. D. (2006). Risks of Live Digital Forensic Analysis. Communications of the ACM, 49(2), 56-61. Garfinkel, S. L. (2006). AFF: A New Format for Storing Hard Drive Images. Communications of the ACM, 49(2), 85-87. Garfinkel, S. L. et al. (2006). Advanced Forensic Format: An Open, Extensible Format for Disk Imaging. Advances in Digital Forensics II: FIP International Conference on Digital Forensics, National Center for Forensic Science, Orlando, FL. Hibler, M. et al. (2003). Fast, scalable disk imaging with frisbee. USENIX Annual Technical Conference, San Antonio, TX. Laverdire, M. et al. (2009). Ftklipse-Design and Implementation of an Extendable Computer Forensics Environment: Specification Design Document. Arxiv preprint arXiv:0906.2447. Li, X., and Seberry, J. (2003). Forensic Computing. INDOCRYPT 2003: 4th International Conference on Cryptology, ND, India. Palmer, A. T. N. (2004). Computer Forensics: The Six Steps, Retrieved February 15, 2010 from World Wide Web: http://www.krollontrack.co.uk/publications/UK%20EE%20Newsletter %20I1%20V3%20AP%20CF.pdf Rankin, K. (2009). Hack and /: when disaster strikes: hard drive crashes. Linux Journal, 2009(179), 12. Saudi, M. M. (2001). An Overview of Disk Imaging Tool in Computer Forensics. SANS Institute, Retrieved March 10, 2010 from the World Wide Web: http://www.sans.org/readi ng_room/whitepapers/incident/an_overview_of_disk_imaging_tool_in_computer_foren sics_643 Tan, J. (2001). Forensic readiness. CanSecWest Computer Security Conference, Retrieved March 15, 2010 from the World Wide Web: http://www.arcert.gov.ar/webs/textos/forensic _readiness.pdf

Proceedings of Regional Conference on Knowledge Integration in ICT 2010

361

Você também pode gostar