Você está na página 1de 6

Why Anti-virus Products Slow Down Your

Machine?
Wei Yan Nirwan Ansari
Trend Micro, Inc. New Jersey Institute of Technology
U.S.A. U.S.A.
wei yan@trendmicro.com nirwan.ansari@njit.edu

Abstract— Customers always complain that anti-virus soft- softwares. The other discontents, for examples, are long scan
wares bog down their computers by consuming much of PC time and false positives. Fortunately, industry companies have
memories and resources. With the popularity and variety of zero- accepted these complaints and improved their security applica-
day threats over the Internet, security companies have to keep
on inserting new virus signatures into their databases. However, tions. Symantec (www.symantec.com) successfully overhauled
is the increasing size of the signature file the sole reason to its system to make Norton products run faster in 2006.
drag computers to a crawl during the virus scan? This paper
outlines other three reasons for slowing down software-protected A. Is AV dead?
computers, which actually are not directly related to the signature Traditional emergency response teams involve malware col-
file. First, the rising time consumption of de-obfuscating binary
payloads by using the emulation technology requires anti-virus lection, signature generation, and signature database updating.
softwares take more time to scan a packed file than an unpacked However, owing to the flood of malwares, security companies
file. Second, New Technology File System causes self-similarity usually receive thousands of suspicious samples daily from
in file index searching and data block accessing. Even if file sizes honeypots and customers submissions. It is very time consum-
fit the log-normal distribution, there are still many “spikes” of ing and resource intensive for them to analyze these samples
high virus-scanning latency which cannot be ignored. Last but not
least, temporal changes in file size, file type, and storage capacity manually and generate signatures.
in modern operation systems are slowing down virus scan. The Is signature-based virus detection technology dead? There
paper also discusses the cloud-based security infrastructure for exist some concerns out there that this approach cannot catch
deploying a light-weight and fast anti-virus products. up with the flood of new viruses based on the fact that security
vendors usually update virus signatures every hour, or even
I. I NTRODUCTION
twenty minutes. However, most customers are not willing to
It is important to understand that the current threat landscape remove security softwares out of their machines because they
is changing and we have seen a large volume of new malwares still think these applications are worthwhile and must-have.
captured by security vendors each day. Why is this happening? Signature-based virus recognition has been used for more than
It is the online malware generators that enable script kiddies to two decades, and it is one of the most cost-effective and mature
easily create new viruses and rootkits, and challenge Anti-virus methodologies to detect viruses while keeping a low false rate.
(AV) pattern update schemes. For example, Panda Security The debate still goes on.
(www.pandasecurity.com), a security company, has detected One alternative solution is the whitelisting technology. Can
more samples in 2008 than in the previous more than 17 years whitelisting paradigm replace blacklisting? Blacklisting aims
combined. These threats came from softwares, appliances, and to store hash values or fingerprints of malicious programs
web services. This surge in malware infringement on Internet whereas whitelisting lists benign applications and system files.
security calls for urgent demands on security products. Almost all AV products use the blacklisting method, and
Generally speaking, an AV scanner is a software applica- the blacklist is actually the signature file. On the contrary,
tion for checking whether a computer has been infected by whitelisting-based tools only allow operating systems to access
spyware, rootkits, or other malwares. To search an executable benign files and websites, and always block non-listed names.
file for viruses, a scanner typically scans segments at certain At the time of writing, there are about millions of malwares
offsets for known signatures. It also automatically checks for listed in the blacklist, and tens of millions in the whitelist. If
threats in attachments received through emails, and any file op- security companies are already working around the clock to
erations. The signature file usually employs prior knowledge, cope with new blacklist samples, whitelisting protection might
and the scanner detects computer viruses via a scan engines. not be workable due to even more benign files appearing each
Moreover, automatic updates immunize users to defend against day.
new virus outbreaks.
Increasingly, the first thing computer users will do after re- B. Why my machine slows down?
installing operation systems is to install security softwares. The signature file can be considered as a malicious fin-
Then, they may notice slowdowns in their machines after the gerprint database which is updated frequently to cover the
installations; this is one of the top complaints about security latest threats. It works with the scan engine to detect threats.

978-1-4244-4581-3/09/$25.00 ©2009 IEEE


As malwares are becoming more complicated, the signature storage capacity in modern operation systems are discoursed
file is becoming larger and needs to handle various types in Section 4. Section 5 presents the concluding remarks.
of protections including detection, cleaning, and recovery.
Besides, these signatures will be loaded into the memory. II. U NPACKING AND E MULATION
Normally, the scan engine will take milliseconds to A. Code obfuscation
scan a file by traversing through the signature file. It will Security researchers are facing a great challenge in over-
not be surprised that a big signature file will drag down coming the complexity of malwares. It is no doubt that
computers tremendously. However, users always exaggerate Microsoft Windows seems to be the most heavily attacked
the downside of the signature file. They take for granted platform nowadays. Malwares are most commonly written
that a large signature file is the only main reason why for that platform as compared to that of Linux and Unix. A
AV products bog down their computers. As a result, they Portable Executable (PE) file is an executable mostly used by
often blame the security industry’s hesitation to adopt new Microsoft Windows. Reference [4] provides more information
technology to shrink the signature size. However, the PC Spy about the PE format. A PE file comprises various sections and
(http://www.thepcspy.com/read/what slows windows down/) headers which describe the section data, import table, export
had done an interesting testing to show how popular software table, resources, etc. It starts with the DOS header and PE
applications slowed down Windows. Besides anti-virus header. The PE header contains general file properties, such as
softwares, Fonts, Yahoo’s and AOL’s chat programs, .NET, the number of sections, machine type, and time stamp. Another
Visual Studio, and VMWare all slowed down computers important header is the optional header, which includes a set
quite a lot. This work even showed that 1000 Fonts had a of important information segments. The optional header is
bigger negative effect on the window load time than most AV followed by the section table header, which summarizes each
products. section’s raw size, virtual size, section name, etc. Finally, at
If the size of a signature file can be reduced as small as a the end of the PE file is the section data, which contains the
few years earlier, will the PC’s speed be almost as fast as it file’s Original Entry Point (OEP), which refers to where the
did before? In this paper, we outlines three other reasons of file execution begins.
slowing down virus scan, that are actually not directly related Conventional virus scanners search executable files in the
to the size of the signature file. signature database for pre-defined fingerprints. Unfortunately,
1) To evade detection, modern malwares are able to obscure this method can be easily defeated by packed or obfuscated
their fingerprints and to make themselves undetected. viruses. For example, hackers can use packers, which are
Portable Executable (PE) packers become the most softwares that compress and encrypt original payloads in
favorite binary tools for malware authors to instigate advance, and then restore them when loaded into memory, to
code obfuscation. Thus, it is essential for AV scanners scramble the malicious signatures from being detected. This
to support the emulation functionality, which can safely paradigm is inferred to as code obfuscation. Code obfusca-
analyze obfuscated malwares and then unpack their tion has evolved from simple compression and encryption to
payloads. Yan et al. [1] discussed three approaches to polymorphism and metamorphism. Currently, packers become
cope with packers. However, malware emulation is very the most favorite toolkits to bypass security applications.
slow and expensive because it lets an executable file For example, Armadillo (www.siliconrealms.com), Themida
run within a virtual environment implemented by the (www.oreans.com), and Obsidium (www.obsidium.de) are all
software instead of the hardware. commonly used packers.
2) By hiding themselves deep into operating systems by Therefore, it is vital for security products to be able to
using the rootkit technology, modern malwares can unpack and inspect original payloads hidden inside packed
completely bypass personal firewalls and anti-virus scan- programs. Unpacking is the process of stripping packer layers
ners [2]. In this paper, we will demonstrates how low- and restoring the original contents. Normally, a software,
level file operations can propagate self-similarity. This called emulator or sandbox, is developed to construct a vir-
burstiness is caused by the Microsoft’s New Technology tual environment, where the emulator can “execute” packed
File System (NTFS) data accessing algorithm, and will programs until they are fully decrypted or unpacked.
give rise to large scanning latencies.
3) The study in [3] showed temporal changes in the file B. Unpacking Obsidium
size, file number, and storage capacity have increased Reverse Engineering (RE) has become an important ap-
over the past years. Accordingly, security products proach to analyze a program’s logic flow and structure, such
which scan data proportional to the number and size as system call functions. However, RE is a time-consuming
of files will take much longer time. process of discovering the specifications of a system or pro-
The rest of the paper is organized as follows. Section 2 gram by analyzing its outputs and internal logics.
describes the code obfuscation, unpacking, and emulation. Obsidium is a Windows-based packer which encrypts PE
In Section 3, the rootkit hidden problem in the NTFS file files with advanced protection mechanism. Its unpacking
system is discussed, followed by the low-level file scanning process involves four consecutive steps: anti-debugging check-
work flows. Temporal changes in the file size, file type, and ing, memory-page encryption, import table rebuilding, and
jumping to OEP. Obsidium calls quite a few functions to detect A. Rootkit
debuggers, such as CheckRemoteDebuggerPresent(), Create- Malware authors usually prevent the AV engine from detect-
Toolhelp32Snapshot(), FindWindowA(), IsDebuggerPresent(), ing their malicious codes by hiding their files in the infected
and UnhandledExceptionFilter(). Runtime decryption is used systems. Rootkit is the technique to manipulate file system
by Obsidium as the encryption engine. Specifically, Obsidium and system calls so that certain files become invisible or
performs the decryption at the memory-page level. After inaccessible to regular users and AV scanners. In order to
decrypting a memory page and executing the corresponding achieve data hiding, rootkit uses Application Programming
assembly instructions, Obsidium wipes this page out right Interface (API) hooking at both the user level and kernel
away, and decrypts the next one. Therefore, it is very hard level. By intercepting system calls, replacing them with faked
to dump the whole original codes without debugging step-by- ones, and altering the execution paths, a rootkit is able to hide
step. The stages of import table rebuilding and jumping to files [2]. The presence of a rootkit compromises the reliability
OEP are similar to other complicated packers. For the import and the security of the operating system because attackers
table building, Obsidium inserts a large amount of junk codes can modify system environment variables, and hide malicious
to defend against RE. It also applies six different types of codes in hidden files and processes.
protection methods to hide import table data. Moreover, it Since rootkit works by intercepting API calls, a high-
takes advantage of fake OEP trick by stealing a segment of level view using Windows APIs will differ from the low-
codes around the original OEP and storing them somewhere. level (accessing disk data without calling APIs) view, if a
Hence, the scan engine has to discover those stolen codes first, rootkit resides in the system. So the mechanism of a rootkit
and then patch them back to rebuild the original OEP. detection is to list the file discrepancies by comparing results
of API high-level scanning with low-level scanning. Therefore,
C. Emulation speed
understanding how NTFS fetches disk data at low-level is
Despite its power and potentials, emulation cannot be heav- critical for developing such rootkit scanner and integrating its
ily used by AV products, mainly because of complexities of function into security softwares.
implementing a fully virtual environment, and also because of
its tradeoff in the speed. The emulator being used by the scan B. NTFS data accessing
engine is a software which simulates CPU hardware without The use of RE in NTFS structures and principles has been
affecting the actual computer environment so that the computer addressed by several researchers. For example, the Linux-
will not be infected with viruses. NTFS project [5] was developed to create a new Linux
However, the core problem is that the emulation is very slow kernel driver for NTFS, user space utilities, and a function
because the emulator has to interpret assembly instructions library. Ragar [6] presented the details of writing kernel-
one by one. Unfortunately, as more and more new malwares mode Windows NT file-system drivers. Files play a key role
are packed or polymorphic, they mutate themselves as they in Windows systems, and constitute the largest percentage
spread around so that no two copies will share the same codes. of the hidden objects in NTFS. In this section, NTFS file
To perform de-obfuscating, an emulator first needs to parse accessing mechanisms and low-level file scanning work flow
PE internal structures to locate OEP. Then, it will go through are introduced.
the decompressing or decrypting routines to dump original Everything on an NTFS volume exists as a file record. NTFS
instructions in the memory, and to execute these codes. uses B-tree to index file record data, which allows the efficient
As compared to spending milliseconds to scan a unpacked insertion, retrieval and removal of those file records. For
malware, sometimes the emulator needs up to minutes to example, NTFS can quickly list all the files’ sizes, modified
emulate a packed file; this is not tolerable for in-the-fly dates and types in an ordinal order under a certain directory
protection. If the scanner could also emulate an obfuscated without accessing their real data. When an NTFS volume is
sample for only milliseconds, it might not collect enough formatted, metadata files are created, containing Master File
information to determine whether the sample is malicious or Table ($MFT), $BITMAP, $BOOT, etc. For example, $MFT
not. On the other hand, if a suspicious sample is given seconds contains the descriptions of metadata, and the attributes of all
or even minutes to get a “wild run”, desktop machines will the files and directories. Every file record in $MFT stands for
slow down dramatically. Therefore, in this aspect, even if the a file or directory, and if a file is small enough, its actual
size of signature file remains the same as before, the scan time data will be stored directly in the record itself. Otherwise, a
will not be as fast as before. file index is saved instead. A file’s attributes, both resident and
non-resident, can be accessed by traversing the MFT table. The
III. V IRUS S CANNING IN NTFS F ILE S YSTEMS most important attributes include the file name, data, index
Current popular file systems include New Technology File root, and index allocation attributes. The file name attribute
System (NTFS) for Windows, Third Extended Filesystem contains the file’s both long name and MS-DOS short name.
(ext3) for Linux, and Hierarchical File System Plus (HFS+) NTFS allows multiple data attributes in one file record, which
for Mac OS. Since Microsoft Windows is the dominant and makes the data attribute to be the most suitable place for a
the most heavily attacked operating system, the scope of this hacker to hide their malicious files. Finally, index root and
paper is limited to NTFS. allocation attributes are used to implement folders and other
indices. Since NTFS uses B-tree to access files, directories are where
indexed for quick searching by the index entries.
Wk = (X1 , X2 + . . . + Wk ) − kX(n), k ≥ 1 (6)
index index file record Self-similar traces satisfy
root allocation new file
R(n)
E[ ] ∼ nH , 0 < H < 1 (7)
file data S(n)
$ROOT index B+ tree scanning file In this section, NTFS file systems were scanned within a
comparing small scale network. The data were collected from four hosts.
file data
Most file size values range from 256B to 512kB. Our results
show that their frequencies fit the log-normal distribution and
boot sector listing API discrepancy only the distribution tail presents self-similar behavior at a
file calls low bursty degree, which is similar to the work described in
[8]. Table 1 shows the input directory, file number, and the
Fig. 1. Low-level file scanning measured Hurst parameters of the input traces. For example,
for the input trace of “system32” directory with 5895 files,
Fig. 1 shows the work flow of an NTFS low-level file the variance-time measured H is 0.612107. The input trace of
scanning tool. The scanner first reads the NTFS volume’s “F:” drive variance-time measured H is 0.679351.
boot sector, which stores the start address of the $MFT table.
The $ROOT file record in the $MFT table contains the root TABLE I
directory information. From there, the scanner reads the index I NPUT T RACES FOR S IMULATIONS .
root (root node of the B+ tree) or index allocation attribute, Input Input File Variance-time Measured
which is the basic component of an index. In NTFS, a directory traces directory number Measured H R/S H
is a sequence of index entries. Thus, the specific file record Trace 1 system32 5895 0.612107 0.632398
can be accessed from its index. Finally, the file’s content can Trace 2 F: 101781 0.679351 0.595343
Trace 3 system32 4940 0.608519 0.630199
be accessed and copied to a new file, which is scanned by an Trace 4 E: 6055 0.667326 0.631928
AV scanner.
C. Burstineess and latency in NTFS file operations Three file operation events are defined: listing, scanning,
Given a stationary time series X(t), t ∈  , where X(t) is and content comparing. First, starting from the index B+ tree
interpreted as the traffic at time instance t, the aggregated X m root node, all the file names from a directory or even a whole
of X(t) at aggregation level m is defined [7] as raw disk can be listed in alphabetical order one by one. By
km
 comparing with the query results of high-level API calls, file
1
X m (k) = X(t) (1) name discrepancies could be found. Second, based on the
m index entry and file record, the corresponding file’s raw content
i=km−(m−1)
can be accessed. Finally, to detect malwares at the deepest
That is, X(t) is partitioned into non-overlapping blocks of
level, the file raw content was compared with the results of
size m; their values are averaged, and k indexes these blocks.
API calls again for any content discrepancies.
Denote rm (k) as the auto-covariance function of X m (k). X(t)
is called self-similar with Hurst parameter H(0.5 < H < 1), TABLE II
if for all k, m ≥ 1, I NPUT T RACES FOR L OW- LEVEL F ILE P ROCESSING .
V ar(X m ) ∝ m−β (2) trace list list scan scan compare compare
v-t H R/S H v-t H R/S H v-t H R/S H
and
1 0.764 0.739 0.840 0.824 0.852 0.797
rm (k) → r(k) as m → ∞ (3) 2 0.682 0.742 0.823 0.752 0.847 0.869
3 0.736 0.732 0.823 0.826 0.827 0.775
The variance-time plot and R/S plot are two of the most 4 0.692 0.738 0.746 0.701 0.850 0.883
commonly used methods to calculate the Hurst parameter,
H. The variance-time plot is based on the slowly decaying
For the low-level file processing, the searching time depends
variance of a self-similar trace. From Equation 2:
on both file locations in B-tree and file content sizes. We have
log(V ar(X m )) = c − βlog(m) (4) showed that the file listing, scanning, and comparing time
This plot is called variance-time plot with H = 1 − β distributions are not log-normal. In [5], the B-tree searching
2.
Given a series of observations X(t), t ∈  with mean X(n) mechanism is expounded in details. Here, the inter-arrival file
and sample variance S 2 (n), events were demonstrated to present high self-similarity in
those event traces, whose distribution burstiness cannot be
R(n) 1
S(n) = S(n) [max(0, W1 , W2 , . . . Wn )− smoothed by averaging over a large time scale. For the four
(5)
min(0, W1 , W2 , . . . Wn )] traces shown in Table 2, we have listed their measured Hurst
parameters of listing, scanning, and comparing event traces. It been increased from 108k to 189k over the past four years.
is clear that they have much higher bursty degrees. As a result, we expect that users have to wait longer for on-
To our best knowledge, our simulation [9] was the first to access scan owing to the grown average file size.
provide the evidence for the bursty feature in the high-level On the other hand, the findings in [3] also showed that files
file system input and output events, that is caused by the deeper in the index tree tend to be smaller whereas more and
pareto-distributed NTFS file index searching and data block more large files will reside in shallow levels. Since Trojan
accessing time. This conclusion explains delay discrepancies and Internet zero-day malwares are generally much smaller
of file scanning well. AV scan engines usually enumerate in size than other types of viruses, their corresponding index
files by calling Windows APIs, such as FindFirstFile() and searching time and data block accessing time tend to be a little
FindNextFile(), which then will instead enumerate disk blocks bit longer.
by using the NTFS low-level approach. Therefore, during the
virus scanning in NTFS file systems, even if the trace of V. D ISCUSSION AND C ONCLUSION
the file size fits the log-normal distribution, there are still
many “spikes” of high virus-scanning latency which cannot A countermeasure to speed up the virus scan is to move
be ignored. Furthermore, this kind of scan delay has nothing AV functionality from the user desktop into the cloud. AV
to do with the size of signature file, but is only related to In-the-Cloud service is becoming the next-generation security
how Microsoft designs and implements NTFS file accessing infrastructure designed to defend against virus threats. It pro-
algorithms. vides reliable protection service delivered through data centers
worldwide which are built on virtualization technologies.
IV. W INDOWS : THE S YSTEM THAT S LOWS D OWN
W INDOWS
Windows system metadata has been changed in recent years.
Does this trend have any effect on virus scan? Metadata de-
scribes a set of characteristics of files and directories existing
Anonymous system
in the file system. It contains features including: file size, Entrance Exit node
number, timestamps, attributes, etc. Authors in [3] collected node
annual snapshots of file-system metadata from over 60000
Windows PC file systems. Their results showed how NTFS
file system metadata changed from 2000 to 2004. Table 3 AV desktop agent Signature
database
summarizes their research observations of a few important
properties. `
User
AV cloud servers
TABLE III
C HANGES OF F ILE S YSTEM M ETADATA . Fig. 2. Anti-virus In-the-Cloud infrastructure.

2000 2004 effects on AV products AV In-the-Cloud service has been advocated as the
file number 30k 90k on-demand scan
next-generation model for virus detection by Trend Micro
file size 108k 189k on-access scan
directory number 2400 8900 on-demand scan (http://www.trendmicro.com) and other AV vendors since
storage capacity 8G 46G on-demand scan June, 2008. It is a software distribution model in which
security services are hosted by vendors and made available
In AV products, on-demand scan is one of the main scan to customers over the Internet. This approach employs a
types, and is a full search and scan in the file system. On- cloud server pool which analyzes and correlates new attacks,
demand scan is at the file level, and it scans all files in the and generates vaccinations online. The cloud infrastructure
hard disk. Whenever virus signatures are updated, users are will sharply reduce computation burdens on the clients, and
recommended to start the on-demand scan to make sure that enhance security products in mitigating new malwares. Fur-
all files are checked with the latest signatures. As shown in thermore, customers only need to maintain a small and light-
Table 3, the mean value of the number of files in the NTFS weight version of a virus signature file instead of the full copy.
file system has grown from 30k to 90k, implying that on- Benefits include easy deployment, low costs of operation, and
command scan will take much more time. In addition, the fast virus detection.
number of directories and the total storage capacity of the Fig. 2 shows the architecture of AV In-the-Cloud service.
whole file system have also increased steadily; this also drags The agent is an on-access scanner deployed at the desktop. It
down machines further. places itself between the applications and the operating system.
On-access scan is another mainstream type of scan imple- The agent automatically examines the local machine’s memory
mented inside the virus scanner. It continually monitors PC and file system whenever these resources are accessed by an
memory and any on-access file operation. The speed of on- application. For any suspicious file, the agent generates the
access scan is largely dependent on the specific size of the hash value or a specific signature of the file, and sends it to the
accessed file. It was observed that the average file size has remote cloud server for security verification. The low-latency
anonymous communication network is used to forward these
requests from the desktop to the remote cloud.
Our work is motivated by the need of explanation why
AV softwares drag down users’ computers. In this paper,
we have showed that the large signature file is not the only
reason for the slowdown. The virtual emulation widely used in
security products has required AV scan engine more time on
de-obfuscating polymorphic viruses than unpacked ones. On
the other hand, low-level NTFS file operations and the recent
changes of file system metadata also delay both on-command
and on-access scan time.
R EFERENCES
[1] W. Yan, Z. Zhang, and N. Ansari “Revealing packed malware,” IEEE
Security and Privacy, vol. 6, no. 5, pp. 65-69, Sep/Oct, 2008
[2] C. Kruegel, W. Robertson, and G. Vigna, “Detecting Kernel-Level Rootk-
its Through Binary Analysis”, Proceedings of 20th Annual Computer
Security Applications Conference, pp. 91-100. Tuscon, AZ, December
2004.
[3] N. Agrawal, W. Bolosky, J. Douceur, and J. Lorch, “A five-year study
of file-system metadata,” Proceedings of the 5th USENIX conference on
File and Storage Technologies, p.3-3, San Jose, CA, February 2007
[4] http://msdn.microsoft.com/msdnmag/issues/02/02/PE/
[5] Linux-NTFS Project, NTFS Documentation, http://www.linux-ntfs.org
[6] R. Nagar, Windows NT File System Internals, O’Reilly, 1997.
[7] W. Leland, M. Taqqu, W. Willinger and D. Wilson, “On the self-similar
nature of Ethernet traffic”, IEEE/ACM Transactions on Networking, vol.
2, no.1 pp. 1-15, 1994.
[8] J. R. Douceur and W. J. Bolosky, “A large-scale study of file-system
contents”, Proceedings of 1999 ACM SIGMETRICS Conference on
Measurement and Modeling of Computer Systems, pp. 59–70, Atlanta,
Georgia, June, 1999.
[9] W. Yan, “Revealing Self-similarity in NTFS File Operations,” poster
paper, Proceedings of the 7th USENIX Conference on File and Storage
Technologies, San Francisco, CA, February 2009

Você também pode gostar