Você está na página 1de 19

1

On The Establishment of Defender’s Reputation


Against Insider Attacks
Nan Zhang, Wei Yu, Xinwen Fu, Sajal K. Das

Abstract—We address issues related to establishing a de- of inside attackers [6], in particular the fear of detection. We
fender’s reputation in anomaly detection against two types of focus on this approach in the paper.
attackers: (i) smart insiders, who learn from historic attacks Significant research has been done on the game-theoretic
and adapt their strategies to avoid detection/punishment, and (ii)
naı̈ve attackers, who blindly launch their attacks without knowl- modeling of intrusion detection and network security systems
edge of the history. In this paper, we propose two novel algorithms [6]–[12]. It has been frequently observed that an attacker’s
for reputation establishment - one for systems solely consisting strategy depends on not only its own objective but also the
of smart insiders, and the other for systems in which both smart “toughness” of the defender (i.e., the defender’s willingness
insiders and naı̈ve attackers are present. Theoretical analysis and to enforce security even in the expense of false positives).
performance evaluation show that our reputation-establishment
algorithms can significantly improve the performance of anomaly However, to the best of our knowledge, meager attention has
detection against insider attacks in terms of the tradeoff between been paid on the establishment and maintenance of defender’s
detection and false positives. reputation of toughness in order to force the attackers to
abandon their attacks. Traditionally, the reputation concept
was introduced to model the trustworthiness of a suspicious
I. I NTRODUCTION party (from the perspective of an honest party), and has been
Insider attacks are launched by malicious users who are used in a game-theoretic framework to enforce cooperation
entrusted with authorized (i.e., insider) access of a system. in distributed systems [13]. In this paper, we investigate the
From stealing corporate data to propagating malware, it is establishment of a defender’s reputation of toughness (from
known that insider attacks form 29% of all reported electronic the perspective of the attackers) and its impact on the intrusion
crimes [1] and lead to massive damage [2]. detection against insider attacks.
A lot of research has been conducted on anomaly detection We consider the interactions between the defender and
against insider attacks [1], [3], where the objective is to detect (numerous) attackers as a repeated game. Our basic idea is for
deviation from a pre-determined model of normal system the defender to carefully choose its strategy in the beginning
behavior. However, since an insider has authorized access and to establish a desired reputation of toughness (i.e., willingness
extensive knowledge of the victim system, it can be more dif- to detect and punish attackers even with high cost of false
ficult to separate normal system behavior from insider attacks alarms), which may force the upcoming attackers to drop their
than the external ones. Because of this, experimental studies attacks, and lead to lower cost in the long run. A typical real-
show that many generic approaches for anomaly detection world example of this strategy is the police surge against
suffer from a high false positive rate in order to effectively criminal activities aimed at intimidating potential criminals
detect insider attacks [3]. and reducing crimes further. The theoretical foundations of this
There exist two possible approaches to tackle this challenge. strategy, especially the role of reputation in repeated games,
The first approach is to focus on the behavior of inside attack- have been studied in the literature of game theory [14], [15].
ers and design new anomaly detection techniques. Most of the In this paper, we investigate the application of reputation
existing research belongs to this category. Unfortunately, the establishment to defend against insider attackers. To suc-
state-of-the-art cannot yet provide a satisfiable solution [3]– cessfully establish a reputation of toughness for thwarting
[5]. The other approach is not to revise the existing anomaly insider attacks, the defender faces the following two critical
detection techniques, but to build upon them novel game- challenges:
theoretic techniques to exploit the weakness of the objective • First, the defender does not have perfect knowledge
about the outcome of its interactions with the attackers,
Nan Zhang is with Department of Computer Science, George Washington or even the existence of such interaction at all. For
University, Washington, DC, USA. Email: nzhang10@gwu.edu. Wei Yu is example, unlike the real-world criminal defense scenario
with Cisco Systems, Richardson, TX, USA. Email: weyu@cisco.com. Xinwen
Fu is with Department of Computer Science, University of Massachusetts where most criminal activities (e.g., murder, robbery)
Lowers, Lowers, MA. Email: xinwenfu@cs.uml.edu. Sajal K. Das is with are immediately observable, in many computer security
Department of Computer Science and Engineering, University of Texas at applications the defender might not be aware of an attack
Arlington, Arlington, TX, USA. Email: das@cse.uta.edu.
This work was supported in part by AFOSR grant FA9550-08-1-0260, that has successfully evaded detection. Without such
NSF grants IIS-0326505, CNS-0721766, CNS-0721951, CNS-0722856, IIS- knowledge, the defender cannot determine when a round
0845644, CNS-0852673, CCF-0852674, and Texas ARP program under of interaction starts or ends, and thus cannot use many
grant 14-748779. Any opinion, finding, conclusion, and/or recommendation
expressed in this material, either expressed or implied, are those of the authors existing reputation-establishment strategies [16].
and do not necessarily reflect the views of the sponsors listed above. • Second, not all attackers are rational players as assumed
2

in most game-theoretic studies. Instead, a number of followed by the final remarks in Section IX.
them may be irrational or naı̈ve attackers who simply do
not care about being detected. A major threat posed by II. S YSTEM M ODELS
such “naı̈ve” attackers is that they can easily destroy any
This section introduces our system models from the per-
reputation established by the defender. Consider an inside
spective of game theory. We first give the notions and define
attacker which restrains its attack due to the defender’s
the participating parties, and then describe their strategies and
reputation of toughness. If a “naı̈ve” attacker attacks
objectives.
and evades detection (and the insider learns about the
outcome), then the insider may infer that the defender
is actually “softer” than its reputation, and thus restart A. Table of Notions
launching its attacks.
TABLE I
To address the above two challenges, in this paper, we TABLE OF N OTIONS
consider the real scenario where the defender can only observe
D Defender
detections but has no knowledge of the undetected attacks. The Ai Attacker
attackers may be a combination of “smart” ones, who adapt γ Tradeoff parameter set by D
their attacking strategies based on knowledge of the defensive lA (i) Loss of D due to attack from Ai
lF (γ) Loss of D from false positives given γ
mechanism to avoid detection, and also naı̈ve ones, who do 0 (·)
lF First-degree derivative function of lF (·)
not care about being detected. These two types of attackers uD Long-term utility function of D
may also collude to share their observations of the system. To uA (i) Utility function of every attacker Ai
δ Discount factor for D
this end, our contributions can be summarized as follows: βD Defender preference parameter for D
• When the system consists solely of smart insiders, βA Insider preference parameter for Attacker Ai
βR A random variable in place of βD for systems with naı̈ve attackers
we propose a new reputation-establishment algorithm β0 1/(|lF0 (β /(1 + β ))| + 1)
A A
(Algorithm A in Section IV) which adapts the defensive rD (i) Defender’s reputation of toughness from the view of Ai
strategy to significantly improve the tradeoff between p Probability of D being tough
γS βA p/(1 − p + βA p)
detection and false positives. In particular, our algorithm λI Arrival rate of smart insiders
establishes a reputation of toughness for the defender. λN Arrival rate of naı̈ve attackers

• When naı̈ve attackers are present in the system, we prove Table I shows a list of common notions which will be
a negative result that no defensive mechanism without defined and used in the later part of the paper.
the knowledge of undetected attacks can establish and
maintain a reputation of toughness for a substantial
B. Parties
amount of time. Nevertheless, we show that, even in
the presence of naı̈ve attackers, the defender could In a system with both the defense and attacks, there are two
still improve the tradeoff between detection and false types of parties: a defender D and m attackers A1 , . . . , Am .
positives by introducing uncertainty to its real objective, Without loss of generality, we consider each attack to be
and then establish a reputation for the uncertainty in a launched by a different attacker. Clearly, the defender is un-
periodic fashion. We present another novel reputation- trustworthy to the attackers, and vice versa. As we mentioned,
establishment algorithm (Algorithm B in Section V) the attackers may be classified into two categories: Smart
which instantiates this idea. attackers are afraid of being detected, and therefore make
the optimal attacking decisions based on their knowledge of
• We theoretically and quantitatively analyze the impact of the defensive mechanism. Naı̈ve attackers, on the other hand,
the above algorithms on improving detections and reduc- blindly launch attacks without fear of being detected. Many
ing false positives. Our evaluation results demonstrate the naı̈ve attackers are also naı̈ve technically, in that they only use
effectiveness of the proposed algorithms. existing attacking tools detectable by signature-based intrusion
detection techniques [17]. However, there may also be a small
The rest of the paper is organized as follows. Section II number of naı̈ve attackers which are technically sophisticated,
introduces our system models. In Section III, we first intro- and thus must be addressed by anomaly detection [18].
duce a baseline game-theoretic formulation of the repeated
interactions between the defender and the inside attackers and
then introduce the defender’s reputation. Section IV considers C. Different Parties’ Strategies and Outcomes of the Game
a system solely consisting of smart insiders but no naı̈ve at- 1) Defender’s Strategy: The defender D relies on anomaly
tackers and presents a new reputation-establishment algorithm detection to detect the incoming attacks. Although the specific
along with the theoretical analysis. In Section V, we deal with defensive strategy depends on the anomaly detection technique
a system with naı̈ve attackers and then propose and analyze used, in general the defender has to make a proper tradeoff
a new reputation-establishment algorithm. Performance eval- between the detection rate and the false positive rate. Recall
uation of our proposed algorithms is shown in Section VI. In that generally the detection rate increases as the false positive
Section VII, we briefly review the related work. In Section rate increases. Let us define γ ∈ [0, 1] to be a tradeoff
VIII, we discuss other issues related to attacks and defenses, parameter such that the higher the value of γ, the smaller
3

is the false positive rate and hence the smaller is the detection observe from historic interactions is only the outcomes but
rate. Without loss of generality, we normalize γ such that the not the strategies of the defender, i.e., the tradeoff parameter.
probability for detection of an attack is (1 − γ). Consequently, Nevertheless, the attacker may infer certain information about
when γ = 0, all attacks will be detected while a large number the strategy from the outcomes, as we shall further discuss in
of false positives will be issued. When γ = 1, no attack will the later part of this paper.
be detected while no false positive will be generated.
Note that although γ appears to be a threshold that controls
D. Parties’ Objectives
the detection rate, it is not restricted to modeling simple
threshold-based anomaly detection techniques. For profile- 1) Defender: The defender D has two objectives: i) to
based anomaly detection techniques such as the ones that use detect as many attacks as possible, and ii) to reduce the number
data mining or machine learning techniques [5], [19], γ can be of false positives. For each attacker Ai (1 ≤ i ≤ m), let
considered as a measure for the output of such algorithms (e.g., lA (i) ∈ {0, 1} be the loss of the defender associated with Ai :
a minimum threshold on the predicted probability generated 
 1, if Ai launches an attack that is undetected;
by Bayesian detection algorithms [19]). lA (i) = b, if Ai launches an attack that is detected;
A user-controllable tradeoff parameter like γ is featured in 
0, if Ai abstains.
many real-world statistical-based anomaly detection systems.
For example, the known SPADE (Statistical Packet Anomaly We briefly elaborate on the value of b when an attack is
Detection Engine) plug-in of Snort [20], [21] allows system detected. We assume b ≥ 0 to capture the potential cost for
administrators to specify a threshold based on the anomaly the defender to repair damages caused by the detected attack.
score derived from the source IP, source port, destination IP, We also have b ≤ 1, because an undetected attack leads to
and destination port of a monitored packet. Such a parameter even greater damage (otherwise the defender could simply
is important for providing the flexibility for a system admin- abandon any detection effort). According to the definition
istrator to make a proper tradeoff between detection and false of the tradeoff parameter γ (given in Section II-C), if Ai
positive rates. chooses to attack, then the expected loss of the defender is
2) Attackers’ Strategy: The attackers’ strategy also varies E(lA (i)) = γ +(1−γ)·b. If Ai chooses to abstain, lA (i) = 0.
by the specific attack(s) launched. To form a generic model, we Let us now consider the loss of the defender from false
consider a simple attacker strategy which determines whether alarms viewpoint. Notice that the labor-consuming process of
or not to launch an attack. An attacker may opt for a mixed manually handling false alarms is one of the most important
strategy [22] which randomly chooses between the two choices challenges for intrusion detection systems. The more false
according to a probability distribution. On the other hand, a positives the anomaly detection module generates, the more
naı̈ve attacker always launches the attack because it has no is the resource (e.g., manpower) the defender has to spend in
concern over detection. order to distinguish between false positives and real attacks,
3) Outcome of the Game: The strategy combination of thus punishing only real attackers. As such, false positives
the defender and an attacker determines the outcome of their also lead to losses of the defender. In particular, with a given
interaction. In this paper, we consider the following four anomaly detection algorithm, such losses are independent of
possible outcomes: whether there is an attack, but solely depends on the value of
• An attack is launched and not detected; γ. Let lF (γ) ∈ [0, 1] be the loss from false positives in one
• An attack is launched and detected; time slot with the tradeoff parameter γ. We have lF (0) = 1,
• The attacker chooses not to launch the attack, no alarm lF (1) = 0, and lF (·) monotonically decreasing with γ.
is issued, and In the worst-case scenario, lF (·) may be a concave function
• No attack is launched but a false alarm is issued. which leads to an unsatisfiable tradeoff between detection and
In practice, other possibilities may exist. For example, the false positive rates. Thus, we focus on this scenario in our
defender might detect an attack but misidentify the type or discussions, but ensure our algorithms proposed in the paper
0
target of the attack. For the purpose of this paper, we assume to be generic to all possible functions lF (·). We use lF (·) to
that the defender makes a Boolean decision of whether an denote the first-degree derivative of lF (·).
attack has been launched or not. Therefore, there are only four For a given time period [0, tD ], the long-term utility function
of the defender is:
possible outcomes: launched/detected, launched/not detected,
tD n
not launched/detected (false alarm), and not launched/not X X
uD = −βD · (δ t · lF (γ(t))) − (1 − βD ) · (δ ti · lA (i)), (1)
detected. Only the first outcome can be observed by the
t=1 i=1
defender. An attacker, on the other hand, is able to observe all
possible outcomes based on its interaction with the defender. where γ(t) is the value of γ at time t; ti is the time when
As in many security studies, we consider the worst-case Ai interacts with the defender; and n is the total number of
scenario where an attacker may also learn about the outcomes attackers. There are two additional parameters: βD and δ:
of (historic) interactions between the defender and other • βD ∈ [0, 1], the preference parameter, measures the
attackers. This is particularly likely to happen for inside defender’s preference between detection and false pos-
attackers because they have the opportunity to observe the itives. The greater βD is, the more are the concerns the
status of the defense systems. We would like to note that, defender has on false positives. In particular, a defender
for the purpose of this paper, what an attacker may directly with βD = 1 does not care about the loss from undetected
4

attacks while a defender with βD = 0 does not care long-time player (i.e., the defender) and numerous short-term
about the loss from false positives. Intuitively, a lower players (i.e., the insiders). Since the attacker cannot observe
βD implies that the defender appears tougher from an the defender’s tradeoff parameter γ while the defender is
inside attacker’s perspective. unaware of an attack until/unless it is detected, the defender
• δ ∈ [0, 1], the discount factor [22], captures the fact that and the attacker can be considered as moving simultaneously
an entity generally values the present payoffs more highly in each interaction.
than future. In particular, the payoff is discounted by a To understand the impact of the defender’s preference
factor of δ after each time slot. In the extreme case where parameter βD to an insider Ai ’s strategy, we first consider
δ = 1, the defender treats present and future payoffs the case where Ai knows the exact value of βD . Recall that
0
equally. lF (·) is the first-degree derivative of the false positive function
Notice that in many real-world systems, the preference lF (·). For the simplicity of notation, define
parameter and the discount factor are inherent to the nature 1
of the defender and does not change over time. Also notice β0 = 0 ( βA )| + 1
, (3)
|lF 1+βA
that uD ≤ 0. For a given tD , the objective of the defender is
to choose the strategy that maximizes its expected long-term where βA is the insider’s preference parameter. The following
utility E(uD (tD )) (i.e., minimizes its expected loss). theorem depicts the unique Nash equilibrium of the game if
2) Attacker: In an insider attack scenario, once a defender the defender has no loss from detected attacks (i.e., b = 0).
detects an attack, it may easily identify the inside attacker We shall generalize the theorem to the cases where b > 0 in
and punish the attacker in various ways (e.g., fire, arrest, and a corollary.
prosecute the attacker). Thus, a smart insider aims to launch
an attack that will not be detected. In particular, we define its Theorem 1. When b = 0, if an insider Ai knows the exact
utility function as: value of βD , the unique Nash equilibrium of the game is
 formed by a mixed attacking strategy of
 1, if launches an undetected attack;
0 βA
!
uA = −βA , if launches a detected attack; (2) βD · |lF ( 1+βA
)|
Pr{Ai attacks} = min 1, , (4)
0, if abstains; 1 − βD

where βA is a predetermined insider preference parameter. and a defensive strategy of


Since an insider is afraid of being detected, we have βA > 0.
if βD ≥ β0 ;

The objective of a smart insider is to choose the strategy that 1,
γ= βA (5)
maximizes its expected utility. Note that no utility function is 1+βA , otherwise.
defined for naı̈ve attackers because they always launch their
Please refer to Appendix A for the proof of this theorem.
attacks and can be considered as being irrational (i.e.,they do
When b > 0, it is extremely difficult to derive an analytical
not choose the strategy that maximizes their utility).
expression of the Nash equilibrium for a generic false positive
III. D EFENDER ’ S R EPUTATION AS I NSIDERS ’ E STIMATION function lF (as in Theorem 1). Instead, we have the following
OF D EFENDER ’ S P REFERENCE PARAMETER
corollary on a special case where lF (γ) = 1 − γ.
Having defined the strategies and objectives of the defender Corollary 1. When b ∈ [0, 1], if an insider Ai knows the
and the attackers, we now consider the role of defender’s exact value of βD , the unique Nash equilibrium of the game
reputation in their interactions. Although an insider may learn is formed by a mixed attacking strategy of
the outcomes of historic interactions, it cannot directly observe 
βD

the defender’s preference parameter βD . The main objective Pr{Ai attacks} = min 1, , (6)
(1 − βD )(1 − b)
of this section is to study the impact of such uncertainty
(of βD ) on the strategy chosen by the insiders. To do so, and a defensive strategy of
we shall consider two cases: a baseline system where the (
1, if βD ≥ 1−b
2−b ;
exact value of βD is known to all attackers, and a more γ= βA (7)
1+βA , otherwise.
practical system where βD is hidden. The comparison between
these two systems will motivate us to define the defender’s The proof is analogous to that of Theorem 1.
reputation of toughness as the insiders’ estimation of βD . Note that since the Nash equilibrium in the theorem and
The concepts defined in this section will form the basis for the corollary represents a state where neither party can benefit
the reputation-establishment algorithms we shall propose in by deviating from its own strategy unilaterally [22], both Ai
Sections IV and V for different types of anomaly detection and the defender will choose their corresponding strategies
systems. specified in the Nash equilibrium. We can make the following
two observations from the Nash equilibrium.
A. Game-Theoretic Formulation First, a smart insider chooses its strategy based on the value
In order to formalize the defender’s reputation, we first of βD . In particular, when βD is “tougher” than β0 , the insiders
propose a game-theoretic formulation that models the inter- will be forced to drop some attacks due to fear of detection.
actions between the defender and the smart insiders as non- Second, unfortunately, the detection is not intimidating
cooperative, non-zero-sum [22], repeated games between one enough to force the insiders to completely abstain. In fact,
5

unless the defender is willing to accept the maximum tolerable be forced to abstain. This observation motivates our reputation-
level of false positives (i.e., γ = 0, βD = 0), the insiders will establishment algorithms proposed in the next section.
always launch attack with probability greater than 0.
These observations stand in contrast to what we are about C. Sequential Equilibrium
to analyze in the next subsection, for cases where the attackers To further illustrate the role of reputation in the repeated
do not know the exact value of βD . games between the defender and the attackers, we derive a
sequential equilibrium of the game when there is no naı̈ve
B. Defender’s Reputation attacker in the system. Similar to Corollary 1, we consider
Note that in a real system, an attacker cannot make a perfect lF (γ) = 1−γ and b ∈ [0, 1]. To simplify the expression of the
estimation of βD as in Theorem 1. This provides the defender equilibrium, we consider the case where a defender is either
an opportunity to delude the insiders to believe in a false value tough with βD = 0, or soft with βD = βs ∈ (0, 1]. The attacks
of βD , i.e., to establish a reputation desired by the defender. have prior belief of Pr{βD = 0} = p. Both the defender
We consider a generic definition of the defender’s repu- and all attackers have perfect knowledge of the outcomes of
tation. Intuitively, the defender should aim at establishing a interactions that have taken place. Let λ be the number of
tough reputation (i.e., βD < β0 ) in order to force the attackers attacks that have been launched.
to abstain. Suppose that a smart insider Ai is capable of Theorem 3. The following strategies constitute a sequential
observing the historic interactions between the defender and equilibrium of the game:
other (insider or naı̈ve) attackers A1 , . . . , Ai−1 . We define the • A tough defender sets γ = 0.
defender’s reputation as follows. • If no attack has escaped detection and p(βA +1)
λ+1
< 1,
Definition 1. The defender’s reputation of toughness from the a soft defender sets
view of an insider Ai is defined as βA
γ= . (10)
rD (i) = Pr{βD < β0 |outcomes for A1 , . . . , Ai−1 }. (8) βA + 1 − (βA + 1)λ+1 p
Otherwise, a soft defender sets γ = 1.
The greater rD (i) is, the tougher does the defender appear
• If there has been at least one attack that escaped detec-
from the perspective of Ai ; and the more likely it is that Ai
tion, an attacker always attacks. Otherwise, if p(βA +
will choose not to attack. We now analyze the probability of Ai
1)λ+1 < 1, an attacker attacks with probability
withholding its attack because of rD (i). In particular, consider  
a simple case where the defender is either tough enough to βs
Pr{Ai attacks} = min 1, . (11)
accept the maximum tolerable level of false positives (i.e., (1 − βs )(1 − b)
βD = 0), or soft enough to be unwilling to generate any of If p(βA + 1)λ+1 ≥ 1, an attacker abstains.
them (i.e., βD = 1). We prove the following theorem.
Please refer to Appendix C for the proof of the theorem.
Theorem 2. If βD is either 0 or 1, a smart insider will not Theorem 3 indicates that a soft defender may establish its
launch attack if and only if reputation of toughness by behaving tougher in the initial
1 interactions. Figure 1 depicts the change of γ and rD when λ
rD (i) ≥ . (9)
βA + 1 ranges from 1 to 10. The other parameters are set as p = 0.01
In this case, the unique Nash equilibrium of the game consists and βA = 0.5. As we can see, according to (10), the value
of of γ increases with λ, the number of launched attacks, until
γ reaches 1. Correspondingly, the reputation of toughness of
• The tough defender (with βD = 0) sets γ = 0.
the defender increases. When rD reaches 1/(βA + 1) (i.e.,
• The soft defender (with βD = 1) sets γ = 1.
p(βA + 1)λ+1 ≥ 1), due to Theorem 2, the attackers abstain.
• The smart insider chooses not to launch attack.

Please refer to Appendix B for the proof of the theorem. 0.8


γ
We can make two observations from Theorem 2. First, a smart 0.7 r
D
insiders’ uncertainty of βD provides the defender an edge. In 0.6
fact, the insider is forced to deterministically drop its attack 0.5
when the defender’s reputation satisfies rD (i) ≥ 1/(βA + 1).
γ or rD

0.4
The second observation is a consequence of the first one, 0.3
and is related to the change of rD (i) in repeated interactions.
0.2
Consider a system consisting of only smart insiders but
0.1
not naı̈ve attackers. As the game progresses, the value of
0
rD (i) may change. Nonetheless, once the reputation satisfies 2 4 6 8 10
λ
rD (i) ≥ 1/(βA + 1) and forces Ai to abstain, the defender’s
reputation from the view of Ai+1 will remain the same as Fig. 1. Change of γ and rD with λ
rD (i) - because no attack has taken place to change it. Thus,
in a system without naı̈ve attackers, as long as the defender can In comparison with the Nash equilibrium derived in Corol-
achieve rD (i) ≥ 1/(βA + 1) once, all subsequent insiders will lary 1, the main benefit a soft defender receives from the
6

1 reputation becomes tougher (i.e., rD (i) increases) when there


βA = 0.6

Probability of Reputation Establishment


βA = 0.7
is a detection; and becomes softer when there is a successful
0.8
βA = 0.8 attack. With our algorithm, the strategy of a soft defender is
βA = 0.9 to lower its tradeoff parameter γ in the early stage to detect a
0.6
large percentage of attacks, and thereby establish a reputation
of toughness.
0.4
Clearly, the reputation-building step should end as soon
as the defender’s reputation of toughness rD exceeds a pre-
0.2
determined threshold. However, note that, due to Bayes’
0
theorem [23], such reputation is determined by not only the
0.1 0.2 0.3 0.4 0.5
p number of detections but also the number of missed attacks,
which is unknown to the defender. Therefore, the soft defender
Fig. 2. Probability of Reputation Establishment vs p has to monitor the inter-arrival time of detections until it has
a high confidence that the reputation of toughness has been
properly established to the degree that prevents all subsequent
equilibrium in Theorem 3 is in the cases where the soft
insider attacks from happening (e.g., when rD (i) ≥ 1/(βA +1)
defender has γ = 1 while the attacker abstains. In this
in Theorem 2). At this moment, the reputation-building step
case, the reputation of toughness has been established for the
is concluded and the reputation-established step begins.
defender, and neither attack nor false positive would occur
In the reputation-established step, a soft defender capitalizes
in the system. One can see from Figure 1 that even if the
on the established reputation. In particular, it increases the
attackers’ prior belief of the defender being tough is extremely
tradeoff parameter γ to the maximum value such that, facing
low (e.g., p = 0.01 in the figure), the reputation effect may
the established reputation, an insider will choose to abstain
still predominate fairly quickly. Figure 2 shows the probability
(e.g., γ = 1 in Theorem 2). Clearly, by doing so, we can reduce
for the reputation to be established when βA ranges from 0.6
the number of false positives without sacrificing the detection
to 0.9 and p from 0.1 to 0.5. As we can see, the probability
of attacks. Since the reputation will not change without an
becomes higher when the attacker are penalized more from
attack, this value of γ can be maintained in a system without
detection (i.e., a larger βA ) or the prior belief of the defender
naı̈ve attackers. As such, a soft defender achieves much better
being tough is higher (i.e., a larger p).
tradeoff between detection and false positives in the long run.
While the sequential equilibrium derived in Theorem 3 theo-
retically verifies the effectiveness of reputation establishment,
it cannot be directly used in a practical anomaly detection B. Reputation-Establishment Algorithm for Systems Without
system. The reason is that the defensive strategy requires a naı̈ve Attackers
soft defender to know the number of launched attacks λ, which Based on the above two steps, we now present a detailed
is unavailable in practice due to reasons discussed in Section algorithm for reputation establishment.
1. In the following two sections, we develop our main ideas Note that in order to determine rD (i), we need to assume
of two practical reputation-establishment algorithms with the a prior distribution of βD . Instead of considering a specific
spirit behind the sequential equilibrium. type of distribution which may not hold in practice, we again
consider only a single parameter p ∈ [0, 1] such that the prior
IV. S YSTEMS W ITHOUT NA ÏVE ATTACKERS distribution of βD satisfies
In this section, we introduce a reputation-establishment Pr{βD < β0 } = p. (12)
algorithm for systems solely consisting of smart insiders (i.e.,
without naı̈ve attackers). The insiders may have knowledge of Both the defender and the insiders know p. If p > 1/(βA +1),
the reputation-establishment algorithm, but do not know (and no smart insider will launch its attack because the defender’s
have to learn) the defender’s preference parameter βD . We initial reputation rD (1) ≥ 1/(βA + 1). As such, we only need
first describe our basic ideas, and then present the detailed to consider the cases where p < 1/(βA + 1).
algorithm and the theoretical analysis. Algorithm A depicts our reputation-establishment algorithm
for systems without naı̈ve attackers. The reputation-building
A. Basic Ideas and reputation-established steps are corresponding to flags
STATUS = NULL and ESTABLISHED, respectively.
For the simplicity of discussions, in this subsection, let us For the simplicity of notation, let us define
call a defender soft if βD ≥ β0 , and tough if βD < β0 ,
where β0 = 1/(|lF 0
(βA /(1 + βA ))| + 1) is defined in (3). βA p
γS = . (13)
Our reputation-establishment algorithm for systems without 1 − p + βA p
naı̈ve attackers consists of two steps: reputation-building and In the reputation-building step, a soft defender assigns 1−γS to
reputation-established. Since the main objective is to establish the tradeoff parameter γ as (Line 6 in Algorithm A). A tough
the reputation of toughness for a soft defender, we first defender chooses γ = γS (Line 8 in Algorithm A). These
describe these two steps from a soft defender’s perspective: values of γ hold throughout the reputation-building step.
Note that, in general, a tough defender chooses a lower In Algorithm A, we consider the reputation of toughness
tradeoff parameter γ than a soft defender. Thus, the defender’s to be established when rD ≥ 1/(1 + βA ). The reason why
7

Algorithm A: Reputation-Establishment for Systems without have stopped attacking. In particular, to model the attack
naı̈ve Attackers frequencies, we follow a common assumption that the attacks
1: STATUS ← NULL. from insiders form a Poisson process with an arrival rate (i.e.,
2: repeat the number of attacks per time slot) of λI . We consider l0
3: if STATUS = ESTABLISHED then as a pre-determined threshold on the elapsed time. Since the
4: γ ← 1 if βD ≥ β0 ; γ ← 0 if βD < β0 . inter-arrival time of a Poisson process satisfies exponential
5: else if βD ≥ β0 then distribution, for any c ∈ (0, 1), the defender has confidence of
6: γ ← (1 − p)/(1 − p + βA p). c that the reputation has been properly established if
7: else if βD < β0 then
− ln(1 − c)
8: γ ← βA p/(1 − p + βA p). l0 = . (17)
9: end if λ I γS
10: Set γ as the tradeoff parameter. For example, the defender has confidence of c = 99% when
11: Wait until t > tD or an attack is detected. l0 = ln(100)/(λI γS ). When l0 is satisfied, the defender sets
12: if t > tD then STATUS = ESTABLISHED, and transfers from the reputation-
13: STATUS ← ESTABLISHED. building step to the reputation-established step.
14: else In the reputation-established step, a soft defender will set
15: tD ← t + l0 . γ = 0 while a tough one will set γ = 1 (Line 4 in Algorithm
16: end if A). Clearly, when rD ≥ 1/(1 + βA ), no insider will choose
17: until t > t0 AND STATUS = NULL to launch an attack to the system due to Theorem 2.
The time and space complexity of Algorithm A are both
O(1). We now show the transparency of Algorithm A to the
such a threshold can force all insiders to drop their attacks underlying anomaly detection algorithm. One can see from
will be clear after we discuss the reputation-established step Algorithm A that the only requirement for our reputation-
in a moment. Before that, we first briefly describe the possible establishment algorithm on the underlying anomaly detection
interactions in the reputation-building step: Initially, an insider technique is the existence of a parameter γ (Lines 10-11 in
will choose to attack because its expected payoff satisfies Algorithm A) which controls the tradeoff between the false
alarm rate and the detection rate. Other than that, our algorithm
E(uA (1)) = p(γS − βA (1 − γS ))+ considers the anomaly detection algorithm as a black box.
(1 − p)((1 − γS ) − βA γS ) > 0 (14) Thus, Algorithm A is transparent to the underlying anomaly
detection technique, and can be readily integrated with existing
according to p < 1/(βA + 1) and βA < 1. However, if the
anomaly detection systems against insider attacks.
very first attack happens to be detected, there is
p · (1 − γS ) 1
rD = = , (15) C. Theoretical Analysis
p · (1 − γS ) + (1 − p) · γS 1 + βA
We now present quantitative analysis on the performance of
and thus the reputation is established. If the first attack goes
Algorithm A in three theorems. First, Theorem 4 shows that
undetected, rD will decrease, and the insiders will continue
in a system without naı̈ve attackers, no insider will choose
launching attacks. However, (as we shall prove in the next
to attack if rD (i) ≥ 1/(βA + 1), which holds as long as the
subsection) whenever the number of detections exceeds the
number of detections exceeds the number of missed attacks.
number of successful attacks, there will be
Second, Theorem 5 shows a lower bound on the probability for
p · (1 − γS ) 1 the reputation to be established within time t. Third, Theorem
rD ≥ = , (16)
p · (1 − γS ) + (1 − p) · γS 1 + βA 6 shows that, with Algorithm A, the expected utility of a
and the reputation is established. Clearly, the probability of defender is significantly higher than that with the (single-
an established reputation increases with time. Nevertheless, interaction) optimal strategy derived in the Nash equilibrium
this also means that there is a possibility that the reputation in Theorem 1.
cannot be established after a long period of time, say a pre- Theorem 4. In a system without naı̈ve attackers, when the
determined timeout threshold t0 . In this case, the defender number of detections exceeds the number of undetected at-
has to abandon Algorithm A (Line 17) because after a large tacks, we have
number of interactions, the attackers may already have an
1
accurate estimation of the true value of βD . Fortunately, as rD (i) ≥ , (18)
shown by the quantitative analysis in the next subsection, this 1 + βA
case happens with a fairly small probability. and no insider will choose to launch its attack.
Next we consider how the defender makes the transition
Please refer to Appendix D for the proof of the theorem. Let
from the reputation-building step to the reputation-established
us now compute the probability of reputation being established
step. With Algorithm A, the defender determines whether the
at time t. Let erf(·) be the Gaussian error function. Recall that
reputation has been established by monitoring the time elapsed
λI is the arrival rate of smart insiders. We derive the following
since the last detection. Clearly, a prolonged period of time
theorem.
without detection is extremely unlikely unless the insiders
8

Theorem 5. With Algorithm A, the probability that rD ≥ Please refer to Appendix F for the proof of the theorem.
1/(1 + βA ) at time t < t0 is given by Note that for a soft defender,
uD (t)
tλI (1 − 2p)2
 
1 1 − 2γS limt→∞ tλI 1 − p − pβA
f (t) ≥ − · erf . (19) ≈ , (23)
2(1 − γS ) 2(1 − γS ) 8p(1 − p) S)
u0 (βD 1−p
In particular, which could be very small in practice. For example, in
Figure 3, when p = 0.5 and βA = 0.9, we have (1 − p −
lim f (t) ≥ pβA /(1 − p). (20) pβA )/(1 − p) = 0.1. After using Algorithm A, the loss of
t→∞
a soft defender is reduced to only 10% of the original loss
according to the (single-interaction) optimal strategy.
0.9

0.8 D. Remarks
Pr{STATUS = ESTABLISHED}

0.7 Based on the above discussions, we make the following


0.6
two key observations on a system which consists solely of
(informed) inside attackers:
0.5
• In such a system, the defender has a high probability
0.4
to successfully prevent all attacks from happening
0.3
β = 0.6
A
(by establishing a reputation of toughness) after a
0.2 β = 0.7
A few rounds of interactions. In particular, Algorithm A
βA = 0.8
0.1 has probability of at least pβA /(1 − p) to stop all attacks.
βA = 0.9
0
0 10 20 30 40 50
Time (t) • Once an inside attacker withdraws its attack, no subse-
quent insider will resume attacking the defender.
Fig. 3. Probability of Established Reputation with Time t
V. S YSTEMS W ITH NA ÏVE ATTACKERS
Please refer to Appendix E for the proof of the theorem.
Figure 3 depicts the numerical values of f (t) when λI = 1, This section deals with systems with naı̈ve attackers. Note
t varies from 0 to 50, p = 0.5, and βA = 0.6, 0.7, 0.8, and that naı̈ve attackers may bring significant damage to a system
0.9, respectively. As observed from the figure, the probability using Algorithm A. To see this, recall that without naı̈ve
of status being established increases fairly quickly with t. attackers, no insider will launch an attack in the reputation-
Thus, a soft defender can most likely establish its reputation of established step. However, a naı̈ve attacker will always launch
toughness after only a few attacks have been launched to the an attack regardless of the reputation; and, for a soft defender
system. For example, when βA = 0.9, there is 72% probability which has set γ = 1, the attack will always succeed. As such,
that no more than 10 attacks (corresponding to t = 10 in the a smart insider which colludes with (or observes) the naı̈ve
figure due to λI = 1) will be launched, and 84% probability attacker will realize that the defender is actually soft, and then
that no more than 50 attacks will be launched. We can also (re-)start attacking the system. Clearly, a naı̈ve attacker may
observe from the figure, the value of f (t) grows faster for destroy the reputation and enable future attacks from insiders.
higher value of βA . Intuitively, this means that an insider While it is intriguing to improve Algorithm A to address
which receives higher penalty from detection (i.e., has higher naı̈ve attackers, in this section, we shall provide a negative
preference parameter βA ) is more likely to abandon attack result which shows that no algorithm can hide the defender’s
while facing Algorithm A. preference parameter βD without significantly reducing the
Let u0 (βD ) be the expected utility of the defender in the utility of soft and/or tough defenders. Thus, no reputation
Nash equilibrium derived in Theorem 1. Based on Theorems 4 about βD could be maintained for a prolonged period of time.
and 5, we have the following theorem on the expected utility However, this does not mean that the reputation-
of the defender. establishment idea is useless for systems with naı̈ve attackers.
To show this, we will introduce a new reputation-establishment
Theorem 6. With Algorithm A in place, when t0 is sufficiently algorithm. This time the insiders have knowledge of not only
large, the expected payoff of a soft defender with preference the presence of the algorithm, but also the exact value of βD .
S
parameter βD satisfies Our algorithm introduces additional uncertainty (in particular,
uD (t) 1 − p − pβA a randomly generated parameter βR ) to the defender’s strat-
S
lim ≥ · u0 (βD ) − (1 − βD ) · e−l0 λI γS . egy, and then establishes a reputation on the new uncertain
t→∞ tλI 1−p
(21) parameter. We shall again provide quantitative analysis which
demonstrates the effectiveness of the new algorithm.
The expected payoff of a tough defender with preference
T
parameter βD satisfies
A. Negative Result
uD (t) T Before presenting the negative result, we first illustrate the
lim = u0 (βD ). (22)
t→∞ tλI rationale behind it. Note that since the defender can observe
9

nothing but detections, its strategy can be represented as a 1) Basic Ideas: Since the defender’s preference parameter
sequence of tradeoff parameters hγ1 , . . . , γn i, where γi is βD cannot be hidden for a prolonged period of time when
the expected value of γ between the (i − 1)-th and the i- there exist naı̈ve attackers in the system, we directly consider
th detections. Consider a tough defender with sequence γiT the worst-case scenario where βD is already known by all
and a soft one with sequence γiS . If the defender successfully attackers. Clearly, if we would still like to force the insiders
maintains a reputation on βD , then an insider should not be to drop their attacks, the defensive strategy (i.e., the tradeoff
able to distinguish between γiT and γiS . parameter γ) should not be uniquely determined by βD .
However, the insider can also make observations. In par- Instead, the defender must introduce additional uncertainty to
ticular, it may learn a sequence hd1 , . . . , dn i, where di is the its decision-making process, and then build a reputation of
number of successful attacks between the (i − 1)-th and i-th toughness around that uncertainty.
detections. Clearly, di can be considered as a random variable Our basic idea is for the defender to first “simulate” its
drawn from a distribution determined by γi . Due to the central preference parameter βD by a random variable βR , and then
limit theorem, given sufficient amounts of observations (i.e., determine its defensive strategy based on (the toughness of)
with a sufficiently large n), unless the distributions for γiT and βR instead of βD . In particular, we consider a mixed strategy
γiS are the same, the insider can always use di to distinguish of βR = 0 or 1. From the defender’s perspective, βR with
between tough and soft defenders. In particular, the more a proper distribution (e.g., βR = 1 with probability of βD )
disparate the values of γiT and γiS , the shorter amount of time can be considered as equivalent to βD . From an insider’s per-
the insider needs to make the distinction. spective, however, βR introduces additional uncertainty which
If we set γiT = γiS , then either a tough or soft defender must the insider has to guess based on the historic interactions. In
have a significantly lower utility than its single-interaction particular, a defender with βR = 0 is tougher than one with
optimal strategy. The reason is that these two types of defend- βR = 1. This uncertainty provides the defender an opportunity
ers have different preferences (i.e., βD ) between the detection to establish a reputation of toughness around βR .
and false positive rates. As such, the design of the defensive Formally, we redefine the reputation of a defender as
mechanism faces a dilemma: Either make γiT and γiS close follows.
to each other, at the expense of damaging the utility of the
defender, or make γiT and γiS divergent, but allow the insider Definition 2. In a system with naı̈ve attackers, the defender’s
to quickly learn the real value of βD . Suppose that the attacks reputation of toughness from the view of an insider Ai is
from naı̈ve attackers also form a Poisson process with an defined as
arrival rate (i.e., number of attacks per time slot) of λN . The rR (i) = Pr{βR < β0 |outcomes for A1 , . . . , Ai−1 }. (26)
following theorem formalizes such dilemma:
Based on this definition, the basic steps of our reputation-
Theorem 7. A smart insider can always learn βD with con-
establishment algorithm for systems with naı̈ve attackers can
fidence c within time t unless either the tough or the soft
be described as follows. Similar to Algorithm A, there are
defender must have an average utility per inside attacker
two steps: reputation-building and reputation-established. The
satisfies
q reputation-building step is essentially the same as Algorithm
uD (t) 1 2 1 A, with the exception of replacing βD by βR . A defender
≤ | − βD | · 1 − (1 − c) tλN − (24)
tλI 2 2 with βR = 1 behaves tougher in the initial stage of anomaly
Please refer to Appendix G for the proof of the theorem. detection in order to build its reputation of toughness (i.e.,
The theorem yields rR ≥ 1/(1 + βA )) and then capitalize on it in the long run.
The defender monitors the time elapsed since the last detection
uD (t)
< max(−βD , βD − 1). (25) to determine whether the reputation has been established.
tλI
However, the reputation-established step is different from
The right side of this inequality is the utility that defender can Algorithm A. In Algorithm A, once the reputation is estab-
receive by simply choosing γ = 0 if tough, and γ = 1 if soft. lished, a soft defender makes no effort for detection (by setting
Thus, either the tough or soft defender must sacrifice its utility γ = 1) because it is assured that no insider will launch attacks.
in order to hide βD . In particular, the lower c or longer t the However, in a system with naı̈ve attackers, such guarantee no
defender requires, the greater must such sacrifice be. longer holds because the reputation established on βR could
still be destroyed by an incoming naı̈ve attacker. Thus, instead
B. Defense Against naı̈ve Attackers of setting γ = 1, we set γ to a value close to but not exactly
In this section we extend the reputation-establishment idea equal to 1. This way, once the reputation is destroyed and
to systems with naı̈ve attackers. Clearly, the previous definition the insiders resume their attacks, the defender can detect such
of reputation (i.e., rD ) is no longer applicable due to the failure of reputation.
negative result which shows the impossibility of hiding the Once the reputation is destroyed, the defender must recover
defender’s preference parameter for a prolonged period of from it and reestablish its reputation. Somewhat surprisingly,
time. Thus, we first introduce a new reputation of toughness this is an extremely easy task. The only thing the defender
for the defender, and then describe the basic ideas of our needs to do is to regenerate a new random value for βR
algorithm. After that, we present the detailed steps and the according to the original distribution. Clearly, by doing so, all
quantitative analysis. historic interactions become useless for an insider to estimate
10

the new βR , and the system rolls back to the reputation- Algorithm B: Reputation-Establishment For Systems With
building step. naı̈ve Attackers

Note that the utility of a defender depends on the inter- 1: Randomly choose βR as 1 with probability of βD and
arrival time of naı̈ve attackers. This is because the defender 0 otherwise.
can only gain on the reputation in the reputation-established 2: STATUS ← NULL. i ← 0. d ← 0.
step, which ends when an attack from naı̈ve attackers is not 3: repeat
detected. Thus, the length of the reputation-established step is 4: if STATUS = ESTABLISHED then
proportional to the inter-arrival time of naı̈ve attackers. Clearly, 5: γ ← 1 − p0 if βR = 1, γ ← p0 if βR = 0.
if the inter-arrival time is too short, the algorithm will keep 6: else if βR = 1 then
restarting the reputation-building step, and thus not be able 7: γ ← (1 − p)/(1 − p + βA p).
to generate profit for the defender. Fortunately, according to 8: else if βR = 0 then
Section II, only a small number of naı̈ve attackers need to be 9: γ ← βA p/(1 − p + βA p).
addressed by anomaly detection. Thus, the inter-arrival time of 10: end if
naı̈ve attackers can be much longer than that of the (potential) 11: Wait until t > tD or an attack is detected.
insiders which could launch attack at every time slot. 12: if t > tD then
Additionally, when naı̈ve attackers are present, the utility of 13: STATUS ← ESTABLISHED.
a soft defender after reputation establishment will not be as 14: else if STATUS = ESTABLISHED then
good as its utility in systems without naı̈ve attackers. This is 15: Goto 1;
because, in order to establish a reputation of toughness, a soft 16: else
defender has to simulate a tough defender by generating βR = 17: tD ← t + l0 .
0 with positive probability. We shall show in the quantitative 18: end if
analysis that, when the inter-arrival time of naı̈ve attackers 19: until t > t0 AND STATUS = NULL
is sufficiently long, the expected payoff of a soft defender is 20: Goto 1;
still significantly higher than the (single-interaction) optimal
strategy derived in Theorem 1.
Theorem 8. With Algorithm B, an insider Ai will abstain if
C. Reputation-Establishment Algorithm 1
rR (i) > , (28)
Based on the basic ideas introduced above, a detailed βA + 1
algorithm is presented for reputation establishment in systems which holds as long as the number of detections exceeds the
with naı̈ve attackers, as depicted in Algorithm B. In particular, number of undetected attacks
the defender randomly generates βR based on the following
distribution (Line 1 in Algorithm B) Please refer to Appendix H for the proof of the theorem.
 √ Theorem 9. When t is sufficiently large, with Algorithm B,
0, with probability of p = 1 − βD ;
βR = (27) the probability that no insider would launch attack at time t
1, otherwise.
satisfies
As we shall show in the theoretical analysis, this distribution of Z 1/λN
βR maximizes the expected (long-term) payoff of the defender. λN e−λN t
pE ≥ (1 − (1 − 2γS )·
The rest of the algorithm, including the meanings of t0 and l0 , 0 2(1 − γS )
are the same as Algorithm A with only two exceptions: First, tλI (1 − 2p)2
 
we determine the defensive strategy γ based on the randomly erf dt. (29)
8p(1 − p)
generated βR instead of the real βD . Second, as we mentioned
above, in the reputation-established step, we set γ = 1 − p0 In particular, when λN  λI , we have pE ≥ βA /(6 − 6p).
and p0 where 0 < p0  1 instead of 1 and 0 in Algorithm A, Please refer to Appendix I for the proof of the theorem.
respectively.
Theorem 10. When Algorithm B is used, if λN  λI , the
expected payoff of the defender satisfies
D. Theoretical Analysis √
uD βA (1 − βD ) 3/2
In analogy to the analysis of Algorithm A, we analyze lim ≥ u0 (βD ) + √ (1 − 2βD + βD ),
n→∞ n 6 βD
Algorithm B in three steps. First, as shown in Theorem 8, we
(30)
prove that no insider will choose to attack the system when
rR (i) > 1/(1 + βA ); and there is always rR (i) > 1/(1 + βA ) which is maximized over all possible values of p.
if the number of detections exceeds the number of successful
Please refer to Appendix J for the proof of the theorem.
attacks. Second, as shown in Theorem 9, we show that the
probability of STATUS = ESTABLISHED increases fairly
quickly with t. Finally, as shown in Theorem 10, we derive E. Summary of Findings and Implications in Practice
the expected payoff of a defender with Algorithm B and show Based on the above discussions, we make the following two
that it is substantially higher than that in the Nash equilibrium key observations on systems with naı̈ve attackers which always
in Theorem 1. launch an attack blindly:
11

• The defender can no longer obscure its real degree of system call names (e.g., open or fork), and issues an alert
toughness (and hence establish a “tougher” reputation) if the absolute difference between the normal and monitored
by manipulating its strategy. In particular, for any given frequencies exceeds a user-determined threshold.
p ∈ [0, 1], we derived an upper bound on the number The other algorithm, namely the Parameter-driven (PD)
of interactions required for an insider to learn the real algorithm, measures not only system call names, but also
degree of toughness of the defender with confidence of c. their input/output parameters. An anomaly alert is issued if
the categorical Hamming distance between the normal and
• Nonetheless, the idea of establishing a defender’s reputa- monitored profiles exceeds a user-determined threshold. One
tion can still reduce the probability of insiders launching can see that both algorithms are statistical anomaly detection
attacks even in a system with naı̈ve attackers. The key algorithms, each featuring a user-controllable threshold which
idea for the proposed reputation-establishment algorithm makes a tradeoff between the detection and false positive rates.
is that the defender should no longer establish a reputation There are three main reasons why we selected HT/PD
of toughness for its utility function, because the real value algorithms as the reference algorithms in our experiments.
of βD will be learned by an inside attacker within a short • HT/PD algorithms are among the few anomaly detection
period of time. Instead, the defender should introduce algorithms which were designed specifically for insider
additional uncertainty by a new “behavior toughness” attackers. As such, the application domains of HT/PD
parameter βR , and then establish a reputation for this new algorithms fit well within the scope of this paper.
parameter. The premise here is that unlike the real tough- • The design of HT/PD algorithms features an attractive
ness degree which is an inherent property of the defender idea of utilizing well-tested anomaly detection algorithms
and cannot be changed, the behavior toughness degree for external attackers (e.g., based on monitored system
is simply a man-made parameter which can be revised calls) to defend against insiders attacks. In addition, our
(randomly) once learned by the adversary. We adopted approach proposed in this paper is capable of further
this idea in Algorithm B and derived a lower bound on the improving their performance to a satisfiable level. The
probability for an inside attacker to withdraw its attack prospect of utilizing external-attack-detection algorithms
due to the established reputation of toughness. against insider attacks is a strong motivation for our
We now discuss the application scenarios for the two selection of HT/PD algorithms in this paper.
algorithms (Algorithms A and B) we proposed in the paper. • The experiments reported in [5] contain a comprehensive
These two algorithms differ in whether naı̈ve attackers may set of Receiver Operating Characteristic (ROC) curves
be present in the system - i.e., whether there exist adversaries for different parameter settings on the HT/PD algorithms.
which are technically sophisticated enough to launch insider This enables us to thoroughly evaluate the impact of
attacks, while not suffering significant loss from being detected incorporating our algorithms with theirs.
and identified. If no such adversary exists, Algorithm A should
be used. If such adversary may exist, or a determination cannot
B. Evaluation Methodology
be made, the system administrator should choose Algorithm B
because this algorithm can be used in both cases (but cannot We conducted extensive experiments using the same settings
achieve as good performance as Algorithm A in systems with as the anomaly detection algorithms in [5]. In particular,
no naı̈ve adversary). HT/PD algorithms were used to detect illegal users who at-
tempt to download and install malware through web browsers.
VI. P ERFORMANCE E VALUATION Please refer to [5] for a detailed description of the normal
profile of system calls, the synthetic generator for insider
In this section, we present the performance evaluation of
activities, and the baseline performance of HT/PD algorithms.
our reputation-establishment algorithms.
We evaluated the performance of our proposed Algorithms
A and B based on the HT/PD reference algorithms. We
A. Reference Algorithms consider a total time length of 10, 000 seconds and an insider
Although our reputation-establishment approach is trans- arrival rate of λI = 0.1. Recall from Section II that, in real-
parent to the underlying anomaly detection techniques, for world systems, most naı̈ve attackers are also naı̈ve technically
the purpose of conducting experimental studies, we need to and therefore can be detected by signature-based intrusion
choose concrete anomaly detection algorithms for comparison detection techniques. Thus, the arrival of naı̈ve attacks which
purposes. In this paper, we consider two anomaly detection need to be addressed by anomaly detection is usually less
algorithms from [5] as the reference (baseline) algorithms and frequent than the insiders, who might launch their attacks at
use them as black boxes in our implementation. any time. As such, we assume a lower arrival rate λN for naı̈ve
We now briefly describe the baseline algorithms and explain attackers. In particular, the default value of λN is set to 0.01,
our reasons for choosing them. The algorithms in [5] were and varied between 0.001 to 0.011 in an experiment to test the
designed to monitor system calls for detecting insider attacks. change of the performance of our algorithms when the insider
Both of them first measure the normal profile of systems and naı̈ve attacker arrival rates have different ratios.
call sequences, and then detect deviations from it in the Recall that l0 is the length of the waiting period until the
monitored calls as anomalies. One algorithm, namely the defender believes in the establishment of its reputation. We
Histogram (HT) algorithm, measures the frequency count of compute the value of l0 based on 99% confidence. Also recall
12

that the expiration time t0 is the timeout threshold for the such performance improvement is not as significant as the ones
defender to abandon its attempt to establishing reputation. In for Algorithm A (shown in Figure 4) due to the adversarial
our experiments, we set t0 = 1, 000. Note that the performance knowledge of βD .
of our algorithm is insensitive to t0 , as demonstrated in We also tested the detection/prevention rate of Algorithm B
Figure 3. In Algorithm B, we set p0 = 0.1. by varying the arrival rate of naı̈ve attackers between 0.001
The performance of our two algorithms is compared with to 0.011. We set βA = 0.5 and βD = 0.5. Figure 7 shows the
the reference algorithms based on the Receiver Operating detection/prevention rate achieved while maintaining a false
Characteristic (ROC) curve, which is widely used to measure positive rate of at most 0.35. One can see from the figure that
the tradeoff between detection/prevention rate and false pos- the smaller the arrival rate, the higher the detection/prevention
itive rate. In this case, we have six variants: reference HT, rate the defender can achieve. This is consistent with our
reference PD, Algorithm A HT, Algorithm A PD, Algorithm B intuition discussed in Section V.
HT, and Algorithm B PD. We define the detection/prevention
rate as the percentage of attackers that either choose not to D. Execution Process of Algorithms A and B
attack or launch attacks that are detected. The false positive
We now demonstrate the execution process of Algorithms
rate is the normalized number of generated false positives.
A and B by showing the change of the defender’s reputation
We apply numerical results generated by Monte Carlo
and strategy over time. Note that such process is independent
simulation based on the Nash equilibrium derived in the paper.
of whether HT or PD algorithm is used underneath.
Recall that the Nash equilibrium represents a state where
Figure 8 depicts the change of the defender’s reputation over
neither player can benefit by unilaterally changing its strategy.
time when naı̈ve attackers exist or do not exist in the system,
The numerical results shown in the following subsection
respectively. We set βA = βD = 0.5, and the insider arrive
actually represent the performance of our algorithms in this
rate to be λI = 0.1. To intuitively demonstrate the impact of
state, and thus can be used to demonstrate the real performance
naı̈ve attackers on the reputation, we set a naı̈ve attacker to
of anomaly detection systems adopting our algorithms.
arrive at the 25th second since the initialization of the system.
One can see from Figure 8 that when no naı̈ve attacker exists
C. Performance Evaluation for Algorithms A and B in the system, the defender’s reputation will not be broken
The experiments in this subsection demonstrate the perfor- once established. When naı̈ve attackers exist in the system,
mance improvement (in terms of the tradeoff between the however, the defender will have to reestablish its reputation
detection and false positive rates) gained by applying our periodically after the reputation is destroyed by an incoming
reputation-establishment algorithms to the baseline HT/PD naı̈ve attacker.
algorithms. In the next subsection, the experimental data We also validate the change of the expected value of γ
will further explain the reasons for such improvement by (a soft defender’s tradeoff parameter) over time when naı̈ve
demonstrating the change of defender’s strategy and reputation attackers exist or do not exist in the system. Figure 9 depicts
during the execution of our algorithms. the results using the same settings as Figure 8. When no naı̈ve
Figure 4 shows the comparison between the ROC curves of attacker exists in the system, a soft defender can maintain
Algorithm A and the two reference algorithms when βA = a high value of γ (and low false positive rate) after the
0.5, p = 0.5, and βD = 0.5. In general, the larger the area establishment of its reputation. When naı̈ve attackers exist in
under an ROC curve is, the better is the tradeoff between the the system, the value of γ has to be increased periodically for
detection and false positive rates. As seen from the figure, by the purpose of establishing the reputation again.
establishing the defender’s reputation, Algorithms A HT/PD
can achieve a significantly better tradeoff than the reference VII. R ELATED W ORK
algorithms HT/PD. In particular, our scheme can detect almost There exists a significant body of research on anomaly
all attacks with false positive rate of only 0.2. detection against insider attacks [4], [5], [24]–[27]. Chinchani
We also tested Algorithm A against various insider prefer- et al. [24] proposed a graph-based model for assessing the
ence parameters βA from 0.1 to 0.9. Figure 5 shows the false threats from insider attacks and revealing possible attacking
positive rate required to achieve a detection rate of 90% when strategies. Spitzner et al. [26] discussed the possibility of
βD = 0.5. We observe that the greater βA is, the fewer false deploying Honeypot to detect, identify, and gather information
positives will the defender have to generate. By comparing on inside attackers. Ning and Sun [28] addressed the defense
with Figure 4, we can see that even when βA is quite small against insider attacks in mobile ad-hoc networks. Liu et al. [5]
(e.g., 0.1), Algorithm A still significantly outperforms the evaluated the performance of several existing anomaly detec-
baseline algorithms. tion techniques for detecting insider attacks and demonstrated
Figure 6 depicts the comparison between the ROC curves the limitation of those techniques. In the document control
of Algorithm B (Algorithms B HT/PD) and the two refer- domain, Thompson [25] proposed a content-based approach
ence algorithms (Algorithms HT/PD) when βA = 0.5 and to detect insider attacks using Hidden Markov models, while
βD = 0.5. This figure concludes that even for systems with Pramanik et al. [27] proposed a security policy that is tailored
naı̈ve attackers, our algorithm can still achieve a substantially to insider attacks. Whyte et al. [29] addressed the detection
better tradeoff than the reference algorithms when the required of intra-enterprise worm (which may be propagated by inside
detection rate is high. However, as predicted in Section V-B, attackers) based on address resolution. Nonetheless, our work
13

0.24
1 Algorithm A HT 1
0.22 Algorithm A PD
0.9 0.9
0.2
Detection/Prevention Rate

Detection/Prevention Rate
0.8 0.8
0.18

False Alarm Rate


0.7 0.7
0.16
0.6 0.6
Algorithm A HT 0.14 Algorithm B HT
Reference HT Reference HT
0.5 0.5
Algorithm A PD Algorithm B PD
0.12
Reference PD Reference PD
0.4 0.4
0.1
0.3 0.3
0.08
0.2 0.2
0.06
0.1 0.1
0.04
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0 0
0 0.2 0.4 0.6 0.8 1 β 0 0.2 0.4 0.6 0.8 1
A
False Alarm Rate False Alarm Rate

Fig. 4. ROC curves of Algorithm A and the Fig. 5. False positive rates of Algorithm A with Fig. 6. ROC curves of Algorithm B and the
reference algorithms (βA = 0.5.p = 0.5) varying insider preference parameter βA (when reference algorithms (βA = 0.5.βD = 0.5.)
p = 0.5)

1
1 1
Pr{STATUS = ESTABLISHED}
Detection/Prevention Rate

E(γ) for a soft defender


0.95 0.8 0.9
0.9
0.6 0.8
0.85
Algorithm A
0.8 Algorithm B
0.4 0.7
0.75
0.6
0.7 Algorithm B HT 0.2 Algorithm A
Algorithm B PD Algorithm B
0.65 0.5
0.001 0.003 0.005 0.007 0.009 0.011 0 0 10 20 30 40 50
λN 0 10 20 30 40 50 Time
Time

Fig. 7. Performance of Algorithm B with the naı̈ve Fig. 9. Change of a soft defender’s tradeoff
Fig. 8. Changes of the defender’s reputation in parameter in Algorithms A and B
attacker arrival rate
Algorithms A and B

is the first to exploit the inside attackers’ weakness of being previous work has addressed the importance of defender’s
afraid of detection in order to improve the tradeoff between reputation in repeated games, and how to properly establish a
detection and false positives for anomaly detection. reputation of toughness to defend against insider attacks.
Besides the detection of anomalies against insider attacks, Closely related, but orthogonal to our work, is the extensive
there have also been studies on further visualizing the gener- literature on using (potential attackers’) reputation to measure
ated anomalies [4], isolating anomalies on the network [30], the trustworthiness of parties in distributed systems such as
reasoning attack scenarios and attacker profiles [31], [32], and peer-to-peer networks [38], [39] and sensor networks [40].
securing data (content) transmission against insider attacks In these studies, the reputation of a participating party is
using software [33], [34] and hardware/software integrated evaluated based on its previous activities, and determines the
techniques [35]. Although our work does not directly address degree of trust the party receives in the future. Compared with
these issues, the existing solutions can nevertheless be inte- these studies, our work concerns the reputation of toughness
grated with our reputation-aware scheme to improve upon the of the defending party, instead of the reputation of trust-
performance of anomaly detection. worthiness of potential adversaries. Note that the reputation-
Additionally, the application of game theory has been ex- aware scheme proposed in this paper is independent of the
tensively studied in distributed systems and network security trustworthiness evaluation of potential attackers. Thus, the
[6]–[12]. Related to anomaly detection, Alpcan et al. [11] existing trustworthy-reputation schemes can co-exist with our
conducted game-theoretic analysis for anomaly detection in reputation-aware scheme in a system as independent compo-
access control systems, while Liu et al. [12] applied game- nents.
theoretic results to anomaly detection in wireless networks. Also form the basis of this paper are theoretical studies on
To model the interactions between defender and attackers in reputation models and repeated game-theory based models. A
anomaly detection, researchers have proposed various game- particularly interesting result is that in a chain-store paradox
theoretic models, including static [14], stochastic [15], [36], [16], [41] with one incumbent facing multiple entrants, the
and repeated games [37]. Based on the game-theoretic models, long-term incumbent can (almost) always benefit from main-
most existing work focused on the modeling of attackers’ taining a reputation of toughness, as long as there exists a
intent, objectives, and strategies, as well as the corresponding small probability for it to be tough [16]. In this paper, we
defensive countermeasures. In particular, Buike [37] studied apply game theory to formulate the iterative insider attacks
a model of repeated games in the information warfare with in anomaly detection systems as repeated games, and propose
incomplete information, which is similar to the repeated game reputation-aware schemes that incorporate game-theoretic re-
model investigated in this paper. Nonetheless, none of the sults to improve the performance of anomaly detection.
14

VIII. D ISCUSSIONS C. Impact of Detection Latency


A. Inside Attackers with Short-Term Memory The detection of an attack may not be real-time in practice.
We now discuss the impact of such detection latency on
In practice, some inside attackers may have the capability the execution of Algorithms A and B. There are the two
of observing historic interactions but can only memorize such important properties of these two algorithms that are related
interactions for a limited amount of time, say tM . We now to the detection latency: (i) according to Algorithms A and
discuss the impact of such “short-term-memory” attackers on B, the defender’s strategy remains the same regardless of the
the performance of our algorithms. In particular, we con- number of detections until such number becomes greater than
sider two cases: (i) a system consisting only of short-term- the number of missed attacks (i.e., until the reputation is
memory insiders, and (ii) a system consisting of the previously established), (ii) since the defender cannot directly observe
discussed (long-term-memory) inside attackers, short-term- the number of missed attacks, both Algorithms A and B
memory insiders, as well as naı̈ve attackers. feature a “waiting-period” parameter l0 which is an upper-
In the first case, when Algorithm A is used, the defender’s bound threshold on the amount of time without detection
established reputation may be “forgotten” by the attackers. before the defender claims an established reputation. Given
Fortunately, according to Algorithm A, the defender’s rep- these properties, one can see that the only change required
utation is established as soon as the number of detections for tolerating detection latency in Algorithms A and B is to
exceeds the number of missed attacks. Since the first-coming prolong the waiting period l0 with an upper bound on the
interactions are forgotten first, even with the loss of memory, detection latency. Then, the defender can switch to reputation-
the number of detections remains greater than the number of established strategies once the new l0 expires without the
missed attacks until tM units of time after the establishment detection of any attacks.
of the reputation. As such, an established reputation will not
be lost until after tM units of time. A simple solution to IX. F INAL R EMARKS
address this problem is to execute Algorithm A again once
the reputation has been established for tM amount of time. In this paper, we addressed how to establish the defender’s
Since no attacker has long-term memory, the attackers cannot reputation in anomaly detection against insider attacks. We
infer whether the defender is tough or soft by associating two considered smart insiders who learn from the outcomes of
different rounds of executions of Algorithm A. historic attacks to avoid detection, and naı̈ve attackers who
In the second case, when Algorithm B is used, the de- blindly launch their attacks and do not care about being
fender’s established reputation may be destroyed by either a detected. Based on a game-theoretic formulation, we proposed
naı̈ve attacker, as discussed previously in Section V, or a short- two generic reputation-establishment algorithms for systems
term-memory insider which forgets the established reputation. consisting of only smart insiders and also those with both
Again, the short-term-memory insider will start to attack smart insiders and naı̈ve attackers. Theoretical analysis and
after tM units of time have passed since the establishment performance evaluation indicate that our algorithms can sig-
of defender’s reputation. At that time, a short-term-memory nificantly improve the tradeoff between the detection rate and
insider can be considered equivalent to a naı̈ve attacker by the false positive rate. As part of our future work, we are inves-
the defender. Thus, a simple solution is to revise Algorithm tigating the establishment of defender’s reputation in a broader
B by adding the arrival rate of short-term-memory attackers range of applications such as anonymous communication and
to that of the naı̈ve attackers at tM units of time after the distributed data sharing systems.
establishment of reputation.
R EFERENCES
[1] D. Cappelli, A. Moore, and T. Shimcall. Common sense guide to
B. Time-Variant Preference Parameters prevention and detection of insider threats. Technical report, United
States Computer Emergency Readiness Team, http://www.us-cert.gov/,
In practice, the preference parameters of the defender and 2005.
the insiders may change over time. Although such change [2] VeriSign. Managed Security Services - Securing Your Critical Networks.
White Paper, 2003.
is usually rather infrequent, it may have significant impact [3] Alexander Liu, Cheryl Martin, Tom Hetherington, and Sara Matzner.
on the execution of Algorithm A. In particular, the insiders AI lessons learned from experiments in insider threat detection. In
may launch their attacks again due to the possibility that (i) AAAI Spring Symposium: What Went Wrong and Why: Lessons from
AI Research and Applications, 2006.
a defender may become softer with the change of defender’s [4] Jeffrey B. Colombe and Gregory Stephens. Statistical profiling and
preference parameter, or (ii) the attackers become tougher (i.e., visualization for detection of malicious insider attacks on computer
βA becomes smaller). networks. In Proceedings of the 2004 ACM workshop on Visualization
and data mining for computer security, 2004.
On the other hand, notice that the periodic execution in [5] A. Liu, C. Martin, T. Hetherington, and S. Matzner. A comparison
Algorithm B trivializes the problem of time-variant preference of system call feature representations for insider threat detection. In
parameter, because the defender can simply start the next Proceedings of the 2005 IEEE Workshop on Information Assurance,
2005.
round of execution with the new parameters. [6] P. Liu, W. Zang, and M. Yu. Incentive-based modeling and inference
Thus, when the preference parameters of the defender and/or of attacker intent, objectives, and strategies. ACM Transaction on
the attackers might change over time, Algorithm B should be Information System and Security, 8(1):78–118, 2005.
[7] J. Xu and W. Lee. Sustaining availability of web services under
used (without change) regardless of whether naı̈ve attackers distributed denial of service attacks. ACM Transaction on Computer,
exist in the system. 52(4):195–208, 2003.
15

[8] L. A. Gordon and M. P. Loeb. Using information security as a response [35] S. Jiang, S. Smith, and K. Minami. Securing web servers against
to competitor analysis systems. Communications of the ACM, 44(9):70– insider attack. In Proceedings of the 17th Annual Computer Security
75, 2001. Applications Conference, 2001.
[9] L. Buttyan and J. P. Hubaux. Security and Cooperation in Wireless [36] K. Lye and J. M. Wing. Game strategies in network security. In
Networks. Cambridge University Press, 2007. Proceedings of the 15th IEEE Computer Security Foundations Workshop
[10] W. Yu and K. J. R. Liu. Game theoretic analysis of cooperation (CSFW), 2002.
stimulation and security in autonomous mobile ad hoc networks. IEEE [37] D. Buike. Towards a game theory model of information warfare.
Transaction on Mobile Computing, 6(5):459–473, 2007. Technical Report Master Thesis, Airforce Institute of Technology, 1999.
[11] T. Alpcan and T. Basar. A game theoretic analysis of intrusion detection [38] L. S. Sherwood and R. Bobby Bhattacharjee. Cooperative peer groups
in access control systems. In Proceedings of the 43th IEEE Conference in nices. In Proceedings of the 22th Annual Joint Conference of the
on Decision and Control, 2004. IEEE Computer and Communications Societies (INFOCOM), 2003.
[12] Y. Liu, C. Comaniciu, and H. Man. A bayesian game approach for [39] L. Xiong and L. Liu. Peertrust: Supporting reputation-based trust in
intrusion detection in wireless ad hoc networks. In Proceedings of the peer-to-peer communities. IEEE Transactions on Knowledge and Data
2006 Workshop on Game Theory for Communications and Networks, Engineering, 16(7):443–459, 2004.
2006. [40] S. Ganeriwal and M. B. Srivastava. Reputation-based framework for
[13] Pietro Michiardi and Refik Molva. Core: a collaborative reputation high integrity sensor networks. In Proceedings of ACM workshop on
mechanism to enforce node cooperation in mobile ad hoc networks. Security in Ad-hoc & Sensor Networks (SASN), 2004.
In Proceedings of the IFIP TC6/TC11 Sixth Joint Working Conference [41] D. Fudenberg and D. M. Kreps. Reputation in the simultaneous play of
on Communications and Multimedia Security, 2002. multiple opponents. The Review of Economic Studies, 54(4):541–568,
1987.
[14] N. Zhang and W. Zhao. Distributed privacy preserving information
sharing. In Proceedings of the 31th International Conference on Very
Large Data Bases (VLDB), 2005.
[15] T. Alpcan and T. Basar. An intrusion detection game with limited
observations. In Proceeding of the 12th Intl. Symposium on Dynamic
Games and Applications, 2006.
[16] Drew Fudenberg and David K. Levine. Reputation and equilibrium
selection in games with a patient player. Econometrica, 57(4):759–778,
1989.
[17] H. Kim and B. Karp. Autograph: toward automated, distributed worm
signature detection. In Proceedings of the 13th USENIX Security
Symposium (SECURITY), 2004.
[18] A. Lakhina, M. Crovella, and C. Diot. Diagnosing networkwide traffic
anomalies. In Proceedings of ACM SIGCOMM, 2004.
[19] S. Axelsson. The base-rate fallacy and its implications for the difficulty
of intrusion detection. In Proceedings of the 6th ACM Computer and
Communications Security Conference (CCS), 1999.
[20] Snort. http://www.snort.org.
[21] Securityfocus. SPADE (Statistical Packet Anomaly Detection Engine).
www.securityfocus.com/tools/1767.
[22] D. Fudenberg and J. Tirole. Game Thoery. MIT Press, 1991.
[23] Ajit C. Tamhane and Dorothy D. Dunlop. Statistics and Data Analysis:
From Elementary to Intermediate. Prentice-Hall, New Jersey, 2000.
[24] R. Chinchani, A. Iyer, H. Q. Ngo, and S. Upadhyaya. Towards a
theory of insider threat assessment. In Proceedings of the International
Conference on Dependable Systems and Networks (DSN), 2005.
[25] P. Thompson. Weak models for insider threat detection. In Proceedings
of the SPIE, 2004.
[26] L. Spitzner. Honeypots: Catching the insider threat. In Proceeding of
Annual Computer Security Application Conference (ACSAC), 2003.
[27] Suranjan Pramanik, Vidyaraman Sankaranarayanan, and Shambhu Upad-
hyaya. Security policies to mitigate insider threat in the document
control domain. In Proceedings of the 20th Annual Computer Security
Applications Conference (ACSAC), 2004.
[28] Peng Ning and Kun Sun. How to misuse aodv: A case study of insider
attacks against mobile ad-hoc routing protocols. Ad Hoc Networks,
3(6):795–819, 2005.
[29] David Whyte, Paul C. Van Oorschot, and Evangelos Kranakis. Detecting
intra-enterprise scanning worms based on address resolution. In Pro-
ceedings of the 21th Annual Computer Security Applications Conference
(ACSAC), 2005.
[30] C. Hood and C. Ji. Proactive network fault detection. In Proceedings
of the 16th Annual Joint Conference of the IEEE Computer and
Communications Societies (INFOCOM), 1997.
[31] P. Ning and D. B. Xu. Building attack scenarios through integration of
complementary alert correlation methods. In Proceedings of the 11th
Annual Network and Distributed System Security Symposium, 2004.
[32] Jim Yuill, Shyhtsun Felix Wu, Fengmin Gong, and Ming-Yuh Huang.
Intrusion detection for an on-going attack. In Proceedings of Recent
Advances in Intrusion Detection (RAID), 1999.
[33] Yang Yu and Tzi cker Chiueh. Display-only file server: a solution against
information theft due to insider attack. In Proceedings of the 4th ACM
workshop on Digital rights management (DRM), 2004.
[34] Jonathan Katz and Ji Sun Shin. Modeling insider attacks on group key-
exchange protocols. In Proceedings of the 12th ACM conference on
computer and communications security (CCS), 2005.
16

A PPENDIX A In this case, the unique Nash equilibrium of the game consists
P ROOF OF T HEOREM 1 of
Theorem 1. When b = 0, if an insider Ai knows the exact value • The tough defender (with βD = 0) sets γ = 0.
of βD , the unique Nash equilibrium of the game is formed by • The soft defender (with βD = 1) sets γ = 1.
a mixed attacking strategy of • The smart insider chooses not to launch attack.

0 βA
! Proof: We prove this theorem by showing that when
βD · |lF ( 1+β )|
Pr{Ai attacks} = min 1, A
, (31) rD (i) = 1/(βA +1), a Nash equilibrium of the game is formed
1 − βD by a defensive strategy of

and a defensive strategy of 1, if βD ≥ β0 ;
γ= (37)
0, otherwise.
if βD ≥ β0 ;

1,
γ= βA (32) and an attacking strategy that sets
1+βA , otherwise.
Pr{Ai attacks} = 0. (38)
Proof: In order to prove the theorem, we show that no
player can benefit by unilaterally changing from the strate- The cases where rD (i) > 1/(βA +1) can be proved in analogy.
gies specified in the Nash equilibrium. Let us first consider Note that since the defender publishes the function of γ with
the attacker. With the attacking strategy in (31), we have βD to the attackers, the game now follows the Stackelberg
Pr{Ai attacks} = 1 if leadership model [22] with the defender being the leader and
1 the attacker being the follower. The reason is that the defender
βD ≥ 0 ( βA )|
. (33) knows ex ante that the attackers can observe the published
|lF 1+βA +1 function. We consider the attacker first. The expected payoff
Note that the attacker’s expected payoff is uA = 1 if (33) of the attacker is
holds, and 0 otherwise. Since the maximum possible payoff E(uA (i)) = Pr{Ai attacks} · (1 − rD (i) · (1 + βA )). (39)
for the attacker is 1, the attack cannot benefit by changing
its strategy unilaterally when (33) holds. When (33) does not which satisfies E(uA (i)) ≤ 0 when (36) holds. Thus, the
hold, since the defensive strategy will be γ = βA /(1 + βA ), attacker cannot benefit by changing its strategy unilaterally.
the attacker’s payoff uA (i) ≡ 0 no matter what strategy the Let us now consider the defender. Note that a defender
attacker chooses. Thus, the attacker cannot benefit by changing with βD ≥ β0 already achieves the maximum possible
its strategy unilaterally. payoff 0, and thus can never benefit by changing its strategy
We now consider the defender’s payoff. When (33) holds, unilaterally. A defender with βD < β0 has an expected payoff
the attacker will always attack. Thus, the expected payoff of of E(uD (i)) = −βD . Suppose that it changes the defensive
the defender is strategy to γ > 0. Since the game follows the Stackelberg
leadership model, the attacker will change its (follower’s)
E(uD (i)) = −βD · lF (γ) − (1 − βD ) · γ. (34) strategy to Pr{Ai attacks} = 1. The defender’s expected
Recall from our assumption in Section II that lF (·) is concave. payoff will then be E(uD (i)) = −βD · lF (γ) − (1 − βD ) · γ,
Thus, the value of γ that maximizes E(uD (i)) is γ = 1, which which is less than −βD due to the concaveness of lF (·).
is the strategy specified in (32). Thus, when (33) holds, the Thus, the defender cannot benefit by changing its strategy
defender cannot benefit by changing its strategy unilaterally. unilaterally. As such, the attacker will not launch its attack
If (33) does not hold, the defender’s expected payoff is if (36) is satisfied.

0 βA
E(uD (i)) = −βD · lF (γ) − βD · |lF ( )| · γ. (35) A PPENDIX C
1 + βA P ROOF OF T HEOREM 3
0 0
The first order condition is lF (γ) = lF (βA /(1 + βA )). Thus, Theorem 3. The following strategies constitute a sequential
the value of γ that maximizes E(uD (i)) is γ = βA /(1 + βA ) equilibrium of the game:
which is exactly what is specified in (32). Thus, the defender
• A tough defender sets γ = 0.
cannot benefit by changing its strategy unilaterally. The set λ+1
• If no attack has escaped detection and p(βA +1) < 1,
of strategies specified in (31) and (32) leads to a Nash
a soft defender sets
equilibrium. It is easy to verify that it is the unique Nash
equilibrium of the game. βA
γ= . (40)
βA + 1 − (βA + 1)λ+1 p
A PPENDIX B Otherwise, a soft defender sets γ = 1.
P ROOF OF T HEOREM 2 • If there has been at least one attack that escaped detec-
Theorem 2. If βD is either 0 or 1, a smart insider will not tion, an attacker always attacks. Otherwise, if p(βA +
launch attack if and only if 1)λ+1 < 1, an attacker attacks with probability
 
1 βs
rD (i) ≥ . (36) Pr{Ai attacks} = min 1, . (41)
βA + 1 (1 − βs )(1 − b)
17

If p(βA + 1)λ+1 ≥ 1, an attacker abstains. Proof: Let d and i be the number of detections and
Proof: It is straightforward to verify that the above the total number of launched attacks, respectively. Applying
strategy for a tough defender is optimal: Since such a defender Bayes’ theorem, we have
has βD = 0, setting γ = 0 is always the optimal strategy. Pr{historic outcomes|βD < β0 } Pr{βD < β0 }
For a soft defender, we consider the cases where p(βA + rD (i) =
Pr{historic outcomes}
1)λ+1 ≥ 1 or < 1, respectively. When p(βA + 1)λ+1 ≥ 1, (50)
every attacker abstains and a soft defender sets γ = 1, which
is clearly the optimally strategy. When p(βA + 1)λ+1 < 1, an pγSi−d (1 − γS )d
= . (51)
attacker attacks with probability βs /((1 − βs )(1 − b)). For any pγSi−d (1 − γS )d + (1 − p)(1 − γS )i−d γSd
γ, the expected payoff of the defender during its interaction
with the attacker is Since d > i/2, we have d ≥ i − d + 1. As γS < 1/2,

βs p(1 − γS ) 1
E(uD (i)) = − (1 − βs ) · · (γ + (1 − γ) · b)− rD (i) ≥ = . (52)
(1 − βs )(1 − b) p(1 − γS ) + (1 − p)γS 1 + βA
βs · (1 − γ) (42) Applying Theorem 2, no insider will choose to attack the
βs · b system.
=− − βs · γ − βs · (1 − γ) (43)
1−b
βs
=− . (44) A PPENDIX E
1−b P ROOF OF T HEOREM 5
As can be seen, the expected payoff is independent of γ. Thus, Theorem 5. With Algorithm A, the probability that rD ≥
a soft defender cannot benefit by deviating from the strategy 1/(1 + βA ) at time t < t0 is given by
unilaterally.
tλI (1 − 2p)2
 
We now consider the attacker’s strategy. If there has been 1 1 − 2γS
f (t) ≥ − · erf . (53)
at least one attack that escaped detection, then the attacker 2(1 − γS ) 2(1 − γS ) 8p(1 − p)
knows that the defender is soft (i.e., βD = βs ) because a
tough defender always has γ = 0. Since a soft defender in In particular,
this case sets γ = 1, the optimal strategy for the attacker is to lim f (t) ≥ pβA /(1 − p). (54)
launch its attack. t→∞
Now consider the case where all attacks so far have been
detected. Let rD (λ) be the reputation of the defender after λ Proof: Let R(j) be the number of detections for the
launched attacks. Let γ(i) be the value of γ in (40) when i first j attacks. According to Algorithm A, STATUS = NULL
attacks have been launched. We have if and only if ∀j < i, R(j) ≤ j/2. We now derive f (i)
by transforming the problem to the monotonic path counting
rD (λ) = Pr{βD = 0|all attacks are detected} (45) problem in combinatorics. Figure 10 depicts a grid of n × n
p square cells. We start with the lower left corner at Round 1.
= Qλ (46)
p + (1 − p) · i=1 (1 − γ(i)) If an attack is detected, we move one step right along an edge
= (βA + 1)λ · p. (47) of the grid. If an attack is not detected, we move one step up.

When the attacker launches an attack, its expected payoff is Detected


y=x

E(uA ) = −rD (λ) · βA − (1 − rD (λ)) · (γ − (1 − γ) · βA ). R(i) > i/2


(48)

With (40) and (47), we have E(uA ) = 0. Note that the


expected payoff of the attacker from abstaining is also 0. Thus,
the attacker cannot benefit by changing its strategy.
Undetected

A PPENDIX D
Fig. 10. n × n Grid
P ROOF OF T HEOREM 4
Theorem 4. In a system without naı̈ve attackers, when the num- As we can see, if R(j) ≤ j/2 holds for all j < i, then
ber of detections exceeds the number of undetected attacks, we the path never crosses the diagonal of the grid. We count the
have number of paths that satisfy this condition as follows. Without
1 loss of generality, we consider the case where i is even. Note
rD (i) ≥ , (49) that when i is odd, then f (i) = f (i + 1). When i is even,
1 + βA
after i attacks, the finishing point of the path can be one of
and no insider will choose to launch its attack. the following points: (i, 0), (i−1, 1), . . ., (i/2, i/2). Note that
18

S S
for x, y ≥ 1 and x ≥ y, the number of monotonic paths from preference parameter of βD is u0 (βD ). Thus, the expected
(0, 0) to (x, y) that never cross the diagonal is payoff of the soft defender satisfies
   
x+y x+y uD 1 − p − pβA
g(x, y) = − . (55) ≥ S
· u0 (βD ). (62)
y y−1 n 1−p
For example, g(i, 0) = 1 because there is only one monotonic when n0 is sufficiently large. Irrespective of whether STATUS
paths from (0, 0) to (i, 0). = ESTABLISHED or NULL, the expected payoff a tough
Thus, the probability that STATUS = ESTABLISHED after defender with preference parameter of βD T T
is always u0 (βD ).
i attacks have been launched to the system is
i/2
X
f (i) = 1 − γSy (1 − γS )i−y g(i − y, y) (56)
y=0 A PPENDIX G
i/2   P ROOF OF T HEOREM 7
X i
=1− γSy (1 − γS )i−y +
y Theorem 7. A smart insider can always learn βD with con-
y=0
i/2   fidence c within time t unless either the tough or the soft
X i defender must have an average utility per inside attacker
γSy (1 − γS )i−y . (57)
y−1 satisfies
y=1
q
Pi/2   uD (t) 1 2 1
Note that y=0 γSy (1 − γS )i−y yi in (57) is the cumulative ≤ | − βD | · 1 − (1 − c) tλN − (63)
tλI 2 2
probability from y = 0 to y = i/2 for a binomial distribution
with mean i · γS and variance i · γS · (1 − γS ). When i is large,
we can approximate such binomial distribution by a normal Proof: (Sketch) Consider γ T , the average value of γ
distribution with the same mean and variance. Thus, we have chosen by a tough defender. Note that (63) is satisfied as long
as a tough defender chooses γ T > 1/2 or a soft defender
i(1 − 2p)2
 
1 1 − 2γS
f (i) = − · erf . (58) chooses γ S < 1/2. Thus, we only need to consider the case
2(1 − γS ) 2(1 − γS ) 8p(1 − p) where γ T ≤ 1/2 and γ S ≥ 1/2. Without loss of generality,
where erf(·) is the Gaussian error function. limi→∞ f (i) ≥ consider 1/2 − γ T ≤ γ S − 1/2. Let sH be the number of
pβA /(1 − p). Since erf(i(1 − 2p)2 /(8p(1 − p))) → 1 when successful attacks in the history, which can be observed by a
i → ∞, we have smart insider. When the defender is tough, for a given time t,
sH can be considered as following a binomial distribution with
pβA p = γ T and the number of trials N = t · λN . The confidence
lim f (i) = . (59)
i→∞ 1−p interval of sH for confidence c is given by
" r r #
p(1 − p) p(1 − p)
p − Zc/2 · , p + Z1−c/2 · . (64)
A PPENDIX F N N
P ROOF OF T HEOREM 6
where Zi is the i percentile of a standard normal distribution.
Theorem 6. With Algorithm A in place, when t0 is sufficiently Clearly, a smart insider is able to learn βD with confidence c
large, the expected payoff of a soft defender with preference if the confidence intervals of sH for tough and soft defenders
S
parameter βD satisfies have no overlap. Thus, in order to prevent a smart insider from
uD (t) 1 − p − pβA having confidence t for learning βD , the parameter γ T must
S
lim ≥ · u0 (βD ) − (1 − βD ) · e−l0 λI γS . satisfy
t→∞ tλI 1−p
(60) r
γ T (1 − γ T )
The expected payoff of a tough defender with preference 2 · Z1−c/2 · ≥ 1 − γT. (65)
N
T
parameter βD satisfies
After some mathematical manipulation, we obtain
uD (t) T
lim = u0 (βD ). (61) q
t→∞ tλI 1 2
γ T ≤ − 1 − (1 − c) tλN . (66)
2
Proof: Applying Theorem 4, when n0 → ∞, the prob-
Thus, the average utility per inside attacker satisfies
ability that STATUS = ESTABLISHED after n0 attacks is at
most pβA /(1 − p). When STATUS = ESTABLISHED, the uD (t) 1
q
2 1
expected payoff of a soft defender is 0 because no attacks ≤ | − βD | · 1 − (1 − c) tλN − (67)
tλI 2 2
will be launched while no false positive will be triggered.
Otherwise, the expected payoff of a soft defender is with
19

A PPENDIX H
P ROOF OF T HEOREM 8
Theorem 8. With Algorithm B, an insider Ai will abstain if
1
rR (i) > , (68)
βA + 1
which holds as long as the number of detections exceeds the
number of undetected attacks
Proof: This theorem can be proved analogous to the proof
of Theorem 2. The expected payoff of the attacker is
E(uA (i)) = Pr{Ai attacks} · (1 − rR (i) · (1 + βA )). (69)
which satisfies E(uA (i)) ≤ 0 when rR (i) ≥ 1/(βA + 1).
Thus, an insider cannot benefit by launching its attack.

A PPENDIX I
P ROOF OF T HEOREM 9
Theorem 9. When t is sufficiently large, with Algorithm B,
the probability that no insider would launch attack at time t
satisfies
Z 1/λN
λN e−λN t
pE ≥ (1 − (1 − 2γS )·
0 2(1 − γS )
tλI (1 − 2p)2
 
erf dt. (70)
8p(1 − p)
In particular, when λN  λI , we have pE ≥ βA /(6 − 6p).
Proof: Algorithms A and B share an important common
property that STATUS = NULL if and only if ∀j < i, R(j) ≤
j/2, where R(j) is the number of detections for the first j
attacks. Thus, this theorem 9 can be proved by following the
exact same steps as in the proof of Theorem 5.

A PPENDIX J
P ROOF OF T HEOREM 10
Theorem 10 When Algorithm B is used, if λN  λI , the
expected payoff of the defender satisfies

uD βA (1 − βD ) 3/2
lim ≥ u0 (βD ) + √ (1 − 2βD + βD ),
n→∞ n 6 βD
(71)
which is maximized over all possible values of p.
Proof: When STATUS = ESTABLISHED, the expected
utility of the defender is
p
E(uD (i)) = −(1 − βD )βD . (72)
Otherwise, the expected utility is −(1 − βD ). Since the
probability that STATUS = ESTABLISHED after i attacks is
at least

pE βA βA (1 − βD )
lim f (i) = = √ , (73)
i→∞ 1−p 6 βD
the expected payoff of the defender satisfies the following
inequality:

uD βA (1 − βD ) 3/2
lim ≥ u0 (βD ) + √ (1 − 2βD + βD ).
n→∞ n 6 βD
(74)

Você também pode gostar