CizaThomas PHD Thesis

i
Performance Enhancement of Intrusion Detection

Systems using Advances in Sensor Fusion
A THESIS
SUBMITTED FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
IN THE FACULTY OF ENGINEERING
by
Ciza Thomas
Supercomputer Education and Research Centre
Indian Institute of Science
BANGALORE 560 012
April 2009
i
c _Ciza Thomas
April 2009
All rights reserved
i
DEDICATED WITH EXTREME AFFECTION AND GRATITUDE TO
my parents Mr. M.C. Thomas and Mrs. Accamma Thomas
my husband Dr. T. John Tharakan
my kids Alka and Alin
and
my research supervisor Prof. N. Balakrishnan
Acknowledgements
The endless thanks goes to Lord Almighty for all the blessings he has show-
ered onto me, which has enabled me to write this last note in my research work.
During the period of my research, as in the rest of my life, I have been blessed
by Almighty with some extraordinary people who have spun a web of support
around me. Words can never be enough in expressing how grateful I am to those
incredible people in my life who made this thesis possible. I would like an at-
tempt to Thank them for making my time during my research in the Institute a
period I will treasure.
I am deeply indebted to my research supervisor, Professor N. Balakrishnan for
presenting me such an interesting thesis topic. Each meeting with him added
invaluable aspects to the implementation and broadened my perspective. He has
guided me with his invaluable suggestions, lightened up the way in my darkest
times and encouraged me a lot in the academic life. From him I have learned to
think critically, to select problems, to solve them and to present their solutions.
I would like to thank him for furthering my education in many subjects like
probability theory, network security, pattern recognition and machine learning.
He has given me the best training within the country and even abroad by send-
ing me to Cylab and CERT of CMU, Pittsburg, US. It was more than I had ever
hoped for in my research life. His drive for scientic excellence has pushed me
to aspire for the same (but could never achieve it). It was a great pleasure for
me to have a chance of working with him. He was the best choice I could have
made for an advisor. Sometime we are just (or incredibly) lucky!!!
I consider it a great honor to have been part of the MMSL lab, SERC depart-
ment of the Indian Institute of Science and I salute the efforts of my Professor
for his support to the nation in many many ways. I would also like to mention
ii
iii
my deep gratitude towards Prof. R. Govindarajan, Chairman, SERC for all the
support provided to me while I was a student in the department.
I will be failing in my duty if I dont acknowledge some of my friends in the
campus with whom I have shared my research experiences since it was a joy
and enlightenment to me. I am fortunate to have a friend like Sharmili Roy
who has opened her heart and her problems to me in turn motivating me many
a times with her extraordinary brilliance and analytical perceptions. Ms. Neeta
Trivedi has helped me through the totally alien landscape of writing any doc-
ument correctly. Suneesh S.S. has been a caring friend who has helped me at
times of troubles in the campus.
I would like to address special thanks to the unknown reviewers of my the-
sis, for accepting to read and review this thesis. I wish to thank the authors,
developers and maintainers of the open source used in this work. I would like to
appreciate all the researchers whose works I have used, initially in understand-
ing my eld of research and later for updates. I would like to thank the many
people who have taught me starting with my school teachers, my undergraduate
teachers, and my graduate teachers and examiners especially Prof. Joy Kuri,
Prof. Vital Rao, Prof. Veni Madhavan, Prof. Anurag Kumar, and Prof. Mathew
Jacob.
It is with sincere gratitude that I wish to thank Prof. K.R. Ramakrishnan, Prof.S.
M. Rao, Prof. S.K. Sen, and Prof. K. Gopakumar for the caring they have pro-
vided. I consider it a great privilege to have associated with some great Profes-
sors in my eld of research namely Prof. Raj Reddy, Dr. B.V Dasarathy, Prof.
Dorothy Denning, Prof. P.K.Chan, Prof. Mathew Mahoney, CERT cylab group,
Dr.Athithan, and Mr. Philip and I appreciate the help rendered by them. Prof.
Jyothi Balakrishnan, Mr. Murali, and Ms. Reshmi need a special mention in
this acknowledgement for being particularly supportive during times of need. I
feel obliged to say thank you one more time. I also like to express my gratitude
to Dr. Latha Christy, whose thoughtful advise when I was away from my kids
during the days of my research, served to give me a sense of direction during
my PhD studies. I wish to thank Mr. Vishwas Sharma, Dr. G. Ravindra, Dr.J.
iv
Sujatha, Ms. K. Nagarthna, Ms. Swarna, Mr. Ravikumar, Mr. Sasikumar, Mr.
Sekhar, SERC security staff and a few others who have in some way or the other
helped me at various stages during my research life. And then, there are all the
other people who are not mentioned here but has helped in making IISc a very
special place over all these years. I would also like to thank my employers,
the Director of Technical Education, Govt. of Kerala, Principal, and Head of
the Department for the support and encouragement extended to me during my
period of research in the Institute.
I would express my deep sense of gratitude to the affection and support shown
to me by my parents-inlaw. My father-inlaw could not see me reach this stage of
my research and I acknowledge him in front of his memory. I take this opportu-
nity to dedicate this work to my parents who have made me what I am, husband
and children who have given consistent support throughout my research, and
my guide who had a vision in my research work. I learnt to aspire to a career
in research from my parents in my childhood, and later from my husband. My
parents have passed onto me a wonderful humanitarian lineage, whose value
cannot be measured by any worldly yardstick. The warmest of thanks to my
husband Dr. T. John Tharakan for his understanding and patience while I was
far away from home during the period of my research in the Institute. He has
supported me in each and every way, believed in me permanently and inspired
me in all dimensions of life. I am blessed with two wonderful kids Alka and
Alin, who knew only to encourage and never did complain about anything even
when they had to suffer a lot in my absence over these years. I owe everything
to them, without their everlasting love, this thesis would never be completed.
To you all, I dedicate this work.
All of you made it possible for me to reach this last stage of my endeavor.
Thank You from my heart-of-hearts.
Ciza Thomas
Publications based on this Thesis
International Journal Publications
1. Ciza Thomas and N. Balakrishnan, Improvement in Intrusion Detection
with Advances in Sensor Fusion, To appear in the IEEE Transactions on
Information Forensics and Security.
2. Ciza Thomas and N. Balakrishnan, Performance Enhancement in Attack
Detection with Skewness in Network Trafc, International Journal on In-
formation Fusion (under review).
3. Ciza Thomas and N. Balakrishnan, Data-dependent Decision Fusion of In-
trusion Detection Systems using Modied Evidence Theory, IEEE Trans-
actions on Information Forensics and Security (under review).
4. Ciza Thomas and N. Balakrishnan, Modeling the Attack-Detection Sce-
nario with Network Intrusion Detection Systems, International Journal of
Security and Networks (under review).
5. Ciza Thomas and N. Balakrishnan, Improvement in Intrusion Detection
with Advances in Sensor Fusion, International Journal of Security and Net-
works (under review).
6. Ciza Thomas and N. Balakrishnan, Sensor Fusion for Performance En-
hancement of Intrusion Detection Systems, IEEE Transactions on Depend-
able and Secure Computing (to be communicated).
7. Ciza Thomas and N. Balakrishnan, Intrusion Detection Systems: A survey,
ACM surveys (to be communicated).
v
vi
International Conference Publications
1. Ciza Thomas and N. Balakrishnan, Selection of Intrusion Detection Thresh-
old for Effective Sensor Fusion, International Symposium on Defense and
Security, Proceedings of SPIE, 6570, 5, 2007.
2. Ciza Thomas, Vishwas Sharma and N. Balakrishnan, Usefulness of DARPA
Dataset for Intrusion Detection System Evaluation, International Sympo-
sium on Defense and Security, Proceedings of SPIE, 6973, 15, 2008.
3. Ciza Thomas and N. Balakrishnan, Improvement in Minority Attack De-
tection with Skewness in Network Trafc, International Symposium on
Defense and Security, Proceedings of SPIE, 6973, 23, 2008.
4. Ciza Thomas and N. Balakrishnan, Advanced Sensor Fusion Technique for
Enhanced Intrusion Detection, Proceedings of the IEEE International Con-
ference on Intelligence and Security Informatics, 1-4244-2415, pp. 173-
178, 2008, available online in IEEEXplore.
5. Ciza Thomas and N. Balakrishnan, Performance Enhancement of Intrusion
Detection Systems using Advances in Sensor Fusion, Proceedings of the
International Conference on Information Fusion, 4883, 2, pp. 1671-1677,
2008.
6. Ciza Thomas and N. Balakrishnan, Modied Evidence Theory for Perfor-
mance Enhancement of Intrusion Detection Systems, Proceedings of the
International Conference on Information Fusion, 4883, 2, pp. 1751-1758,
2008.
7. Ciza Thomas and N. Balakrishnan, Mathematical Analysis of Sensor Fu-
sion for Intrusion Detection Systems, Proceedings of the International Con-
ference on Communications and Networking, 97, 2009, available online in
IEEEXplore.
Research Symposium Publication
1. Ciza Thomas and N. Balakrishnan, Performance Enhancement of Intrusion
Detection Systems using Advances in Sensor Fusion, Techvista, Microsoft
Research Symposium, 2007.
Abstract
The technique of sensor fusion addresses the issues relating to the optimality
of decision-making in the multiple-sensor framework. The advances in sensor
fusion enable to perform intrusion detection for both rare and new attacks. This
thesis discusses this assertion in detail, and describes the theoretical and exper-
imental work done to show its validity.
The attack-detector relationship is initially modeled and validated to understand
the detection scenario. The different metrics available for the evaluation of in-
trusion detection systems are also introduced. The usefulness of the data set
used for experimental evaluation has been demonstrated. The issues connected
with intrusion detection systems are analyzed and the need for incorporating
multiple detectors and their fusion is established in this work. Sensor fusion
provides advantages with respect to reliability and completeness, in addition to
intuitive and meaningful results. The goal for this work is to investigate how
to combine data from diverse intrusion detection systems in order to improve
the detection rate and reduce the false-alarm rate. The primary objective of the
proposed thesis work is to develop a theoretical and practical basis for enhanc-
ing the performance of intrusion detection systems using advances in sensor
fusion with easily available intrusion detection systems. This thesis introduces
the mathematical basis for sensor fusion in order to provide enough support for
the acceptability of sensor fusion in performance enhancement of intrusion de-
tection systems. The thesis also shows the practical feasibility of performance
enhancement using advances in sensor fusion and discusses various sensor fu-
sion algorithms, its characteristics and related design and implementation is-
sues. We show that it is possible to build performance enhancement to intrusion
detection systems by setting proper threshold bounds and also by rule-based fu-
sion. We introduce an architecture called the data-dependent decision fusion as
vii
viii
a framework for building intrusion detection systems using sensor fusion based
on data-dependency. Furthermore, we provide information about the types of
data, the data skewness problems and the most effective algorithm in detecting
different types of attacks. This thesis also proposes and incorporates a modied
evidence theory for the fusion unit, which performs very well for the intrusion
detection application. The future improvements in individual IDSs can also
be easily incorporated in this technique in order to obtain better detection ca-
pabilities. Experimental evaluation shows that the proposed methods have the
capability of detecting a signicant percentage of rare and new attacks. The
improved performance of the IDS using the algorithms that has been developed
in this thesis, if deployed fully would contribute to an enormous reduction of
the successful attacks over a period of time. This has been demonstrated in the
thesis and is a right step towards making the cyber space safer.
Keywords
Intrusion Detection Systems, Sensor Fusion, Negative Binomial Distribution,
Chebyshev Inequality, Data-dependent Decision Fusion, Neural Network, Base-
rate Fallacy, Accuracy Paradox, Dempster-Shafer Evidence Theory, Context-
dependent Operator
ix
Notation and Abbreviations
Notation Details
Assigns weight to precision over recall
0
Mean value of the normal trafc prole
1
Mean value of the attack trafc prole
F Set of all focal points
0
Standard deviation of normal trafc
1
Standard deviation of attack trafc
i
Standard deviation of Sensor indexed i
av
Average standard deviation
fusion
Standard deviation of fused Sensor
0
False alarm rate xed for acceptable detection
False alarm rate at the fusion center
Frame of Discerment
Correlation coefcient
n
Correlation coefcient of n sensors
j
i,k
Correlation coefcient between i
th
and k
th
detectors
i
(s) Bahadur-Lazarsfeld polynomial
A
t
Number of attacks at any time t
A
t+1
Number of attacks at any time (t + 1)
Bel(A) Belief of hypothesis A
C
j
Class labels
D Detection rate
D
i
Detection rate of Sensor indexed i
D
t
Number of detectors at any time t
x
xi
Notation Details
D
t+1
Number of detectors at any time (t + 1)
E(s) Expectation of s
F False positive rate
F(X) Cumulative distribution function
F
i
False positive rate of Sensor indexed i
FP
i
False Positives of Sensor indexed i
F
j
Fusion function for any input indexed j
G
x
Average occurrence of X
L Likelihood function
N Total number of experiments
N
f
Number of experiments where all the Sensors fail to detect
N
t
Number of experiments where all the Sensors detect correctly
N
e
Number of encounters between the detector and the attack
P Precision
P(s) Probability density function of s
P
e
Probability of error
Pl(A) Plausibility of hypothesis A
P() Power set of FoD
P
0
Prior probability of normal trafc
P
1
Prior probability of attack trafc
R Recall
T Threshold
TP
i
True Positives of Sensor indexed i
V ar(s) Variance of s
a
i
Ambiguity
a Attack increase rate
c Detector learning parameter
d Detector efciency
e Error rate
e
f
(x) Feature extractor
e
1
, ...e
m
Unknown feature list
g Clumping factor
n Number of Sensors in the fusion unit
m Detector correlation
xii
Notation Details
m(A) Basic Probability Assignment
t Time in years
p
i
Probability of detection of Sensor indexed i
p(A) Probability of hypothesis A
(x
n
, y
n
) Training data set
s Fusion output
s
j
i
Output of Sensor indexed i corresponding to an input x
j
s
i
Set of parameters associated with sensor indexed i
x Network trafc as an input vector
v
r
Variance reduction factor
Abbreviation Details
AFRL AirForce Research Laboratory
ALAD Application Layer Anomaly Detector
ANN Articial Neural Network
ARPANET Advanced Research Project Administration NETwork
AUC Area Under Curve
BPA Basic Probability Assignment
CD Context Dependent
CERT Computer Emergency Response Team
CERT/CC Computer Emergency Response Team / Coordination Center
CRB Cramer Rao Bound
CTF Capture The Flag
CV Coefcient of Variation
DARPA Defense Advanced Research Projects Agency
DCost Damage Cost
DD Data-dependent Decision
DDoS Distributed Denial of Service
DS Dempster-Shafer
DOD Department of Defense
DoS Denial of Service
DRDoS Distributed Reector DoS
D-Tree Decision Tree
xiii
Abbreviation Details
EMERALD Event Monitoring Enabling Responses to Anomalous Live Disturbances
FBI Federal Bureau of Investigation
FCM Fuzzy Cognitive Maps
F-score Figure of merit score (or F-measure)
FN False Negative
FoD Frame of Discernment
FP False Positive
FTP File Transfer Protocol
IC3 Internet Crime Complaint Center
ICMP Internet Control Message Protocol
IDS Intrusion Detection Systems
IIDS Intelligent Intrusion Detection System
IP Internet Protocol
KDD Knowledge Discovery in Databases
LR Likelihood Ratio
MAP Maximum A Posteriori
MIT Massuchusetts Institute of Technology
NB Naive Bayes
PHAD Packet Header Anomaly Detector
P-test A signicance test
RBF Radial Basis Function
R2L Remote to Local
RB Rule based
ROC Receiver Operating Characteristics
RCost Response Cost
SSH Secure SHell
SVM Support Vector Machine
TBM Transferable Belief Model
TCP Transmission Control Protocol
TN True Negative
TP True Positive
U2R User to Root
UDP User Datagram Protocol
Contents
Acknowledgements ii
Publications based on this Thesis v
Abstract vii
Keywords ix
Notation and Abbreviations x
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Intrusion Detection Systems: Background . . . . . . . . . . . . 1
1.2.1 Growth of the Internet . . . . . . . . . . . . . . . . . . 1
1.2.2 Growth of Internet attacks . . . . . . . . . . . . . . . . 2
1.2.3 Cyber crimes in India . . . . . . . . . . . . . . . . . . . 3
1.2.4 Financial risks in corporate networks . . . . . . . . . . 4
1.2.5 Need for Intrusion Detection Systems . . . . . . . . . . 5
1.2.6 Current status, challenges and limitations of IDS . . . . 7
1.2.7 Open issues . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Major contributions of this thesis . . . . . . . . . . . . . . . . . 13
1.5.1 Theoretical formulation . . . . . . . . . . . . . . . . . 13
1.5.2 Experimental validation . . . . . . . . . . . . . . . . . 14
1.6 Research goal . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.7 Organization of the thesis . . . . . . . . . . . . . . . . . . . . . 14
xiv
CONTENTS xv
2 Issues Connected with Single IDSs and the Attack-Detection Sce-
nario 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Attackers inuence on the detection environment . . . . . . . . 20
2.3 Data skewness in network trafc . . . . . . . . . . . . . . . . . 20
2.3.1 Classication of attacks . . . . . . . . . . . . . . . . . 21
2.3.2 Identication of real-world network trafc problems . . 24
2.3.3 Non-uniform misclassication cost . . . . . . . . . . . 28
2.3.4 Inability of IDS in optimum decision making due to
data skewness . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Attack-Detection Scenario in a Secured Environment . . . . . . 31
2.4.1 Internet attacks and the countermeasure for detection . . 32
2.4.2 Testing the performance of Intrusion Detection Systems 33
2.5 Modeling the attack-detector relationship . . . . . . . . . . . . 38
2.5.1 Detectors learning from the detected attacks . . . . . . . 45
2.5.2 Detector correlation . . . . . . . . . . . . . . . . . . . 46
2.6 Validation of the model using real-world data . . . . . . . . . . 48
2.6.1 Discussion on the modeling . . . . . . . . . . . . . . . 49
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3 Evaluation and Test-bed of Intrusion Detection Systems 53
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 Usefulness of DARPA data set for IDS evaluation . . . . . . . . 57
3.3.1 Criticisms against the DARPA IDS evaluation data set . 57
3.3.2 Facts in support of the DARPA IDS evaluation data set . 58
3.3.3 Results and discussion . . . . . . . . . . . . . . . . . . 59
3.4 Choice and the performance improvement of individual IDSs . . 64
3.4.1 Snort: Improvements by adding new rules . . . . . . . . 65
3.4.2 PHAD/ALAD . . . . . . . . . . . . . . . . . . . . . . 66
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4 Mathematical Basis for Sensor Fusion 68
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2 Sensor fusion algorithms . . . . . . . . . . . . . . . . . . . . . 69
4.2.1 Machine Learning for intrusion detection . . . . . . . . 70
4.2.2 Evidence Theory . . . . . . . . . . . . . . . . . . . . . 72
CONTENTS xvi
4.2.3 Kalman lter . . . . . . . . . . . . . . . . . . . . . . . 73
4.2.4 Bayesian network . . . . . . . . . . . . . . . . . . . . . 73
4.3 Related work in sensor fusion . . . . . . . . . . . . . . . . . . . 73
4.4 Related work using sensor fusion in intrusion detection appli-
cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5 Theoretical formulation . . . . . . . . . . . . . . . . . . . . . . 78
4.6 Solution approaches . . . . . . . . . . . . . . . . . . . . . . . . 86
4.6.1 Dempster-Shafer combination method . . . . . . . . . . 88
4.6.2 Analysis of detection error assuming trafc distribution . 93
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5 Selection of Threshold Bounds for Effective Sensor Fusion 104
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2 Modeling the fusion IDS by dening proper threshold bounds . 105
5.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . 109
5.3.1 Experimental evaluation . . . . . . . . . . . . . . . . . 109
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6 Performance Enhancement of IDS using Rule-based Fusion and Data-
dependent Decision Fusion 114
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2 Rule-based fusion . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3 Data-dependent decision fusion . . . . . . . . . . . . . . . . . . 117
6.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . 118
6.3.2 Data-dependent decision fusion architecture . . . . . . . 118
6.3.3 Detection of rarer attacks . . . . . . . . . . . . . . . . . 121
6.4.1 Test setup . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.4.2 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.4.3 Data-dependent decision fusion algorithm . . . . . . . . 123
6.4.4 Experimental evaluation . . . . . . . . . . . . . . . . . 126
6.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 131
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7 Modied Dempster-Shafer Theory for Intrusion Detection Systems 134
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.2 Dempster Shafer combination method . . . . . . . . . . . . . . 136
CONTENTS xvii
7.2.1 Motivation for choosing the Dempster Shafer combina-
tion method . . . . . . . . . . . . . . . . . . . . . . . . 137
7.2.2 Limitations of the Dempster-Shafer combination . . . . 138
7.3 Disjunctive combination of evidence . . . . . . . . . . . . . . . 141
7.4 Context-dependent operator . . . . . . . . . . . . . . . . . . . . 142
7.4.1 Performance of the proposed combination operator . . . 145
7.4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 149
7.5 Experimental evaluation . . . . . . . . . . . . . . . . . . . . . 151
7.5.1 Impact of this work . . . . . . . . . . . . . . . . . . . . 153
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8 Modeling of Intrusion Detection Systems and Sensor Fusion 156
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
8.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.3 Modeling of data-dependent decision fusion system . . . . . . . 157
8.3.1 Modeling of Intrusion Detection Systems . . . . . . . . 158
8.3.2 Modeling the fusion IDS . . . . . . . . . . . . . . . . . 159
8.3.3 Statement of the problem . . . . . . . . . . . . . . . . . 161
8.3.4 The effect of setting threshold . . . . . . . . . . . . . . 162
8.3.5 Modeling of neural network learner unit . . . . . . . . . 164
8.3.6 Dependence on the data and the individual IDSs . . . . 165
8.3.7 Threshold optimization . . . . . . . . . . . . . . . . . . 166
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
9 Conclusions 169
9.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
A Attacks on the Internet: A study 174
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
A.2 History of Internet attacks . . . . . . . . . . . . . . . . . . . . 175
A.3 Attack motivation and objectives . . . . . . . . . . . . . . . . . 176
A.4 Attack taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . 177
A.4.1 Viruses . . . . . . . . . . . . . . . . . . . . . . . . . . 177
A.4.2 Worms . . . . . . . . . . . . . . . . . . . . . . . . . . 179
CONTENTS xviii
A.4.3 Trojans . . . . . . . . . . . . . . . . . . . . . . . . . . 179
A.4.4 Buffer overows . . . . . . . . . . . . . . . . . . . . . 180
A.4.5 Denial of Service attacks . . . . . . . . . . . . . . . . . 181
A.4.6 Network-based attacks . . . . . . . . . . . . . . . . . . 185
A.4.7 Password attacks . . . . . . . . . . . . . . . . . . . . . 186
A.4.8 Information gathering attacks . . . . . . . . . . . . . . 186
A.4.9 Blended attacks . . . . . . . . . . . . . . . . . . . . . . 188
A.5 Top ten cyber security menaces for 2008 . . . . . . . . . . . . . 188
A.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
B Intrusion Detection Systems: A survey 192
B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
B.2 History of Intrusion Detection Systems . . . . . . . . . . . . . . 193
B.2.1 The emergence of intrusion detection systems . . . . . . 194
B.3 Taxonomy of Intrusion Detection System . . . . . . . . . . . . 210
B.3.1 Intrusion detection methods . . . . . . . . . . . . . . . 210
B.3.2 Deployment techniques . . . . . . . . . . . . . . . . . 213
B.3.3 Information source . . . . . . . . . . . . . . . . . . . . 218
B.3.4 Architecture . . . . . . . . . . . . . . . . . . . . . . . . 219
B.3.5 Analysis frequency . . . . . . . . . . . . . . . . . . . . 221
B.3.6 Response . . . . . . . . . . . . . . . . . . . . . . . . . 221
B.4 Latest Intrusion Detection softwares . . . . . . . . . . . . . . . 222
B.5 Review of the data processing techniques used in IDS . . . . . . 228
B.6 Current Intrusion Detection research . . . . . . . . . . . . . . . 231
B.6.1 Intrusion Prevention System . . . . . . . . . . . . . . . 231
B.7 Intrusion detection using multi-sensor fusion . . . . . . . . . . . 233
B.7.1 Existing fusion IDSs . . . . . . . . . . . . . . . . . . . 234
B.7.2 Current status of applying sensor fusion in IDS . . . . . 236
B.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
C Modeling of the Internet Attacks and the Countermeasure for De-
tection 237
C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
C.2 Nicholson-Bailey model . . . . . . . . . . . . . . . . . . . . . 239
C.2.1 Attack/Detection as they stand alone . . . . . . . . . . . 242
C.2.2 Attack carrying capacity . . . . . . . . . . . . . . . . . 243
C.2.3 Stability in attack-detector model . . . . . . . . . . . . 245
CONTENTS xix
C.2.4 Inclusion of stealthy attacks . . . . . . . . . . . . . . . 246
C.2.5 Modeling of non-random attacks and detection . . . . . 247
C.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
D Methodology for Evaluation of Intrusion Detection Systems 249
D.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
D.2 Metrics for performance evaluation . . . . . . . . . . . . . . . . 250
D.2.1 Detection rate and false alarm rate . . . . . . . . . . . . 250
D.2.2 Receiver Operating Characteristic (ROC) Curve . . . . . 251
D.2.3 The Area Under ROC Curve (AUC) . . . . . . . . . . . 251
D.2.4 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 251
D.2.5 Precision . . . . . . . . . . . . . . . . . . . . . . . . . 252
D.2.6 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
D.2.7 F-score . . . . . . . . . . . . . . . . . . . . . . . . . . 252
D.2.8 P-test . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
D.3 Test setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
D.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
References 256
List of Tables
2.1 Details of attack types present in DARPA 1999 data set [27] . . 22
2.2 Performance dependence of Snort on base-rate . . . . . . . . . . 26
2.3 Damage cost and response cost of different attack types [24] . . 29
2.4 Cost matrix [52] . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Attacks present in DARPA 1999 data set . . . . . . . . . . . . . 56
3.2 Attacks detected by Snort from the DARPA 1999 data set . . . . 60
3.3 Attacks detected by PHAD from the DARPA 1999 data set . . . 60
3.4 Attacks detected by ALAD from the DARPA 1999 data set . . . 61
3.5 Attacks detected by Cisco IDS from the DARPA 1999 data set . 61
5.1 Types of attacks detected by PHAD at 0.00002 FP rate (100 FPs) 110
5.2 Types of attacks detected by ALAD at at 0.00002 FP rate (100
FPs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3 Types of attacks detected by the combination of ALAD and
PHAD at 0.00004 FP rate (200 FPs) . . . . . . . . . . . . . . . 110
5.4 F-score of PHAD for different choice of false positives . . . . . 110
5.5 F-score of ALAD for different choice of false positives . . . . . 110
5.6 F-score of fused IDS for different choice of false positives . . . 111
5.7 Comparison of the evaluated IDSs using the real-world network
trafc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.1 Types of attacks detected by the rule-based combination of ALAD
and PHAD at a FP rate of 0.000025 (125 FPs) . . . . . . . . . . 117
6.2 Types of attacks detected by PHAD at a false positive rate of
0.00002 (100 FPs) . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3 Types of attacks detected by ALAD at a false positive rate of
0.00002 (100 FPs) . . . . . . . . . . . . . . . . . . . . . . . . . 127
xx
LIST OF TABLES xxi
6.4 Types of attacks detected by Snort at a false positive rate of
0.0002 (1000 FPs) . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.5 Types of attacks detected by DD fusion IDS at a false positive
rate of 0.00002 (100 FPs) . . . . . . . . . . . . . . . . . . . . . 127
6.6 Comparison of the evaluated IDSs with various evaluation metrics127
6.7 Detection of different attack types by single IDSs and data-
dependent decision fusion IDS . . . . . . . . . . . . . . . . . . 127
6.8 Comparison of the evaluated IDSs using the real-world data set . 130
6.9 Performance comparison of single IDSs and DD fusion IDS . . 130
7.1 Evidence with total conict . . . . . . . . . . . . . . . . . . . . 139
7.2 Evidence with conict . . . . . . . . . . . . . . . . . . . . . . 139
7.3 Evidence from four sensors with one unreliable using the DS
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.4 Evidence fromfour sensors with one unreliable using the context-
dependent operator . . . . . . . . . . . . . . . . . . . . . . . . 146
7.5 Belief of each of the IDSs for a R2L attack . . . . . . . . . . . . 152
7.6 Belief of each of the IDSs for a stealthy probe . . . . . . . . . . 152
7.7 Type of attacks detected by PHAD at 100 false alarms . . . . . . 152
7.8 Type of attacks detected by ALAD at 100 false alarms . . . . . 152
7.9 Type of attacks detected by Snort at 1000 false alarms . . . . . . 152
7.10 Type of attacks detected by context-dependent fusion at 100
false alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.11 Comparison of the evaluated IDSs using the real-world data set . 154
8.1 Average probability of error with DD fusion algorithms . . . . . 168
List of Figures
1.1 Growth of Internet in terms of the host count over the years [2] . 2
1.2 A typical security scenario in any network . . . . . . . . . . . . 6
2.1 Probability of attack vs Bayesian attack detection rate for xed
values of FP and TP . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Probability of attack vs F-score for Snort . . . . . . . . . . . . . 26
2.3 Receiver Operator Characteristic graph . . . . . . . . . . . . . . 34
2.4 Trade-off between recall and precision . . . . . . . . . . . . . . 34
2.5 Precision and recall of IDS over the years . . . . . . . . . . . . 36
2.6 Plot of F-score over the years . . . . . . . . . . . . . . . . . . . 36
2.7 Growth in the incidents reported to CERT . . . . . . . . . . . . 36
2.8 Growth in the vulnerabilities reported by CERT . . . . . . . . . 36
2.9 Incidents reported over a period of time . . . . . . . . . . . . . 38
2.10 D(1)=60, d=0.7 . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.11 D(1)=40, d=0.7 . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.12 D(1)=80, d=0.7 . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.13 D(1)=60, d=0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.14 D(1)=60, d=1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.15 Effect of detector efciency on the attack growth rate . . . . . . 48
4.1 Fusion architecture with decisions from n IDSs . . . . . . . . . 81
5.1 Parametric curve showing the choice of threshold T . . . . . . . 106
5.2 Detection rate vs Threshold . . . . . . . . . . . . . . . . . . . . 111
5.3 Precision vs Threshold . . . . . . . . . . . . . . . . . . . . . . 112
5.4 F-score vs Threshold . . . . . . . . . . . . . . . . . . . . . . . 112
5.5 False Negative Rate vs Threshold . . . . . . . . . . . . . . . . . 112
6.1 Data-dependent Decision fusion architecture . . . . . . . . . . . 119
6.2 Performance of evaluated systems . . . . . . . . . . . . . . . . 128
xxii
LIST OF FIGURES xxiii
6.3 Semilog ROC curve of single and DD fusion IDSs . . . . . . . . 128
6.4 Comparison of evaluated systems . . . . . . . . . . . . . . . . . 129
6.5 Detection of Attack Types . . . . . . . . . . . . . . . . . . . . 129
7.1 Detection of Attack Types . . . . . . . . . . . . . . . . . . . . 153
8.1 Parallel Decision Fusion Network . . . . . . . . . . . . . . . . 160
8.2 Average probability of error . . . . . . . . . . . . . . . . . . . 168
A.1 Plot of Attack sophistication vs Intruder Knowledge over the
years . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
A.2 Distributed Denial of Service . . . . . . . . . . . . . . . . . . . 183
A.3 Distributed Reector DoS (DRDoS) . . . . . . . . . . . . . . . 184
A.4 Taxonomy of Distributed DoS . . . . . . . . . . . . . . . . . . 185
B.1 Taxonomy of Intrusion Detection Systems . . . . . . . . . . . . 211
C.1 Attack-Detector relationship using the Nicholson-Bailey model . 241
C.2 Attack-Detector relationship with attack carrying capacity . . . 245
Chapter 1
Introduction
Life was simpler before World War II. After that, we had systems.
Admiral Grace Hopper
1.1 Introduction
Security attacks through Internet have proliferated in recent years. Hence, in-
formation security is an issue of very serious global concern of the present time.
This chapter discusses the growth of the Internet and the tremendous security
threat posed by the increased complexity, accessibility and openness of the In-
ternet. The importance of protecting the corporate network has been introduced.
The need for network security and in particular the need for Intrusion Detection
Systems (IDS) have been brought out. This chapter also includes the motivation
for the work reported in this thesis, scope and objective of the thesis, the major
contributions of this thesis and synopsis of all the chapters of this thesis.
1.2 Intrusion Detection Systems: Background
1.2.1 Growth of the Internet
Internet has made real what in the 1970s a visionary of communications Mar-
shall McLuhan called the Global Village [1]. In a matter of very few years,
the Internet has consolidated itself as a very powerful platform that has changed
the way we do business, and the way we communicate. The Internet, as no
other medium, has given an International or, else, a Globalized dimension to
1
Chapter 1 2
the world. It is the Universal source of information.
The Internet came into existence in the late seventies as an outgrowth of the
ARPANET, a DOD project. The incredibly fast evolution of the Internet from
1995 till the present time is understood from Figure 1.1 by taking the statistics
over this period of time. At the end of 1995, there were about 16 million Inter-
net users, which was about 0.4% of the total population. By the end of 2005,
in a decade it has increased to 1018 million Internet users, which was about
15.7% of the total population. As of June 10, 2008, 1.596 billion people use the
Internet according to Internet World Stats.
1994 1996 1998 2000 2002 2004 2006 2008
0
1
2
3
4
5
6
x 10
8
YEAR
H
O
S
T

C
O
U
N
T
GROWTH OF INTERNET IN TERMS OF HOST COUNT
Figure 1.1: Growth of Internet in terms of the host count over the years [2]
1.2.2 Growth of Internet attacks
The growth of Internet has brought about great benets to the modern society;
meanwhile, the rapidly increasing connectivity and accessibility to the Internet
has posed a tremendous security threat. The growth of attacks has roughly par-
alleled the growth of Internet [3]. Malicious usage, attacks and sabotage have
been on the rise as more and more computers are put into use. The attacks on
Chapter 1 3
the Internet have become both more prolic and easier to implement because
of the ubiquity of the Internet and the pervasiveness of easy-to-use operating
systems and development environments.
There are multiple penetration points for intrusions to take place in a network
system. For example, at the network level carefully crafted malicious IP pack-
ets can crash a victim host; at the host level, vulnerabilities in system software
can be exploited to yield an illegal root shell. The security threats have ex-
ploited all kinds of networks ranging from traditional computers to point-to-
point and distributed networks. These security threats have also exploited the
vulnerable protocols and operating systems extending attacks to operating sys-
tem on various kinds of applications, such as database and web servers. The
most popular operating systems regularly publish updates, but the combination
of poorly administered machines, uninformed users, a vast number of targets,
and ever-present software bugs has allowed exploits to remain ahead of patches.
Appendix A includes an involved study of the attacks on the Internet for further
reference.
1.2.3 Cyber crimes in India
The general and US focused trend of the cyber attacks have been highlighted in
the beginning of the section; but the trend in India is important to be looked into.
In the Indian scenario with e-commerce becoming popular in the last few years,
cyber crime is a term used to broadly describe criminal activity in which com-
puters or computer networks are a tool, a target, or a place of criminal activity
and include everything from electronic cracking to denial of service attacks [4].
It is also used to include traditional crimes in which computers or networks are
used to enable the illicit activity. A key nding of the Economic Crime Sur-
vey 2006 was that a typical perpetrator of economic crime in India was male
(almost100%), a graduate or undergraduate and 31-50 years of age. Further,
one third of the frauds were from insiders and over 37% of them were in senior
managerial positions.
Trying to identify how widespread is the crime in India and the world over,
Chapter 1 4
experts feel that only a tiny proportion of cyber crime incidents are actually re-
ported world over. In India, cyber crime cases registered are less compared to
the US, Europe, etc. The Internet Crime Complaint Center (IC3) 2006 ranks
the US (60.9%) as rst among the nations in hosting perpetrators followed by
the UK (15.9%). Many countries, including India, have established Computer
Emergency Response Teams (CERTs) with an objective to coordinate and re-
spond during major security incidents/events. These organizations identify and
address existing and potential threats and vulnerabilities in the system and co-
ordinate with stakeholders to address these threats.
1.2.4 Financial risks in corporate networks
The threats on the Internet can translate to substantial losses resulting from
business disruption, loss of time and money, and damage to reputation. The
nancial impact of application downtime and lost productivity caused by the
increasing number of application level vulnerabilities and frequency of attacks
is substantial.
According to the census conducted in US in 2007, volume of business-to-business
commerce has increased fromUS $38 billion in 1997 to US $990 billion in 2006
(an increase of 6.3%). The total E-commerce revenue of US $20 billion in 1999
has increased to US $990 billion in 2006. It was predicted that by 2008, online
retail sales will account for 10 percent of total US retail sales. There may have
been boom in the usage of Internet and online businesses; but one main issue
is security of online environment that is affecting both the users and businesses
alike.
According to US report, cyber crime robs US business about 67.2 billions a
year. Over the past two years, US consumers have lost US $8 billion to online
fraud schemes. The online fraudsters are not only cheating online business, but
they are also increasing the perception of fear among consumers. The 2005
annual computer crime and security survey [5], jointly conducted by the Com-
puter Security Institute and the FBI, indicated that the nancial losses incurred
by the respondent companies due to network attacks were US $130 million.
Chapter 1 5
In another survey commissioned by VanDyke Software in 2003, found that re-
walls alone are not sufcient to provide adequate protection. Moreover, ac-
cording to recent studies, an average of twenty to forty new vulnerabilities are
discovered every month in commonly used networking and computer products.
Such wide-spread vulnerabilities in software add to todays insecure comput-
ing/networking environment.
1.2.5 Need for Intrusion Detection Systems
Intrusions refer to the network attacks against vulnerable services, data-driven
attacks on applications, host-based attacks like privilege escalation, unautho-
rized logins and access to sensitive les, or malware like viruses, worms and
trojan horses. These actions attempt to compromise the integrity, condentiality
or availability of a resource. Intrusions result in services being denied, system
failing to respond or data stolen or being lost. Intrusion detection means detect-
ing unauthorized use of a system or attacks on a system or network. Intrusion
Detection Systems are implemented in software or hardware in order to detect
these activities.
An Intrusion Detection System (IDS) typically operates behind the rewall as
shown in Figure 1.2, looking for patterns in network trafc that might indicate
malicious activity. Thus, IDSs are used as the second and the nal level of de-
fense in any protected network against attacks that breach other defences. The
need for this second layer of protection is many times questioned like Do we
need an IDS once we have a rewall?. To briey answer this question, it is
required to understand what a rewall does and does not do, and what an IDS
does and does not do. This will help in realizing the need for both IDS and
rewall to help in securing a network.
The existing network security solutions, including rewalls, were not designed
to handle network and application layer attacks such as Denial of Service and
Distributed Denial of Service attacks, worms, viruses, and Trojans. Along with
the drastic growth of the Internet, the high prevalence of the threats over the
Internet has been the reason for the security personnel to think of IDSs.
Chapter 1 6
Figure 1.2: A typical security scenario in any network
The unauthorized activities on the Internet are not only by the external attackers
but also by internal sources, such as fraudulent employees or people abusing
their privileges for personal gain or revenge. These internal activities cannot
be prevented by a rewall which usually stops the external trafc from entering
the internal network. Firewalls are made to stop unnecessary network trafc
into or out of any network. Packet ltering rewalls typically will scan a packet
for layer 3 and layer 4 protocol information. There are not very much dynamic
defensive abilities to most rewalls. The trafc approaching the rewall either
matches up to applied rule and is allowed through or the trafc is stopped and
the rewall logs the blocked trafc. As a result, IDSs, as originally introduced
by Anderson [6] in 1980 and later formalized by Denning [7] in 1987, have re-
ceived increasing attention in the recent years. The IDSs along with the rewall
form the fundamental technologies for network security.
IDSs can be categorized into two classes, anomaly based IDSs and misuse based
IDSs. (Appendix B provides a detailed survey on various IDSs.) Anomaly
based IDSs look for deviations from normal usage behavior to identify abnor-
mal behavior. Misuse based, on the other hand, recognize patterns of attack.
Anomaly detection techniques rely on models of the normal behavior of a com-
puter system. These models may focus on the users, the applications, or the
network. Behavior proles are built by performing statistical analysis on histor-
ical data [8, 9], or by using rule based approaches to specify behavior patterns
Chapter 1 7
[10, 11, 12]. A basic assumption of anomaly detection is that attacks differ from
normal behavior in type and amount. By dening whats normal, any violation
can be identied, whether it is part of threat model or not. However, the ad-
vantage of detecting previously unknown attacks is paid for in terms of high
false-positive rates in anomaly detection systems. It is also difcult to train
an anomaly detection system in highly dynamic environments. The anomaly
detection systems are intrinsically complex and also there is some difculty in
determining which specic event triggered the alarms.
On the other hand, misuse detection systems essentially contain attack descrip-
tions or signatures and match them against the audit data stream, looking for
evidence of known attacks [5, 13]. The main advantage of misuse detection
systems is that they focus analysis on the audit data and typically produce few
false positives. The main disadvantage of misuse detection systems is that they
can detect only known attacks for which they have a dened signature. As
new attacks are discovered, developers must model and add them to the signa-
ture database. In addition, signature-based IDSs are more vulnerable to attacks
aimed at triggering a high volume of detection alerts by injecting trafc that has
been specically crafted to match the signatures used in the analysis process.
This type of attack can be used to exhaust the resources on the IDS computing
platform and to hide attacks within the large number of alerts produced.
In contrast to rewalls, a misuse based IDS will scan all packets at layers 3
and 4 as well as the application level protocols looking for back door Trojans,
Denial of Service attacks, worms, buffer overow attacks, detect scans against
the network etc. An IDS provides much greater visibility to detect signs of
attacks and compromised hosts. There is still the need for a rewall to block
trafc before it enters the network; but, an IDS is also needed to make sure that
the trafc that gets past the rewall will be monitored.
1.2.6 Current status, challenges and limitations of IDS
Current cyber security capabilities have evolved largely as trivial patches and
add-ons to the Internet, which were designed on the principles of open com-
munication and implicit mutual trust. It is now recognized that it is no longer
Chapter 1 8
sufcient to follow such evolutionary paths and that security must be consid-
ered as a sophisticated research and design part of the information infrastruc-
ture. With all the progress that IDS have made over the last few years, it still has
some major challenges. The analysis is always slow and often computationally
intensive. Hence, intrusion detection programs used to detect intrusions after
the intrusions have occurred. There is still a little hope to catch an attack in
progress.
The attackers continue to nd ingenious ways to compromise remote hosts and
frequently make their tools publicly available. Also, the increasing size and
complexity of the Internet along with the end host operating systems, make it
more prone to vulnerabilities. Additionally, there is only very little broad under-
standing of the intrusion activity due to many privacy issues. Because of these
challenges, current best practices for the Internet security rely on the reports of
new intrusions and security holes from organizations like CERT. Another well-
known fact is that false positives are one of the biggest problems when working
with IDS. Also, a large number of false alerts mean a lot in terms of acceptabil-
ity of IDSs if the incidence of attacks is considerably less in comparison to the
normal trafc.
It is very difcult to integrate router logs, system logs, rewall logs, and host
based IDS alerts, with alerts from a network based IDS. The last main challenge
is the need for skilled IDS analysts. In order to monitor and evaluate the alerts
forces, the analyst has to stay on top of all the newest attacks, worms, virus, dif-
ferent operating systems and network changes on the internal network to keep
rule list accurate.
In the last two decades, a range of commercial and public domain intrusion
detection systems have been developed. These systems use various approaches
to detect intrusions. As a result, they show distinct preferences in detecting cer-
tain classes of attacks with improved accuracy while performing moderately for
the other classes. The analysis of these IDSs have given us some insight into
the problems that still have to be solved before we can have intrusion detection
systems that are useful and reliable for detecting a wide range of intrusions.
Chapter 1 9
This has created an opportunity for us to enhance the performance of IDSs by
various advanced techniques.
1.2.7 Open issues
Although intrusion detection has evolved rapidly in the past few years, many
important issues remain. First, detection systems must be more effective, de-
tecting a wider range of attacks with fewer false positives. Second, intrusion
detection must keep pace with modern networks increased size, speed and dy-
namics. Intrusion detection must keep up with the input-event stream generated
by high-speed networks and high-performance network nodes. Additionally,
there is the need for analysis techniques that support the identication of attacks
against whole networks. The issues connected with single IDSs is established
in greater detail in chapter 2. The challenge for increased system effectiveness
is to develop a system that detects close to 100 percent of attacks with minimal
false positives. We are still far from achieving this goal.
1.3 Motivation
The motivation behind the present work was the realization that with the in-
creasing trafc and increasing complexity of attacks, none of the present day
stand-alone intrusion detection systems can meet the high demands for a very
high detection rate and an extremely low false alarm rate. Also, most of the
IDSs available in literature show distinct preference for detecting a certain class
of attack with improved accuracy while performing moderately for the other
classes of attacks. The inability of single IDSs for an acceptable attack detec-
tion is discussed in chapters 2 and 3. In view of the enormous computing power
available with the present day processors, combining multiple IDSs to obtain
best-of-breed solutions has been attempted earlier. The following works have
also motivated us to choose this area of research.
Lee et al. comment in their work [14] that analyzing the data from multiple
sensors should increase the accuracy of the IDS. Kumar [15] observes that cor-
relation of information from different sources has allowed additional informa-
tion to be inferred that may be difcult to obtain directly. Such correlation is
Chapter 1 10
also useful in assessing the severity of other threats, be it severe like an attacker
making a concerted effort to break in to a particular host, or severe like the
source of the activity is a worm with the potential to infect a large number of
hosts in a short span of time.
Lane in his work [16] comments that it is well known in the machine learn-
ing literature that appropriate combination of a number of weak classiers can
yield a highly accurate global classier. Likewise, Neri [17] notes the belief
that combining classiers learned by different learning methods, such as hill-
climbing and genetic evolution, can produce higher classication performances
because of the different knowledge captured by complementary search methods.
The use of numerous data mining methods is commonly known as an ensemble
approach, and the process of learning the correlation between these ensemble
techniques is known by names such as multistrategy learning, or meta-learning.
Chan and Stolfo [18] note that the use of meta-learning techniques can be eas-
ily parallelized for efciency. Additional efciencies can be gained by pruning
less accurate classiers [19]. Carbone [20] notes that these multistrategy learn-
ing techniques have been growing in popularity due to the varying performance
of different data mining techniques. She describes multistrategy learning as a
high level controller choosing which outputs to accept from lower level learners
given the data, what lower level learners are employed, and what the current
goals are.
The generalizations made concerning ensemble techniques are particularly apt
in intrusion detection. As Axelsson [21] notes, In reality there are many dif-
ferent types of intrusions, and different detectors are needed to detect them.
As such, the same argument that Lee et al. make in their work [22, 23] for
the use of multiple sensors applies to the use of multiple methods as well: if
one method or technique fails to detect an attack, then another should detect
it. They note in the work of Lee and Stolfo [24] that combining evidence from
multiple base classiers is likely to improve the effectiveness in detecting intru-
sions. They went on to nd that by combining signature and anomaly detection
models, one can improve the overall detection rate of the system without com-
promising the benets of either detection method [14] and that a well designed
Chapter 1 11
/ updated misuse detection module should be used to detect the majority of the
attacks, and anomaly detection is the only hope to ght against the innovative
and stealthy attacks [43]. Mahoney and Chan [25] suggest that, because their
IDSs use technique that has signicant non-overlap with other IDSs, combining
their technique with others should increase detection coverage. In performing a
manual post hoc analysis of the results of the 1998 DARPA Intrusion Detection
Challenge, the challenge coordinators found that the best combination of 1998
evaluation systems provides more than two orders of magnitude of reduction in
false alarm rate with greatly improved detection accuracy [27].
Despite the positive accolades and research results that sensor fusion approaches
for intrusion detection have received, the only IDS that has been specically de-
signed for the use of multiple detection methods is EMERALD [30], a research
IDS. The lack of published research into applying multiple and heterogeneous
IDSs seems to be a signicant oversight in the intrusion detection community.
We believe that fusion based IDSs are the only foreseeable way of achieving
high accuracy with an acceptably low false positive rate. In spite of all the ear-
lier works in enhancing the performance of IDSs, the overall performance of the
IDS leaves room for improvement. Multi-sensor fusion meets the requirements
of a better than the best detection by a renement of the combined response of
different IDSs with largely varying accuracy. The motivation for applying sen-
sor fusion in enhancing the performance of intrusion detection systems is that a
better analysis of existing data gathered by various individual IDSs can detect
many attacks that currently go undetected.
Through out this thesis we use the term sensor to denote a component that mon-
itors the network trafc or the audit logs for indications of suspicious activity
in a network or on a system, according to a detection algorithm and produces
alerts as a result. A simple IDS, in most cases constitutes a single sensor. We
therefore use the terms sensor and IDS interchangeably in this thesis.
Chapter 1 12
1.4 Problem statement
The review of the state-of-the-art IDSs has posed the following problems to be
considered for further research.
Problem No.1: What are the issues connected with single IDSs? Is there any
IDSs available at present which has a complete detection coverage? Is it possi-
ble to improve the detection performance of the available IDSs without affecting
the false alarm rate?
Problem No.2: How to model the attack-detector relationship so that the detec-
tor performance is better understood?
Problem No.3: What are the different metrics available for the effective evalua-
tion of IDSs?
Problem No.4: Is the technique of sensor fusion acceptable for the performance
improvement of IDS? What are the sensor fusion algorithms and which algo-
rithm best suites the intrusion detection application? As such, the primary thesis
question is Why and how does sensor fusion succeed?
Problem No.5: How to select threshold bounds for effective sensor fusion?
Problem No.6: How to propose an architecture better than the threshold-based
or the rule-based taking into consideration the large data set and also the dy-
namic nature of the network environment? How can the problems associated
with the skewness in the network trafc effectively handled?
Problem No.7: What are the limitations associated with the Demspter-Shafer
evidence theory in order to use it for sensor fusion? How can it be modied to
suite a generic fusion environment?
Problem No.8: How can IDS and sensor fusion be modeled?
This thesis tries to nd solutions to the above raised problems and the proposed
solutions are available in the chapters from two to eight with the related work
in the above problem areas introduced in the respective chapters. In essence the
problem can be stated as: Given a set of heterogeneous IDSs monitoring a net-
work, how should these individual decisions be assimilated in order to enhance
the localization performances of the combination?
A more complete statement of this investigation:
Chapter 1 13
Can we effectively detect network intrusions by applying advances in sen-
sor fusion techniques to intrusion detection?
Is it possible to combine a number of currently known intrusion detection
systems with a simple fusion technique based on the data-dependency and
performance of the individual IDS?
Can a single computational model be used to represent and monitor ex-
ploitations in all the attack categories using advanced sensor fusion tech-
niques?
The strategy for answering these questions is to engage in a study of the litera-
ture concerned with similar studies, and then to proceed with a theoretical and
empirical analysis.
1.5 Major contributions of this thesis
This thesis contributes various sensor fusion algorithms for the effective en-
hancement of intrusion detection performance. The thesis also incorporates a
theoretical basis for improvement in performance of IDSs using sensor fusion
techniques.
1.5.1 Theoretical formulation
1. Issues associated with data skewness in attack detection have been identi-
ed.
2. Attack-detector relationship has been modeled.
3. Improvement in performance of fusion IDS in comparison with any of the
constituent IDSs has been proved for both dependent as well as indepen-
dent IDSs.
4. The Chebyshev inequality principle is used to derive appropriate threshold
bounds for the fusion unit.
5. The Demspter-Shafer evidence theory is modied for intrusion detection
application.
Chapter 1 14
1.5.2 Experimental validation
1. Attack-detector model has been validated.
2. The DARPA dataset for IDS evaluation has been validated against some of
the existing IDSs.
3. IDS fusion using threshold bounds has been validated.
4. Rule based fusion as well as the data-dependent decision fusion have been
validated for its improved performance.
5. Improved detection rate for the rarer attacks using data-dependent decision
fusion method has been validated.
6. Modied evidence theory for the fusion unit aiding enhanced detection has
been validated.
1.6 Research goal
In order to gure out how sensor fusion can be applied for performance en-
hancement of IDSs, we dene sensor fusion as the process of collecting infor-
mation from multiple and possibly heterogeneous sources and combining them
in order to provide a more descriptive, intuitive and meaningful result. Given
the additional computational requirements of sensor fusion for network intru-
sion detection and the unique benets to be gained, we believe that an unique
processing module using the proposed data-dependent decision fusion architec-
ture with the modied evidence theory, which offers the better than the best
protection will become a standard part of future defense-in-depth security ar-
chitectures.
1.7 Organization of the thesis
A brief synopsis of each of the chapters of this thesis is included below:
Chapter 1 presents the motivation, goal, and contributions of this thesis work in
detail. It also includes a discussion on the growth of the Internet, growth of the
Chapter 1 15
attacks, the importance for protecting the corporate network and the need of net-
work security and in particular the need of intrusion detection systems. Chapter
2 discusses in detail the data skewness in the monitoring trafc as well as other
issues in intrusion detection. This chapter also models the attack-detection sce-
nario for IDSs. The model includes attack carrying capacity, performance im-
provement of detectors, and detector correlation assuming non-random inter-
action between attacks and detectors. Chapter 3 discusses the evaluation and
testbed of IDSs. This chapter includes a detailed discussion on the DARPA data
set and its usefulness in the IDS evaluation. The modications of the available
IDSs used in this work has been attempted. This chapter also brings out the
inability of single IDSs to make a complete detection coverage of the attack
domain. Chapter 4 provides a survey of sensor fusion after identifying the is-
sues and limitations of single IDSs in chapters 2 and 3 respectively. The related
work in sensor fusion and in particular the related work using sensor fusion in
intrusion detection application are discussed. The mathematical basis as well
as the theoretical analysis of sensor fusion in IDS are also incorporated in this
chapter. Chapter 5 includes the selection of intrusion detection system threshold
bounds, which is an important parameter in sensor fusion. The bounds are de-
duced by means of Chebyshev inequality using the detection rate and the false
positive rate, for effective sensor fusion. The theoretical proof is supplemented
with empirical evaluation and the results have been compared.
Chapter 6 discusses the performance enhancement of intrusion detection us-
ing rule based fusion. A new data-dependent decision fusion architecture is
also proposed in this chapter. The experimental evaluation given in this chap-
ter uses the data-dependent decision fusion architecture by specically looking
at the data skewness in the monitoring trafc. Chapter 7 presents a new mod-
ied evidence theory, which is an extension and improvement of the classical
Dempster-Shafer theory. The context dependent operator proposed in this chap-
ter was demonstrated to be feasible for sensor fusion. Chapter 8 provides the-
oretical models for the intrusion detection systems, the neural network learner
and the sensor fusion system. The chapter also includes a discussion on the
effect of threshold on detection and the threshold optimization. Chapter 9 pro-
vides the results and discussion on the main ndings of this investigation. The
Chapter 1 16
conclusions detailing the overall implications of the methodologies introduced
in this thesis are drawn and recommendations for future work are also presented
in this chapter. Acomprehensive study of the attacks on the Internet is presented
in Appendix A. Appendix B provides a detailed survey on various IDSs. Ap-
pendix C introduces dynamic models for the attack-detector interactions with
the simple Nicholson-Bailey precursor. The metrics used for IDS evaluation in
chapters 5-8 are discussed in Appendix D.
Chapter 2
Issues Connected with Single IDSs and the
Attack-Detection Scenario
Problems worthy to attack prove their worth by ghting back.
Paul Erdos
2.1 Introduction
The probability of intrusion detection in a corporate environment protected by
an IDS is low because of various issues. The network IDSs have to operate
on encrypted trafc packets where analysis of the packets is complicated. The
high false alarm rate is generally cited as the main drawback of IDSs. For IDSs
that use machine learning technique for attack detection, the entire scope of
the behavior of an information system may not be covered during the learning
phase. Additionally, the behavior can change over time, introducing the need
for periodic online retraining of the behavior prole. The information system
can undergo attacks at the same time the intrusion detection system is learn-
ing the behavior. As a result, the behavior prole contains intrusive behavior,
which is not detected as anomalous. In the case of signature-based IDSs, one
of the biggest problems is maintaining state information for signatures in which
the intrusive activity encompasses multiple discrete events (i.e., the complete
attack signature occurs in multiple packets on the network). Another drawback
is that the misuse detection system must have a signature dened for all possi-
ble attacks that an attacker may launch against the network. This leads to the
necessity for frequent signature updates to keep the signature database of the
17
Chapter 2 18
misuse detection system up-to-date.
Many of the IDS technologies are complement to each other, since for differ-
ent kind of environments some approaches perform better than others. The
processes followed by IDS operations for detecting intrusions are mainly, 1.
monitoring and analyzing the network activities, 2. nding vulnerable parts in
a network, and 3. integrity testing of sensitive and important data. If an IDS
is to monitor all these activities, the complexity of the IDS becomes unaccept-
ably large. If we look at the present day information system security, a network
intrusion detection system would be considered the best choice to protect the
machines from Denial of Service (DoS) attacks. At the same time, a host in-
trusion detection system would be the right choice to protect the systems from
internal users. In order to protect against trojans on systems, a le integrity
checker might be more appropriate. To protect the servers from attackers, an
intrusion prevention system could be the best bet. This shows that the sensors
available in literature show distinct preference for detecting a certain attack with
improved accuracy and that none of them shows good detection rate for all types
of attacks or a complete intrusion detection coverage. Since an information sys-
tem has to be protected from all types of attacks, it is most likely that we will
actually need a combination of all these methods or sensors. This argument is
ascertained in this chapter by looking at the limitations of single IDSs and also
the rather slow growth rate of the performance of IDSs available in literature
over the years.
In this chapter, we examine the situation of data skewness in real-world data.
We also consider the realistic problems of the data skewness in the detection of
rare attacks. Even the highly accurate intrusion detection systems lack accept-
ability and usability as attack detectors if the incidence of attack is rare in the
general trafc. A large number of false alarms mean a lot in terms of the ac-
ceptability of IDSs. The underlying principle here is called the base-rate fallacy.
Yet another problem encountered in intrusion detection is that most of the IDSs
generate a trivial model by almost predicting the majority class, resulting in a
higher overall accuracy. This results in a higher error rate for the minority class
than the majority class. Within the attack class, the minority attack types cause
Chapter 2 19
more damage than the majority attack types. Thus high accuracy is not neces-
sarily an indicator of high model quality, and therein lies the accuracy paradox
of predictive analytics. This chapter gives supporting fact to the need of giving
a higher weighting to the minority attack types namely Remote to Local (R2L)
and User to Root (U2R), compared to the majority attack types like probe and
Denial of Service (DoS) [27]. Also, the cost of missing an attack is higher than
the cost of false positives.
The class distribution affects the learning of the IDSs. The problem of design-
ing IDSs to work effectively and yield higher accuracies for minor attacks even
in the mix of data skewness has been receiving serious attention in recent times.
In most of the available literature [31, 32, 33], this is overcome by resampling
the training distribution. The resampling is done either by oversampling of the
minority class or by undersampling of the majority class. The other commonly
used approach for overcoming data imbalance is through cost-sensitive learning
[34, 35], the two-phase rule induction method [36], and rule based classication
algorithms like RIPPER [37] and C4.5 rules [38]. In spite of all such attempts,
the performance of the IDSs in detecting minority and rarer attacks leaves room
for improvement. It is necessary for a detection system to perform much better
than those reported so far for the minority attacks while preserving the perfor-
mance for the majority attacks.
This chapter also models the attack-detection scenario for intrusion detection
systems. The model includes the attack carrying capacity, the detectors per-
formance improvement from the attacks that it has detected, and the detector
correlation assuming non-random interaction between the attack and the detec-
tor. This modeling shows that as the intrusion detection performance improves
with time, the slope of F-score is positive and becomes steeper, which causes
the effect of attack to disappear. However, it is seen from the study of the IDSs
developed over the last ten years that it is not possible to get that type of a
growth rate with a single system so that the effect of attacks is not felt in the
information systems. This establishes the need for enhancing the performance
of IDSs.
Chapter 2 20
This chapter is organized as follows. In section 2.2, the attackers inuence
on the detection environment is envisaged. In section 2.3, the data skewness
problem is exemplied. This is done with reference to the DARPA data set as
well as an observation of a typical University trafc. In section 2.4, the attack-
detection scenario on a secured network is discussed taking into account the
various possibilities of interaction between the two groups of population. Sec-
tion 2.5 provides the modeling of the non-random attack-detection relationship.
The summary of the chapter is drawn in section 2.6.
2.2 Attackers inuence on the detection environment
It is required to consider the base rate also along with the false alarm rate and
the detection rate in the analysis of IDS evaluation. These quantities can be
controlled by the intruder to a certain extent. The base-rate can be modied by
controlling the frequency of attacks. Additionally, slow and spread out attacks
are difcult to be identied by many of the existing IDSs. The perceived false
alarm rate can be increased if the intruder nds a aw in any of the signatures of
an IDS that allows the intruder to send maliciously crafted packets that trigger
alarms at the IDS but that look benign to the IDS operator. This overloads the
system administrator or the security analyst, so that the true attacks may get
missed in between. Finally, the detection rate can be modied by the intruder
with the creation of newattacks whose signatures do not match those of the IDS,
or totally uncorrelated and novel attacks from that found in the training data
set, or simply by evading the detection scheme, for example by the creation
of a mimicry attack. Now, considering the base rate alone, if the base rate is
intentionally reduced by an adversary, then the precision of detection by an IDS
reduces and the true detection gets embedded in a lot of false positives. Hence,
we can say that the skewness is present naturally in the network trafc, which
is made even worse by an adversary to succeed in his/her plans.
2.3 Data skewness in network trafc
The goal of an IDS is to collect information from a variety of systems and
network sources, and then analyze the information for signs of intrusion and
Chapter 2 21
misuse. The network-based IDS captures the network trafc and analyzes the
packet or connection for attack trafc. The host-based IDS monitors the activi-
ties of the host on which it is installed.
The network trafc is made up of attack or anomalous trafc, and the normal
trafc. The real-world trafc is predominantly made up of normal trafc rather
than attack trafc. The fact that the normal trafc abound is supported by ana-
lyzing an University network trafc using the snort packet logger. Also, a study
of the characteristics of the network trafc during phases dominated by DDoS
attacks and worm propagation has been done. One of the statistics on the trafc
volume generated by DDoS attack is given in the work of Lan et al.[39]. One
typical situation had 28 attackers and generated 11 Mega packets of 8.6 Gb of
attack trafc in 192 seconds directed at a single host. The magnitude of the at-
tack trafc is about three times more than the normal trafc during this duration
of time. Just after a duration of 192 seconds, the attack activity has come down
to normal. With an exception of such very small durations and that too very
rarely, the usual rate of attack trafc is extremely low. In addition, Bay [40]
mentions anomalous activity to be extremely rare and unusual. Fawcett [41]
has commented that positive activity is inherently rare.
2.3.1 Classication of attacks
A general classication of various attacks found in the network trafc is in-
troduced in Appendix A. The thesis work of Kendall [27] provides in detail
an attack taxonomy with respect to the DARPA Intrusion Detection Evalua-
tion data set [42]. The same is discussed here in brief. The various attacks
found in the DARPA 1999 data set are given in the Table 2.1. The probe or
scan attacks automatically scan a network of computers or a DNS server to nd
valid IP addresses (ipsweep, lsdomain, mscan), active ports (portsweep, mscan),
host operating system types (queso, mscan) and known vulnerabilities (satan).
The DoS attacks are designed to disrupt a host or network service. These in-
clude crashing the Solaris operating system (selfping), actively terminate all
TCP connections to a specic host (tcpreset), corrupt ARP cache entries for a
victim not in others caches (arppoison), crash the Microsoft Windows NT web
server (crashiis) and crash Windows NT (dosnuke). In R2L attacks, an attacker
Chapter 2 22
Table 2.1: Details of attack types present in DARPA 1999 data set [27]
Attack type Solaris SunOS WinNT Linux All
Probe portsweep portsweep ntinfoscan lsdomain, mscan illegal-sniffer
queso queso portsweep queso, portsweep ipsweep
satan portsweep
DoS neptune pod arppoison arppoison apache2, back
processtable land, pod crashiis arppoison
smurf neptune dosnuke mailbomb
syslogd mailbomb smurf neptune, pod
tcpreset processtable tcpreset processtable
warezclient smurf, tcpreset
teardrop, udpstorm
R2L dict, ftpwrite dict dict dict, imap, named snmpget
guest xsnoop framespoof ncftp, phf
httptunnel netbus sendmail
xlock netcat sshtrojan
xsnoop ppmacro xlock, xsnoop
U2R eject, ps loadmodule casesen perl, xterm
fdformat ntfsdos sqlattack
ffbcong nukepw
sechole, yaga
Data secret ntfsdos secret sqlattack
ppmacro
who does not have an account on a victim machine gains local access to the
machine (e.g., guest, dict), exltrates les from the machine (e.g., ppmacro) or
modies data in transit to the machine (e.g., framespoof). New R2L attacks
include an NT PowerPoint macro attack (ppmacro), a man-in-the middle web
browser attack (framespoof), an NT trojan-installed remote-administration tool
(netbus), a Linux trojan SSH server (sshtrojan) and a version of a Linux FTP
le access-utility with a bug that allows remote commands to run on a local
machine (ncftp).
In U2R attacks, a local user on a machine is able to obtain privileges normally
reserved for the UNIX super user or the Windows NT administrator. The Data
attack is to exltrate special les, which the security policy species should
remain on the victim hosts. These include secret attacks, where the user with
permission to access the special les exltrates them via common applications
such as mail or FTP, and other attacks where privilege to access the special les
is obtained using an U2R attack (ntfsdos, sqlattack). An attack could be labeled
Chapter 2 23
as both U2R and Data if one of the U2R attacks was used to obtain access to the
special les. The Data category thus species the goal of an attack rather than
the attack mechanism [27].
Attack behavior [43]
An analysis of the various attack types within the network trafc has resulted in
certain inferences that are listed below:
Probing attacks are expected to show limited variance as it involve making con-
nections to a large number of hosts or ports in a given time frame. Likewise,
the outcome of all U2R attacks is that a root shell is obtained without legiti-
mate means, e.g., login as root, su to root, etc. Thus, for these two categories
of attacks, given some representative instances in the training data, any learning
algorithm was able to learn the general behavior of these attacks. As a result,
the IDSs detect a high percentage of old and new Probing and U2R attacks. On
the other hand, DoS and R2L have a wide variety of behavior because they ex-
ploit the weaknesses of a large number of different network or system services.
The features constructed based on the available attack instances are very spe-
cialized to the known attack types. Hence most of the trivial IDS models missed
a large number of new DoS and R2L attacks. It is understood that the misuse
detection models fail in the case of novel attacks. Even the anomaly detection
models do not work well when there is large variance in user behavior since
the algorithm tries to model the normal behavior and the attack behavior shows
a large variance many times overlapping with the normal behavior. Hence it
turns out to be difcult to guard against the new and diversied attacks. The
same is the case with the network trafc with a relatively small number of intru-
sion only patterns, normal network trafc can have a large number of variations.
It is observed from most of the previous studies that there was no attempt to
consider the correlation information in the input network trafc for improving
the detection effectiveness. The network trafc can be characterized in terms
of sequences of discrete data with temporal dependency [44, 45, 46]. It is ob-
served in [47] that different network intrusions have different correlation statis-
tics which can be directly utilized in the covariance feature space to distinguish
Chapter 2 24
multiple and various network intrusions effectively. By constructing a covari-
ance feature space, a detection approach can thus utilize the correlation dif-
ferences of sequential samples to identify multiple network attacks. It is also
pointed out that the covariance based detection will succeed in distinguishing
multiple classes with near or equal means while any traditional mean based
classication approach will fail. Even the best intrusion detection systems for
the DARPA evaluation [48] shows that less than 10% of new R2L intrusion at-
tempts have been detected. Hence, the numbers of detections of new attacks are
more signicant in determining the quality of IDSs.
2.3.2 Identication of real-world network trafc problems
The attack trafc in a real-world trafc is mostly rare. In addition, the attack
types within the attack class itself is skewed with the probes and DoS attacks
abound whereas the R2L and U2R attacks being rare. The effect of this data
skewness poses some very serious issues in the performance of IDSs, mainly in
two ways:
1. Applying the conditional probability using Bayes theorem, the detection
of an attack can be shown to be difcult unless both the percentage of
attacks in the entire trafc and the accuracy rate of their identication are
far higher than they are at present. The Bayesian rate of attack detection
Pr(I[A) [49] is given by:
Pr(I[A) =
Pr(A[I) Pr(I)
Pr(A[I) Pr(I) +Pr(A[NI) Pr(NI)
(2.1)
where I denotes the intrusion, NI denotes no-intrusion, and A denotes the
alert. The false alarm rate is the limiting factor for the performance of most
of the IDSs. This is due to the base-rate fallacy phenomenon, which says
that in order to achieve substantial value for the Bayesian detection rate,
it is necessary to achieve an unattainably low false positive rate [49]. The
data set commonly used in order to apply this reasoning in the evaluation
of the IDSs is the DARPA 1999 evaluation data set where the ratio of
number of attacks to the number of normal trafc is roughly of the order
of 1:26,000. The DARPA data is supposedly modeling a realistic situation,
having been synthesized based on the trafc observed on a large US Air
Chapter 2 25
10
5
10
4
10
3
10
2
10
1
10
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
PROBABILITY OF ATTACK (LOG SCALE)
P
r
(
I
N
T
R
U
S
I
O
N

|

A
L
E
R
T
)
BAYESIAN ATTACK DETECTION RATE FOR VARYING DATA SKEWNESS
TP=0.99, FP=0.01
Figure 2.1: Probability of attack vs Bayesian attack detection rate for xed values of FP and TP
Force base. With an IDS which is 99% accurate, and a false positive rate of
0.01, the Bayesian rate of attack detection, Pr(I[A) is obtained as 0.00379.
Hence the false positives increase to roughly 262 for detecting almost one
real attack. This clearly shows the inability of the IDS for its proposed
task of attack detection, where the actual attacks get embedded in the large
volume of false positives. Even though the detection is 99% certain, the
chance of detecting an attack in the total alerts is only 1/263, due to the
fact that normal trafc is much larger than the attack trafc. Thus it is
difcult to interpret what a small false alarm rate is, when the base rate is
also small. From equation 2.1, the Bayesian attack detection rate can be
approximated as:
Pr(I[A)
Pr(I)
Pr(A[NI)
(2.2)
Equation 2.2 clearly shows that for the probability of an alert being an
intrusion will be almost 1 for a given data, only if the false alarm rate is
of the same order as the prior probability. This adverse effect of the data
skewness is also illustrated with Figure 2.1. Figure 2.1 shows that the
naturally occurring class distribution often does not produce the best per-
forming IDS. The optimal distribution generally contains more than 50%
of minority class examples.
Chapter 2 26
Table 2.2: Performance dependence of Snort on base-rate
Attacks Normal FP TP Precision Recall F-score
190 5000000 1000 115 0.1 0.61 0.17
190 2000000 500 115 0.19 0.61 0.29
190 1000000 260 115 0.31 0.61 0.41
190 100000 53 115 0.68 0.61 0.64
190 10000 5 115 0.96 0.61 0.75
190 1000 1 115 0.99 0.61 0.76
190 190 0 115 1 0.61 0.76
10
4
10
3
10
2
10
1
10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Base Rate
F
S
c
o
r
e
Base Rate vs Fscore
Figure 2.2: Probability of attack vs F-score for Snort
Yet another attempt is made to demonstrate that data skewness, which is
normally found in real trafc and hence reproduced in simulated trafc
like DARPA dataset, is a reality and is a problem in the performance of
any IDS. The commonly used signature based IDS, Snort, is used to show
the improvement in performance score (F-score) with the proportion of at-
tack to normal trafc. Table 2.2 and Figure 2.2 show the adverse effect of
the base rate on the performance of an IDS. Beyond a stage where the base
rate equals the false positive rate, the IDS performs in an optimum way.
2. The standard base-rate says that it is safe to assume all the trafc as nor-
mal and still get an accuracy of 99.99%. The performance of IDS is usu-
ally evaluated using the accuracy measure, but the measure fails in the
Chapter 2 27
case of imbalance in data and also when the cost of different errors vary
markedly. Most of the IDSs generate a trivial model by almost predict-
ing the majority classes, since predicting the minority classes has a much
higher error rate, which in turn degrades the performance of the IDS. This
is accuracy paradox, which says that in predictive analytics, high accuracy
is not necessarily an indicator of high model quality. This is explained
with the DARPA test data set with 5 million test records consisting of 190
attacks. Consider an IDS which detects 100 attacks at a false positive rate
of 0.01%. The accuracy of this detector is 99.994%. However the accuracy
paradox lies in the fact that the accuracy can be easily made 99.996% by
always predicting normal. The second model, even though has a higher
accuracy, is useless since it does not detect attacks. Hence, most of the
IDSs do not detect minority class types sufciently well since they aim
to minimize the overall error rate, rather than paying attention to minority
class, which is obviously not the desired detection result. Thus the present
day stand-alone IDSs are not effective in detecting the attacks, especially
the rare class of attack types.
The probes aim at knowing whether a particular IP exists and if so the
details regarding the services and the operating system running on it. Prob-
ing may be normal or may be the pre-phase of an attack. In the latter case,
it may not be possible to conrm the intend of the prober and thus rec-
ognize an attack, but instead proper preventive measures like the installa-
tion of patches, deployment of rewall, addition of rewall rules, removal
of unused services, etc., can be taken so that the vulnerability does not
get exploited. DoS is mainly disrupting the services on a network or on
a host. Hence DoS causes service denial and probes are for reconnais-
sance/surveillance.
The R2L attack on the other hand gains account on a remote machine,
exltrates les, modies data, installs trojans for back door entry etc. The
U2R attack uses buffer overow to acquire root shell and getting the full
control of the system. Data attacks are the special case where an attacker
gets privilege to access the special les. In real-world environment, these
Chapter 2 28
minority attacks are more dangerous than the majority attacks. Hence, it
is essential to improve the detection performance for the minority attacks,
while maintaining a reasonable overall detection rate.
2.3.3 Non-uniform misclassication cost
It is important to understand that the cost of misclassifying an attack as a nor-
mal(type I errors or FN) is often more than the cost of misclassifying a normal
as an attack (type II errors or FP). The issues involved in the measurement of
the cost factors have been studied by the computer risk analysis and security as-
sessment communities. Denning in her book [50] remarks on the cost analysis
and the risk assessment in general that it is not an exact science because pre-
cise measurement of relevant factors is often impossible. Damage cost (DCost)
characterizes the amount of damage to a target resource by an attack when in-
trusion detection is unavailable or ineffective. Response cost (RCost) is the cost
of acting upon an alarm or log entry that indicates a potential intrusion [51]. Lee
et al. [24] have come up with an attack taxonomy illustrated in Table 2.3, which
categorizes intrusions that occur in the DARPA Intrusion Detection Evaluation
data set.
When attempting to look at the skewness within the minority class, it is ob-
served that there is again a still higher misclassication cost for the minority
attack types. This has also been highlighted in the cost matrix of Table 2.4 pub-
lished for the KDD IDS evaluation [52]. Hence, it is important to have IDSs
that minimize the overall misclassication cost by performing better on minor-
ity classes and again on minority attack types. Thus, by thinking in line with
the KDD evaluations done on IDSs, the goal was not only the improvement in
accuracy but also the reduction in misclassication cost. The misclassication
cost penalty was the highest for one of the most infrequent attack type and that
too for the type I error [52]. The total misclassication cost can be reduced if
the type I errors and the type II errors can be reduced. The advantage with most
of the rare events is that their signatures are unique which can be learned from
the given data. In some of the commonly encountered attacks, the attacker often
upload some malicious code onto the target machine and then login to crash the
machine to get root access, like casesen and sechole. Also, there are attacks that
Chapter 2 29
Table 2.3: Damage cost and response cost of different attack types [24]
Attack Type
Sub-
category
Description Cost
(by results) (by techniques)
U2R local legitimate user trying to DCost=100
acquire higher privileges RCost=40
remote from remote machine DCost=100
acquiring root privileges RCost=60
R2L single with a single event an illegal DCost=50
user access is obtained RCost=20
multiple with multiple events an illegal DCost=50
user access is obtained RCost=40
DoS crashing crashing a system by DCost=30
certain framed packets RCost=10
consumption exhausting bandwidth or DCost=30
system resources RCost=15
Probe simple fast scan DCost=02
RCost=05
stealth slow scan DCost=02
RCost=07
Table 2.4: Cost matrix [52]
Predicted Normal Probe DoS U2R R2L
Actual
Normal 0 1 2 2 2
Probe 1 0 2 2 2
DoS 2 1 0 2 2
U2R 3 2 2 0 2
R2L 4 2 2 2 0
Chapter 2 30
upload/download illegal copies of the software through ftp, like warezmaster
and warezclient. Port 80 attacks are malformed HTTP requests, very different
from the normal requests. Usually the connection-based detection has a better
result than the packet-based models for ports 21 and port 80.
2.3.4 Inability of IDS in optimum decision making due to data skewness
An IDS is said to make an optimum decision only when the cost of that partic-
ular prediction is less. For an IDS to make a decision attack, it is expected
that:
Pr(attack) C
11
+Pr(normal) C
10
Pr(attack)C
01
+Pr(normal)C
00
(2.3)
where:
C
11
=Cost of predicting an attack as an attack,
C
10
=Cost of predicting a normal as attack,
C
01
=Cost of predicting an attack as normal,
C
00
=cost of predicting a normal as normal.
Assuming the cost of TP and TN to be zero, since in both these cases the
correct decision has been made, equation 2.3 can be written as:
Pr(normal) C
10
Pr(attack) C
01
(2.4)
If the probability of the attack is given by p; (1 p)C
10
pC
01
. The optimum
threshold is decided by the data with a minimum a priori probability given by:
p
opt
=
C
10
C
01
+C
10
(2.5)
For the DARPA 1999 data set, the cost matrix in Table 2.3 shows that C
01
> C
10
and on substituting the values in equation 2.5, p
opt
is obtained as 0.41. Hence,
the optimal decision from any IDS cannot be expected with data skewness of
prior probability of attack less than 0.41. This proves that data skewness results
Chapter 2 31
in inefcient decision making.
2.4 Attack-Detection Scenario in a Secured Environment
This section introduces a new formal technique for modeling the attacks that
occur on the network trafc and the countermeasure for its detection. With
the modeling of attack-detector relationship, the signicance of the distributed
dynamics for persistence of attack-detector interactions is discussed. The mod-
eling is based on deduction rules that model the capabilities of the attacker and
the detector. The attacker makes use of the vulnerabilities on the applications,
or in the systems or on the network and generates attacks that exploit those vul-
nerabilities. The security experts react to it by attempting to detect the attacks
and thereby to protect the network.
This section introduces the non-random attack-detection interactions that is nor-
mally expected on a secured network environment. The attack-detector model-
ing helps in enriching the understanding and to further the design and research
of IDSs. Also, the level of severity of the alert is understood with this mod-
eling. This knowledge could then potentially be used by a security analyst to
understand and respond more effectively to future intrusions. The modeling
shows that as the intrusion detection performance improves with time, the slope
of F-score is positive and becomes steeper, which causes the effect of attacks
to disappear. However, it is not possible to get that type of a growth rate with
a single IDS. In order that the effect of attacks is not felt in the information
systems, it is necessary for the performance of IDS to rise steep and reach to-
wards a Figure of Merit metric value of 1. Since none of the IDSs available
in literature can achieve this, it is felt necessary to make use of multiple IDSs,
beneting from the advantages of each one of them. The modeling is realistic in
a network environment with multiple IDSs for protection, looking at the system
as a whole, instead of the individual responses to an attack. Thus, the model-
ing of the attack-detection scenario also partially establishes the limitations of
a single IDS in attack detection.
Chapter 2 32
2.4.1 Internet attacks and the countermeasure for detection
The IDS analyzes the network data and looks for patterns of attacks. Patterns
can be as simple as an attempt to access a specic port or as complex as se-
quences of operations directed at multiple hosts over an arbitrary period of time.
In any case, the threats of these attacks are quite real and cannot be overempha-
sized. Hence, IDSs can be an extremely valuable tool if implemented prop-
erly. The understanding of the practical limitations as well as the capabilities
of the technology and modeling the attack-detection scenario will enable one
to achieve the best results. IDSs in general detect trivial attacks and cause only
highly transient reductions in attack density. Fusion of highly sophisticated
IDSs, each of which is usually developed for a particular attack may deplete a
large fraction of attacks with an appreciable impact on the total trend in attacks.
Hence, there are lot of conicting issues that need to be dealt with in the case
of an IDS. The analysis and the nal decision making with an IDS in case of
highly sophisticated attacks bring to the fore the need for theoretically sound
basis for modeling the attack-detector interactions.
The attackers, security researchers and intrusion detection developers have con-
tinually played a game of point-counterpoint when it comes to IDS technology.
The basic problems in the eld of intrusion detection are extremely challenging
even with the continuous emergence of methods and technology for securing
networks. The attackers continue to nd ingenious ways to compromise remote
hosts and frequently make their tools publicly available. The increasing size
and complexity of the Internet along with the end host operating systems, make
it more and more prone to vulnerabilities. The lack of in-depth understanding
of the intrusion activities due to many privacy issues is yet another problem.
Because of these challenges, current best practices for the Internet security rely
on reports of new intrusions and security holes mainly from organizations like
CERT (Computer Emergency Response Team). The IDS developers continu-
ally counteract the attacks with patches and new releases. Due to the inherent
complexities involved in capturing, analyzing and understanding the network
trafc, there are several common techniques that can be used to exploit inherent
weaknesses in IDSs. Hence considering the three layers (attackers, security re-
searchers and intrusion detection developers) in the attack-detector scenario, it
Chapter 2 33
is the IDS developers who take the middle layer and keep the ecosystem stable
by maintaining at least a minimum number of undetected attacks and detectors
at all time.
2.4.2 Testing the performance of Intrusion Detection Systems
The evaluation of IDS was initiated by the US Defense Advanced Research
Projects Agency (DARPA) in 1998 and has been the most comprehensive sci-
entic study known for comparing the performance of different IDSs [48].
The MIT Lincoln Laboratory synthesized the network trafc with its data sets
DARPA 1998 and DARPA 1999 [42]. The performance of IDS can be evaluated
by choosing these publicly available data sets.
IDSs can be congured and tuned in a variety of ways in order to reduce the
false positive rate and to maximize the detection rate. However, there is a trade-
off between these two metrics for any system and hence these measurements
are used to form the Receiver Operating Characteristic (ROC) curves. An ROC
curve plots the detection rate against the false alarm rate. If the IDS raises
alarms very often on every suspicious packet, the false alarm rate as well as the
detection rate will increase. On the other hand, if the IDS raises alarms only af-
ter sufcient evidence is available i.e., lower false alarms, the detection rate will
suffer but with an increased alarm condence. An IDS can be operated at any
given point on the ROC curve. The optimal operating point for an IDS, given
a particular network, is determined by factors like the cost of a false alarm, the
value of a correct detection and the prior probabilities of normal and attack traf-
c. The ROC curve conveys information of importance when analyzing and
comparing IDSs. Figure 2.3 is an ROC graph plotted with each point identify-
ing the status of a particular IDS, developed from 1995 to 2004, in terms of the
detection rate and the false alarm rate. The crowded region to the top left as seen
in the graph can be identied as that due to the recent systems. Thus the envi-
ronment in which most of the IDSs of recent times operate requires very low
false alarm rates (much lower than the 0.1% designated by DARPA) for useful
detection. The overall accuracy is not a good metric since the class of interest
is extremely rare. In cases with low base rate, the IDS has to be evaluated based
on its performance in terms of both recall as well as precision. Figure 2.4 shows
Chapter 2 34
0 0.05 0.1 0.15 0.2
0.5
0.6
0.7
0.8
0.9
1
FALSE ALARM RATE
D
E
T
E
C
T
I
O
N

R
A
T
E
ROC GRAPH SHOWING INDIVIDUAL IDS
Figure 2.3: Receiver Operator Characteristic graph
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
RECALL
P
R
E
C
I
S
I
O
N
PRECISIONRECALL CURVE FOR COMBINATION SENSOR
Figure 2.4: Trade-off between recall and precision
Chapter 2 35
the tradeoff between the two metrics precision and recall. The precision of an
IDS refers to the fraction of intrusions detected from the total alerts generated.
Similarly the recall refers to the IDS completeness; the more complete an IDS is
the fewer are the intrusions that remain undetected. The plots of precision and
completeness of the IDS over the years of IDS development can be understood
from Figure 2.5 with both the plots on a single graph, where the top plot refers
to the precision and the bottom plot refers to the recall.
Precision(P) =
number of correctly detected intrusions
number of alerts
=
True Positives
True Positives+False Positives
Recall(R) =
number of correctly detected intrusions
numberof actual intrusions
=
True Positives
True Positives+False Negatives
The behavior of the IDS can be generalized in terms of F-score, which scores a
balance between precision and recall as:
F-score =
1
.
1
P
+(1).
1
R
;
where takes a value between 0 and 1 and corresponds to the relative impor-
tance of precision over recall. Thus F-score is expected to take values between
0 and 1 depending on the relative importance of precision over recall. Con-
sidering equal importance to precision and recall; is assigned a value of 0.5
and F-score takes a value, which is the harmonic mean of precision and recall.
Higher value of F-score indicates that the IDS is performing better on recall as
well as precision. The plot of F-score over a period of time, as shown in Figure
2.6, gives an idea of the effectiveness of the IDSs developed over that period.
Hence a study was undertaken to highlight the performance of the various IDSs
over a period of time in terms of the F-score.
Usually it is expected that both technology as well as the performance of any
system improve with time. However, in the case of IDSs, this need not be the
case, since the attackers also gain a lot of expertise with time and the false
alarms can be increased so as to confuse the security analyst regarding the cor-
rect picture of the attack. Analysis of the Figure 2.6 can be carried out in terms
of the study of the growth in Internet insecurity from the incidents reported to
Computer Emergency Response Team/ Coordination Center (CERT/CC). The
Chapter 2 36
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
0.4
0.5
0.6
0.7
0.8
0.9
1
YEAR
R
E
C
A
L
L
/
P
R
E
C
I
S
I
O
N
RECALL/PRECISION vs YEAR
PRECISION
RECALL
Figure 2.5: Precision and recall of IDS over the
years
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
YEAR
F
S
C
O
R
E
FSCORE vs YEAR
Figure 2.6: Plot of F-score over the years
1988 1990 1992 1994 1996 1998 2000 2002 2004
0
2
4
6
8
10
12
14
x 10
4
YEAR
R
E
P
O
R
T
E
D

I
N
C
I
D
E
N
T
S
GROWTH IN INCIDENTS REPORTED TO CERT
Figure 2.7: Growth in the incidents reported to
CERT
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
0
500
1000
1500
2000
2500
3000
3500
4000
4500
YEAR
V
U
L
N
E
R
A
B
I
L
I
T
I
E
S
GROWTH IN VULNERABILITIES REPORTED BY CERT
Figure 2.8: Growth in the vulnerabilities re-
ported by CERT
Figure 2.7 and Figure 2.8 show the growth rate of incidents reported to CERT
as well as the vulnerabilities reported by CERT respectively. It looks as if the
attackers understanding of security weaknesses has also increased over time,
as is the case of improvements in their attack tools.
The statistics released by CERT [53] show disturbing trends in incident and
vulnerability reporting since it has started increasing exponentially in the recent
times whereas the increase was more linear in the early times. In order to bring
down the exponential growth of attacks, it is required that sufcient number
of IDSs have to be deployed for detection along with better and advanced IDS
Chapter 2 37
techniques for better detection. The plot of F-score over the years is a clear
indication that the IDS techniques are not improving in a regular and steady
manner, but instead following a prey/predator relationship.
Just after the introduction of IDS, it was becoming more and more competitive
trying to detect the existing intrusions caused by attackers. This can be seen
by an initial steep rise in the F-score. With the advancement in the IDS tech-
nology, the attackers also try to acquire more expertise and will try to launch
more unidentied and confusing attacks which causes the F-score to come down
drastically. As a result it becomes a need for security experts to overcome these
attacks, where both the IDS researchers and implementors will strive for new
techniques and modify the available IDSs and also nd patches for the known
vulnerabilities. This effort will again bring up the F-score and this process of
the attacker or the detector gaining more and more expertise with time contin-
ues as seen in the Figure 2.3. This trend is seen in the work of Shimeall and
Williams [54] and Browne et al. [55] where the incidents reported behaves in an
oscillatory manner. The practical data available in their work explains this type
of system behavior. The plot is reproduced in Figure 2.6, from the data observed
by Shimeall and Williams. The plot shows the number of incidents with seri-
ous effects on a reporting site as reported to the CERT/CC each week between
June 24, 2000 and Feb 17, 2001. The incident counts excludes the simple port-
scanning, unexploited vulnerabilities, false alarms, failed frauds and hoaxes.
Shimeall and Williams [54] have found a temporal, spatial and associative trend
in the relationship of attacks on the Internet. One type of such temporal trends
relates to the timing between an event that may trigger an incident (say, a new
product announcement or the release of a new intrusion tool) and the corre-
sponding incidents. Hacking conventions like Defcon, Black Hat Briengs, etc.
appeared at or close to local peaks in the incident reporting. Taking into ac-
count the exploits published on website, the peaks in the exploit publication
rate were weakly correlated with the peaks in the incident reporting rate one to
three weeks later. The exploitation of vulnerabilities in reported security inci-
dents is very common. The intruders have the habit of modifying their tactics
too quickly. It is also true that while substantive periods of time may elapse
between the discovery of a vulnerability and its wide spread exploitation, there
Chapter 2 38
5 10 15 20 25 30 35
10
20
30
40
50
60
70
80
90
100
NUMBER OF WEEKS SINCE 24JUN2000
N
U
M
B
E
R

O
F

I
N
C
I
D
E
N
T
S
INCIDENTS OVER A PERIOD OF TIME
Figure 2.9: Incidents reported over a period of time
is a trend toward more rapid exploitation.
Browne et al. [55] have collected evidence to show that the availability of
patches will not reduce the severity of incidents after a time delay as normally
expected. They found that the incidents accumulate regardless of the existence
of corrections for exploited vulnerabilities. The incidents, however, accumulate
in a linear fashion which allow the statistical modeling of the incident accumu-
lation rate. This modeling helps any organization in deciding on its IT invest-
ments, in knowing the severity of continuing incidents, and in testing vendor
supplied patches prior to deployment.
2.5 Modeling the attack-detector relationship
The detection of an attack scenario depends on the exact correspondence of an
attackers actions with those anticipated in the scenario. Since the total number
of possible actions is huge, the specication, by an expert and a priori, of the
set of all the scenarios becomes illusory. It implies that it is known, a priori,
all the scenarios to detect, but without any guarantee that those scenarios can be
generated [56].
Most IDSs are designed to learn from attacks and feed into the patches for
the OS, virus signatures, rewalls and hence thwart future attacks. There is a
Chapter 2 39
delay between the time attacks are detected and steps are taken to immunize
the systems to such attacks. After a time delay the attackers develop new types
of attack. This process continues and hence the variation of F-score with time
shows an oscillatory behavior in the case of IDSs.
On further analyzing the F-score plot, one of the possible models for estimating
the performance of IDS with time is a prey/predator relationship identied in
the attack/detector relationship. The attacks to a host/network increases, caus-
ing the detection rate to go high; as the detection becomes more and more com-
petitive, the attacks will naturally come down since many of the attacks will no
longer be successful. As the attacks reduce signicantly, the research and devel-
opment taking place in the eld of countermeasures like IDSs decreases. This
is because the intrusion detection analysts may be inclined to try new methods
only when they see their old methods to be inadequate, particularly when the
new methods require considerable knowledge and skill to be used effectively.
This again causes the attacks to increase considerably.
Considering the performance of a single IDS over the years, it will be seen
that the performance signicantly deteriorate with time. It is difcult for an
IDS to keep up with the new attacks generated by the sophisticated attackers.
This infers that a constant update will be required at all times to maintain a
steady performance. This occurs only if the security researchers keep up with
the attackers steadily in their capabilities. However, this rarely happens with
full efciency. Hence the performance of a single IDS deteriorates with time.
The performance of a single IDS at no stage has caused the effect of attacks to
totally disappear. Similar is the case with attacks also. A successful attack of
one time may not be successful at a later stage. This is mainly because of the
patches and the advanced security measures that the security developers intro-
duce from time-to-time. Hence the attacks also show a decreasing performance
over a period of time.
To model this scenario, certain assumptions are made in the initial phase while
working with the model trying to dene the attack-detector population dynam-
ics and is given as follows:
Chapter 2 40
Attacks increase at a rate directly proportional to the attack density in the
absence of detectors. The knowledge of a successful attack in one domain
may motivate the attacker to generate another attack for a different target or
domain so far undisturbed by attacks. The new attacks get introduced with
an attack increase rate, in the absence of detectors that are effective over
them. (As discussed later, it is not reasonable to assume that the attacks
grow indenitely in the absence of detectors.)
Detectors become ineffective at a rate proportional to the detector density.
(More detectors exist when there is a scope for more, due to many of them
already becoming ineffective. Normally, the motivation for a new detector
development as seen in most of the literature is the limitation of the exist-
ing ones. If a 100% efcient IDS exists, there is no need for more IDSs,
since that itself is sufcient. Hence the introduction of each new IDS is to
overcome the ineffectiveness (or limitations) of the earlier IDSs to newer
attacks.)
Detectors increase in number proportional to the rate at which they de-
tect attacks. (The detectors learn from the attacks detected and will ei-
ther replace it with an improved version or new detectors come up for the
changing attack scenario. To explain in detail, the gateway/rewall logs
constitute a good source for traces of novel attacks/probe techniques that
are not yet publicly known. Analysis of large amounts of such logs could
lead to the synthesis of signatures that could be incorporated into IDSs.
Alternatively, this log data can enhance the detection capacity of proba-
bilistic systems. When a highly sophisticated anomaly-based IDS detects
an attack, the attack signature gets known and the signature database of the
misuse-based IDSs get updated. Thus new versions of misuse based IDS
are developed. This attack appears as a labeled attack in the training data
set of any pattern recognition IDS, which again improves the performance
of the IDS.)
Detectors non-randomly detect attacks and cause the attack to be ineffec-
tive or unsuccessful, at a rate proportional to the detector density. (In Ap-
pendix C we consider an initial assumption with no prior knowledge on
the probable attacks that happen on the Internet. Hence, it is reasonable
Chapter 2 41
to consider that the detectors search randomly for attacks, and more the
number of detectors, more is the chance of attacks becoming ineffective or
unsuccessful.)
Attack density cannot exceed some carrying capacity, which is imposed
by an ultimate resource limit on the attacks due to technical bottlenecks on
the system and network resources. The attacker knowledge also restricts
the sophistication in the attacks beyond a certain level.
The assumptions made for the pursuit-evasion processes are given as follows:
The attackers classied as script kiddies try to identify the security mea-
sures in terms of the density of the detectors in a particular domain and
try to attack sparsely monitored or secured networks. Thus it is a trend
that majority of the attacks move away from the domain which is intensely
watched and protected.
In the case of professional hackers, it is expected that they concentrate on
well protected and highly gainful domains. In such cases, sophisticated
attacks are seen to successfully concentrate more on critical domains in-
tensely watched and protected. Their interest is to attack such domains for
various goals.
The trivial and average detectors nd application by identifying the density
of attacks in a particular domain and then try to monitor sparsely monitored
resources and networks to ght against the existing attack scenario.
Highly sophisticated IDSs concentrate on intensely secured resources and
networks to still improve the security measures.
In this section, an attempt has been made to model the dynamic relationship
existing between the detectors and the attacks. This knowledge can be used to
enrich the design and development of IDSs. For each combination of a and d
there is an unstable attack-detector equilibrium, with the slightest disturbance
leading to expanding population oscillations. Depending on the initial state, the
system can evolve towards a simple steady state or a limit cycle, in which the
attack-detector populations oscillate periodically in time. The attack-detector
relationship may thus exhibit coupled oscillations.
Chapter 2 42
Usually detectors respond to attack distributions. A constant searching ef-
ciency is more difcult to accept. Searching efciency depends on the speed of
the trafc, attack density on a priori grounds and also on the detector density.
When a network intrusion happens, the sequence of attacks does not take place
in a totally random order. Intruders come with a set of tools trying to achieve
a specic goal. The selection of the cracking tools and the order of applica-
tion depends heavily on the situation as well as the responses from the targeted
system. Typically there are multiple ways to invade a system. Nevertheless, it
usually requires several actions/tools to be applied in a particular logical order
to launch a sequence of effective attacks to achieve a particular goal. It is this
logical partial order that reveals the short and long-term goals of the invasion.
Real attacks are likely to be distributed in a patchwork of high and low den-
sities, and the detectors can be expected to respond to the attacks by orienting
towards high density patches. It is natural that the detector searches for certain
trafc features for signs of attack. This provides a strong selective advantage for
detectors that result in a more focused search. Since the detector aggregation
has the effect of giving attacks a partial refuge at low densities, it is poten-
tially important stabilizing factor in the dynamic interactions. Detectors usually
search for certain trafc features for the signs of attack. The normal trafc is
uncorrelated whereas the attack trafc is correlated. This provides a strong se-
lective advantage for detectors that results in a more focused search.
The detections show a negative binomial pattern. The negative binomial dis-
tribution does not assume randomness but only proneness, i.e. certain trafc
feature(s) have a higher chance of disclosing attacks than the other features. If
the variance is larger than the mean, the level of proneness of the population is
high. If a Poisson distribution is used to model such a data, the model mean and
variance are equal. In that case, the observations are over dispersed with respect
to the Poisson model. Data are examined that relate to two-population interac-
tion models based on the negative binomial parameter g. The negative binomial
pattern is specied by its mean , and a clumping parameter, g. The expected
proportion of attacks, not getting detected at all is given by P(0) = [1 +

g
]
g
.
Chapter 2 43
The mathematical meaning of the parameter g may be appreciated by noting
that, for a negative binomial with mean , the coefcient of variation (CV) is
given by:
CV
2
=
variance
(mean)
2
=
1
+
1
g
.
In the limit, as g tends to innity, the random or Poisson distribution is recov-
ered, with the variance equal to the mean. Thus with large g values, say g > 8,
the Poisson randomness is assumed. If g < 1, CV gets larger, the detectors are
strongly aggregated in patches of high attack density.
Let A
t
and D
t
denote the number of attacks and detectors at any time t. Let
d denote the detector efciency and a denote the attack increase rate ignoring
detection. The attack-detector model giving the attacks and the detectors in the
successive generations, t + 1 and t can be expressed as:
A
t+1
= aA
t
(1 +dD
t
/g)
g
and
D
t+1
= cA
t
[1 (1 + (dD
t
/g)
g
],
with g being the negative binomial dispersion parameter, which can be inter-
preted as a coefcient of variance of detector density among patches and dD
t
being the mean detector density. The dynamics of the model show diverging
oscillations if g > 1; for g < 1, the system is always stable, at rst show-
ing damped oscillations and then approaches the equilibrium state. Figure 2.10
shows the attack-detector relationship using the negative binomial distribution
model. Besides the densities of the attacks and the detectors; namely A
t
and
D
t
, the parameters of the system are non-negative values. Figure 2.10 shows
the attack and detector distributions with typical values and initial conditions
for the coexistence of the attacks and the detectors as: a = 0.25(from Figure
2.7), A
1
= 20000, d = 0.9 (from Figure 2.6), D
1
= 6, g = 0.7 and t varying
from 2000 to 2005. The attacks as well as detectors are seen to increase with
time and clumping of attacks was signicantly reduced upon detection. The
Chapter 2 44
1 1.5 2 2.5 3 3.5 4 4.5 5
0
1
2
3
4
5
6
7
x 10
4
Time
A
(
t
)

a
n
d

D
(
t
)
ATTACKDETECTOR RELATIONSHIP

attacks
detectors
Figure 2.10: D(1)=60, d=0.7
model is seen to be in total agreement with the attack incident reports published
by CERT for the same period of time. In order to bring down the attacks, it is
necessary to deploy more detectors or/and improve the detection efciency of
the detectors. With the detector efciency maintained steady, the number of de-
2000 2001 2002 2003 2004 2005
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x 10
5
Time
A
(
t
)

a
n
d

D
(
t
)

attacks
detectors
Figure 2.11: D(1)=40, d=0.7
2000 2001 2002 2003 2004 2005
0
0.5
1
1.5
2
2.5
3
3.5
x 10
4
Time
A
(
t
)

a
n
d

D
(
t
)

attacks
detectors
Figure 2.12: D(1)=80, d=0.7
tectors deployed is rst decreased and then increased to demonstrate the effect.
Figures 2.11 and 2.12 show the attack-detector interactions with less number
of initial deployment of detectors and more number of initial deployment of
detectors respectively. Also, with the deployed detectors remaining unchanged
Figures 2.13 and 2.14 show the effect of decreasing the detector efciency and
Chapter 2 45
increasing the detector efciency respectively.
2000 2001 2002 2003 2004 2005
0
1
2
3
4
5
6
7
8
x 10
5
Time
A
(
t
)

a
n
d

D
(
t
)

attacks
detectors
Figure 2.13: D(1)=60, d=0.5
2000 2001 2002 2003 2004 2005
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
4
Time
A
(
t
)

a
n
d

D
(
t
)

attacks
detectors
Figure 2.14: D(1)=60, d=1.0
2.5.1 Detectors learning from the detected attacks
The detectors usually learn fromthe attacks and feed into the patches for the OS,
virus signatures, rewalls and hence thwart future attacks. When a new attack is
detected by an IDS, the IDS analyzes the attack and nds out the possible vari-
ations of this attack. This contributes to new research and further modications
to the existing IDSs or totally new IDSs come up to encounter the variations of
the attack detected. It is a reasonable outcome of the analysis that happen when
an IDS detects an attack. The modied attack-detector model is given as:
A
t+1
= aA
t
(1 +dD
t
/g)
g
(2.6)
and
D
t+1
= cA
t
[1 (1 + (dD
t
/g))
g
] (2.7)
where c is the factor that incorporates the effect of detectors increasing in num-
ber depending on the number of attacks it detects. A special case of non-random
search is where some attacks are completely free from detection within a certain
time span (time refuge). In such a case, the detectors may aggregate in patches
of high attack density. This tendency of detectors aggregating in patches of high
attack density gives the attacks a refuge at low densities; a powerful stabilizing
force in the interaction. This aggregation is most extreme at low values of g.
Chapter 2 46
2.5.2 Detector correlation
The attack-detector model can be modied to incorporate detector density ef-
fects, through the correlation coefcient. As the detector density increases, the
density-dependent effect could stabilize the attack-detector model. The detector
searching efciency, d is the probability of detecting a particular attack in the
life time of a detector and it depends on the density of the detectors. As the de-
tector density increases, the detector searching efciency will reduce and hence
d = QD
m
t
or log(d) = log(Q) mlog(D
t
). Q = d if D
t
= 1 and mis the cor-
relation coefcient. Q is a factor that contributes to the determination of level,
but has no effect on stability. The individual searching efciency, d reduces
when there are more number of detectors. Again, when D
t
increases, corre-
lation coefcient increases and hence d reduces again. The detector searching
efciency versus the detector density if plotted on a log-log scale will be lin-
early decreasing with the detector correlation m as the slope of the plot. Thus
to introduce the effect of correlation coefcient , the attack-detector model is
modied as:
A
t+1
= aA
t
(1 +QD
1m
t
/g)
g
(2.8)
and
D
t+1
= cA
t
[1 (1 + (QD
1m
t
/g)
g
] (2.9)
This modication of the attack-detector model can completely alter the outcome
of an attack-detector model. Instead of always being unstable, this new model is
stable over a wide range of conditions depending on the attack growth rate (a),
and the amount of correlation (m). With a being very large, even small values
of m (say m = 0.3) will contribute markedly to the stability and may even give
complete stability. Apart from contributing to the stability of the attack-detector
interactions, detector correlation can also account for the frequent coexistence
of several detector varieties on one attack. The value of m increases as the
detector density increases. The detector searching efciency tends to become
independent of the detector density as the change in correlation becomes very
small. The detector curve as shown in Figure 2.11 is seen to come down with a
decrease in D(1), d and g. The detector picks up with a decrease in the value of
correlation coefcient m.
Chapter 2 47
Thus, the introduction of the interference factor into the model contributes to
the stability of the model. As m increases, the stability increases; this is due
to the detectors being distributed in a more aggregative manner to counter a
sophisticated attack - an effect termed as pseudo-interference. The hetero-
geneous attack distributions, coupled with detector aggregation at high attack
density, must be the main stabilizing mechanism. The clumping of detections
can be formally described as pseudo interference, same as the stabilizing dy-
namical effects of mutual interference among detectors. Detector correlation
can also account for the frequent coexistence of several detector varieties on
one attack.
The correlation coefcient and the aggregative behavior are, in reality, closely
related. The common effect of correlation coefcient is to reduce the required
searching time in direct proportion to the frequency of encounters. Search-
ing efciency d declines as detector density increases. The modication has a
marked effect on stability. Instead of always being unstable, there can now be a
stable equilibrium, given suitable values of the interference constant m and the
attack growth rate a. These notions can be applied to the present model to get a
pseudo-interference coefcient of magnitude
m
=
g(1a
(1/g)
)
ln(a)
.
That is, the overdispersion of detector attacks has much the same dynamical
consequences as would be produced by pure mutual interference among detec-
tors in a homogeneous world. The pseudo-interference coefcient m
corre-
sponds to a stable equilibrium if, and only if, g < 1. A special case of non-
random search is where some attacks are completely free from detection within
a certain time span (time refuge). In such a case, the detectors may aggregate
in patches of high attack density. This tendency of detectors to aggregate in
patches of high attack density is a powerful stabilizing force in the interaction.
This aggregation is most extreme at low values of g. The density dependent
attack growth is also a stabilizing factor. Since this behavior gives the attacks
a refuge at low densities, it is a potentially important factor in stabilizing the
attack-detector interaction. In short, there are both empirical and theoretical
Chapter 2 48
1999 2000 2001 2002 2003 2004
0
2
4
6
8
10
12
14
16
18
x 10
4
Time in years
N
u
m
b
e
r

o
f

A
t
t
a
c
k
s
Attacks over years 1999 to 2004
d=0.25
d=0.5
d=0.75
Figure 2.15: Effect of detector efciency on the attack growth rate
reasons for fastening on the negative binomial to approximate the distribution
of detectors in a patchy environment.
2.6 Validation of the model using real-world data
Figure 2.15 shows the validation of the attack-detector relationship using the
real world data. The middle plot for d = 0.5 exactly matches with the increase
in attacks as reported by CERT and given in Figure 2.7. The detector efciency
is chosen to be 0.5 as it approximately reects the performance of the IDS
available in the commercial market. In order to identify the effect of detectors
on the attack, we have used a higher average detection efciency as well as a
lower average detection efciency. Figure 2.15 shows the attack and detector
distributions for three different values of d (d = 0.25, 0.5 and 0.75), with typical
values and initial conditions for the coexistence of the attacks and the detectors
as: a = 0.25 (from Figure 2.7), A
1
= 10000, in year 1999, D
1
= 60, g = 0.7
and t varying from year 1999 to 2004 for which the data is available in CERT
site. The number of detectors have been chosen to be a fraction of the servers
existing at a particular time. The attacks increase rate is observed to be mainly
dependent on the efciency of the detectors deployed for the protection of the
servers, which are the main target for the attackers. The model is seen to be
in total agreement with the attack incident reports published by CERT for the
same period of time. It is understood from this modeling that in order to bring
Chapter 2 49
down the attacks, it is necessary to deploy detectors of higher detection. Hence
enhancing the performance of the IDSs is a step towards making the cyber space
safer.
2.6.1 Discussion on the modeling
The increased frequency, sophistication and strength of Internet attacks have led
to the proposal of numerous detectors. However, the problem is hardly tackled,
let alone solved. There are many factors that hinder the advance of defense
research.
It is necessary to thoroughly understand the attacks in order to design
imaginative solutions for them. It is generally believed that publicly re-
porting attacks damages the reputation of the victim network. Attacks are
therefore reported only to government organizations under obligation to
keep the details secret. There are efforts from researchers in the right di-
rection, but they still reveal only a small part of the total picture.
There is currently no benchmark suite of attack scenarios or established
evaluation methodology that would enable comparison between detectors.
In addition to known threats, there are attacks seen rarely in the wild and
mostly stealthy, and some novel attack methods. As usual suspects get
handled by detectors, these alien attacks will gain popularity. Understand-
ing these threats, implementing them in a test bed environment, and using
them to test detectors will help researchers keep one step ahead of the at-
tackers.
This chapter does not propose or advocate any specic defense mechanism.
Even though some sections might point out the different possibilities in the eld
of security, our purpose is not to criticize, but to draw attention to these defense
problems so that they might be solved.
2.7 Summary
Intrusion detection systems are becoming an indispensable and integral compo-
nent of any comprehensive enterprize security system. The reason being that
Chapter 2 50
the IDS has the potential to alleviate many of the problems facing current net-
work security.
A review of the issues connected with single IDSs has offered a critical analysis
to understand the need for further work in this eld of research. This chapter
is integral to the whole thesis supporting the correctness of the track, and rein-
forcing that there is a contribution to make in this eld. It demonstrates that the
existing work in the eld of intrusion detection has been understood critically
along with the most important issues and their relevance to this work, its con-
troversies, and its omissions.
In this chapter, issues connected with single IDSs are discussed. The prob-
lems associated with data skewness is exemplied. The need for improving the
performance of individual IDSs using advanced techniques is established. This
chapter makes certain inferences about the intrusion detection environment. The
normal trafc in any environment comprises of a majority of non-attacks and a
minority of attacks. The cost of missing an attack is higher than the cost of false
positives. Within the attack trafc, some attacks are even rarer. Rarer attacks
may also cause signicant damage. The IDSs are normally characterized by
the overall accuracy. The imbalance in data degrades the prediction accuracy.
Though an IDS can give very high overall accuracy, its performance for the
class of rarer attacks has been found to be less than acceptable. Hence, it is not
appropriate to evaluate the IDSs using predictive accuracy when the data is im-
balanced and/or the cost of different errors vary markedly. The data skewness
in the network trafc demands for an extremely low false positive rate of the
order of the prior probability of attack for an acceptable value of the Bayesian
attack detection rate.
The trends of F-score and precision/recall for IDSs over a period of 10 years
is analyzed and a model proposed to characterize the attack-detector behaviors
and formalize the attack-detector interactions. The modeling is based on deduc-
tion rules that are used to model the capabilities of the attacker and the detector.
The proposed model is validated with the empirical values. This modeling helps
in enriching the understanding and to further the design and research in IDSs.
Chapter 2 51
Also, the level of severity in a network environment due to the exponentially
growing Internet attacks is understood. This knowledge could then potentially
be used by a security analyst to understand and respond more effectively to
future intrusions. Usually, deploying more detectors and also improving the
detector performance is seen to bring down the attacks as understood with this
modeling.
The modeling also shows that as the intrusion detection performance improves
with time, the slope of F-score is positive and becomes steeper, which causes
the effect of attacks to disappear. However, it is not possible to get that type of
a growth rate with a single IDS. In order that the effect of attacks is not felt in
the information systems, it is necessary for the IDS performance to rise steep
and reach towards an F-score value of 1. Since none of the IDSs available in
literature can achieve this, it is felt necessary to make use of multiple IDSs,
beneting from the advantages of each one of them. The modeling is realistic
in an environment of network with multiple IDSs for protection, looking at the
system as a whole, instead of the individual responses to an attack. Thus, the
modeling of the attack-detection scenario also partially establishes the limita-
tions of a single IDS in attack detection.
The attack eld contains a multitude of attack and detection mechanisms, which
obscures a global view of the attack problem. This model is an attempt to cut
through the obscurity and structure the knowledge in this eld. The model is
intended to help the security research community to think about the threats we
face and the possible countermeasures. One benet we foresee from this study
is that of fostering easier cooperation among researchers. Attackers cooper-
ate to exchange attack code and information about vulnerable machines, and
to organize their agents into coordinated networks of immense power and sur-
vivability. The Internet community must be equally cooperative within itself to
counter the threat. They should look at how different mechanisms are likely to
work in concert, and identify areas of remaining weaknesses that require addi-
tional work. There is a pressing need for the research community to develop
common metrics and benchmarks for detector evaluation. It is clear that under
the pressures of a highly competitive global research environment, the eld of
Chapter 2 52
IDS will re-mould rapidly and overcome many of the existing limitations and
hurdles. As the eld grows, the attack-detection scenario will also be rened.
For more proactive defense, it is essential to understand the network defensive
and offensive strategies. With the attack-detector scenario better understood,
the future evolution of attacks can be estimated in a certain way thereby aid-
ing better attack detection and in turn reduced false negatives. This knowledge
helps the security community to become proactive rather than reactive with re-
spect to incident response.
Chapter 3
Evaluation and Test-bed of Intrusion
Detection Systems
The strongest arguments prove nothing so long as the conclusions are not ver-
ied by experience. Experimental Science is the queen of sciences and the goal
of all speculation.
Roger Bacon
3.1 Introduction
The poor understanding of the performance of IDSs available in literature may
be in-part caused by the shortage of an effective, unbiased evaluation and test-
ing methodology that is both scientically rigorous and technically feasible.
The choice of IDSs for a particular environment is a general problem, more
concisely stated as the intrusion detection evaluation problem, and its solution
usually depends on several factors. The most basic of these factors are the false
alarm rate and the detection rate, and their tradeoff can be intuitively analyzed
with the help of the Receiver Operating Characteristic (ROC) curve [43], [57],
[12], [58], [59]. However, as pointed out by the earlier investigators [49] [60]
[61], the information provided by the detection rate and the false alarm rate
alone might not be enough to provide a good evaluation of the performance of
an IDS. Hence, the evaluation metrics need to consider the environment the IDS
is going to operate in, such as the maintenance costs and the hostility of the op-
erating environment (the likelihood of an attack). In an effort to provide such an
evaluation method, several performance metrics such as Bayesian detection rate
53
Chapter 3 54
[49], expected cost [60], sensitivity [62] and intrusion detection capability [63],
have been proposed in literature. These metrics usually assume the knowledge
of some uncertain parameters like the likelihood of an attack, or the costs of
false alarms and missed detections. Yet despite the fact that each of these per-
formance metrics makes their own contribution to the analysis of IDSs, these
are rarely applied in the literature when proposing a new IDS.
In Appendix D, we review the method of evaluation and also describe the evalu-
ation methodology used in this thesis. Appendix Dintroduces some newmetrics
for IDS evaluation. Classication accuracy in IDSs deals with such fundamen-
tal problems as how to compare two or more IDSs, how to evaluate the perfor-
mance of an IDS, and how to determine the best conguration of an IDS. In an
effort to analyze and solve these related problems, evaluation metrics such as
Area Under ROC Curve, precision, recall, and F-score, have been introduced.
Additionally, we introduce the P-test [36], which is more of an intuitive way
of comparing two IDSs and also more relevant to intrusion detection evaluation
problem. We also introduce a formal framework for reasoning about the perfor-
mance of an IDS and the proposed metrics against adaptive adversaries.
We provide simulations and experimental results with these metrics using the
real-world trafc data as well as the DARPA 1999 data set in order to illustrate
the benets of the algorithms proposed in the chapters ve to eight of this the-
sis. The main reason for using the DARPA data set is that we need relevant data
that can easily be shared with other researchers, allowing them to duplicate and
improve our results. The common practice in intrusion detection to claim good
performance with real-time trafc makes it difcult to verify and improve pre-
vious research results, as the trafc is never quantied or released for privacy
concerns. We use both the DARPA data sets and the real-world trafc data.
Doing so and being able to compare and contrast the results should help alle-
viate most of the criticism against work based solely on the DARPA data, and
still allow work to be directly compared. Being the only comprehensive data
set that can be shared for IDS evaluation it becomes reasonable to analyze the
shortcomings and also its importance and strengths for such a critical evalua-
tion. Since this data set was made publicly available nine years back, the IDSs
Chapter 3 55
that were developed after this time were taken for analyzing whether the data
set has become obsolete. The analysis shows that the inability of the IDSs far
outweigh the limitations of the data set. This section is supposed to give enough
support to IDS researchers using the DARPA data set in their evaluations. This
chapter also highlights the inability of single IDSs to make a complete coverage
of the entire attack domain. This clearly establishes the need for multiple and
heterogeneous IDSs for a wide coverage of the present-day attacks.
3.2 Data set
The MIT Lincoln Laboratory under DARPA and AFRL sponsorship, has col-
lected and distributed the rst standard corpora for evaluation of computer net-
work intrusion detection systems [48]. This DARPA evaluation data set [42]
is used for the purpose of training as well as testing intrusion detectors. These
evaluations contributed signicantly to the intrusion detection research by pro-
viding direction for research efforts and an objective calibration of the technical
state-of-the-art. They are of interest to all researchers working on the general
problem of workstation and network intrusion detection [60].
In the DARPA IDS evaluation data set, all the network trafc including the
entire payload of each packet was recorded in tcpdump format and provided for
evaluation. Taking the DARPA 1999 data set for further discussion, the data set
consists of weeks one, two and three of training data and weeks four and ve of
test data. In training data, the weeks one and three consist of normal trafc and
week two consists of labeled attacks.
The DARPA 1999 test data consisted of 190 instances of 57 attacks which in-
cluded 37 Probes, 63 DoS attacks, 53 R2L attacks, 37 U2R/Data attacks with
details on attack types given in Table 3.1. Even with its serious drawbacks, as
can be seen in [64] and [65], and the potential questions about the adequacy
of the data for its intended purpose, still there is no good data set other than
DARPA data set for IDS evaluation. The DARPA data has certainly been use-
ful in the development of the system proposed in this thesis. The details of the
Chapter 3 56
Table 3.1: Attacks present in DARPA 1999 data set
Attack Class Attack Type
Probe portsweep, ipsweep, lsdomain, ntinfoscan,
mscan, illegal-sniffer, queso, satan
DoS apache2, smurf, neptune, dosnuke, land,
pod, back, teardrop, tcpreset, syslogd,
crashiis, arppoison, mailbomb, selfping,
processtable, udpstorm, warezclient
R2L dict, netcat, sendmail,imap, ncftp,
xlock, xsnoop, sshtrojan, framespoof,
ppmacro, guest, netbus, snmpget,
ftpwrite, httptunnel, phf, named
U2R sechole, xterm, eject, ntfsdos, nukepw,
secret, perl, ps, yaga, fdformat, ppmacro
ffbcong, casesen, loadmodule, sqlattack
usefulness of the DARPA data set is included in section 3.3. Some of the pub-
licly available data sets [66] have been investigated, but they are not entirely
suitable for the analysis mainly due to the absence of the application payload.
Two anomaly detectors, PHAD [67] and ALAD [68], which give extremely low
false alarm rate of the order of 0.00002 and a third IDS which is the popularly
used open source IDS, Snort [69] and the fourth IDS being the commercially
accepted CISCO IDS 4215 [70] are considered in this study.
To improve the performance of the IDSs PHAD and ALAD, more data has
been incorporated in their training. Normal data was collected from a secured
University internal network and this has been randomly divided into two parts.
PHAD is trained on week three of the data set and one portion of the internal
network trafc data, and ALAD is trained on week one of the data set and the
other portion of the internal network trafc data. Hence the two anomaly-based
IDSs PHAD and ALAD are trained on disjoint sets of training data. The cor-
relation among the classiers is lowered due to factors like more training data
and that too disjoint and also more training time.
Chapter 3 57
3.3 Usefulness of DARPA data set for IDS evaluation
With the increase in the network trafc and the introduction of new applica-
tions and attacks over time, continuous improvement is required to make the
IDS evaluation data set a key element to keep it valuable for researchers. The
user behavior also shows great unpredictability and changes over time. Mod-
eling the network trafc is an immensely challenging undertaking because of
the complexity and intricacy of human behaviors. The DARPA data set mod-
els the synthetic trafc from a session level. Evaluating the proposed IDS with
DARPA 1999 data set may not be representative of the performance with more
recent attacks or with other attacks against different types of machines, routers,
rewalls or other network infrastructure. All these reasons have caused a lot of
criticisms against this IDS evaluation data set.
A paper that discusses in similar lines as presented in this section is by Brugger
[71] . He has analyzed the DARPA 1998 data set using Snort and have con-
cluded that any sufciently advanced IDS should be able to achieve good false
positive detection performance on the DARPA IDS evaluation data set.
3.3.1 Criticisms against the DARPA IDS evaluation data set
The main criticism against the DARPA IDS evaluation data set is that the test
bed trafc generation software is not publicly available, and hence it is not pos-
sible to determine how accurate the background trafc inserted into the evalua-
tion is. Also the evaluation criteria does not account for system resources used,
ease of use, or what type of system it is [72].
The other popular critiques to the DARPA IDS evaluation data set are by Mc
Hugh [64] and by Mahoney and Chan [65]. Mc Hugh [64] criticizes the pro-
cedures used in building the data set and in performing the evaluation. In his
critique of DARPA evaluation, Mc Hugh questioned a number of their results,
starting from usage of synthetic simulated data for the background and using at-
tacks implemented via scripts and programs collected from a variety of sources.
In addition, the background data does not contain the background noise like
the packet storms, strange packets, etc. Hence, the models used to generate
Chapter 3 58
background trafc were too simple with the DARPA data set, and if real back-
ground trafc was used, the false positive rate would be much higher. Mahoney
and Chan [65] comments on the irregularities in the data, like the obvious dif-
ference in the TTL value for the attacks as well as the normal packets, which
makes even a trivial detector showing appreciable detection rate. They have
conducted an evaluation of anomaly-based network IDS with an enhanced ver-
sion of the DARPAdata set created by injecting benign trafc froma single host.
All the above criticisms have been well researched comments and these works
have made it clear that there remain several issues unsolved in design and mod-
eling of the resultant data set. However, we cannot agree to the comment made
by Pickering [72] that benchmarking, testing and evaluating with the DARPA
data set is useless unless serious breakthroughs are made in machine learning.
The DARPA data set has the drawback that it was not recorded on a network
connected to the Internet. Internet trafc usually contains a fairly large amount
of anomalous trafc that is not caused by any malicious behavior [73]. Hence
the DARPA data set being recorded in a network isolated from the Internet
might not include these types of anomalies. The unsolved problems clearly re-
main. However, in the lack of better benchmarks, vast amount of the research
is based on the experiments performed on the DARPA data set. The general
thought that even with all the criticisms, the DARPA data set is still rigorously
used by the research community for evaluation of IDSs bring to the fore the
motivation for this section.
3.3.2 Facts in support of the DARPA IDS evaluation data set
A data set that is seen to be used for IDS evaluation other than the DARPA data
set is the Defcon Capture The Flag (CTF) data set. Defcon is an yearly hacker
competition and convention. However, this data set has several properties that
makes it very different from the real-world network trafc. The differences in-
clude an extremely high volume of attack trafc, the absence of background
trafc, and the availability of a very small number of IP addresses. The non-
availability of any other data set that includes the complete network trafc was
probably the initial reason to make use of the DARPA data set for evaluation by
a researcher in IDS. Also, the experience while trying to work with the real data
Chapter 3 59
trafc was not good; the main reason being the lack of the information regard-
ing the status of the trafc. Even with intense analysis the prediction can never
be 100 percent accurate because of the stealthiness and sophistication of the at-
tacks and the unpredictability of the non-malicious user. It involves high cost
if an attempt is made to properly label the network connections with raw data.
Hence most of the research work that used the real network data were not able
to report the detection rate or other evaluation metrics for a comparison purpose.
Mahoney and Chan [65] comment that if an advanced IDS could not perform
well on the DARPA data set, it could also not perform acceptably on realistic
data. Hence before thinking of junking the DARPA data set, it is wise to see
whether the state-of-the-art IDSs perform well, in the sense that it detects all the
attacks of the DARPA data set.
With the general impression that the data set used was old and hence not ap-
propriate for IDS evaluation, the poor performance of some of the evaluated
IDSs were expected and hence acceptable. Assuming that the data set is not
generalized and hence counting that as a drawback of the data set, ne tuning
of the IDSs to the data set was considered. Snort has a main conguration le
that allows one to add and remove preprocessor requirements as well as the in-
cluded rules les. The limit of fragmentation to be taken into notice and the
requirement of packet reconstruction are typically specied in this le. Snort
can be customized to perform better in certain situations using the DARPA data
set by improving the Snort rule-set. Thus, we tried to manipulate the benchmark
system.
3.3.3 Results and discussion
Test setup
The test setup for the experimental evaluation consisted of three Pentium ma-
chines with Linux Operating System. The experiments were conducted with the
simulated IDSs Snort version 2.3.4, PHAD, and ALAD and also the Cisco IDS
4215, distributed across a single subnet observing the same domain. This col-
lection of heterogeneous IDSs was to examine how the different IDSs perform
in detecting the attacks of the DARPA 1999 data set.
Chapter 3 60
Experimental evaluation
The IDS Snort was evaluated with the DARPA 1999 data set and the results are
shown in Table 3.2. It can be noted in table 3.2 that some of the attacks for a
certain attack type may get detected whereas some other attacks from the same
attack type may not get detected. Hence some of the attack types appears in
both rows of Table 3.2. The performance of PHAD and ALAD on the same
Table 3.2: Attacks detected by Snort from the DARPA 1999 data set
Attacks detected teardrop, dosnuke, portsweep, sshtrojan, sechole
by Snort ftpwrite, yaga, phf, netcat, land, satan,
nc-setup, imap,nc-breakin, ncftp, guessftp,
tcpreset, secret, selfping, dosnuke, crashiis,
sqlattack, ntinfoscan, neptune, httptunnel, udpstorm,
ls, xlock, xsnoop, named, loadmodule, ppmacro
Attacks not detected ps, portsweep, crashiis, sendmail, netcat,
by Snort nfsdos, sshtrojan, ftpwrite, back,
guesspop, xsnoop, pod, snmpget, eject, dict,
guesstelnet, syslogd, guestftp, netbus, crashiis,
secret, smurf, httptunnel, loadmod, ps, ntfsdos,
arppoison, sqlattack, sechole, mailbomb, secret,
queso, processtable, sqlattack, fdformat, apache2, warez,
arppoison ffbcong, named, casesen, land, xterm1
data set are given in Tables 3.3 and 3.4 respectively. The duplication in both the
Table 3.3: Attacks detected by PHAD from the DARPA 1999 data set
Attacks detected fdformat, teardrop, dosnuke, portsweep, phf,
by PHAD land, satan, neptune
Attacks not detected loadmodule, anypw, casesen, ffbcong, eject,
by PHAD ntfsdos, perl, ps, sechole, sqlattack, sendmail,
nfsdos, sshtrojan, xlock, guesspop, xsnoop, snmpget,
guesstelnet, guestftp, netbus, crashiis,
secret, smurf, httptunnel, loadmod, arppoison, land,
mailbomb, processtable, ppmacro, fdformat,
warez, arppoison, named
rows as appeared in Table 3.2 is avoided in the rest of the tables to the maximum
extent possible by making the entry depending on the majority of detections or
missings from a certain attack type. The attacks detected by the Cisco 4215 IDS
is given in the Table 3.5.
Chapter 3 61
Table 3.4: Attacks detected by ALAD from the DARPA 1999 data set
Attacks detected casesen, eject, fdformat, ffbcong, sechole, xterm,
by ALAD yaga, phf, ncftp, guessftp, crashiis, ps
Attacks not detected loadmodule, anypw, nfsdos, perl, sqlattack, sendmail,
by ALAD sshtrojan, xlock, guesspop, xsnoop, snmpget,
netbus, secret, smurf, httptunnel, loadmodule,
arppoison, sqlattack, sechole, land, mailbomb,
processtable, sqlattack, ppmacro, warez, arppoison, named
Table 3.5: Attacks detected by Cisco IDS from the DARPA 1999 data set
Attacks detected portsweep, land, crashiis, ppmacro, mailbomb, netbus,
by Cisco IDS sechole, sshtrojan, imap, phf
Attacks not detected teardrop, dosnuke, ps, ftpwrite, yaga, sendmail,
by Cisco IDS nfsdos, xlock, guesspop, xsnoop, snmpget,
guesstelnet, guestftp, secret, smurf, httptunnel,
loadmod, ps, ntfsdos, arppoison, sqlattack,
processtable, sqlattack, fdformat, warez,
arppoison, named, satan, nc-setup, nc-breakin, ncftp
Discussion
The experimental evaluation gave rise to certain questions:
Since the DARPA attack signatures were known as the most popular data
set for the IDSs at the time of their design and development, why is it that
a 100% detection becomes impossible?
Why is it not possible to have zero false alarms with a signature-based IDS
like Snort?
The anomaly detectors are also inferior in attack detection and high with
false alarms even when they were thoroughly trained on the normal data
set. Why is it that none of the learning algorithms that learn from the
normal trafc behavior learn successfully when there is no shortage for
the normal trafc data from the data set or even otherwise?
The Snort is designed as a network IDS; extremely good at detecting distributed
port scans and also fragmented attacks which hide malicious packets by frag-
mentation. The pre-processor of Snort is highly capable of defragmenting the
packets. Matching the alert produced by Snort with the packets in the data set
by means of timestamp might sometimes cause missing. This is mainly because
Chapter 3 62
of the time gap within 10seconds between the two. However, the Table 3.2
shows that the DARPA 1999 data set does in fact model attacks that Snort has
trouble detecting or the Snorts signature database is still not updated with those
signatures. Isnt it reasonable to think that the attacks for which the signatures
are not available with an IDS like Snort, which has its rule set regularly up-
dated, are the ones that still exist undetected? The attackers also are vigilant of
the detection trend and hence cant we think that some of the latest attacks are
variants of those undetected attacks since those attacks were successful in terms
of detection avoidance. Or cant we say that if an IDS is capable of detecting
those attacks in addition to the ones detected by the Snort, it is a better perform-
ing IDS than Snort? Or is it reasonable to think of changing the testbed when
the IDS is suboptimum in performance on that test bed?
In a study made by Sommers et al. [74], after comparing the two IDSs Snort
and Bro, they comment that Snorts drop rates seem to degrade less intensely
with volume for the DARPA data set. They have also concluded in the paper
that Snorts signature set has been tuned to detect DARPA attacks. Even then,
if we cannot detect all the attacks of this nine year old data set, it clearly shows
the inability of reproducing the signatures of all the available attacks in the data
set of a signature-based IDS. This shows the inability of the IDSs rather than
the deciency of the data set.
Preprocessing of the DARPA data set is required before applying to any ma-
chine learning algorithm. With the anomaly based IDSs, PHAD and ALAD, we
tried to train them by mixing the normal data from an isolated network along
with the week 1 and week 3 respectively of the training data set. Even then,
the algorithms produce less than 50% detection and around 100 false alarms for
the entire DARPA test data set. Again, there are enough reasons to think of the
failure on the part of the learning algorithms.
The usual reasoning for the poor performance of the anomaly detectors is that
the training and the test data are not correlated; but that happens in real-world
network trafc as well. The normal user behavior changes so drastically from
what the algorithm has been trained with, and hence we expect the machine
Chapter 3 63
learning algorithms to be extremely sophisticated and learn the changing be-
havior. Hence, the uncorrelated test bed is good for evaluating the performance
of learning algorithms. Then again, it is the failure on the part of the learn-
ing algorithms rather than the data set if the anomaly detectors are performing
poorly. Hence it can be concluded that the DARPA data set, even though old,
still carries a lot of novelty and sophistication in attacks.
The Cisco IDS is a network-based intrusion detection system that uses a signa-
ture database to trigger intrusion alarms. As any other network IDS, the Cisco
IDS also has only a local view. This feature-gap is pointed out indirectly in
[75]; ...does not operate properly in an asymmetrically routed environment.
Thus, the main reasons for the poor performance of the IDSs with the DARPA
1999 IDS evaluation data set are the following:
The training and test data sets are not correlated for R2L and U2R attacks
and hence most of the pattern recognition and machine learning algorithms
except for the anomaly detectors that learn only from the normal data, will
perform badly while detecting the R2L and the U2R attacks.
The normal trafc in real networks and also in the data set are not corre-
lated and hence the trainable algorithms are expected to generate a lot of
false alarms.
None of the network based systems did very well against host based, U2R
attacks [76].
The DoS and the R2L attacks have a very low variance and hence difcult
to detect with a unique signature by a signature-based IDS or to observe as
an anomaly by an anomaly detector [14].
Several of the surveillance attacks probe the network and retrieve signi-
cant information, and they go undetected, by limiting the speed and scope
of probes [76].
The data set provides a large sample of computer attacks embedded in
normal background trafc; several realistic intrusion scenarios conducted
in the midst of normal background data.
Chapter 3 64
Many threats and thereby the exploits that are available on the computer
systems and networks are undened and open-ended.
The above limitations have to be overcome by sophisticated detection tech-
niques for an improved and acceptable IDS performance. We have also seen
that Snort performs exceptionally well in detecting the U2R attacks and DoS
attacks, PHAD performs well in detecting the probes and ALAD performs well
in detecting the R2L attacks. This clearly shows that each IDS is designed to
focus on a limited region of the attack domain rather than the entire attack do-
main. Hence IDSs are limited in their performance at the design stage itself.
On analyzing certain IDS alerts, the doubt arises as to whether it is justiable
to say that the IDS detects the particular attack. Considering for instance the
attacker executes the command : $./exploit. In the real data set especially for
per packet model, it will get translated to many packets, with the rst packet
containing $
, second packet containing ., third packet containing /, fourth

packet containing e. Is it justiable to say that the IDS detects the particu-
lar attack when the IDS detects the fourth packet as anomalous? It depends on
the implementation of the IDS. Some IDS buffers the data before matching it
against the stored patterns. In that case, it is able to see the whole string $./ex-
ploit and hence detects the anomaly. For an IDS that analyzes on a per packet
basis, it is able to nd some anomalous pattern in one packet before the con-
nection is terminated and then ags it as an anomalous connection. If the aim
is to nd intrusive connections, then any packet corresponding to intrusive
connection, detected as malicious, should be good.
3.4 Choice and the performance improvement of individual
IDSs
The acceptable false alarm rate has been established to be extremely low, almost
as low as the prior probability. Hence two IDSs, namely PHAD [67] and ALAD
[68], which give extremely lowfalse alarmrate of the order of 0.00002 is chosen
for this work. The third IDS is the popularly used open source IDS, the Snort
[69] is also considered in this work. With the rst two IDSs, the Bayesian
detection rate is of the order of 35% and 38% respectively. Thus one of the
Chapter 3 65
primary reasons for choosing the IDSs PHAD and ALAD was the requirement
of the acceptability in terms of the number of false alerts that does not overload
a system analyst. The other reason for the choice of PHAD and ALAD was
that most of the existing IDS algorithms neglect the minority attack types, R2L
and U2R in comparison to the majority attack types, probes and DoS. ALAD
is highly successful in detecting these rare attack types. Also, Snort detects
the U2R/Data attacks exceptionally well. All the above IDSs are average in
terms of detection performance. Hence an attempt was made to improve the
performance of the individual IDSs.
3.4.1 Snort: Improvements by adding new rules
Snort has been identied to have a lot of rules that are named differently from
that in the DARPA 99 data set. For example the land attack which comes
under the DoS attack class is found in the bad trafc rules folder of Snort and
not in the DoS rules. The attack warezclient which downloads illegal copies
of software has been identied by Snort with a rule that looks for the executable
code on the FTP port. Also, many of the rules are very generic and hence
the chances of false positive were very high. However, it was identied that it
requires tremendous effort to modify those generic rules and we have succeeded
only to a very small extent. We seek for a higher recall objective in the rst
phase and the fusion is expected to reduce the false alarms to some extent. Snort
rules were modied for the DoS attacks like land, dosnuke and selfping. When
trying to incorporate the rules, care has been taken not to overt or make it very
generic. This avoids FNs and FPs to the maximum extent possible. For example
when the signature is connection type=ftp, the misclassication should not
happen because it can be due to DoS also in the ooding attempt. Hence, rule
R2L has to be rened for the absence of DoS attacks [110]. The rules may thus
incorporate more conditions for renement and thus avoid misclassication and
hence the misclassication cost also. This has increased the Snort detection of
DoS.
Chapter 3 66
3.4.2 PHAD/ALAD
PHAD was highly reliable in detecting all the probes except for the stealthy
slow scans which have been included in the DARPA99 data set. The stealthy
probes which PHAD missed are ipsweep, lsdomain, portsweep and resetscan.
However, Snort was effective in identifying those stealthy ones by waiting for
longer that one minute between the successive network transmissions. PHAD
has the disadvantage that it classies attacks based on a single packet. We have
improved PHAD by examining a session and detecting the anomalies in the
connection rather than only at the packet level. A connection (record) is a se-
quence of TCP packets starting and ending at some well-dened times, between
which data ows from the source IP address to the target IP address under some
well-dened protocol.
The detection performance of the anomaly detectors PHAD and ALAD can
be improved further by training them on additional normal trafc other than the
trafc of weeks one and three of the DARPA 1999 data set. To improve the
performance of the IDSs PHAD and ALAD, more data has been incorporated
in their training. Normal data was collected from an University internal net-
work and this has been randomly divided into two parts. PHAD was trained on
week three of the data set and one portion of the internal network trafc data,
and ALAD is trained on week one of the data set and the other portion of the
internal network trafc data. Hence, the two anomaly-based IDSs PHAD and
ALAD are trained on disjoint sets of the training data. The correlation among
the classiers is lowered by incorporating more of training time and that too
being disjoint. The disjoint data sets given to PHAD and ALAD for training
has also helped to an extent in feature selection and also in reducing the corre-
lation between the two IDSs. Both PHAD and ALAD look into almost disjoint
features of the trafc. PHAD detects anomaly based on the intrinsic features
of the TCP, UDP, IP, ICMP and the Ethernet headers. ALAD detects anomaly
based on almost disjoint features of the trafc by looking at the inbound TCP
stream connection to well-known server ports.
There are a number of DoS as well as R2L attacks that are difcult to get de-
tected since they exploit a large number of different network or system services.
Chapter 3 67
There is no regular pattern for such attacks for detection by misuse detection
systems. The anomaly detection systems are also unable to detect them since
they may look like normal trafc because of the attacker evading some trusted
hosts and using them for an attack. These attacks are highly sophisticated and
needs a thorough analysis by a specialized detector. In addition, there is an ob-
servable imbalanced intrusion result due to DoS having more connections than
any other attack. Most of the IDSs will try to minimize the overall error rate,
but this leads to increase in the error rate of rare classes. Hence, more efforts
should be made to improve the detection rate of the rare classes.
This section has highlighted that even with an effort to improve the available
IDSs PHAD, ALAD and Snort, these IDSs still remain suboptimum with detec-
tion rates less than 50%.
3.5 Summary
The whole world has a growing interest in network security. The DARPAs
sponsorship, the AFRLs evaluation and the MIT Lincoln Laboratorys support
in security tools have resulted in a world class IDS evaluation setup that can
be considered as a ground breaking intrusion detection research. The DARPA
evaluation data set has the required potential in modeling the attacks that are
commonly found in the network trafc. Hence we conclude by commenting
that it can be used to evaluate the IDSs in the present scenario, even though any
effort to make the data set more real and therefore fairer for IDS evaluation is
to be welcomed. If a system is evaluated on the DARPA data set, then it cannot
claim anything more in terms of its performance on the real network trafc.
Hence this data set can be considered as the base line of any research.
In an effort to analyze and solve the IDS evaluation problems, evaluation
metrics such as Area Under ROC Curve, precision, recall, and F-score have
been introduced in Appendix D. Additionally, the P-test, which is more of an
intuitive way of comparing two IDSs and also more relevant to intrusion detec-
tion evaluation problem has been included in Appendix D. The metrics used for
IDS evaluation like F-score and P-test are highly effective for a perfect compar-
ison of IDSs.
Chapter 4
Mathematical Basis for Sensor Fusion
Mathematics possesses not only truth, but supreme beauty - a beauty cold and
austere like that of sculpture, and capable of stern perfection, such as only great
art can show.
Bertrand Russell
4.1 Introduction
Chapter two and chapter three established the issues as well as limitations of a
single IDS respectively. Sensor fusion was identied as a viable solution for en-
hancing the performance of IDSs. The primary objective of the proposed thesis
is to develop a theoretical and practical basis for enhancing the performance of
intrusion detection systems using advances in sensor fusion with easily avail-
able IDSs. This chapter introduces the mathematical basis for sensor fusion in
order to provide enough support for the acceptability of sensor fusion in intru-
sion detection applications. Clearly, sensor fusion for performance enhance-
ment of IDSs requires very complex observations, combinations of decisions
and inferences via scenarios and models. The basic problem involves selecting
IDSs and choosing the appropriate sensor fusion algorithms that provide suf-
cient enhancement in the performance of the fused IDS. Although, fusion in the
context of enhancing the intrusion detection performance has been discussed
earlier in literature, there is still a lack of theoretical analysis and understand-
ing, particularly with respect to correlation of detector decisions. In this chapter,
we formulate the problem of fusion of multiple heterogeneous IDSs and exam-
ine whether the improvement in performance could be achieved through sensor
68
Chapter 4 69
fusion. This chapter describes the central concept underlying the work and a
theme that ties together all the arguments in this work. It provides an answer to
the questions posed in the introduction at a conceptual level.
With a precise understanding as to why, when, and how particular sensor fu-
sion methods can be applied successfully, progress can be made towards a pow-
erful new tool for intrusion detection: the ability to automatically exploit the
strengths and weaknesses of different IDSs. The theoretical model is under-
taken, initially without any knowledge of the available detectors or the moni-
toring data. The empirical evaluation to augment the mathematical analysis is
illustrated in the chapters ve to eight using two data sets; 1) the real-world
network trafc and 2) the DARPA 1999 data set. The results in those chapters
conrm the analytical ndings in this chapter.
This chapter is organized as follows: Section 4.2 discusses the sensor fusion
algorithms. Section 4.3 and section 4.4 provide a survey on the related work in
sensor fusion and also the related work of sensor fusion in intrusion detection
applications respectively. Section 4.5 includes the theoretical formulation and
section 4.6 includes the solution approaches in the intrusion detection applica-
tions using sensor fusion. The chapter is summarized in section 4.7.
4.2 Sensor fusion algorithms
In this section, we provide state-of-the-art review in the area of intrusion detec-
tion based on sensor fusion approaches. This section aims to help choose the
appropriate sensor fusion algorithm for any given data set by making it easy to
compare the utility of different sensor fusion algorithms on the specic data set
of interest. Several approaches have been proposed for sensor fusion such as
weighted average, fuzzy logic, neural networks, Bayesian Inference and prob-
ability techniques, Dempster-Shafer evidence theory and Kalman lters. Intru-
sion detection using machine learning algorithms has the advantage in identify-
ing new or unknown data or signal that a machine learning system is not aware
of during training. We also investigate and compare the performance achieved
by different machine learning algorithms for sensor fusion namely the statistical
Chapter 4 70
approaches, Articial Neural Networks (ANN), Radial Basis Functions (RBF),
Support Vector Machines (SVM) and Naive Bayes (NB) trees. The conditions
under which each of these techniques operate efciently is identied and the
detection effectiveness of these strategies are compared.
4.2.1 Machine Learning for intrusion detection
Machine Learning in intrusion detection is a problem that is researched for last
12 years. The most prominent works on data mining for intrusion detection have
been conducted in University of New Mexico (S. Forrest and S. A. Hofmeyr),
Purdue University (T. Lane and C. E. Brodley), Reliable Software Technologies
(A.K. Ghosh, A. Schwartzbard, and M. Schatz), University of Minnesota (V.
Kumar, P. Dokas, L. Ertoz, and A. Lazarevic), Columbia University (S. Stolfo
and E. Eskin), North Carolina State University (W. Lee), Florida Institute of
Technology (P. Chan and M. Mahoney), George Mason University (S. Jajodia,
D. Barbara, and N. Wu), Arizona State University (Nong Ye). Of course, this
list is not exhaustive.
It is quite intriguing that both Bagging and Boosting worked quite badly with
the DARPA data set. We are using the classication method to solve an articial
classication problem to which we have reduced the original outlier detection
problem. This reduced classication problem tends to be highly noisy because
the articial examples are in the background of the real ones. As it is known
in the literature, Boosting tends to work poorly in the presence of high noise
because it puts too much weight on the incorrectly labeled examples.
Statistical approaches
Statistical approaches are mostly based on modeling the data based on its sta-
tistical properties and using this information to estimate whether a test samples
comes from the same distribution or not. The simplest approach can be based
on constructing a density function for data of a known class, and then assum-
ing that data is normal computing the probability of a test sample of belonging
to that class. The probability estimate can be thresholded to signal the intrusion.
Two main approaches exist to the estimation of the probability density function,
Chapter 4 71
parametric and non-parametric methods. The parametric approach assumes that
the data comes from a family of known distributions, such as the normal distri-
bution and certain parameters are calculated to t this distribution. However, in
most real world situations, the underlying distribution of the data is not known
and hence such techniques have little practical importance. In non-parametric
methods the overall form of the density function is derived from the data as well
as the parameters of the model. As a result non-parametric methods give greater
exibility in general systems.
Neural networks
Some issues for sensor fusion such as their ability to generalize, computational
expense during training and further expense when they need to be retrained are
critical to neural networks in comparison to statistical methods. A subjective
view supports the use of neural network for sensor fusion in order to achieve
novelty detection in intrusion detection applications. The neural networks gain
experience by training the system to correctly identify the preselected examples
of the problem. The back-propagation algorithm can be used in the learning
phase to adapt the weights of the neural network. The computational complexity
of neural networks has always been an important consideration for practical
applications. One important consideration with neural networks is that they
cannot be as easily retrained as statistical models. Retraining is done when new
class data is to be added to the training set or when the training data no longer
reects the environmental conditions.
Support Vector Machines
The support vector machine (SVM) is a supervised classication system that
minimizes an upper bound on its expected error. It attempts to nd the hyper-
plane separating two classes of data that will generalize best to future data. Such
a hyperplane is the so called maximum margin hyperplane, which maximizes
the distance to the closest points from each class. Generally, they work well
when the number of features is magnitudes higher than the available training
data. They also avoid the two problems of dimensionality; they generalize well
to unseen data and they are efcient as they avoid explicit use of higher order
dimensional spaces.
Chapter 4 72
Bayesian classiers
Bayes estimator is an estimator or decision rule that maximizes the posterior
expected value of a utility function or minimizes the posterior expected value of
a loss function. The estimator which minimizes the posterior expected loss also
minimizes the Bayes risk like the mean square error and therefore is a Bayes
estimator.
Decision tree and Naive Bayes
Decision trees (D-trees) dominate SVMs which in turn dominate NB in both the
precision and recall values. However, D-trees show a much larger uctuation
in accuracy in the initial stages. This is to be expected because decision trees
are known to be unstable classiers. SVMs are better in the initial stages of ac-
tive learning when the training data is small but they loose out later. SVMs are
known to excel on accuracy but the uncertainty value measured as the distance
from SVM separator is perhaps not too meaningful. D-trees turn out to be better
in the combined metric.
An intuitive method for measuring uncertainty for separator based classiers
like SVMs is to make it inversely proportional to the distance of the instance
from the separator. Similarly for Bayesian classiers, the posterior probabilities
of classes can be used as an estimate of certainty. For decision trees, typically
uncertainty is derived from the error of the leaf into which the instance falls.
NB tree
The complementary behavior of NB and the D-trees has given rise to their hy-
brid which outperforms most of the earlier methods for intrusion detection ap-
plication.
4.2.2 Evidence Theory
The Dempster-Shafer (DS) method is a powerful tool that can deal with sub-
jective hypothesis for evidence as well as statistical data combination. The DS
Chapter 4 73
method does not have the requirement like Bayesian that the sensor set be pre-
dened and sensors joint observation probability distribution be known before-
hand. The DS rule corresponds to a conjunction operator: it builds the belief
induced by accepting two pieces of evidence, i.e., by accepting their conjunc-
tion. Shafer developed the DS theory of evidence based on the model that all
the hypotheses in the FoD are exclusive and that the frame is exhaustive, which
is true with the decisions of multiple IDSs that are to be fused.
The DS infers the true state of the system without having an explicit model
of the system. It is based only on some observations that can be considered as
hints (with some uncertainty) towards some system states. DS theory makes
the distinction between uncertainty and ignorance, so it is a very useful way to
reason with uncertainty based on incomplete and possibly contradictory infor-
mation extracted from a stochastic environment.
4.2.3 Kalman lter
The Kalman lter is an efcient recursive lter that estimates the state of a dy-
namic system from a series of noisy measurements. A linear system in which
the mean squared error between the desired output and the actual output is min-
imized when the input is a random signal generated by white noise.
4.2.4 Bayesian network
A Bayesian network or a belief network is a probabilistic graphical model that
represents a set of variables and their probabilistic independencies. Formally,
Bayesian networks are directed acyclic graphs whose nodes represent variables,
and whose missing edges encode conditional independencies between the vari-
ables. Nodes can represent any kind of variable. Efcient algorithms exist that
perform inference and learning in Bayesian network.
4.3 Related work in sensor fusion
Blum [81] suggests that analytical studies of fusion performance can augment
existing experimental studies by addressing some aspects that are difcult to
Chapter 4 74
study using experimental methods. In his work, an estimation theory approach
is employed using a mathematical model based on the observation that each dif-
ferent sensor can provide a different quality when viewing a given object in the
scene.
Dasarathy [82] considers a generalized input-output (I/O) descriptor pair based
characterization of the sensor fusion process. The fusion system design phi-
losophy expounded in his work is that an exhaustive exploitation of the sensor
fusion potential should explore fusion under all of the different I/O-based fusion
modes conceivable under such a characterization. Fusion system architectures
designed to permit such exploitation offer the requisite exibility for develop-
ing the most effective fusion system designs for a given application.
Cohen et al. [83] presents a method for evaluating sensor fusion algorithms
based on a quantitative comparison, which is independent of the data acquired
and the sensors used. The sensor fusion performance measures and performance
analysis procedure provide a basis for modeling, analyzing, experimenting, and
comparing different sensor fusion algorithms. The statistical analysis provides
a systematic method for comparing sensor fusion algorithms. Quantitative pro-
cedures are developed to ensure that specic environmental conditions do not
inuence the evaluation.
Iyengar and Brooks in their book [77] comment that understanding multi-sensor
fusion helps in achieving the most sophisticated way to deliver accurate real-
world data to computer systems. Li et al. [84] compare three different fusion
rules for an arbitrary number of sensors, with complete, incomplete or no prior
information about the estimate. Krogh and Vedelsby [85] prove that at a single
data point the quadratic error of the ensemble estimator is guaranteed to be less
than or equal to the average quadratic error of the component estimators. The
qualitative benets of sensor fusion like the increased condence of detection,
improved system reliability and reduced ambiguity of inferences result in oper-
ational advantages. The uncertainty of the estimated detection on sensor fusion
is also expected to be smaller than the uncertainty of the individual detection
alone. Thus the qualitative benets in turn give a complete detailed description
Chapter 4 75
of the improved performance with sensor fusion.
Hall and McMullen [86] state that if the tactical rules of detection require that
a particular certainty threshold must be exceeded for attack detection, then the
fused decision result provides an added detection up to 25% greater than the
detection at which any individual IDS alone exceeds the threshold. This added
detection equates to increased tactical options and to an improved probability
of true negatives [86]. Another attempt to illustrate the quantitative benet of
sensor fusion is provided by Nahin and Pokoski [87]. Their work demonstrates
the benets of multisensor fusion and their results also provide some conceptual
rules of thumb.
Chair and Varshney [35] present an optimal data fusion structure for distributed
sensor network, which minimizes the cumulative average risk. The structure
weights the individual decision depending on the reliability of the sensor. The
weights are functions of probability of false alarm and the probability of de-
tection. The maximum a posteriori (MAP) test or the Likelihood Ratio (L-R)
test requires either exact knowledge of the a priori probabilities of the tested
hypotheses or the assumption that all the hypotheses are equally likely. This
limitation is overcome in the work of Thomopoulos et al. [88]. Thomopoulos et
al. [88] use the Neyman-Pearson test to derive an optimal decision fusion. Baek
and Bommareddy [89] present optimal decision rules for problems involving n
distributed sensors and m target classes.
Aalo and Viswanathan [90] perform numerical simulations of the correlation
problems to study the effect of error correlation on the performance of a dis-
tributed detection systems. The system performance is shown to deteriorate
when the correlation between the sensor errors is positive and increasing, while
the performance improves considerably when the correlation is negative and
increasing. Drakopoulos and Lee [91] derive an optimum fusion rule for the
Neyman-Pearson criterion, and use simulation to study its performance for a
specic type of correlation matrix. Kam et al. [92] consider the case in which
the class-conditioned sensor-to-sensor correlation coefcient are known, and
expresses the result in compact form. Their approach is a generalization of the
Chapter 4 76
method adopted by Chair and Varshney [35] for solving the data fusion problem
for xed binary local detectors with statistically independent decisions. Kam et
al. [92] use Bahadur-Lazarsfeld expansion of the probability density functions.
Blum and Kassam [93] study the problem of locally most powerful detection
for correlated local decisions. The approach for an optimal data fusion for in-
dividual decisions that are correlated is in terms of the conditional correlation
coefcients of all orders.
4.4 Related work using sensor fusion in intrusion detection
application
Siaterlis and Maglaris [79] present the use of data fusion in the eld of DoS
anomaly detection. The Dempster-Shafer theory of evidence is used as the
mathematical foundation for the development of a novel DoS detection engine.
The detection engine is evaluated using the real network trafc. Tim Bass [94]
presents a framework to improve the performance of intrusion detection sys-
tems based on data fusion. A few rst steps towards developing the engineering
requirements using the art and science of multi-sensor data fusion as an under-
lying model is provided in his work. Giacinto et al. [95], propose an approach
to intrusion detection based on fusion of multiple classiers. With each member
of the classier ensemble trained on a distinct feature representation of patterns,
the individual results are combined using a number of xed and trainable fusion
rules.
Didaci et al. [28] attempt the formulation of the intrusion detection problem
as a pattern recognition task using data fusion approach based on multiple clas-
siers. Their work conrms that the combination reduces the overall error rate,
but may also reduce the generalization capabilities. Wang et al. [96] in their
work brings out the superiority of data fusion technology applied to intrusion
detection systems. The method uses information collection from the network
and host agents and application of Dempster-Shafer theory of evidence. An-
other work incorporating the Dempster-Shafer theory of evidence is by Hu et
al. [97]. The Dempster-Shafer theory of evidence in data fusion is observed to
solve the problem of how to analyze the uncertainty in a quantitative way. In the
Chapter 4 77
evaluation, the ingoing and outgoing trafc ratio and service rate are selected as
the detection metrics, and the prior knowledge in the DDoS domain is proposed
to assign probability to evidence.
Siraj et al. [98] discuss a Decision Engine for an Intelligent Intrusion Detection
System (IIDS) that fuses information from different intrusion detection sensors
using an articial intelligence technique. The Decision Engine uses Fuzzy Cog-
nitive Maps (FCMs) and fuzzy rule-bases for causal knowledge acquisition and
to support the causal knowledge reasoning process. Thomopolous in one of his
work [88], concludes that with the individual sensors being independent, the
optimal decision scheme that maximizes the probability of detection at the fu-
sion for xed false alarm probability consists of a Neyman-Pearson test at the
fusion unit and the likelihood ratio test at the sensors. In the work of Lee et al.
[51] they note that, the best way to make intrusion detection models adaptive is
by combining existing models with new models trained on new intrusion data
or new normal data. In that work, they combined the rule sets that were induc-
tively generated on separate days to produce a more accurate composite rule set.
The other somewhat related works albeit distantly are the alarmclustering method
by Perdisci et al. [99], aggregation of alerts by Valdes et al. [100], combina-
tion of alerts into scenarios by Dain et al. [101], the alert correlation by Cup-
pens et al. [102], the correlation of Intrusion Symptoms with an application of
chronicles by Morin et al. [103], and aggregation and correlation of intrusion-
detection alerts by Debar et al. [104], etc. The correlation of alerts is mainly
by grouping alerts that are part of the same attack trend and hence completely
avoids the duplicate alerts. The aggregation of alerts is based on certain criteria
to aggregate severity level, reveal trends, and clarify attackers intentions. The
work of Valeur et al. [105] presents a general correlation model that includes a
comprehensive set of components and a framework based on this model. These
works address the issue of efciently managing the large number of alerts by
providing an unied description of the alerts from individual IDSs.
Considering the literature on various sensor fusion techniques used for intru-
sion detection applications, it is seen that many machine learning algorithms do
Chapter 4 78
not handle skewed data sets well. To counter the effect of data skewness, either
downsampling the number of normal events or upsampling the number of attack
events is normally done. Sampling the normal set might reduce the information
content and only present a subset of all available normal events, in turn lead-
ing to false positives being reported by the system. This again establishes that
sensor fusion is the only promising approach for performance enhancement of
IDSs. The mathematical basis of sensor fusion is attempted in the remaining
sections of this chapter.
4.5 Theoretical formulation
Sensor Fusion can be dened as the process of collecting information from
multiple and possibly heterogeneous sources and combining them to obtain a
more descriptive, intuitive and meaningful result [79]. The choice of when to
perform the fusion depends on the types of sensor data available and the types
of preprocessing performed by the sensors. The fusion can occur at the various
levels like:
1. raw data level prior to feature extraction,
2. feature vector level prior to identity declaration,
3. decision level after each sensor has made an independent declaration of
identity.
In data level fusion, data from individual sensors are fused directly, with sub-
sequent feature extraction and identity declaration from the fused data. To per-
form data level fusion, the sensors must either be identical or commensurate.
Association is performed on the raw data to ensure that data being fused re-
late to the same object. The identication process proceeds identical to the
process for a single sensor. In a feature level fusion each sensor observes an
object and feature extraction is performed. The result is a separate feature vec-
tor representing the object from each sensor. An association process must then
be used to sort feature vectors into meaningful groups. These feature vectors
are fused and an identity declaration is made based on the joint feature vector.
In decision level fusion each sensor performs a feature extraction to obtain an
independent declaration of identity. Association is then performed to partition
Chapter 4 79
the identity declarations into groups representing observations belonging to the
same observed entity. The associated declarations of identity from each sensor
are subsequently fused.
Sensor fusion is expected to result in both qualitative and quantitative bene-
ts for the intrusion detection application. The primary aim of sensor fusion
is to detect the intrusion and to make reliable inferences, which may not be
possible with a single sensor alone. The particular quantitative improvement in
estimation that results from using multiple IDSs depends on the performance
of the specic IDSs involved. Thus the fused estimate takes advantage of the
relative strengths of each IDS, resulting in an improved estimate of the intrusion
detection. The error analysis techniques also provide a means for determining
the specic quantitative benets of sensor fusion in the case of intrusion de-
tection. The quantitative benets discover the phenomena that are likely rather
than merely chance of occurrences.
Consider a single detector decision with multiple error sources. The overall
error estimate is given as:
e
est
= (
Component Errors
2
)
1
2
(4.1)
The overall error estimate of a single detector as given in equation 4.1 is (rea-
sonably) larger than the largest single error source, and it is often dominated by
the largest one. On the contrary, when multiple detector decisions are made of
the same observation with different detectors, their individual contributions are
weighted by the reciprocals of the squares of their individual error estimates, so
that the overall error estimate is given as:
e
est
= (
Component Errors
2
)
1
2
(4.2)
The overall error estimate of the fused sensor as given in equation 4.2 is (reason-
ably) less than the smallest individual error estimate, and it is often dominated
by the smallest one. The reduction in the overall error estimate claries to an
extent the need for sensor fusion. While none of these are compelling argu-
ments for the benets of using multi-sensor fusion, it does illustrate that there
are both qualitative and quantitative benets to be derived from sensor fusion.
Chapter 4 80
In this section, an attempt is made to study the performance of the theoretically
best fusion approach using mathematical analysis. The motivation was the fact
that before the empirical evaluation is attempted, it is extremely necessary to
prove the acceptability of sensor fusion in performance enhancement of IDSs.
The analytical evaluation can be extremely useful with a complete addressing
of the problem with sound mathematical and logical concepts. Thus the mathe-
matical analysis of decision fusion develops a rational basis which is free from
the various techniques used. This is later augmented with empirical evaluation.
A system of n sensors IDS
1
, IDS
2
, ..., IDS
n
is considered; corresponding to
an observation with parameter x; x 1
m
. Consider the sensor IDS
i
to yield
an output s
i
; s
i
1
m
according to an unknown probability distribution p
i
. The
decision of the individual IDSs that take part in decision is expected to be de-
pendent on the input and hence the output of IDS
i
in response to the input x
j
can be written more specically as s
j
i
. A successful operation of a multiple
sensor system critically depends on the methods that combine the outputs of
the sensors, where the errors introduced by various individual sensors are un-
known and not controllable. With such a fusion system available, the fusion
rule for the system has to be obtained. The problem is to estimate a fusion rule
f : 1
nm
1
m
, independent of the sample or the individual detector that take
part in fusion, such that the expected square error is minimized over a family of
fusion rules.
To perform the theoretical analysis, it is necessary to model the process under
consideration. Consider a simple fusion architecture as given in Fig. 4.1 with
n individual IDSs combined by means of a fusion unit. To start with, consider
a two dimensional problem with the detectors responding in a binary manner.
Each of the local detector collects an observation x
j
1
m
and transforms it to a
local decision s
j
i
0, 1, i = 1, 2, ..., n, where the decision is 0 when the trafc
is detected normal or else 1. Thus s
j
i
is the response of the ith detector to the
network connection belonging to class j = 0, 1, where the classes correspond
to normal trafc and the attack trafc respectively. These local decisions s
j
i
are
fed to the fusion unit to produce an unanimous decision s
j
, which is supposed to
Chapter 4 81
minimize the overall cost of misclassication and improve the overall detection
rate. The fundamental problem of network intrusion detection can be viewed
Figure 4.1: Fusion architecture with decisions from n IDSs
IDS2
IDS1
INPUT
.
.
.
.
.
IDSn
FUSION UNIT
OUTPUT (y)
(x)
S1
S2
Sn
as a detection task to decide whether network connection x is a normal one or
an attack. Assume a set of unknown features e = e
1
, e
2
, ..., e
m
that are used
to characterize the network trafc. The feature extractor is given by e
f
(x) e.
It is assumed that this observed variable has a deterministic component and a
random component and that their relation is additive. The deterministic compo-
nent is due to the fact that the class is discrete in nature, i.e., during detection,
it is known that the connection in either normal or an attack. The imprecise
component is due to some random processes which in turn affects the quality of
extracted features. Indeed, it has a distribution governed by the extracted feature
set often in a nonlinear way. By ignoring the source of distortion in extracted
network features e
e
(x), it is assumed that the noise component is random (while
in fact they may not be if it is possible to systematically incorporate all possible
variations into the base-expert model).
In a statistical framework, the probability that x is identied as normal or as
attack after a detector s
observes the network connection can be written as:

s
i
= s
i
(e
f
(x)) (4.3)
where x is the sniffed network trafc, e
f
is a feature extractor, and
i
is a set
of parameters associated with the sensor indexed i. There exists several types
of intrusion detectors, all of which can be represented by equation 4.3. Sensor
fusion results in the combination of data from sensors competent on partially
overlapping frames. The output of a fusion system is characterized by a variable
s, which is a function of uncertain variables s
1
, ..., s
n
, being the output of the
Chapter 4 82
individual IDSs and given as:
s = f(s
1
, ..., s
n
) (4.4)
where f(.) corresponds to the fusion function. The independent variables (i.e.,
information about any group of variables does not change the belief about oth-
ers) s
1
, ..., s
n
, are imprecise and dependent on the class of observation and hence
given as:
s
j
= f(s
j
1
, ..., s
j
n
) (4.5)
where j refers to the class of the observation.
Variance of the IDSs determines the average quality when each IDS acts indi-
vidually. Lower variance corresponds to better performance. Covariance among
detectors measures the dependence of the detectors. The more the dependence,
the lesser the gain beneted out of fusion. Let us consider two cases here. In the
rst case, for each access, n responses are available and are used independently
of each other. The average of variance of s
j
over all i = 1, 2, ..., n, denoted as
(
j
av
)
2
is given as:
(
j
av
)
2
=
1
n
n
i=1
(
j
i
)
2
(4.6)
In the second case, all n responses are used together and are combined using
the mean operator; the variance of over many accesses, denoted as (
j
fusion
)
2
is
called the variance of average and can be calculated as follows:
(
j
fusion
)
2
=
1
n
2
n
i=1
(
j
i
)
2
+
1
n
2
m
i=1,i<k
n
k=1,k<i
j
i,k
j
i
j
k
=
1
n
n
i=1
(
j
av
)
2
+
1
n
2
m
i=1,i<k
n
k=1,k<i
j
i,k
j
i
j
k
(4.7)
where
j
i,k
is the correlation coefcient between the ith and kth detectors and
for j taking the different class values. The rst term is the average variance of
the base-experts while the second term is the covariance between ith and kth
Chapter 4 83
detectors for i ,= k. This is because the term
j
i,k
j
i
j
k
is by denition equivalent
to correlation. On analysis, it is seen that:
(
j
fusion
)
2
(
j
av
)
2
(4.8)
It can be observed that the resultant variance of the nal score will be reduced
with respect to the average variance of the two original scores when two detec-
tor scores are merged by a simple mean operator. Since 0
j
m,n
1,
1
n
(
j
av
)
2
(
j
fusion
)
2
(4.9)
The two equations 4.8 and 4.9 give the lower and upper bound of (
j
fusion
)
2
,
attained with correlation and uncorrelation respectively. Any positive correla-
tion results in a variance between these bounds. Hence, by combining responses
using the mean operator, the resultant variance is assured to be smaller than the
average (not the minimum) variance. Fusion of the scores reduces variance,
which in turn results in reduction of error (with respect to the case where scores
are used separately). To measure explicitly the factor of reduction in variance,
1
n
(
j
av
)
2
(
j
fusion
)
2
(
j
av
)
2
(4.10)
Factor of reduction in variance, v
r
=
(
j
av
)
2
(
k
fusion
)
2
; 1 v
r
n
This clearly indicates that the reduction in variance is more when more de-
tectors are used, i.e., increasing n, the better will be the combined system, even
if the hypotheses of underlying IDSs are correlated. This comes at a cost of in-
creased computation, proportional to the value of n. The reduction in variance
of the individual classes results in lesser overlap between the class distributions.
Thus the chances of error reduces, which in turn results in improved detection.
This forms the argument in this work for why fusion using multiple detectors
works for intrusion detection application. Experimental results provide strong
evidence to support this claim.
Following common possibilities encountered on combining two detectors are
analyzed:
Chapter 4 84
1. combining two uncorrelated experts with very different performances;
2. combining two highly correlated experts with very different performances;
3. combining two uncorrelated experts with very similar performances;
4. combining two highly correlated experts with very similar performances.
Fusing IDSs of similar and different performances are encountered in almost all
practical fusion problems. Considering the rst case, without loss of generality
it can be assumed that system 1 is better than system 2, i.e.,
1
<
2
and = 0.
Hence, for the combination to be better than the best system, i.e., system 1, it is
required that
(
j
fusion
)
2
< (
j
1
)
2
(
j
1
)
2
+(
j
2
)
2
+2
j
1
j
2
4
< (
j
1
)
2
(
j
2
)
2
< 3(
j
1
)
2
2
j
1
j
2
The covariance is zero in general for cases 1 and 3. Hence, the combined system
will benet from the fusion when the variance of one ((
j
2
)
2
) is at most less than
3 times of the variance of the other (
j
1
)
2
since = 0 . Furthermore, correlation
[or equivalently covariance; one is proportional to the other] between the two
systems penalizes this margin of 3(
j
1
)
2
. This is particularly true for the second
case since > 0. Also, it should be noted that < 0 (which implies negative
correlation) could allow for larger (
j
2
)
2
. As a result, adding another system
that is negatively correlated, but with large variance (hence large error) will im-
prove fusion ((
j
fusion
)
2
<
1
n
(
j
av
)
2
). Unfortunately, with intrusion detection
systems, two systems are either positively correlated or not correlated, unless
these systems are jointly trained together by algorithms such as negative corre-
lation learning [80]. For a given detector i, s
i
for i = 1, ..., n, will tend to agree
with each other (hence positively correlated) most often than to disagree with
each other (hence negatively correlated). By fusing scores obtained from IDSs
that are trained independently, one can almost be certain that 0
m,n
1. For
the third and fourth cases, we have (
j
1
)
2
(
j
2
)
2
. Hence, (
j
2
)
2
< (
j
1
)
2
. Note
that for the third case with 0, the above constraint gets satised. Therefore,
Chapter 4 85
fusion will denitely lead to better performance. On the other hand, for the
fourth case where 1, fusion may not necessarily lead to better performance.
From the above analysis using a mean operator as fusion, the conclusion drawn
are the following:
The analysis explains and shows that fusing two systems of different perfor-
mances is not always benecial. The theoretical analysis shows that if the
weaker IDS has (class-dependent) variance three times larger than the variance
of the best IDS, the gain due to fusion breaks down. This is even more true
for correlated base-experts as correlation penalizes this limit further. It is also
seen that fusing two uncorrelated IDSs of similar performance always result in
improved performance. Finally, fusing two correlated IDSs of similar perfor-
mance will be benecial only when the covariance of the two IDSs are less than
the variance of the IDSs. It is necessary to show that a lower bound of accuracy
results in the case of sensor fusion. This can be proved as below:
Given the fused output as s =
i
w
i
s
i
, the quadratic error of a sensor indexed
i, e
i
, and also the fused sensor, e
fusion
are given by:
e
i
= (s
i
c)
2
(4.11)
and
e
fusion
= (s
fusion
c)
2
(4.12)
respectively, where w
i
is the weighting on the ith detector, and c is the target.
The ambiguity of the sensor is dened as:
a
i
= (s
i
s)
2
(4.13)
The squared error of the fused sensor is seen to equal the weighted average
squared error of the individuals, minus a term which measures average correla-
tion. This allows for non-uniform weights (with the constraint

i
w
i
= 1) so
the general form of the ensemble output is s =
i
w
i
s
i
. The ambiguity of the
fused sensor is given as:
Chapter 4 86
a
fusion
=
i
w
i
a
i
=
i
w
i
(s
i
s)
2
=
i
w
i
(s
i
c +c s)
2
=
i
w
i
((s
i
c) (s c))
2
=
i
w
i
((s
i
c)
2
2(s
i
c)(s c) + (s c)
2
=
i
w
i
e
i
i
w
i
2(s
i
c)(s c)
2
(4.14)
On solving equation 4.14, the error due to the combination of several detectors
is obtained as the difference between the weighted average error of individual
detectors and the ambiguity among the fusion member decisions.
e
fusion
=
i
w
i
(s
i
c)
2
i
w
i
(s
i
s)
2
(4.15)
The ambiguity among the fusion member decisions is always positive and hence
the combination of several detectors is expected to be better than the average
over several detectors. This result turns out to be very important for the focus
of this work.
4.6 Solution approaches
In the case of fusion problem, the solution approaches depend on whether there
is any knowledge regarding the trafc and the intrusion detectors. This section
initially considers no knowledge of the IDSs and the intrusion detection data
and later with a knowledge of available IDSs and evaluation data set.
There is an arsenal of different theories of uncertainty and methods based on
these theories for making decisions under uncertainty. There is no consensus as
to which method is most suitable for problems with epistemic uncertainty, when
information is scarce and imprecise. The choice of heterogeneous detectors is
Chapter 4 87
expected to result in decisions that conict or be in consensus, completely or
partially. The detectors can be categorized by their output s
i
, i.e., probabil-
ity (within the range [0, 1]), Basic Probability Assignment(BPA) m (within the
range [0, 1]), membership function (within the range [0, 1]), distance metric
(more than or equal to zero), or log-likelihood ratio (a real number).
Consider a body of evidence (F; m), where F represents the set of all focal
elements and m their corresponding basic probability assignments. The above
analysis without any knowledge about the system or the data, attempting to
prove the acceptance of sensor fusion in improving the intrusion detection per-
formance is unlimited in scope. With such an analysis favoring the use of sensor
fusion in enhancing the performance of IDSs, the Dempster-Shafer fusion op-
erator is used since it is more acceptable for intrusion detection applications.
Dempster-Shafer theory considers two types of uncertainty; 1) due to the impre-
cision and 2) due to the conict in the evidence. Non specicity and strife mea-
sure the uncertainty due to imprecision and conict, respectively. The larger
the focal elements of a body of evidence, the more imprecise is the evidence
and, consequently, the higher is non specicity. When the evidence is precise
(all the focal elements consist of a single member), non specicity is zero. In
the challenge problems, the broader the interval of the experts, the higher is
non specicity. Strife measures the degree to which pieces of evidence contra-
dict each other. Consonant (nested) focal elements imply little or no conict.
Disjoint elements imply high conict in the evidence. For example, if the ex-
perts intervals are disjoint, the experts contradict each other. Therefore, strife
is large. For nite sets, when evidence is precise, strife reduces to Shannons
entropy, which measures conict in probability theory. Non specicity mea-
sures the epistemic/reducible uncertainty, the uncertainty associated with the
sizes (cardinalities) of relevant sets of alternatives.
It is required to model the uncertainty in the independent variables, and de-
rive a model about the uncertainty in the performance variable s and assess the
performance enhancement of the fusion system. This has been attempted in the
next section. The importance of Dempster-Shafer theory in intrusion detection
Chapter 4 88
is that in order to track statistics, it is necessary to model the distribution of
decisions. If these decisions are probabilistic assignments over the set of la-
bels, then the distribution function will be too complicated to retain precisely.
The Dempster-Shafer theory of evidence solves this problem by simplifying the
opinions to Boolean decisions, so that each detector decision lies in a space
having 2
elements, where denes the working space or the Frame of Dis-

cernment (FoD). In this way, the full set of statistics can be specied using 2
values.
4.6.1 Dempster-Shafer combination method
Dempster-Shafer (DS) theory is required to model the situation in which a clas-
sication algorithm cannot classify a target or cannot exhaustively list all of the
classes to which it could belong. This is most acceptable in the case of unknown
attacks or novel attacks or the case of zero a priori knowledge of data distribu-
tion. DS theory does not attempt to formalize the emergence of novelties, but it
is a suitable framework for reconstructing the formation of beliefs when novel-
ties appear. An application of decision making in the eld of intrusion detection
illustrates the potentialities of DS theory, as well as its shortcomings.
The DS rule corresponds to conjunction operator since it builds the belief in-
duced by accepting two pieces of evidence, i.e., by accepting their conjunction.
Shafer developed the DS theory of evidence based on the model that all the hy-
potheses in the FoD are exclusive and the frame is exhaustive. The purpose is
to combine/aggregate several independent and equi-reliable sources of evidence
expressing their belief on the set. The Dempster-Shafer theory is a mathemati-
cal theory of evidence based on belief functions and plausible reasoning, which
is used to combine separate pieces of information (evidence) to calculate the
probability of an event. Fusion should result in a thought on the decisions and
not merely a response that aggregates the decisions. The aim of using the DS
theory of fusion is that with any set of decisions from heterogeneous detectors,
sensor fusion can be modeled as utility maximization.
DS theory of combination conceives novel categories that classify empirical
Chapter 4 89
evidence in a novel way and, possibly, are better able to discriminate the rele-
vant aspects of emergent phenomena. Novel categories detect novel empirical
evidence, that may be fragmentary, irrelevant, contradictory or supportive of
particular hypotheses. The DS theory approach for quantifying the uncertainty
in the performance of a detector and assessing the improvement in system per-
formance, consists of three steps:
1. Model uncertainty by considering each variable separately. Then a model
that considers all variables together is derived.
2. Propagate uncertainty through the system, which results in a model of un-
certainty in the performance of the system.
3. Assess the system performance enhancement.
In the case of Dempster-Shafer theory, the FoD is expected to contain all propo-
sitions of which the information sources (IDSs) can provide evidence. When a
proposition corresponds to a subset of a frame of discernment, it is said that the
frame discerns that proposition. It is expected that the elements of the frame of
discernment, are assumed to be exclusive propositions. This is a constraint,
which gets always satised in intrusion detection application because of the dis-
crete nature of the detector decision. The belief of likelihood of the trafc to
be in an anomalous state is detected by various IDSs by means of a mass to the
subsets of the FoD.
The DS theory is a generalization of the classical probability theory with its
additivity axiom excluded or modied. The probability mass function (p) is a
mapping which indicates how the probability mass is assigned to the elements.
The Basic Probability Assignment (BPA) function (m) on the other hand is the
set mapping, and the two can be related A as m(A) =
BA
p(B) and
hence obviously m(A) relates to a belief structure. The mass m is very near to
the probabilistic mass p, except that it is shared not only by the single hypoth-
esis but also to the union of the hypotheses. In DS theory, rather than knowing
exactly how the probability is distributed to each element B , we just know
by the BPA function m that a certain quantity of a probability mass is some-
how divided among the focal elements. Because of this less specic knowledge
about the allocation of the probability mass, it is difcult to assign exactly the
Chapter 4 90
probability associated with the subsets of the FoD, but instead we assign two
measures: the (1) belief (Bel)and (2) plausibility (Pl), which correspond to the
lower and upper bounds on the probability,
i.e., Bel(A) p(A) Pl(A)
where the belief function, Bel(A), measures the minimum uncertainty value
about proposition A, and the Plausibility, Pl(A), reects the maximum uncer-
tainty value about proposition A.
The following are the key assumptions made with the fusion of intrusion detec-
tors:
If some of the detectors are imprecise, the uncertainty can be quantied
about an event by the maximum and minimum probabilities of that event.
Maximum (minimum) probability of an event is the maximum (minimum)
of all probabilities that are consistent with the available evidence.
The process of asking an IDS about an uncertain variable is a random ex-
periment whose outcome can be precise or imprecise. There is randomness
because every time a different IDS observes the variable, a different deci-
sion can be expected. The IDS can be precise and provide a single value or
imprecise and provide an interval. Therefore, if the information about un-
certainty consists of intervals from multiple IDSs, then there is uncertainty
due to both imprecision and randomness.
If all IDSs are precise, they give pieces of evidence pointing precisely to spe-
cic values. In this case, a probability distribution of the variable can be build.
However, if the IDSs provide intervals, such a probability distribution cannot be
build because it is not known as to what specic values of the random variables
each piece of evidence supports.
In the case of DS theory, the additivity axiom of probability theory p(A) +
p(

A) = 1 is modied as m(A) + m(

A) + m() = 1, with uncertainty intro-
duced by the term m(). m(A) is the mass assigned to A, m(

A) is the mass
assigned to all other propositions that are not Ain FoD and m() is the mass as-
signed to the union of all hypotheses when the detector is ignorant. This clearly
Chapter 4 91
explains the advantages of evidence theory in handling an uncertainty where the
detectors joint probability distribution is not required.
The equation Bel(A) + Bel(

A) = 1, which is equivalent to Bel(A) = Pl(A),
holds for all subsets A of the FoD if and only if Bel
/
s focal points are all single-
tons. In this case, Bel is an additive probability distribution. Whether normal-
ized or not, the DS method satises the two axioms of combination; 0 m(A)
1 and

m(A) = 1
A
. The third axiom

m() = 0 is not satised by
the unnormalized DS method. Also, independence of evidence is yet another
requirement for the DS combination method. The problem is formalized as fol-
lows: Considering the network trafc, assume a trafc space , which is the
union of the different classes, namely, the attack and the normal. The attack
class have different types of attacks and the classes are assumed to be mutually
exclusive. Each IDS assigns to the trafc, the detection of any of the trafc
sample x, that denotes the trafc sample to come from a class which is an
element of the FoD, . With n IDSs used for the combination, the decision of
each one of the IDSs is considered for the nal decision of the fusion IDS.
This chapter presents a method to detect the unknown trafc attacks with an
increased degree of condence by making use of a fusion system composed of
detectors. Each detector observes the same trafc on the network and detects
the attack trafc with an uncertainty index. The frame of discernment consists
of singletons that are exclusive (A
i
A
j
= , i ,= j) and are exhaustive since
the FoD consists of all the expected attacks which the individual IDS detects
or else the detector fails to detect by recognizing it as a normal trafc. All the
constituent IDSs that take part in fusion are assumed to have a global point of
view about the system rather than separate detectors being introduced to give
specialized opinion about a single hypothesis.
The DS combination rule gives the combined mass of the two evidence m
1
Chapter 4 92
and m
2
on any subset A of the FoD as m(A) given by:
m(A) =
m
1
(X)m
2
(Y )
X Y = A
1

m
1
(X)m
2
(Y )
X Y =
(4.16)
The numerator of Dempster-Shafer combination equation 2.16 represents the
inuence of aspects of the second evidence that conrm the rst one. The de-
nominator represents the inuence of aspects of the second evidence that con-
tradict the rst one. The denominator of equation 2.16 is 1 k, where k is
the conict between the two evidence. This denominator is for normalization,
which spreads the resultant uncertainty of any evidence with a weight factor,
over all focal elements and results in an intuitive decision. i.e., the effect of
normalization consists of eliminating the conicting pieces of information be-
tween the two sources to combine, consistently with the intersection operator.
Dempster-Shafer rule does not apply if the two evidence are completely con-
tradictory. It only makes sense if k < 1 . If the two evidence are completely
contradictory, they can be handled as one single evidence over alternative pos-
sibilities whose BPA must be re-scaled in order to comply with equation 2.16.
Dempster-Shafer rule says that compatible evidence on a possibility must be
evaluated as a fraction of total compatible evidence. The meaning of Dempster-
Shafer rule 2.16 can be illustrated in the simple case of two evidence on an
observation A. Suppose that one evidence is m
1
(A) = p, m
1
() = 1 p and
that another evidence is m
2
(A) = q, m() = 1 q. The total evidence in favor
of A = The denominator of equation 2.16 is given as 1 (1 p)(1 q).
The fraction supported by both the bodies of evidence =
pq
(1p)(1q)
Chapter 4 93
Specically, if a particular detector indexed i taking part in fusion has prob-
ability of detection m
i
(A) for a particular winning class A. It is expected that
fusion results in the probability of that class as m(A), which is expected to
be more that m
i
(A) i and A. Thus the condence in detecting a particu-
lar winning class is improved, which is the key aim of sensor fusion. Thus,
Dempster-Shafer theory for sensor fusion aids in attaining an increased value of
condence in detection by means of increased probability of detection of indi-
vidual classes. Note that the Dempster-Shafer rule is independent of the order
in which evidence are combined.
The above analysis is simple since it considers only one class at a time. The
variance of the two classes can be merged and the resultant variance is the sum
of the normalized variances of the individual classes. Hence, the class label can
be dropped.
4.6.2 Analysis of detection error assuming trafc distribution
The previous sections analyzed the system without any knowledge of the under-
lying trafc or detectors. In this section, the Gaussian distribution is assumed
for both the normal and the attack trafc due to its acceptability in practice.
Often, the data available in databases is only an approximation of the true data.
When the information about the goodness of the approximation is recorded,
the results obtained from the database can be interpreted more reliably. Any
database is associated with a degree of accuracy, which is denoted with a prob-
ability density function, whose mean is the value itself. Formally, each database
value is indeed a random variable; the mean of this variable becomes the stored
value, and is interpreted as an approximation of the true value; the standard de-
viation of this variable is a measure of the level of accuracy of the stored value.
Assuming the attack connection and normal connection scores to have the mean
values y
j=1
i
=
1
and y
j=0
i
=
0
respectively,
1
>
0
without loss of general-
ity. Let
1
and
0
be the standard deviation of the attack connection and normal
connection scores. The two types of errors committed by IDSs are often mea-
sured by False Positive Rate (FP
rate
) and False Negative Rate (FN
rate
). FP
rate
is calculated by integrating the attack score distribution from a given threshold
Chapter 4 94
T in the score space to , while FN
rate
is calculated by integrating the normal
distribution from to the given threshold T.
FP
rate
=
_

T
(p
k=0
)dy (4.17)
FN
rate
=
_
T
(p
k=1
)dy (4.18)
The threshold T is an unique point where the error is minimized, i.e., the differ-
ence between FP
rate
and FN
rate
is minimized by the following criterion:
T = argmin(FP
rate
FN
rate
) (4.19)
At this threshold the resultant error due to FP
rate
and FN
rate
is a minimum.
This is because the FN
rate
is an increasing function (a cumulative density func-
tion, cdf) and FP
rate
is a decreasing function (1 cdf). T is the point where
these two functions intersect. Decreasing the error introduced by the FP
rate
and the FN
rate
implies an improvement in the performance of the system. The
fusion algorithm accepts decisions from many IDSs, where a minority of the
decisions are false positives or false negatives. A good sensor fusion system is
expected to give a result that accurately represents the decision from the cor-
rectly performing individual sensors, while minimizing the decisions from er-
roneous IDSs. Approximate agreement emphasizes precision, even when this
conicts with system accuracy. However, sensor fusion is concerned solely with
the accuracy of the readings, which is appropriate for sensor applications. This
is true despite the fact that increased precision within known accuracy bounds
would be benecial in most of the cases. Hence the following strategy is being
adopted:
. The false alarm rate FP
rate
can be xed at an acceptable value
0
and then
the detection rate can be maximized. Based on the above criteria a lower
bound on accuracy can be derived.
. The detection rate is always higher than the false alarm rate for every IDS,
an assumption that is trivially satised by any reasonably functional sensor.
. Determine whether the accuracy of the IDS after fusion is indeed better
Chapter 4 95
than the accuracy of the individual IDSs in order to support the perfor-
mance enhancement of fusion IDS.
. To discover the weights on the individual IDSs that gives the best fusion.
Given the desired false alarm rate which is acceptable, FP
rate
=
0
, the thresh-
old (T) that maximizes the TP
rate
and thus minimizes the FN
rate
;
TP
rate
= Pr[alert[attack] = Pr[
n
i=1
w
i
s
i
T [attack ] (4.20)
FP
rate
= Pr[alert[normal] = Pr[
n
i=1
w
i
s
i
T[normal] =
0
(4.21)
The fusion of IDSs becomes meaningful only when FP FP
i
i and
TP TP
i
i. In order to satisfy these conditions, an adaptive or dynamic
weighting of IDSs is the only possible alternative. Model of the fusion output
is given as:
s =
n
i=1
w
i
s
i
and
TP
i
= Pr[s
i
= 1[attack], FP
i
= Pr[s
i
= 1[normal]
where TP
i
and FP
i
are the detection rate and the false positive rate of an indi-
vidual IDS indexed i. It is required to provide low value of weight to any indi-
vidual sensor that is unreliable, hence meeting the constraint on false alarm as
given in equation 4.21. Similarly the fusion improves the TP
rate
as the detectors
get appropriately weighted according to their performance. One justication for
this evaluation metric is that when searching large databases like the network
trafc, it is more reasonable to have the results to be most relevant (precision),
without caring whether all the relevant examples are seen (recall) or not. We
chose the number
0
depending on the proportion of attacks in the normal traf-
c (base-rate). This threshold is of course adjustable and one may vary the scale
of the measured performance numbers by adjusting it. It also happens that these
Chapter 4 96
features precision-at-
0
scores are quite distinct from one another, facilitating
meaningful comparison.
Fusion of the decisions from various IDSs is expected to produce a single deci-
sion that is more informative and accurate than any of the decisions from the in-
dividual IDSs. Then the question arises as to whether it is optimal. Towards that
end, a lower bound on variance for the fusion problem of independent sensors,
or an upper bound on the false positive rate or a lower bound on the detection
rate for the fusion problem of dependent sensors is presented in this work.
Fusion of Independent Sensors
The decisions from various IDSs are assumed to be statistically independent
for the sake of simplicity so that the combination of IDSs will not diffuse the
detection. In sensor fusion, improvements in performances are related to the
degree of error diversity among the individual IDSs.
Variance and Mean Square Error of the estimate of fused output The successful oper-
ation of a multiple sensor systemcritically depends on the methods that combine
the outputs of the sensors. A suitable rule can be inferred using the training ex-
amples, where the errors introduced by various individual sensors are unknown
and not controllable. The choice of the sensors has been made and the system is
available, and the fusion rule for the system has to be obtained. A system of n
sensors IDS
1
, IDS
2
, ..., IDS
n
is considered; corresponding to an observation
with parameter x, x 1
m
, sensor IDS
i
yields output s
i
, s
i
1
m
according to
an unknown probability distribution p
i
. A training lsample (x
1
, y
1
), (x
2
, y
2
),
..., (x
l
, y
l
) is given where s
i
= (s
i
1
, s
i
2
, ..., s
i
n
) and s
j
i
is the output of IDS
i
in re-
sponse to the input x
j
. The problem is to estimate a fusion rule f : 1
nm
1
m
,
based on the sample, such that the expected square error is minimized over a
family of fusion rules based on the given lsample.
Consider n independent IDSs with the decisions of each being a random vari-
able with Gaussian distribution of zero mean vector and covariance matrix diag-
onal (
2
1
,
2
2
, . . . ,
2
n
). Assume s to be the expected fusion output, which is the
unknown deterministic scalar quantity to be estimated and s to be the estimate
Chapter 4 97
of the fusion output. In most cases the estimate is a deterministic function of
the data. Then the mean square error (MSE) associated with the estimate s for
a particular test data set is given as E[(s s)
2
]. For a given value of s, there are
two basic kinds of errors:
. Random error, which is also called precision or estimation variance,
. Systematic error, which is also called accuracy or estimation bias.
Both kinds of errors can be quantied by the conditional distribution of the es-
timates pr( s s). The MSE of a detector is the expected value of the error
and is due to the randomness or due to the estimator not taking into account the
information that could produce a more accurate result.
MSE = E[(s s)
2
] = V ar( s) + (Bias( s, s))
2
The MSE is the absolute error used to assess the quality of the sensor in terms
of its variation and unbiasedness. For an unbiased sensor, the MSE is the vari-
ance of the estimator, or the root mean squared error (RMSE) is the standard
deviation. The standard deviation measures the accuracy of a set of probability
assessments. The lower the value of RMSE, the better it is as an estimator in
terms of both the precision as well as the accuracy. Thus, reduced variance can
be considered as an index of improved accuracy and precision of any detector.
Hence, this section proves the reduction in variance of the fusion IDS to show
its improved performance. The Cramer-Rao inequality can be used for deriving
the lower bound on the variance of a sensor.
Cramer-Rao Bound (CRB) for fused output The Cramer-Rao lower bound is used
to get the best achievable estimation performance. Any sensor fusion approach
which achieves this performance is optimum in this regard. CR inequality states
that the reciprocal of the Fisher information is an asymptotic lower bound on
the variance of any unbiased estimator s. Fisher information is a method for
summarizing the inuence of the parameters of a generative model on a collec-
tion of samples from that model. In this case, the parameters considered are the
means of the Gaussians. Fisher information is the variance, (
2
) of the score
(partial derivative of the logarithm of the likelihood function of the network
trafc with respect to
2
).
Chapter 4 98
score =

2
ln(L(
2
; s))
Basically, the score tells us how sensitive the log-likelihood is to changes in
parameters. This is a function of variance,
2
and the detection s and this score
is a sufcient statistic for variance. The expected value of this score is zero, and
hence the Fisher information is given as:
E[

2
ln(L(
2
; s))
2
[
2
Fisher information is thus the expectation of the squared score. A random vari-
able carrying high Fisher information implies that the absolute value of the score
is often high. Cramer-Rao inequality expresses a lower bound on the variance
of an unbiased statistical estimator, based on the Fisher information.
1
Fisher information
=
1
E
_
[

2
ln(L(
2
; X))]
2
[
2
_
If the prior probability of detection of the various IDSs are known, the weights
w
i[i=1,n
can be assigned to the individual IDSs. The idea is to estimate the
local accuracy of the IDSs. The decision of the IDS with the highest local ac-
curacy estimate will be having the highest weighting on aggregation. The best
fusion algorithm is supposed to choose the correct class if any of the individ-
ual IDS did so. This is a theoretical upper bound for all fusion algorithms. Of
course, the best individual IDS is a lower bound for any meaningful fusion al-
gorithm. Depending on the data, the fusion may sometimes be no better than
Bayes. In such cases, the upper and lower performance bounds are identical and
there is no point in using a fusion algorithm. A further insight into CRB can be
gained by understanding how each IDS affects it. With the architecture shown
in Fig. 4.1, the model is given by s =
n
i=1
w
i
s
i
. The bound is calculated from
the effective variance of each one of the IDSs as

2
i
=

2
i
w
2
i
and then combining
them to have the CRB as
1
n
i=1
1

2
i
.
Chapter 4 99
The weight assigned to an IDS is inversely proportional to its variance. This
is due to the fact that, if the variance is small, the IDS is expected to be more
dependable. The bound on the smallest variance of an estimation s is given as:
2
= E[( s s)
2
]
1
n
i=1
w
2
i
2
i
(4.23)
It can be observed from equation 4.23 that any IDS decision that is not reliable
will have a very limited impact on the bound. This is because the non-reliable
IDS will have a much larger variance than other IDSs in the group;

2
n

2
1
,-
- - ,

2
n1
and hence
1
2
n
2
1
, - - ,
1
2
n1
. The bound can then be approximated
as
1
n1
i=1
1
2
i
.
Also, it can be observed from equation 4.23 that the bound shows asymp-
totically optimum behavior of minimum variance. Then,

2
i
> 0 and

2
min
=
min[

2
i
, ,

2
n
], then
CRB =
1
n
i=1
1
2
i
<

2
min

2
i
(4.24)
From equation 4.24, it can be shown that perfect performance is apparently
possible with enough IDSs. The bound tends to zero as more and more individ-
ual IDSs are added to the fusion unit.
CRB
n
= Lt
n
1
1
2
1
++
1
2
n
(4.25)
For simplicity assume homogeneous IDSs with variance

2
;
CRB
n
= Lt
n
1
n
2
= Lt
n
2
n
= 0 (4.26)
From equation 4.25 and equation 4.26, it can be easily interpreted that increas-
ing the number of IDSs to a sufciently large number will lead to the perfor-
mance bounds towards perfect estimates. Also, due to monotone decreasing
nature of the bound, the IDSs can be chosen to make the performance as close
Chapter 4 100
to perfect.
Fusion of Dependent Sensors
In most of the sensor fusion problems, individual sensor errors are assumed to
be uncorrelated so that the sensor decisions are independent. While indepen-
dence of sensors is a good assumption, it is often unrealistic in the normal case.
Considering the general case of statistically dependent decisions, the Bahadur-
Lazarsfeld expansion of probability density functions can be used for analysis.
Bahadur-Lazarsfeld polynomials Consider s = [s
1
, ..., s
n
] to be a vector of the
correlated decisions from individual sensors and P(s) be the probability den-
sity function of s. With the prior probabilities of normal trafc and attack
trafc being P
0
and P
1
respectively, the conditional pdf P(s[attack) is in-
troduced as normalized random variable r
1
i
=
s
i
p
i
p
i
q
i
, where p
i
= P(s
i
=
1[attack) and q
i
= 1 p
i
, i = 1, 2, ..., n, and pdf P(s[normal) as nor-
malized random variables r
0
i
=
s
i
p
i
p
i
q
i
, where p
i
= P(s
i
= 1[normal) and
q
i
= 1 p
i
, i = 1, 2, ..., n. The normalized random variables r
1
i
and r
0
i
have
zero mean and unit variance. The Bahadur-Lazarsfeld polynomials are dened
as:
i
(s) = [1, r
1
, r
2
, ..., r
n
, r
1
r
2
, r
1
r
3
, , ..., r
1
r
2
..r
n
] for the respective values
of i = [0, 1, 2, ..., n, n + 1, n + 2, ..., 2
n
1] and recalling that the Bahadur-
Lazarsfeld polynomial is a product of normalized variables r
i
, and simplifying,
the correlation coefcients of r
i
n
i=1
by order
ij
=
s
r
i
r
j
P(s) (second-order
correlation coefcient) and
ij...n
=
s
r
i
r
j
...r
n
P(s) (n
th
-order correlation co-
efcient).
Using the decisions of the local sensors as its input, the fusion unit performs
a likelihood ratio test in order to make a global decision. The optimal fusion
rule of the fusion unit is given by the likelihood ratio as below:
L(S) =
P(S[attack)
P(S[normal)
=
n
i=1
(
(1 FN)
FP
)
(s
i
)
(
FN
(1 FP)
)
(1si)
Chapter 4 101
1+
i<j
1
ij
r
1
i
r
1
j
+
i<j<k
1
ij
r
1
i
r
1
j
r
1
k
+... +
1
12...n
r
1
1
r
1
2
...r
1
n
1+
i<j
0
ij
r
0
i
r
0
j
+
i<j<k
0
ij
r
0
i
r
0
j
r
0
k
+... +
0
12...n
r
0
1
r
0
2
...r
0
n
The log-likelihood ratio for the problem of deciding between the hypotheses,
attack or normal is given by:
logL(s) =
n
i=1
s
i
[log
(1 FN)(1 FP)
FN FP
] +
n
i=1
log(
FN
(1 FP)
)+
log
1 +
i<j
1
ij
r
1
i
r
1
j
+
i<j<k
1
ij
r
1
i
r
1
j
r
1
k
+.. +
1
12..n
r
1
1
r
1
2
..r
1
n
1 +
i<j
0
ij
r
0
i
r
0
j
+
i<j<k
0
ij
r
0
i
r
0
j
r
0
k
+.. +
0
12..n
r
0
1
r
0
2
..r
0
n
where the global detections are given in terms of r
1
, ..., r
n
. The log-likelihood
ratio gives the data fusion rule for a distributed detection system with correlated
local decisions. If the conditional correlation coefcients above a certain order
can be neglected, as in many practical applications, the computational burden
can be reduced. If most correlation coefcients of the local decisions are zero,
the computation gets simplied to the optimal data fusion rule developed by
Chair and Varshney [35] for independent local decisions.
Setting bounds on false positives and true positives As an illustration, let us consider
a system with three individual IDSs, with a joint density at the IDSs having a
covariance matrix of the form:
_
=
_
_
1
12

13
21
1
23
31

32
1
_
_
With fusion doing an aggregation of the individual decisions, the false alarm
Chapter 4 102
rate () at the fusion center can be written as:
max
= 1Pr(s
1
= 0, s
2
= 0, s
3
= 0[normal) = 1
_
t
_
t
_
t
P
s
(s[normal)ds
(4.29)
where P
s
(s[normal) is the density of the sensor observations under the hy-
pothesis normal and is a function of the correlation coefcient, . Assuming a
single threshold, T, for all the sensors, and with the same correlation coefcient,
between different sensors, a function F
n
(T[) = Pr(s
1
= 0, s
2
= 0, s
3
= 0)
can be dened.
F
n
(T[) =
_

F
n
(
T
1
)g(y)dy (4.30)
where g(.) and F(.) are the standard normal density and cumulative distribution
function respectively.
F
n
(X) = [F(X)]
n
(4.31)
Equation 4.29 can be written depending on whether >
1
n1
or not, as:
max
= 1
_

F
3
(
T
1
)f(y)dy 0 < 1 (4.32)
and
max
= 1 F
3
(T [) 0.5 < 1 (4.33)
With this threshold T, the probability of detection at the fusion unit can be
computed as:
TP
min
= 1
_

F
3
(
T S
1
)f(y)dy 0 < 1 (4.34)
and
TP
min
= 1 F
3
(T S [) 0.5 < 1 (4.35)
The above equations 4.32, 4.33, 4.34, 4.35, show the performance improve-
ment of sensor fusion where the upper bound on false positive rate and lower
bound on detection rate are xed. The system performance was shown to deteri-
orate when the correlation between the sensor errors is positive and increasing,
Chapter 4 103
while the performance improves considerably when the correlation is negative
and increasing.
4.7 Summary
One of the common reasons for the avoidance of the IDSs as the second and last
stage of defense is due to its less than satisfactory performance. Consequently,
improving the IDS performance is a signicant research challenge. In this chap-
ter, we prove that it is possible to improve the performance with multiple IDSs
using advances in sensor fusion. The chapter includes mathematical basis for
sensor fusion in IDS with the theoretical formulation and analysis on the ac-
ceptability of sensor fusion in intrusion detection. The sensor fusion system is
characterized and modeled with no knowledge of the IDSs and the intrusion de-
tection data. The need of sensor fusion in IDS is envisaged. The evidence theory
is the method most suited for the fusion of IDSs as seen in this chapter. Having
chosen the sensor fusion method, we address the issues related to sensor fusion
like choosing the threshold bounds, rule-based fusion, Data-dependent Decision
fusion and the modied evidence theory in chapters 5, 6, and 7 respectively.
The study undertaken in this chapter contributes to fusion eld in several as-
pects. It is expected that positive correlation improves reliability of fusion while
negative correlation, improves fusion by means of improved coverage. In this
theoretical study, independent as well as dependent detectors were considered
and the study claries the intuition that independence of detectors is crucial in
determining the success of fusion operation. In the case when they are depen-
dent, fusion will lead to improved results but the gain will be smaller. This
is explained by variance reduction due to the combination. The later half of
the chapter takes into account the analysis of the sensor fusion system with a
knowledge of the network trafc distribution. This analysis also resulted in the
acceptance of sensor fusion for enhancing the performance of the intrusion de-
tection. These results are further supported by empirical evidence in the later
chapters.
Chapter 5
Selection of Threshold Bounds for
Effective Sensor Fusion
I have not failed. I have just found 10,000 ways that wont work.
Thomas Alva Edison
5.1 Introduction
In this chapter, we prove the distinct advantages of sensor fusion over individ-
ual IDSs. Fusion threshold bounds are derived using the principle of Cheby-
shev inequality at the fusion center using the false positive rates and detec-
tion rates of the IDSs. The goal is to achieve best fusion performance with
the least amount of model knowledge, in a computationally inexpensive way.
The anomaly-based IDSs detect anomalies beyond a set threshold level in the
features it detects. Threshold bounds instead of a single threshold give more
freedom in steering system properties. Any threshold within the bounds can
be chosen depending on the preferred level of trade-off between detection and
false alarms.
All the related work in the eld of sensor fusion has been carried out mainly
with one of the methods like probability theory, evidence theory, voting fusion
theory, fuzzy logic theory or neural network in order to aggregate information.
The Bayesian theory is the classical method for statistical inference problems.
The fusion rule is expressed for a system of independent learners, with the dis-
tribution of hypotheses known a priori. The Dempster-Shafer decision theory
104
Chapter 5 105
is considered a generalized Bayesian theory. It does not require a priori knowl-
edge or probability distribution on the possible system states like the Bayesian
approach and it is mostly useful when modeling of the system is difcult or
impossible [106]. An attempt to prove the distinct advantages of sensor fusion
over individual IDSs is done in the next section using the Chebyshev inequality,
as an extension to the work done by Zhu et al. [107].
5.2 Modeling the fusion IDS by dening proper threshold
bounds
Every IDS participating in the fusion has its own detection rate D
i
, and false
positive rate F
i
, due to the preferred heterogeneity of the sensors in the fusion
process. Each IDS indexed i gave an alert or no-alert indicated by s
i
taking a
value of one or zero respectively. The fusion center collected these local deci-
sions and formed a binomial distribution s as given by s=
n
i=1
s
i
, where n is
the total number of IDSs taking part in the fusion.
Let D and F denote the unanimous detection rate and the false positive rate
respectively. The mean and variance of s in case of attack and no-attack, are
given by the following equations:
E[s[alert] =
n
i=1
D
i
, V ar[s[alert] =
n
i=1
D
i
(1 D
i
); in case of attack
E[s[alert] =
n
i=1
F
i
, V ar[s[alert] =
n
i=1
F
i
(1 F
i
); in case of no-attack
The fusion IDS is required to give a high detection rate and a low false pos-
itive rate. Hence the threshold T has to be chosen well above the mean of the
false alerts and well below the mean of the true alerts. The Figure 5.1 shows a
typical case where the threshold T is chosen at the point of overlap of the two
parametric curves for normal and attack trafcs. Consequently, the threshold
bounds are given as:
Chapter 5 106
Figure 5.1: Parametric curve showing the choice of threshold T
n
i=1
F
i
< T <
n
i=1
D
i
The detection rate and the false positive rate of the fusion IDS is desired to
surpass the corresponding weighted averages and hence:
D >
n
i=1
D
2
i
n
i=1
D
i
(5.1)
and
F <
n
i=1
(1 F
i
)F
i
n
i=1
(1 F
i
)
(5.2)
Now, using simple range comparison,
D = Prs T[attack =Pr[s
n
i=1
D
i
(
n
i=1
D
i
T)[attack.
Using Chebyshev inequality on the random variable s, with Mean = E[s] =
n
i=1
D
i
and V ariance = V ar[s]=
n
i=1
D
i
(1 D
i
),
Chapter 5 107
Pr [s E(s)[ k
V ar(s)
k
2
With the assumption that the threshold T is greater than the mean of normal
activity,
Pr[s
n
i=1
D
i
(
n
i=1
D
i
T)[attack 1
n
i=1
D
i
(1 D
i
)
(
n
i=1
D
i
T)
2
From equation 5.1 it follows that 1
n
i=1
D
i
(1 D
i
)
(
n
i=1
D
i
T)
2
i=1
D
2
i
n
i=1
D
i
The upper bound of T is derived from the above equation as:
T
n
i=1
D
i
_
n
i=1
D
i
Similarly, for the false positive rate, F = PrS T [ no-attack, in order
to derive the lower bound of T,
From equation 5.2 it follows that
n
i=1
F
i
(1 F
i
)
(T
n
i=1
F
i
)
2
i=1
F
i
(1 F
i
)
n
i=1
(1 F
i
)
The lower bound of T is derived from the above equation as:
T
n
i=1
F
i
+
_
n
i=1
(1 F
i
)
The threshold bounds for the fusion IDS is:
Chapter 5 108
_
_
N
F
i
i = 1
+
_
n
i=1
(1 F
i
),
n
i=1
D
i
_
n
i=1
D
i
_
_
.
Since the threshold T is assumed to be greater than the mean of normal activity,
the upper bound of false positive rate F can be obtained from the Chebyshev
inequality as:
F
V ar[s]
(T E[s])
2
(5.3)
In a statistical intrusion detection system, a false positive is caused due to the
variance of network trafc during normal operations. Hence, to reduce the false
positive rate, it is important to reduce the variance of the normal trafc. In
the ideal case, with normal trafc the variance is zero. The above equation
5.3 shows that as the variance of the normal trafc approaches zero, the false
positive rate should also approach zero. Also, since the threshold T is assumed
to be less than the mean of the intrusive activity, the lower bound of the detection
rate D can be obtained from the Chebyshev inequality as:
D 1
V ar[s]
(E[s] T)
2
(5.4)
For an intrusive trafc, the factor D
i
(1 D
i
) remains almost steady and hence
the variance given as:
V ariance =
n
i=1
D
i
(1 D
i
), is an appreciable value. Since the variance of
the attack trafc is above a certain detectable minimum, from equation 5.4, it
is seen that the correct detection rate can approach an appreciably high value.
Similarly the true negatives will also approach a high value since the false pos-
itive rate is reduced with IDS fusion.
It has been proved above that with IDS fusion, the variance of the normal trafc
is clearly dropping down to zero and the variance of the intrusive trafc stays
above a detectable minimum. This additionally supports the proof that the fu-
sion IDS gives better detection rate and a tremendously low false positive rate.
Chapter 5 109
5.3 Results and discussion
5.3.1 Experimental evaluation
The fusion IDS and all the IDSs that form part of the fusion IDS were separately
evaluated with the same two data sets, namely 1) the real-world network trafc
and 2) the DARPA 1999 data set. The real trafc within a protected University
campus network was collected during the working hours of a day. This traf-
c of around two million packets was divided into two halves, one for training
the anomaly IDSs, and the other for testing. The test data was injected with
45 HTTP attack packets using the HTTP attack trafc generator tool called lib-
whisker [108]. The test data set was introduced with a base rate of 0.0000225,
which is relatively realistic. The test data of the DARPA data set consisted of
190 instances of 57 attacks which included 37 probes, 63 DoS attacks, 53 R2L
attacks, 37 U2R/Data attacks with details on attack types given in Table 4.1. The
large observational data set were analyzed to nd unsuspected relationships and
was summarized in novel ways that were both understandable and useful for
the detector evaluation. There are many types of attacks in the test set, many
of them not present in the training set. Hence, the selected data also challenged
the ability to detect the unknown intrusions. When a discrete IDS was applied
to a test set, it yields a single confusion matrix. Thus, a discrete IDS produced
only a single point in the ROC space, whereas scoring IDSs can be used with a
threshold to produce different points in the ROC space.
The fusion IDS was initially evaluated with the DARPA 1999 data set. The
individual IDSs chosen in this work are PHAD and ALAD, two research IDSs
that are anomaly-based and having extremely low false alarm rate of the order
of 0.00002. The other reason for the choice of PHAD and ALAD was that the
two are almost complementary in attack detection. This helps in achieving the
best results from the fusion process. The analysis of PHAD and ALAD has
resulted in a clear understanding of the individual IDSs expected to succeed or
fail under a particular attack. On combining the two sensor alerts and removing
the duplicates, an improved rate of detection is achieved as shown in Table 5.3.
The performance in terms of F-score of PHAD, ALAD and the combina-
tion of PHAD and ALAD is shown in the Tables 5.4, 5.5 and 5.6 respectively
Chapter 5 110
Table 5.1: Types of attacks detected by PHAD at 0.00002 FP rate (100 FPs)
Attack type Total attacks Attacks detected % detection
Probe 37 22 60%
DOS 63 24 38%
R2L 53 6 11%
U2R/Data 37 2 5%
Total 190 54 28%
Table 5.2: Types of attacks detected by ALAD at at 0.00002 FP rate (100 FPs)
Probe 37 6 16%
DOS 63 19 30%
R2L 53 25 47%
U2R/Data 37 10 27%
Total 190 60 32%
Table 5.3: Types of attacks detected by the combination of ALAD and PHAD at 0.00004 FP
rate (200 FPs)
Probe 37 24 65%
DOS 63 39 62%
R2L 53 26 49%
U2R/Data 37 10 27%
Total 190 99 52%
Table 5.4: F-score of PHAD for different choice of false positives
FP TP Precision Recall Overall Accuracy F-score
50 33 0.39 0.17 0.99 0.24
100 54 0.35 0.28 0.99 0.31
200 56 0.22 0.29 0.99 0.25
500 56 0.10 0.29 0.99 0.15
Table 5.5: F-score of ALAD for different choice of false positives
50 42 0.45 0.21 0.99 0.29
100 60 0.37 0.31 0.99 0.34
200 66 0.25 0.34 0.99 0.29
500 72 0.12 0.38 0.99 0.18
Chapter 5 111
Table 5.6: F-score of fused IDS for different choice of false positives
50 44 0.46 0.23 0.99 0.31
100 73 0.42 0.38 0.99 0.40
200 99 0.33 0.52 0.99 0.40
500 108 0.18 0.57 0.99 0.27
Figure 5.2: Detection rate vs Threshold
for various values of false positives by setting the threshold appropriately. The
improved performance of the combination of the alarms from each system can
be observed in Table 5.6, corresponding to the false positives between 100 and
200, by xing the threshold bounds appropriately. Thus the combination works
best above a false positive of 100 and much below a false positive of 200. In
each of the individual IDSs, the number of detections were observed at false
positives of 50, 100, 200 and 500, when trained on inside week 3 and tested
on weeks 4 and 5. Figures 5.2, 5.3, 5.4 and 5.5 show the selected thresholds
for the false positives of 50, 100, 200 and 500. The fusion IDS has improved
performance than single IDSs for all the threshold values. The performance is
seen to be optimized within the bounds of 100 to 200 false positives.
The improved performance of the fusion IDS over some of the fusion alterna-
tives using the real-world network trafc is shown in Table 5.7.
Chapter 5 112
Figure 5.3: Precision vs Threshold
Figure 5.4: F-score vs Threshold
Figure 5.5: False Negative Rate vs Threshold
Chapter 5 113
Detector/ Total TP FP Precision Recall F-score
Fusion Type Attacks
PHAD 45 10 45 0.18 0.22 0.2
ALAD 45 18 45 0.29 0.4 0.34
OR 45 22 77 0.22 0.49 0.30
AND 45 9 29 0.24 0.2 0.22
SVM 45 19 49 0.3 0.42 0.35
ANN 45 19 68 0.22 0.42 0.29
Fusion IDS 45 20 37 0.35 0.44 0.39
Table 5.7: Comparison of the evaluated IDSs using the real-world network trafc
5.4 Summary
Simple theoretical model is initially illustrated in this chapter for the purpose
of showing the improved performance of fusion IDS. The detection rate and
the false positive rate quantify the performance benet obtained through the
xing of threshold bounds. Also, the more independent and distinct the attack
space is for the individual IDSs, the better the fusion IDS performs. The theo-
retical proof was supplemented with experimental evaluation, and the detection
rates, false positive rates, and F-score were measured. In order to understand
the importance of thresholding, the anomaly-based IDSs, PHAD and ALAD
have been individually analyzed. Preliminary experimental results prove the
correctness of the theoretical proof. The chapter demonstrates that our tech-
nique is more exible and also outperforms other existing fusion techniques
such as OR, AND, SVM, and ANN using the real-world network trafc em-
bedded with attacks. The experimental comparison using the real-world trafc
has thus conrmed the usefulness and signicance of the method. The uncondi-
tional combination of alarms avoiding duplicates as shown in Table 5.3 results
in a detection rate of 52% at 200 false positives, and F-score of 0.4. The combi-
nation of highest scoring alarms as shown in Table 5.6 using the DARPA 1999
data set results in a detection rate of 38% and threshold xed at 100 false posi-
tives, and F-score of 0.4.
Chapter 6
Performance Enhancement of IDS using
Rule-based Fusion and Data-dependent
Decision Fusion
There is no greatness where there is no simplicity, goodness and truth.
Leo Tolstoy
6.1 Introduction
In the previous chapter the utility of sensor fusion for improved sensitivity and
reduced false alarm rate was illustrated. In this chapter we have further explored
the general problem of the poorly detected attacks. The poorly detected attacks
reveal the fact that they are characterized by features that do not discriminate
them much. This chapter discusses the improved performance of multiple IDSs
using rule-based fusion and Data-dependent Decision fusion (or DD fusion for
the purposes of this document). The DD fusion approach gathers an in-depth
understanding about the input trafc and also the behavior of the individual
IDSs by means of a neural network learner unit. This information is used to
ne-tune the fusion unit since the fusion depends on the input feature vector.
Thus fusion implements a function that is local to each region in the feature
space. It is well-known that the effectiveness of sensor fusion improves when
the individual IDSs are uncorrelated. The training methodology adopted in this
work takes note of this fact. The performance of Snort has been improved by
114
Chapter 6 115
enhancing its rule base. The overall performance of the fused IDSs using rule-
based fusion shows an overall enhancement in the performance with respect to
the performance of individual IDSs. For illustrative purposes two different data
sets, namely the DARPA 1999 data set as well as the real-world network trafc
embedded with attacks, have been used. The DD fusion shows a signicantly
better performance with respect to the performance of individual IDSs.
The related work of sensor fusion in intrusion detection application is dis-
cussed in chapter 4. The problem of designing IDSs to work effectively and
yield higher accuracies for minority attacks like R2L and U2R even in the mix
of data skewness has been receiving serious attention in recent times. Other
than the related work discussed in chapter 4, predictive classier models for
rare events are given in [33] and [109]. But, none of these attempts have shown
any signicant contribution in overcoming the data skewness problems. Hence
in spite of all the earlier attempts, there is still room for a signicant improve-
ment in the detection of rare attacks.
The chapter is organized as follows. Section 6.2 discusses the rule-based
fusion of IDSs. Section 6.3 explains the proposed Data-dependent Decision
(DD) fusion architecture. Section 6.4 describes the algorithm of the proposed
data-dependent decision fusion architecture. This chapter also illustrates and
discusses the results of the proposed architecture. In section 6.5 the conclusion
of the chapter is drawn.
6.2 Rule-based fusion
In sensor fusion, if the sources are overlapping in their decisions and are inde-
pendently maintained, then the chances of inconsistent decisions are high. Con-
fronted with an inconsistent set of decisions, there are many solutions which
have already been researched and seen in the previous chapter. The different fu-
sion schemes are: (i) combination of all the alarms from each system and avoid-
ing the duplicates, (ii) taking the alarms from each system by xing threshold
bounds on fusion unit, and (iii) rule-based fusion with a prior knowledge of the
individual sensor performance.
The architecture followed in all of these methods is given in Figure 4.1. The
Chapter 6 116
appropriate adjustment of the fusion threshold optimizes the performance of
the resultant IDS in terms of a high detection rate and a low false alarm rate.
The upper bound of fusion threshold is obtained by Chebyshev inequality by as-
suming the fusion threshold to be greater than the mean of the normal activity.
Similarly, the lower bound of the threshold is obtained by assuming the thresh-
old to be less than the mean of the intrusive activity. In rule-based method, an
enhancement on the performance of the combined detector using simple rule-
based fusion is used, with fusion making use of the objective certainty of a
hypothesis to occur given a particular sensor as the component in fusion. The
rules were introduced with the knowledge of the optimal IDS under different
attack conditions.
An observation of the poorly detected attacks reveal the fact that they are char-
acterized by features that do not discriminate them much. This is claimed by
investigating the relevance of each feature in the 1999 DARPA IDS evaluation
data set. The data set is consolidated into network connections with 41 features
identied per connection. The probes in the trafc are identied by host-based
trafc features, whereas DoS attacks are identied by the basic features derived
from the packet header as well as the time-based trafc features. Thus, both
the anomaly-based detectors PHAD and ALAD perform with sufciently high
detection rates for DoS attacks, and PHAD outperforms ALAD in detecting
the probes. The attacks R2L and U2R are characterized by the basic features
derived from the packet headers and also the content-based features. Hence,
ALAD performs better than PHAD in detecting the R2L and U2R attacks. The
rule-based combination with the fusion has been used making use of the fact
that given a sensor, the objective certainty of a hypothesis to occur is used for
improving the performance of the combined sensor. The rules are as follows:
If any category of Probe except for ntinfoscan then PHAD
If Probe is ntinfoscan then ALAD
If any category of R2L except for xlock then ALAD
If R2L is xlock then PHAD
If any category of U2R then ALAD
Chapter 6 117
If DoS is due to (fragmentation [[ checksum error [[ URG-FIN ag set [[
small packet size) then PHAD
If DoS is due to malicious payload then ALAD
Table 6.1: Types of attacks detected by the rule-based combination of ALAD and PHAD at a
FP rate of 0.000025 (125 FPs)
Probe 37 24 65%
DOS 63 39 62%
R2L 53 26 49%
U2R/Data 37 10 27%
Total 190 99 52%
Experimental results show that the rule-based fusion performs better with a de-
tection rate of 52% at 125 false positives, F-score being 0.48, than the other
two combinations in the previous chapter. The rule-based fusion works signi-
cantly well, compared to the threshold based fusion in the case of detection of
known attacks. However, the threshold-based approach offers an advantage in
identifying the unknown attacks. The rule-based fusion IDS can detect some of
the well-known intrusions with high detection rate, but it is difcult to detect
novel intrusions, and its rule set has to be updated manually and frequently [7].
Thus, while the results were encouraging, it was realized that rule-based fusion
has no possibility of generalizing from previously observed behavior. As a re-
sult, the research was pursued further to generalize rule-based fusion and also
to overcome the other disadvantages of rule-based fusion.
6.3 Data-dependent decision fusion
An IDS is expected to work with a very large input data. The rule-based fu-
sion works with only small input data and there is a need for machine learning
algorithm to handle the type of data appearing on the network trafc. The rule-
based fusion also has the disadvantage of being dependent on the individual
IDSs that are used for sensor fusion. It is necessary to incorporate an architec-
ture that considers a method for improving the detection rate by gathering an
Chapter 6 118
in-depth understanding about the input trafc and also the behavior of the indi-
vidual IDSs. This helps in automatically learning the individual weights for the
combination when the IDSs are heterogeneous and shows difference in perfor-
mance. The architecture should thus be data-dependent and hence the rule set
has to be developed dynamically.
A new data-dependent architecture underpinning sensor fusion to signi-
cantly enhance the IDS performance was introduced and implemented in this
work. To this end, the decisions of various IDSs were combined with weights
derived using a machine learning approach. This architecture is different from
conventional fusion architectures and guarantees improved performance in terms
of detection rate and false alarm rate, works well even for large data sets, is ca-
pable of identifying novel attacks since the rules are dynamically updated and
has improved scalability.
6.3.1 Motivation
After the 1998 DARPA IDS evaluation, the MIT Lincoln Laboratory has re-
ported that if the best performing systems against each one of the different cate-
gories of attacks were combined into a single system, then roughly between 60
to 70 percentage of the attacks would have been detected with a false positive
rate of lower than 0.01%, i.e., lower than 10 false positives a day. However,
none of the previous work of sensor fusion in IDS has reached the Lincoln
Laboratory prediction. None of these approaches can avoid the effect due to
systematic errors of the individual IDSs. They are also prone to mistakes for
unrealistic condences of certain IDSs. The availability of large volume of ex-
perimental data has motivated us to use the machine learning concepts to fuse
the data. The individual weights of the IDSs can be obtained by learning the
behavior of various IDSs for different attack classes, and these weighted deci-
sions can be combined in efcient ways.
6.3.2 Data-dependent decision fusion architecture
This section introduces a better architecture which explicitly incorporates data
dependence in the fusion technique. The disadvantage of the commonly used
Chapter 6 119
fusion techniques which are either implicitly data dependent or data indepen-
dent, is due to the unrealistic condence of certain IDSs. The idea in the pro-
posed architecture is to properly analyze the data and understand when the in-
dividual IDSs fail. The fusion unit should incorporate this learning from input
as well as from the output of detectors to make an appropriate decision. We
Figure 6.1: Data-dependent Decision fusion architecture
IDS1 IDSn
INPUT (x)
OUTPUT(y)
FUSION UNIT
NEURAL NETWORK LEARNER
(x)
x
IDS2
S1 S2 Sn
S1 S2 Sn
S1
S2
Sn
w1
w2
wn
proposed a three-stage architecture, with optimizing of the individual IDSs as
the rst stage, the Neural Network learner determining the weights of the in-
dividual IDSs as the second stage, and the fusion unit doing the weighted ag-
gregation as the nal stage. The neural network learner can be considered as a
pre-processing stage to the fusion unit. The neural network is most appropriate
for weight determination, since it becomes difcult to dene the rules clearly,
mainly as more number of IDSs are added to the fusion unit. When a record is
correctly classied by one or more detectors, the neural network will accumu-
late this knowledge as a weight and with more number of iterations, the weight
gets stabilized. The architecture is independent of the data set and the struc-
tures employed, and can be used with any real valued data set, which is not the
case with rule-based aggregation. Thus, it is reasonable to make use of neural
network learner unit to understand the performance and to assign weights to
various individual IDSs in the case of a large data set. The neural network has
the capability to generalize from past observed behavior to identify novel attack
inputs and hence to give the proper weighting to the individual IDSs.
The weight assigned to any IDS not only depends on the output of that IDS
as in the case of the probability theory or the Dempster-Shafer theory, but also
on the input trafc which causes this output. A neural network unit is fed with
the output of the IDSs along with the respective input for an in-depth under-
standing of the reliability estimation of the IDSs. The alarms produced by the
Chapter 6 120
different IDSs when they are presented with a certain attack clearly tell which
sensor generated more precise result. The output of the neural network unit cor-
responds to the weights which are assigned to each one of the individual IDSs.
The IDSs can be fused to produce an improved resultant output with these im-
proved weight factors. Thus, the proposed architecture refers to a collection
of diverse IDSs that respond to an input trafc and the weighted combination
of their predictions. The weights are learned by looking at the response of the
individual sensors for every input trafc connection. The fusion output can be
represented as:
y = F
j
(w
j
i
(x, s
j
i
), s
j
i
),
where the weights w
j
i
are dependent on both the input x as well as individ-
ual IDSs output S
i
, the sufx i refers to the IDS index and the prex j refers to
the class label. The fusion unit gives a value of zero or one depending on the set
threshold being higher or lower than the weighted aggregation of the decisions
of individual IDSs respectively.
In the case of intrusion detectors ALAD and PHAD, the training is done by
considering more of the data and at the same time optimally, which will likely
decrease the bias of the individual detectors. So the individual IDSs chosen are
of low bias, comparatively high variance (which gets reduced on fusion), and a
low error correlation (or they make different error or a high variance component
of error). Hence the proposed data-dependent architecture allows the IDSs to
develop diversity while being trained. What is required of the nal fusion unit
is that it generalizes well after training (reduced bias) on an unexpected traf-
c stream and additionally avoid over-tting, which ensures variance reduction.
Thus the proposed architecture exploits the experimental observation made in
work of Giacinto et al.[95] that training is done in different feature subspaces.
The test has been conducted on the entire test set and then the evidence is com-
bined to produce the nal decision. The neural network learner was introduced
to process the entire available feature set to extract more effective signatures
than the ones hand-coded by the rule-based fusion. The algorithm of the pro-
posed data-dependent architecture is given in section 6.5.3.
Chapter 6 121
6.3.3 Detection of rarer attacks
In most of the available literature the imbalance in the network data is over-
come by resampling the training distribution. The resampling is commented in
general, and in particular with the experiment that has been conducted in this
thesis in the following manner: There is no point in reducing the normal data
in the training data set since this data set is an expected replication of the real-
world data and the data set available has a distribution more or less like the
naturally occurring class distribution. Additionally, changing the data distribu-
tion complicates the analysis of IDS because it will result in the IDSs behaving
in unexpected or unpredictable manner. Varying the size of the training set af-
fects the accuracy of the IDS in predicting the class of test samples that belong
to each of these classes. Hence, it is a good idea to learn from the data in the
same form as it is available. In the case of anomaly detectors learning from the
normal data, the more the normal data, the more efcient they are in detecting
the attacks. Hence, while using anomaly detectors that learn from the normal
trafc, the normal samples in the data should not be reduced. Also, the base-rate
fallacy is not a factor that can be avoided. The only counter-measure is to set
the acceptable false alarm rate to be extremely low, almost as low as the prior
probability as established in chapter 2.
Several of the detection algorithms present the results with high detection
rate and low false positive rate without considering the real impact of the huge
number of false alerts generated with skewness in network trafc. Since one
of the main goals of this work is to prevent the misinterpretation of the metrics
used, a good estimate for a low false alarm rate is to set it almost equal to the
prior probability of attack. In that case the precision approximates 0.5 assuming
the detection rate to be very high. The new evaluation criterion is dened as:
max TPrate s.t : precision p
min
FPrate, TPrate ROC space
The ROCspace is dened by FPrate and TPrate as x and y axes respectively,
which depicts the relative trade-offs between the TPs and the FPs.
Chapter 6 122
6.4.1 Test setup
The weight analysis of the IDS data coming fromthe three IDSs, PHAD, ALAD,
and Snort was carried out by the neural network supervised learner before it
was fed to the fusion element. The detectors PHAD and ALAD produces the
IP address along with anomaly score whereas the Snort produces the IP address
along with severity score of the alert. The alerts produced by these IDSs are
converted to a standard binary form. The neural network learner inputs these
decisions along with the particular trafc input, which was monitored by the
IDSs. The trafc input to the neural network were in terms of the connection-
based features namely source IP, destination IP, source port, destination port,
transport layer protocol, session duration, bytes exchanged and the throughput
of the session.
The neural network learner was designed as a feed forward back propagation
algorithm with a single hidden layer and 25 sigmoidal hidden units in the hidden
layer. Experimental proof is available for the best performance of the neural
network with the number of hidden units being log(T), where T is the number
of training samples in the dataset [111]. In order to train the neural network, it
is necessary to expose them to both normal and anomalous data. Hence, during
the training, the network was exposed to weeks 1, 2, and 3 of the training data
and the weights were adjusted using the back propagation algorithm. An epoch
of training consisted of one pass over the training data. The training proceeded
until the total error made during each epoch stopped decreasing or 1000 epochs
had been reached.
The fusion unit performed the weighted aggregation of the IDSs outputs for
the purpose of identifying the attacks in the test data set. It used binary fusion
by giving an output value of 1 or 0 depending on the value of the weighted
aggregation of the decisions from the IDSs. The packets were identied by their
timestamp on aggregation. A value of 1 at the output of the fusion unit indicated
the record to be under attack and a 0 indicated the absence of an attack.
Chapter 6 123
6.4.2 Data set
All the intrusion detection systems that form part of the fusion IDS were sepa-
rately evaluated with the same two data sets, namely 1)the real-world network
trafc and 2) the DARPA 1999 data set as discussed in chapter 5. The empiri-
cal evaluation of the data-dependent decision fusion method was also observed
with the same two data sets.
With majority of the IDSs it was observed that probes and DoS attacks have
high detection rate whereas attacks like R2L and U2R have lower detection
rates. The reason was again evident by observing the training data of the week
two which had appreciably high proportion of probes and DoS attacks. Also, the
R2L and the U2Rattacks in the training and testing data sets represent dissimilar
target hypotheses. ALAD and PHAD being anomaly detectors, the evaluations
try to avoid these biasing by getting trained with the attack-free training data
of the week one and week three respectively. Since the third IDS, Snort per-
forms misuse detection, even Snort was unaffected with this disproportionate
R2L and U2R trafc in the training and test data. It is important to mention at
this point that the proposed architecture can be generalized beyond the data set
or the IDSs that were used. The proposed method is independent of the input
trafc or the individual IDSs that take part in fusion.
6.4.3 Data-dependent decision fusion algorithm
Training of IDSs
Input:
The DARPA 1999 training dataset (x
n
, y
n
)where n refers to the number
of the record in the dataset.
The two anomaly IDSs are trained with ALAD on week one and PHAD
on week three.
Testing of IDSs
Input:
Chapter 6 124
The DARPA 1999 test dataset (x
j
)where j refers to the number of the
record in the dataset.
The testing of the three IDSs are done on the test data of weeks four and
ve.
Output:
The IDSs output s
i
where i corresponds to the IDS number.
Training of Neural Network learner
Input:
The IDSs output s
i
where i corresponds to the IDS index.
The DARPA training data set (x
n
, y
n
)where n refers to the number of
the record in the data set.
The IDS outputs as well as the training class labels are such that s
i
, y
i
C
k
where C
k
are the 58 class labels, and k varies from 1 to 58. With the IDSs used
in this experiment, it was simplied as a binary detector with class labels either
zero or one depending on the anomaly score of the anomaly detectors or the
severity of the Snort alert.
Training:
MATLAB Neural Network tool box is used.
Algorithm:Feed Forward Back Propagation
:Four input neurons
:One hidden layer with 25 sigmoidal units
:Output layer of three neurons
The three inputs correspond to the outputs of the three constituent IDSs
and the fourth input neuron is a vector which corresponds to a single record
of the DARPA data set where all values of the vector are run.
Chapter 6 125
Testing of the Neural Network learner
Input:
The IDSs output s
i
where i corresponds to the index of IDSs.
The DARPA test dataset (x
j
)where j refers to the number of the record
in the dataset.
The IDS outputs are such that s
i
C
k
where C
k
are the 58 class labels, and k
varies from 1 to 58. With the IDSs used in this experiment, it was simplied as
a binary detector with class labels either zero or one depending on the anomaly
score of the anomaly detectors or the severity of the Snort alert.
Testing:
MATLAB Neural Network tool box was used.
Algorithm:Feed Forward Network
:Four input neurons
:One hidden layer with 25 sigmoidal units
:Output layer of three neurons
Output:
_
w
j
i
_
is the output of the NN learner which is expected to give a measure
of the reliability of each of the IDS, i depending on the observed data class
type j.
Fusion Unit
Input:
The IDSs output s
i
where i corresponds to the index of IDSs.
Chapter 6 126
Table 6.2: Types of attacks detected by PHAD at a false positive rate of 0.00002 (100 FPs)
Attack Type Total attacks Attacks detected % detection
Probe 37 26 70%
DoS 63 27 43%
R2L 53 6 11%
U2R/Data 37 4 11%
Total 190 63 33%
The weight factor for the fusion process,
_
w
j
i
_
which is the output of the
NN learner which is expected to give a measure of the reliability of each
of the IDS, i depending on the observed data class type j.
Output:
Binary fusion output is one if the weighted linear aggregation of the output
from all the IDSs is greater than zero and zero otherwise.
y = 1, if
w
j
i
s
i
> 0
0, otherwise
6.4.4 Experimental evaluation
All the IDSs that take part in fusion were modied and separately evaluated
with the same data set, and then the empirical evaluation of the proposed data-
dependent decision fusion method was also presented. It can be observed from
the Tables 6.2, 6.3 and 6.4 that the attacks detected by different IDSs were not
necessarily the same and also that no single IDS was able to provide acceptable
values of all the performance measures. A quantitative analysis provides the
correlation coefcient among the different sensors as follows:
Correlation coefcient of PHAD and ALAD: -0.36
Correlation coefcient of PHAD and snort: -0.42
Correlation coefcient of ALAD and snort: 0.59
The results as seen from Table 6.5 and Figure 6.2 support the validity of the
proposed approach compared to the various existing fusion methods of IDSs.
The results evaluated in Table 6.6 show that the accuracy and AUC were heav-
ily biased to favor the majority class. The ROC Semilog curves of the individual
Chapter 6 127
Table 6.3: Types of attacks detected by ALAD at a false positive rate of 0.00002 (100 FPs)
Probe 37 9 24%
DoS 63 23 37%
R2L 53 31 59%
U2R/Data 37 15 31%
Total 190 78 41%
Table 6.4: Types of attacks detected by Snort at a false positive rate of 0.0002 (1000 FPs)
Probe 37 15 41%
DoS 63 35 56%
R2L 53 30 57%
U2R/Data 37 34 92%
Total 190 115 61%
Table 6.5: Types of attacks detected by DD fusion IDS at a false positive rate of 0.00002 (100
FPs)
Probe 37 28 76%
DoS 63 40 64%
R2L 53 34 64%
U2R/Data 37 34 92%
Total 190 136 70%
Table 6.6: Comparison of the evaluated IDSs with various evaluation metrics
Detector P R F-score Accuracy AUC
PHAD 0.39 0.33 0.36 0.99 0.66
ALAD 0.44 0.41 0.42 0.99 0.71
Snort 0.10 0.61 0.17 0.99 0.80
Data-dependent fusion 0.42 0.7 0.53 0.99 0.85
Table 6.7: Detection of different attack types by single IDSs and data-dependent decision fusion
IDS
Fusion/
Attack Type
Detection %
PHAD ALAD Snort Data-dependent decision fusion
Probe 70% 24% 41% 76%
DoS 43% 37% 56% 64%
R2L 11% 59% 57% 64%
U2R/Data 11% 31% 92% 92%
False Positive% 0.002% 0.002% 0.02% 0.002%
Chapter 6 128
IDSs and the DD fusion IDS are given in Figure 6.3. The log-scale was used
for the x-axis to identify the points which would otherwise be crowded on the
y-axis. The results presented in Table 6.7 and Figure 6.4 indicate that the DD
fusion method performs signicantly better with high recall as well as high pre-
cision as against achieving the high accuracy alone using the DARPA data set.
In the case of an IDS, there are both the security requirements and the accept-
ability requirements. The security requirement is determined by the TP
rate
and
the acceptability requirement is decided by the number of FPs because of the
low base rate in the case of network trafc. The hypothesis that the proposed
model is suitable for the detection of rare classes of attacks is empirically eval-
uated in the next section using the DARPA 1999 data set. It may be noted that
the false positive rates differ in the case of Snort as it was extremely difcult to
try for a fair comparison with equal false positive rates for all the IDSs because
of the unacceptable ranges for the detection rate under such circumstances.
Figure 6.2: Performance of evaluated systems
Figure 6.3: Semilog ROC curve of single and DD fusion IDSs
10
6
10
5
10
4
10
3
10
2
10
1
10
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FALSE POSITIVE RATE (LOG SCALE)
T
R
U
E

P
O
S
I
T
I
V
E

R
A
T
E
ROC SEMILOG CURVE

PHAD
ALAD
Snort
DD
Chapter 6 129
Figure 6.4: Comparison of evaluated systems
Figure 6.5: Detection of Attack Types
Chapter 6 130
Fusion Type Attacks
PHAD 45 11 45 0.2 0.24 0.22
ALAD 45 20 45 0.31 0.44 0.36
Snort 45 13 400 0.03 0.29 0.05
OR 45 34 470 0.07 0.76 0.13
AND 45 9 29 0.22 0.2 0.22
SVM 45 23 44 0.24 0.51 0.33
ANN 45 25 94 0.21 0.56 0.31
Data-dependent 45 27 42 0.39 0.6 0.47
Decision Fusion
Table 6.8: Comparison of the evaluated IDSs using the real-world data set
Table 6.9: Performance comparison of single IDSs and DD fusion IDS
Detector pairs/
Z-number
DD fusion and PHAD DD fusion and ALAD DD fusion and Snort
Z
R
7.2 5.7 1.9
Z
P
0.3 -0.2 4.1
Table 6.8 demonstrates that the DD fusion method outperforms other existing
fusion techniques such as OR, AND, SVM, and ANN using the real-world net-
work trafc.
The comparison of IDSs with the metric F-score has the limitation in di-
rectly applying tests of signicance to it in order to determine the condence
level of the comparison. The primary goal of this work is to achieve improve-
ment in recall as well as precision for the rare classes. Hence an improved IDS
comparison test called the P-test [110] was included to take into account the
improvement in both recall as well as precision. This advanced test to compare
IDS performance takes into account the improvement in both recall as well as
precision. The result in Table 6.9 shows that the DD fusion performs signif-
icantly better than any of the individual IDSs. It performs better than PHAD
and ALAD in terms of recall and is comparable to PHAD and ALAD in case of
precision. The DD fusion works exceptionally better than the Snort in terms of
both precision and recall. Hence the proposed approach outperforms the exist-
ing state-of-the-art techniques of its class, for an optimum performance in terms
of both recall and precision.
Chapter 6 131
In real world network environment, the rare attacks like U2R and R2L are more
dangerous than probe and DoS attacks. Hence, it is essential to improve the
detection performance of these rare classes of attacks while maintaining a rea-
sonable overall detection rate. The results presented in Table 6.7 and Figure
6.4 indicate that the proposed method performs signicantly better for rare at-
tack types with a high recall as well as a high precision as against achieving
the high accuracy alone. The claim that the proposed method performs bet-
ter is supported by a statement from Kubat et al. [1998], which states that
a classier that labels all regions as majority class will achieve an accuracy of
96%...a system achieving 94% on the minority class and 94% on the majority
class will have worse accuracy yet be deemed highly successful. With the pro-
posed method, an intrusion detection of 70% with a false positive of as low as
0.002% has been achieved. The F-score has been improved to 0.53.
6.4.5 Discussion
Most of the U2R attacks like loadmodule, perl and sqlattack are made stealthy
by running the attack over multiple sessions. These attacks are detected by Snort
but at the expense of a higher false positive rate. Snort was comparably better
in the detection of all the new attacks like queso, arppoison, dosnuke, self-
ping, tcpreset, ncftp, netbus, netcat, sshtrojan, ntfsdos, and sechole for which
Snort had the signatures available. Snort identies the attack warezclient which
downloads illegal copies of software by an addition of the rule of looking for
executable codes on the FTP port. Thus each rule in the rule set uses the most
discriminating feature values for classifying a data item into one of the class
types.
Although the research discussed in this thesis has thus far focused on the
three IDSs, namely, PHAD, ALAD and Snort, the algorithm works well with
any IDS. The proposed system provides great benet to a security analyst. Snort
was comparably better in the detection of all the new attacks for which the sig-
natures were available with Snort. The computational complexity introduced by
the proposed method can be justied by the possible gains which are illustrated.
The result of the data-dependent decision fusion method is better than what
has been predicted by the Lincoln Laboratory after the DARPA IDS evaluation.
Chapter 6 132
With the fusion architecture proposed in this chapter, an improved intrusion de-
tection of 70% with a false positive of as low as 0.002% and F-score to 0.53
were achieved.
6.5 Summary
We have adapted and extended notions from the eld of multisensor data fusion
for the rule-based fusion and the data-dependent decision fusion. An enhance-
ment in the performance of the combined detector using simple rule-based fu-
sion is demonstrated in the initial part of this chapter, with fusion making use
of the objective certainty of a hypothesis to occur given a particular sensor as
the component in fusion. The extensions are principally in the area of gener-
alizing feature similarity functions to comprehend observances in the intrusion
detection domain. The approach has the ability to fuse decisions from multiple,
heterogeneous and sub-optimal IDSs.
In the proposed data-dependent decision fusion architecture, a neural net-
work unit was used to generate a weight factor depending on the input as well
as the IDSs outputs. The method considers appropriate weights to various indi-
vidual IDSs that take part in fusion. This results in a more accurate and precise
detection for a wider class of attacks. If the individual sensors were complemen-
tary and looked at different regions of the attack domain, then this DD fusion
enriches the analysis on the incoming trafc to detect attack with appreciably
low false alarms. The individual IDSs that are components of this architecture
in this particular work were PHAD, ALAD and Snort with detection rates 0.33,
0.41 and 0.61 respectively after modications to these IDSs. The false positive
rates of PHAD and ALAD were acceptable whereas that of Snort was unexcep-
tionally high. The results obtained by the proposed architecture illustrate that
the DD approach improved beyond the existing fusion approaches with the best
performance in terms of improved accuracy. The marginal increase in the com-
putational requirement introduced by the data-dependency can be justied by
the acceptable ranges of false alarms and an overall detection rate of 0.7, which
have resulted with exceptionally large data set and suboptimal constituent IDSs.
It is also shown that our technique is more exible and also outperforms other
Chapter 6 133
existing fusion techniques such as OR, AND, SVM, and ANN. The experi-
mental comparison of the DD fusion method using the real-world trafc has
conrmed its usefulness and signicance.
The data skewness in the network trafc demands for an extremely low false
positive rate of the order of the prior probability of attack for an acceptable value
of the Bayesian attack detection rate. The research and development efforts in
the eld of IDS, and the state-of-the-art IDSs, all are still with marginal detec-
tion rates and high false positive rates, especially in the case of stealthy, novel
and R2L attacks. In the environment in which an IDS is expected to operate, the
attacks are the minority, requiring very low false positive rates for acceptable
detection. A basic domain knowledge about network intrusions makes us un-
derstand that U2R and R2L attacks are intrinsically rare. The poor performance
of the detectors has been improved by discriminative training of anomaly de-
tectors and incorporating additional rules into the misuse detector. This chap-
ter proposes a new approach of machine learning method where corresponding
learning problem is characterized by a number of features, skewness in data and
the class of interest being the minority class and the minority attack type, and
the non uniform misclassication cost. The proposed method has successfully
demonstrated that the neural network learner encapsulates expert knowledge for
the weighted fusion of individual detector decisions. This creates an adaptable
algorithm that can substantially outperform state-of-the art methods for minor-
ity class type detection in both coverage and precision. The evaluations show
the strength and ability of the proposed approach to perform very well with
64% detection for R2L attacks and 92% detection for U2R attacks with an over-
all false positive rate of 0.002%. The experimental comparison of this method
has conrmed its usefulness and signicance.
Chapter 7
Modied Dempster-Shafer Theory for
Intrusion Detection Systems
A consensus mean everyone agrees to say collectively what no one else be-
lieves individually.
Abba Eban
7.1 Introduction
Sensor fusion using heterogeneous IDSs are employed to aggregate different
views of the same event. This helps in achieving improved detection through
detector reinforcement or complementarity. There is a factor of uncertainty in
the results of most of the IDSs available in literature. The main reasons for un-
certainty are vagueness and imprecision. One of the techniques of sensor fusion
is the Dempster-Shafer evidence theory [112, 113, 114], which can be used to
characterize and model various forms of uncertainty. In DS theory, evidence
can be associated with multiple possible events, e.g., sets of events. As a result,
evidence in DS theory can be meaningful at a higher level of abstraction without
having to resort to assumptions about the events within the evidential set.
The use of data fusion in the eld of DoS anomaly detection is presented by
Siaterlis and Maglaris [79]. The Dempster-Shafer theory of evidence is used
as the mathematical foundation for the development of a novel DoS detection
engine. The detection engine is evaluated using the real network trafc. The
superiority of data fusion technology applied to intrusion detection systems is
134
Chapter 7 135
presented in the work of Wang et al. [96]. The method used is information col-
lection from the network and host agents and application of Dempster-Shafer
theory of evidence. Another work incorporating the Dempster-Shafer theory of
evidence is by Hu et al. [97]. The Dempster-Shafer theory of evidence in data
fusion is observed to solve the problem of how to analyze the uncertainty in a
quantitative way. In the evaluation, the ingoing and outgoing trafc ratio and
service rate are selected as the detection metrics, and the prior knowledge in the
DDoS domain is proposed to assign probability to evidence.
The most prominent of the alternative combination rules to the Dempster-Shafer
method is a class of unbiased operators developed by Ron Yager [115]. Yager
points out that an important feature of combination rules is the ability to update
an already combined structure when new information becomes available. This
is frequently referred to as updating and the algebraic property that facilitates
this is associativity. Dempsters rule is an example of an associative combination
operation and the order of the information does not impact the resulting fused
structure. Yager [116] points out that in many cases a non-associative operator is
necessary for combination. A familiar example of this is the arithmetic average.
The arithmetic average is not itself associative, i.e., one cannot update the infor-
mation by averaging an average of a given body of data and a new data point to
yield a meaningful result. However, the arithmetic average can be updated by
adding the new data point to the sum of the pre-existing data points and divid-
ing by the total number of data points. This is the concept of a quasi-associative
operator that Yager introduced in his work [116]. Quasiassociativity means that
the operator can be broken down into associative sub-operations. Through the
notion of quasi-associative operator, Yager develops a general framework to
look at combination rules where associative operators are a proper subset.
The Transferable Belief Model (TBM) is an elaboration on the Dempster-Shafer
theory of evidence developed by Smets [117], based on the intuition that in the
case of conict, the result should allocate most of the belief weight to the empty
set. Technically, this would be done by using the TBM conjunction rule for
non-interactive sources of information, which is the same as Dempsters rule
of combination without renormalization. While most other theories adhere to
Chapter 7 136
the axiom that the probability (or belief mass) of the empty set is always zero,
Smet has an intuitive reason to drop this axiom: the open-world assumption. It
applies when the frame of reference is not exhaustive, when there are reasons
to believe that an event not described in the frame of reference will occur.
Murphy [118] presents another problem of classical Dempsters combination
rule, the failure to balance multiple bodies of evidence. The averaging best
solves the normalization problems and has much attractive features such as
identifying combination problems, showing the distribution of the beliefs and
preserving a record of ignorance. However, averaging does not offer conver-
gence toward certainty. The fusion technique proposed in this chapter was ex-
pected to combine the output of various IDSs with subjective judgements. The
feasibility of this idea has been demonstrated via an analysis case study with
several IDSs distributed over a LAN network and using the replayed DARPA
data set. It was expected to have a sensor fusion architecture to face the new
challenges in sensor fusion. It can be later modied to a generalizable solution
beyond any specic application.
This chapter is organized as follows: In section 7.2, we briey recall the Dempster-
Shafer theory of evidence, its weighted extension and also its disadvantages.
Section 7.3 illustrates the disjunctive combination of evidence which helps in
evidence aggregation. Section 7.4 discusses the modied evidence approach
with a more detailed observation on the performance of this approach. A gen-
eral discussion on the proposed approach for the particular application of sensor
fusion in intrusion detection is also included. Section 7.5 includes experimental
evaluation with a brief discussion on the impact of this work. In section 7.6 the
summary of this chapter is drawn.
7.2 Dempster Shafer combination method
The Dempster-Shafer theory is explained in detail in chapter 4. In this chapter
we formalize the problem as follows: Considering the DARPA data set, assume
a trafc space = DoS Portsweep R2L U2R Normal, of ve
mutually exclusive classes. Each IDS assigns to the trafc, the detection of any
Chapter 7 137
of the trafc sample x, that denotes the trafc sample to come from the class
which is an element of the FoD, . With n IDSs used for the combination,
the decision of each one of the IDSs is considered for the nal decision of the
fusion IDS.
7.2.1 Motivation for choosing the Dempster Shafer combination method
Even though the research work discussed in chapter ve has given encourag-
ing results, we have realized that there was no possibility of detecting the novel
attacks because of the difculty in generalizing from any previously observed
behavior. As a result, we pursued further by using a neural network learner that
understands the reliability of each one of the IDSs corresponding to the data
and accordingly provided a weight to every IDS decision and then make use of
an appropriate fusion operator. Specically, we are interested in the capability
of neural network to learn the condence to be assigned to every IDS and then
the fusion unit that optimally fuses the IDSs.
This thesis presents a method to detect the trafc attacks with an increased de-
gree of condence by a fusion system composed of different detectors. Each
IDS observes the same network trafc from various points on the network and
also detects the attack trafc with an uncertainty index. The frame of discern-
ment consists of singletons that are exclusive (A
i
A
j
= , i ,= j) and are
exhaustive since the FoD consists of all the expected attacks which the individ-
ual IDS detects or else the detector fails to detect by recognizing it as normal
trafc. The DS rule corresponds to a conjunction operator: it builds the belief
induced by accepting two pieces of evidence, i.e., by accepting their conjunc-
tion. Shafer developed the DS theory of evidence based on the model that all
the hypotheses in the FoD are exclusive and that the frame is exhaustive. The
purpose is to combine/aggregate several independent and equi-reliable sources
of evidence expressing their belief on the set. The DS combination rule gives
Chapter 7 138
the combined mass of the two evidence m
1
and m
2
on any subset A of the FoD.
m(A) =
m
1
(X)m
2
(Y )
X Y = A
1

m
1
(X)m
2
(Y )
X Y =
(7.1)
The denominator of the equation 7.1 is of the form 1 k, where k is the con-
ict between the two evidence. This denominator is for normalization, which
spreads the resultant uncertainty of any evidence with a weight factor, over all
focal elements and results in an intuitive decision. Thus, the effect of normal-
ization consists of eliminating the conicting pieces of information between the
two sources to combine, consistently with the intersection operator. Whether
normalized or not, the DS method satises the two axioms of combination:
0 m(A) 1 and

m(A) = 1
A
. The third axiom,

m() = 0 is not
satised by the unnormalized DS method. Also, independence of evidence is
yet another requirement for the DS combination method. The classical DS the-
ory treats all sensors democratically, but this is not reality since some are more
precise and accurate than others. Hence, we have the weighted Dempster-Shafer
method. In simplied situations this weight factor matches with the prior prob-
ability in the classical Bayesian inference method. The weighted and extended
DS can be used to:
Realize differential trust scheme on sensors
Mitigate conicts that cause counter-intuitive results using classical DS
evidence combination rule.
7.2.2 Limitations of the Dempster-Shafer combination
In the case of full contradiction between the bodies of evidence, k = 1, and such
a case occurs when there exists A such that Bel
1
(A) = 1 and Bel
2
(

A) = 1
as in Table 7.1. The computation of the combined evidence is done for DS
Chapter 7 139
A B
m
1
1 0
m
2
0 1
Table 7.1: Evidence with total conict
method and its other alternative methods like Yager [115, 116], Smets TBM
[117], and the Murphys averaging [118].
DS method: m(A) = 0, m(B) = 0
Yager method: m(A) = 0, m(B) = 0, m() = 1
Smets TBM: m(A) = 0, m(B) = 0, m() = 1
Murphys averaging: m(A) = 0.5, m(B) = 0.5
One more case of contradiction between the bodies of evidence is shown with a
different example in Table 7.2.
A B C
m
1
0.9 0.1 0
m
2
0 0.1 0.9
Table 7.2: Evidence with conict
DS method: m(A) = 0, m(B) = 1, m(C) = 0
Yager method: m(A) = 0, m(B) = 0.01, m(C) = 0, m() = 0.99
Smets TBM: m(A) = 0, m(B) = 0.01, m(C) = 0, m() = 0.99
Murphys averaging: m(A) = 0.45, m(B) = 0.1, m(C) = 0.45
The conict in evidence either gave non-intuitive results as with DS, or con-
icts were ported over to uncertainty as in Yager or to the null set as in Smets
TBM or averaged as in Murphys. However, none of them seem to be intuitive
or reasonable from the point of view of improving the belief. We conclude that
conict is not expected to drawa clear conclusion in one step. Final decision has
to be made depending on the collection of additional evidence. Hence we should
not forcefully converge the conicting evidence till we get more evidence. It is
better to aggregate evidence by the union operator without suppressing any of
the available evidence as in the case of DS.
Chapter 7 140
Normal Probe DoS U2R R2L
PHAD(m
1
) 0.4 0.6 0 0 0
ALAD(m
2
) 0.1 0.1 0 0 0.8
Snort(m
3
) 0 0.3 0 0 0.7
Table 7.3: Evidence from four sensors with one unreliable using the DS method
Another major drawback with DS and its alternatives except for Murphys is
that since it uses conjunctive combination, if any one or more sensors fail to
give evidence on a particular class, the evidence from other sensors on that par-
ticular class will have no effect and the intersection becomes a null set. A sensor
might fail to give evidence in cases when it is not tuned for that particular class
of attack due to the shortcomings of the technology used or due to some other
reason. This disadvantage was overcome by Murphys averaging, but this result
also looks counterintuitive since if one evidence fails, the belief of that hypoth-
esis gets weakened. The DS combination with one of the sensors in the fusion
being totally unreliable gives rise to counter-intuitive results as illustrated with
an example in Table 7.3.
DS method: m(Normal) = 0, m(Probe) = 1, m(DoS) = 0, m(U2R) = 0,
m(R2L) = 0
Yager method: m(Normal) = 0, m(Probe) = 0.018, m(DoS) = 0, m(U2R) =
0, m(R2L) = 0, m() = 0.982
Smets TBM: m(Normal) = 0, m(Probe) = 0.018, m(DoS) = 0, m(U2R) =
0, m(R2L) = 0, m() = 0.982
Murphys averaging: m(Normal) = 0.17, m(Probe) = 0.33, m(DoS) = 0,
m(U2R) = 0, m(R2L) = 0.5
In this particular case with the evidence m
1
unreliable, the result turns out to
be counter-intuitive on providing equal weight factor to all the evidence. If
proper weight factor is given to the evidence depending on their reliability, the
Murphys method gives acceptable results as in the modied averaging method.
We propose to use the union operator for aggregating the evidence in all the
above cases where the DS fails. This is because in case of conict, further evi-
dence aids convergence. Also in case of zero evidence from any one or more of
Chapter 7 141
the sensors, the union operator works the same as the averaging operator, which
is the best that can be thought of. Thus, if the intersection of the evidence is
not empty, the sources overlap and the combination rule can be intersection.
Otherwise at least one of the sources is necessarily wrong, and a more natural
combination rule is union, which assumes that all the sources are not wrong.
7.3 Disjunctive combination of evidence
The union approach can be considered under situations where different IDSs are
specialized on different types of attack detection and hence may not respond to
a certain attack. Thus the union approach focuses on the best-case behavior
of each IDS. The combination of IDSs is done so as to utilize the strength of
each IDS. Since the hypothesis includes each IDS in the binary state realizing
the trafc as a particular attack or as normal, all the hypotheses are singletons.
Hence it is more simpler than the generalized case with the hypotheses taking
any possible subset of the power set of FoD. The equation to nd the mass of
singletons is given by:
m(A) =
(m
1
(X) +m
2
(Y ))
X = Y = A
A
m(A)
(m
1
(X) +m
2
(Y ))
X ,= Y
=
(m
1
(X) +m
2
(Y ))
X = Y = A
(m
1
(X) +m
2
(Y ))
X = Y
(7.2)
The numerator of the above equation relates to the disjunctive combination and
the nal mass is calculated by the normalization with respect to the entire power
Chapter 7 142
set 2
which is closed under union, intersection and complement and hence is a

sigma algebra. This normalization allows the disjunctive combination equation
to satisfy all the axioms of the evidence theory.
Additionally, conicts can be thought of as due to uncertainty whereby the IDS
cannot take the decision correctly and the collective information will also be
ambiguous. Hence only if the reliability of the sensors are known, a conclusion
can be drawn by suppressing some evidence over the others, else it is better
to aggregate all the evidence and combine conjunctively with other evidence
agreeable to the aggregated evidence. The normalization is not done till the
conjunctive combination is done for converging the results.
The properties of associativity and commutativity are satised with a disjunc-
tive combination if normalization is done only at the nal combination stage.
In intrusion detection application, with singletons as the expected hypotheses,
Bel(AUB) = Bel(A) + Bel(B), which is same as Bayesian method, since
in case of singletons DS simplies to Bayesian. However, the advantage of
evidence combination is that more evidence can be combined in a single step,
without the knowledge of the associated probability distribution.
7.4 Context-dependent operator
The DS operator is most acceptable except for its two disadvantages that were
highlighted in section 7.2.2. Hence we require a Context-Dependent operator
for the decision fusion which is supposed to utilize enough available informa-
tion before making a nal decision and this operator is expected to be:
Conjunctive, if the sources have very low conict and also when all masses
are non-zero. The fusion should then behave as a severe operator where
the common or redundant part gets chosen and the mass of the less certain
information gets reduced.
Disjunctive, if sources conict and also when any one or more beliefs hap-
pen to be zero.
Compromise or average, in case of partial conict.
Chapter 7 143
The context-dependent operator is expected to have an adaptive feature of com-
bining the information related to one class in one way, and the information
related to another class in another way.
The proposed hybrid operator works the same way as the DS operator except in
the case of conict and when any belief mass happens to be zero, and also when
varying reliability need to be introduced on the different sensors. The combined
operator has a mass l referring to the modied mass, since it can exceed one at
intermittent stages due to para-consistency and is given by:
l=
_
_
n
i = 1
w
i
l
i
_
_
k
_
_
_
_
_
n
i = 1
k l
i
,= 0
w
i
l
i
_
_
_
_
_
1k
+
_
_
_
_
_
n
i = 1
k = 0, any l
i
= 0
w
i
l
i
_
_
_
_
_
1k
(7.3)
where w
i
is the weight associated to each sensor, k is the conict between the
combining sensors and l
i
is the mass associated with each sensor. The condi-
tions and requirements of using this operator are the following:
The proportionate sensor weighting factor is used since the intrusion detec-
tion systems used for the combination are binary in nature. The axiom of
combination; i.e.,

A
m(A) = 1 gets satised only when exponential
weighting factors are made use of.
The weights assigned to the sensors should add to one;
i.e.,
n
i = 1
w
i
= 1.
Chapter 7 144
The value of k lies between 0 and 1 and is the parameter that controls
the degree of compensation between the intersection and union parts. The
value of the conict factor (k) between any two sensors can be calculated
as the Euclidean distance between the two sensors. k takes a value of zero
if the sensors are in consensus and non-zero in case of conict.
The method that was proposed with this operator gives the most intuitive result
and works as follows:
Disjunctive combination is done on diverse pair of IDSs (averages out
without suppressing any evidence). Pair-wise disjunctive combination is
done on all the IDSs which are necessarily not redundant (since it is intu-
itive to think of a stronger evidence in case of redundancy rather than av-
eraging out which does not give any additional support even though both
IDSs support it).
The results are then conjunctively combined if not totally contradicting
after the pair-wise aggregation, since at this stage suppression of any ev-
idence will not happen to a higher extent and also suppression of certain
evidence helps in faster convergence.
In the case of redundancy, if we use disjunctive combination it is required
to work without normalization in all the intermittent combinations for the
sake of making the strong evidence still stronger.
In order to satisfy all axioms of evidence theory it is required to do nor-
malization at the nal stage so that all the masses of a particular evidence
sum to one.
It can be concluded that the proposed method combines IDSs reasonably
well under all conditions by disjunctively combining diverse or contra-
dicting pairs and nally suppressing the weak hypotheses by a pair-wise
conjunctive combination.
In the very specic case of binary evidence to singletons, it can be ob-
served that there is no additional support happening with the addition of
redundant evidence. Hence, it is better to go in for disjunctive combination
in all the intermittent combinations till the nal step where the conjunctive
combination helps in a faster convergence.
Chapter 7 145
7.4.1 Performance of the proposed combination operator
The proposed operator () is:
Same as the DS operator which is a consensus operator. This consensus
operator cannot provide information from a set of measures among which
one or more are zeros.
Union operator in case of conict and also when the mass/es are zeros.
The union operator is a para-consistent combination operator and hence
the combination mass can exceed one.
The operator satises the commutative, continuity, monotonicity (after normal-
ization), quasi-associativity (if normalization is done only at the nal stage of
combination) and no idempotence (para-consistency of union operator).
The reason for using the context dependent operator is that the union operator is
totally acceptable in case of diversity or conict because of uncertainty. How-
ever, disjunction gives averaging with redundant observations. Even though it is
reasonable to give average value where the additional observation by one more
IDS does not add to the increase in the belief, intuition makes us to think that
some method which increases the belief of the strong hypothesis is required.
The context-dependent operator subsumes the celebrated DS operator except
for the cases of conict and zero evidence and hence all the axioms of the DS
theory appear with the context-dependent operator also.

A
m(A) = 1,
m() = 0 and 0 m(A) 1. Also, independence of the sources which com-
bine is another assumption taken in the case of applicability of the combination.
Most of the commercial IDSs are signature-based which fail to identify the zero-
day attacks while working with real-time trafc. In such cases, misclassication
or False Negative (which is again misclassication, since the FoD contains the
hypothesis normal which is an expected output of the IDS) is expected. Hence
the combination operator can assume a closed-world assumption as in the case
of DS method.
Chapter 7 146
The same example which was used to illustrate the performance of the DS
and other operators is taken again to illustrate the performance of the context-
dependent operator. Applying the context-dependent method, with aggregation
Normal Probe DoS U2R R2L
PHAD(m
1
) 0.4 0.6 0 0 0
ALAD(m
2
) 0.1 0.1 0 0 0.8
Snort(m
3
) 0 0.3 0 0 0.7
Table 7.4: Evidence from four sensors with one unreliable using the context-dependent operator
of information fromthe conicting evidence and subsequently converging down
by the consensus operator:
((m
1
m
2
)m3) gives m(Normal) = 0, m(Probe) = 0.27, m(DoS) = 0,
m(U2R) = 0, m(R2L) = 0.73
The operator will give additional advantage when a lower weighting factor is
given to the rst evidence when it is observed to conict with all the other two
evidence. However, the complexity in choosing a weight factor is very high.
Hence even though in the operator equation we have incorporated the weight
factor, we have got good result even without applying the weight factor and
hence the method as such gets simplied appreciably.
Weighted disjunctive combination
If an IDS is known to be more dependable than the rest of the IDSs in
detecting a particular attack, then that particular IDS is weighted high for
that particular class. Even in the worst case with all other detectors unable
to identify the particular attack and with all of them recognizing it for
something else, the correct detector optimized for the particular attack has
a high weight factor which makes possible the correct detection.
Even with the substitution of weight factors, it is necessary to delay the
normalization till the nal stage of disjunctive combination in order to
satisfy the axiomof evidence theory. Then all the advantages of disjunctive
combination, like the associativity property also gets satised till the nal
stage of combination.
Chapter 7 147
Consider an experiment of observing a slow scan in network trafc with three
sensors, PHAD, ALAD and Snort. Snort responded with an alert whereas the
other two sensors could not detect the slow scan. We have made use of a data-
dependent decision learner using neural network to obtain the weight for each
one of the IDSs under various input trafc. The weights assigned by the neural
network learner is fed to the fusion unit along with the output from individual
IDSs. The fusion is done by the proposed context-dependent operator and the
resultant assignment values corresponding to each of the hypothesis are as fol-
lows:
m(Probe) = 1, m(DoS) = 0, m(R2L) = 0, m(U2R) = 0, m(Normal) = 0
whereas DS method of combination would have resulted in:
m(Probe) = 0, m(DoS) = 0, m(R2L) = 0, m(U2R) = 0, m(Normal) = 0.
Advantages of the proposed operator
This operator can combine evidence from two IDSs with different FoD.
Then the combined FoD will be the union of the FoDs.
The closure property is satised so as to stay within a given mathematical
framework.
This operator works under all conditions and states of the individual IDS.
This operator has been developed quite intuitively and hence the result is
most intuitive. The conjunctive operator is acceptable when all sources
happen to be reliable and similar, whereas the union operator corresponds
to the data aggregation from weaker reliability sources. Thus, conjunctive
combination makes sense when the mass distribution signicantly overlap
and if not, the combination will have at least one of the sources as wrong
and it is better to choose disjunctive combination. Also, our intuition was
that a certain diversity among classiers assures versatility whereas a cer-
tain redundancy assures reliability.
The combination operation is simple and easy.
Chapter 7 148
The combination takes care that no information is unnecessarily suppressed,
but at the same time convergence is assured.
The non-idempotence property is counted as an advantage in sensor fusion,
since the same observation from two sensors should improve the belief in
that observation, rather than idempotence. Bel Bel ,= Bel; even though
Bel Bel will favour the same subsets as Bel but with, as it were, twice
the weight of evidence. If each source supports a hypothesis for inde-
pendent reasons, it is natural to conclude that the hypothesis is supported
strongly since we have different reasons for considering it as such. Also,
adopting idempotence is a matter of context; acceptable when sensors are
homogeneous.
Other properties like commutative, continuity, and distributive properties
are satised.
Associativity is not absolutely required, since our combination algorithm
is not associative since it considers an ordering of sources. A weaker prop-
erty such as quasi-associativity is often sufcient, if we delay the normal-
ization till the end of the combination.
This operator subsumes the most celebrated DS method of evidence and
hence all axioms of DS theory of evidence is incorporated as it is.
This operator has the property of dipolarity, which means that the more
one proposition is supported by all the evidence, the more it can obtain
belief masses after combination. This property is seen to be satised by
DS operator also.
This operator is relatively tolerant of inaccurate, incomplete or inconsistent
evidence.
This operator aggregates the conicting evidence and then as the next stage
comprehend the aggregated pairwise results for an improved sensitivity
and for a false alarm suppression.
Chapter 7 149
Disadvantages of the proposed operator
Choice of the operator function depend on the context and hence has to be
carefully chosen.
Sources need to be independent for the combination. The Demspters idea
on the combination of independent sources of information can be stated
as follows: Suppose there are n pieces of evidence which are given in the
form of n probability spaces (
i
,
i
, m
i
) where
i
is a subset of P(), the
power set of the FoD, each of which has a mapping relation with the same
space S through a multivalued mapping. These n sources are independent
and the explanation by Dempster is as follows: opinion of different sen-
sors based on overlapping experiences could not be regarded as indepen-
dent sources. Dempster assumes statistical independence of sources as
different measurements by different observations on different equipment
would often be regarded as independent ...
For a parallel combination of any model, the basic requirement is that the
combination should be associative.
7.4.2 Discussion
1. The selection of the IDSs has been done by choosing the sensors depending
on minimum correlation among sensors. The correlation coefcient
n
of
the available sensors is given by the formula:
n
=
n(N
f
+N
t
)
NN
f
N
t
+n(N
f
+N
t
)
,
where n is the number of sensors, N is the number of experiments, N
f
is
the number of experiments where all classiers fail to detect, and N
t
is the
number of experiments where all IDSs detect correctly. We refer to redun-
dancy of classiers when the correlation coefcient is one, similarity when
the correlation coefcient is greater than 0.5, diversity of classiers when
correlation coefcient is less than or equal to 0.5 and totally contradicting
when the correlation coefcient is zero.
Chapter 7 150
2. It is quite intuitive to think that the fusion method should work with min-
imum number of IDSs and get the advantages of fusion techniques what-
ever be the fusion technique used. Every best IDS when merged should im-
prove the condence of the existing evidence and thereby converge faster
i.e., the fusion method makes strong evidence stronger so that confusion
(uncertainty ) is eliminated.
3. There are inherent advantages in using best IDSs in the fusion scheme.
This also makes sense intuitively in the case of evidence theory of fusion.
i.e., if one sensor gives its evidence and the second also gives a similar
evidence, the belief is reinforced stronger, or the contradiction in evidence
reduces the belief.
4. The DS method of combination implicitly has a closed world assumption,
i.e. the set of possible hypotheses is perfectly known. We assume that
represents a set of states that are mutually exclusive and exhaustive. In
intrusion detection, when dealing with real time trafc, need not neces-
sarily be exhaustive, since the trafc may contain many novel attacks not
included in the . However in such cases, the intrusion detection systems
may be unable to detect it and hence it appears as normal which is also a
hypothesis included in the FoD. Hence an additional label denoting none
of the above is not needed because the none of the above attacks gets
included in the hypothesis normal by the evidence only or else this novel
attack gets misclassied as some other attack only.
5. The sources are assumed to be independent with DS, (m(A) = m
1
(A)
m
2
(A)), even though we may use IDSs that are trained from the same
training set, the two are independent of each other and they in turn depend
on the training data set. Decisions acquired from multiple IDSs are more
likely to be independent when they look at entirely different features of the
trafc. Else if the two IDSs make use of the same features for detection, the
detectors may give a consensus in their decision, which is different from
the dependence between the sensors. When fusing by means of mathemat-
ical decision rules, it is necessary to have independent detectors, because
this will simplify construction of the rule and enhance its efciency.
6. It is important to note that when we apply the union operator, (m(A) =
Chapter 7 151
m
1
(A) + m
2
(A)), we do not assume mutual exclusiveness. The event
that an IDS alerts with DoS and the event that another IDS alerts with
DoS are not mutually exclusive. The elements in the FoD are mutually
exclusive where as the two sets which are sample space of IDS
1
and IDS
2
are not mutually exclusive. Since they are not mutually exclusive, the
result of Union operator will be para-consistent. The union of two events
IDS
1
alerting DoS and IDS
2
alerting DoS, denoted as IDS
1
IDS
2
is the
event containing all elements belonging to IDS
1
alerting as DoS or IDS
2
alerting as DoS or to both alerting as DoS. The idea is the aggregation
of the support to the events so that the belief improves. However, at the
same time we have tried for disjoint training sets to both the anomaly-
based IDSs, PHAD and ALAD by training them on week one and week
three of the DARPA99 data set respectively.
7. Shafer [112] describes the requirement of the different IDSs to be inde-
pendent and non-interacting, which is just that all their interaction should
be in terms of the issues discerned by the FoD. That clearly says that the
FoD should discern the interaction of the evidence (hence if we have
the singletons A, B and C, then the FoD consists of the eight possibilities
which give the total interaction within the FoD).
8. We concede that for highly conicting cases, this method is same as an
averaging operator. However, we argue that this method still has consider-
able application due to weight factors and data-dependency which results
in highly intuitive results.
7.5 Experimental evaluation
All the Intrusion Detection Systems that form part of the fusion IDS were sepa-
rately evaluated with the same two data sets, namely 1) the real-time trafc and
2) the DARPA 1999 data set.
m(Normal) = 0; m(Probe) = 0; m(DoS) = 0; m(R2L) = 1; m(U2R) = 0;
m(Normal) = 0; m(Probe) = 1; m(DoS) = 0; m(R2L) = 0; m(U2R) = 0;
Chapter 7 152
Normal Probe DoS R2L U2R/ Data
PHAD 1 0 0 0 0
ALAD 0 0 0 1 0
Snort 0 0 0 1 0
Fusion output 0 0 0 1 0
Table 7.5: Belief of each of the IDSs for a R2L attack
Normal Probe DoS R2L U2R/ Data
PHAD 0 1 0 0 0
ALAD 1 0 0 0 0
Snort 1 0 0 0 0
Fusion output 0 1 0 0 0
Table 7.6: Belief of each of the IDSs for a stealthy probe
Probe 37 26 70%
DoS 63 27 43%
R2L 53 6 11%
U2R/ Data 37 4 11%
Total 190 63 33%
Table 7.7: Type of attacks detected by PHAD at 100 false alarms
Probe 37 9 24%
DoS 63 23 37%
R2L 53 31 59%
U2R/ Data 37 15 31%
Total 190 78 41%
Table 7.8: Type of attacks detected by ALAD at 100 false alarms
Probe 37 15 41%
DoS 63 35 56%
R2L 53 30 57%
U2R/ Data 37 34 92%
Total 190 115 61%
Table 7.9: Type of attacks detected by Snort at 1000 false alarms
Chapter 7 153
Probe 37 31 84%
DoS 63 44 70%
R2L 53 34 64%
U2R/ Data 37 34 92%
Total 190 143 75%
Table 7.10: Type of attacks detected by context-dependent fusion at 100 false alarms
Figure 7.1: Detection of Attack Types
7.5.1 Impact of this work
The fusion technique adopted in this work is expected to combine IDSs outputs
with subjective judgements. This concept is nicely suitable to intrusion detec-
tion, where the concern usually involves human subjects activity and intention.
So the solution is to freely use subjective sensors, i.e., the sensors outputs can,
not only depend on observation of statistical process, but also depend on rational
human reasoning. Since there are multiple sensors, we need to coordinate them
and combine their results. The combination operator proposed in this chapter
for sensor fusion is mainly used because it is difcult to represent the informa-
tion supplied by the sensors by means of single probability distributions, due
to imprecision and/or lack of statistical evidence. The context-dependent op-
erator functions either as a conjunctive or as a disjunctive operator depending
on the context is particularly suitable when the sources are heterogeneous. The
Chapter 7 154
Fusion Type Attacks
PHAD 45 11 45 0.20 0.24 0.22
ALAD 45 20 45 0.31 0.44 0.36
Snort 45 13 400 0.03 0.29 0.05
OR 45 34 470 0.07 0.76 0.13
AND 45 9 29 0.24 0.2 0.22
Data-dependent 45 31 39 0.44 0.69 0.54
Decision Fusion
Table 7.11: Comparison of the evaluated IDSs using the real-world data set
feasibility of this idea was demonstrated via an analysis case study with sev-
eral IDSs distributed over a LAN network and using the replayed DARPA data
set. The technique gives a performance better than any of the individual intru-
sion detection systems which were fused. Even though it was validated for a
particular application, it should be a generalizable solution beyond any specic
application case.
7.6 Summary
Different IDSs have different detection rate and false alarm rate and these may
be complementary, competitive or cooperative. What sensor fusion is all about
is how to combine multiple sensor outputs to reveal the best truth regarding the
objects of interest in terms of practical utility. The context dependent opera-
tor proposed in this chapter was demonstrated to be feasible for sensor fusion.
The research in this thesis have improved over the existing DS alternatives in
that it can better handle uncertainty and ambiguity in sensed context. The in-
dividual IDSs that are components of this architecture in this particular work
were PHAD, ALAD and Snort with detection rates 0.33, 0.41 and 0.61 respec-
tively. The false alarm rates for PHAD and ALAD were acceptable whereas
that of Snort was unexceptionally high. Our algorithm has resulted in accept-
able ranges of false alarms and the signicant improvement in detection rate for
all types of attacks with an overall detection rate of 0.75 which have resulted
with exceptionally large data set and suboptimal constituent intrusion detection
systems. The detection rate for the real-world network trafc has improved to
0.69. The F-Score has improved to 0.66 and 0.54 for the DARPA data set and
Chapter 7 155
the real-world trafc respectively. The evaluations show the strength and abil-
ity of the data-dependent decision fusion approach using the modied evidence
theory to perform very well for the real-world network trafc as well as for the
DARPA data set. It is also shown that our technique is more exible and also
outperforms other existing fusion techniques such as OR and AND. The experi-
mental comparison of this method using the real-world trafc has conrmed its
usefulness and signicance.
The experiments in this work used only three IDSs. It is possible that the use of
more sensors will necessarily lead to higher performance improvement of the
fusion IDS. Also, the context-dependent operator can provide a generalizable
solution for a wide range of applications. Thus it supports the claim that syn-
ergistic interaction between sensor fusion and intrusion detection facilitates the
sensor fusion for detection improvement.
Chapter 8
Modeling of Intrusion Detection Systems
and Sensor Fusion
I nd that the harder I work, the more luck I seem to have.
Thomas Jefferson
8.1 Introduction
This chapter addresses the problem of optimizing the performance of intrusion
detection systems using sensor fusion. Considering the utility of sensor fusion
for improved sensitivity and false alarm reduction as demonstrated in the ear-
lier chapters, we have explored the general problem of deciding the threshold
for differentiating the malicious trafc from the normal trafc and also mod-
eling the individual components of proposed sensor fusion architecture. In the
proposed method, the performance optimization of the individual IDSs is rst
addressed. The neural network supervised learner has been designed to deter-
mine the weights of the individual IDSs, which incorporates data-dependency
in the architecture. A sensor fusion unit doing the weighted aggregation in or-
der to make an appropriate decision, forms the nal stage of the data-dependent
decision fusion architecture. This chapter theoretically models the fusion of in-
trusion detection systems for the purpose of demonstrating the improvement in
performance, in order to supplement the empirical evaluation in the previous
two chapters.
156
Chapter 8 157
The remaining of this chapter is organized as follows. In section 8.2, the mo-
tivation for this chapter is discussed. In section 8.3 model of the proposed DD
fusion architecture is presented by modeling the constituent parts. Algorithms
for optimizing the local detectors along with a data-dependent decision fusion
architecture for optimizing the fusion criterion are also presented in section 8.3.
Finally, the concluding comments are presented in section 8.4.
8.2 Motivation
This chapter attempts to realize that there exists more effective means of ana-
lyzing the information provided by existing IDSs using sensor fusion resulting
in an effective data renement for knowledge recovery. The improved perfor-
mance of the DD fusion architecture is shown experimentally with an approach
adopted for optimizing both the local sensors and the fusion unit with respect
to the error rate. The optimal performance along with the complexity of the
task bring to the fore the need for theoretically sound basis for the sensor fusion
techniques in IDSs. The theoretical analysis of the improved performance of
the architecture has been done in chapter 4.
The motivation of the present work was the fact that the empirical evaluation
as seen in previous chapter was extremely promising with the DD fusion. The
modeling can be extremely useful with a complete addressing of the problem
with sound mathematical and logical concepts as introduced in chapter 4. Thus
the present work employs modeling to augment the effective mathematical anal-
ysis of the improved performance of sensor fusion and to develop a rational
basis which is free from the various techniques used.
8.3 Modeling of data-dependent decision fusion system
The Data-dependent Decision fusion approach proposed in chapter six was ex-
tended to include the modeling with optimization done at every single stage,
thereby arriving at an optimum architecture showing improved performance
than what has been reported so far in literature. The architecture is three stage,
Chapter 8 158
optimizing the individual IDSs as the rst stage, the neural network learner de-
termining the weights of the individual IDSs as the second stage, and the fusion
unit doing the weighted aggregation as the nal stage.
8.3.1 Modeling of Intrusion Detection Systems
Consider an IDS that either monitors the network trafc connection on the net-
work or the audit trails on the host. The network trafc connection or the audit
trails monitored are given as x X, where X is the entire domain of network
trafc features or the audit trails respectively. The model is based on the hy-
pothesis that the security violations can be detected by monitoring the network
for trafc connections of malicious intent in the case of network-based IDS
and a systems audit records for abnormal patterns of system usage in the case
of host-based IDS. The model is independent of any particular operating sys-
tem, application, system vulnerability or type of intrusion, thereby providing a
framework for a general-purpose IDS.
When making an attack detection, a connection pattern is given by x
j
1
jk
where j is the number of features from k consecutive samples used as input
to an IDS. As seen in the DARPA data set, for many of the features the dis-
tributions are difcult to describe parametrically as they may be multi-modal
or very heavy-tailed. These highly non-Gaussian distributions has led to inves-
tigate non-parametric statistical tests as a method of intrusion detection in the
initial phase of IDS development. The detection of an attack in the event x is
observed as an alert. In the case of network-based IDS, the elements of x can
be the elds of the network trafc like the raw IP packets or the pre-processed
basic attributes like the duration of a connection, the protocol type, service etc.
or specic attributes selected with domain knowledge such as the number of
failed logins or whether a superuser command was attempted. In host-based
IDS, x can be the sequence of system calls, sequence of user commands, con-
nection attempts to local host, proportion of accesses in terms of TCP or UDP
packets to a given port of a machine over a xed period of time etc. Thus IDS
can be dened as a function that maps the data input into a normal or an attack
event either by means of absence of an alert (0) or by the presence of an alert
(1) respectively and is given by:
Chapter 8 159
IDS : X 0, 1
To detect attacks in the incoming trafc, the IDSs are typically parameterized
by a threshold T. The IDS uses a theoretical basis for deciding the thresholds for
analyzing the network trafc to detect intrusions. Changing this threshold al-
lows the change in performance of the IDS. If the threshold is very low, then the
IDS tends to be very aggressive in detecting the trafc for intrusions. However,
there is a potentially greater chance for the detections to be irrelevant which re-
sult in large false alarms. A large value of threshold on the other hand will have
an opposite effect; being a bit conservative in detecting attacks. However, some
potential attacks may get missed this way. Using a 3 based statistical analysis,
the higher threshold (T
h
) is set at +3 and the lower threshold (T
l
) is set at 3.
This is with the assumption that the trafc signals are normally distributed. In
general the trafc detection with s being the sensor output is given by:
Sensor Detection =
_
attack, T
l
< s < T
h
normal, s T
l
, s T
h
Signature-based IDS functions by looking at the event feature x and check-
ing whether it matches with any of the records in the signature database D
b
.
Signature-based IDS : X 1 x D
b
: X 0 x / D
b
Anomaly-based IDS generates alarm when the input trafc deviates from the
established models or proles P
f
.
Anomaly-based IDS : X 1 x / P
f
: X 0 x P
f
8.3.2 Modeling the fusion IDS
Consider the case where n IDSs monitor a network for attack detection and each
IDS makes a local decision s
i
and these decisions are aggregated in the fusion
Chapter 8 160
unit f. This architecture is often referred to as the parallel decision fusion net-
work and is given in Figure 8.1. The fusion unit makes a global decision, s,
about the true state of the hypothesis based on the collection of the local deci-
sions gathered from all the sensors. The problem is casted as a binary detection
IDS2
IDS1
INPUT
.
.
.
.
.
IDSn
FUSION UNIT
OUTPUT (y)
(x)
S1
S2
Sn
Figure 8.1: Parallel Decision Fusion Network
problem with the hypothesis Attack or Normal. Every IDS participating
in the fusion has its own detection rate D
i
, and false positive rate F
i
, due to the
preferred heterogeneity of the sensors in the fusion process. Each IDS indexed i
gives an alert or no-alert indicated by s
i
taking a value one or zero respectively,
depending on the observation x.
s
i
=
_
0, normal is declared to have been detected
1, attack is declared to have been detected
The fusion center collects these local decisions s
i
and forms a binomial dis-
tribution s as given by s =
n
i=1
s
i
, where n is the total number of IDSs that
take part in fusion.
Theorem 1
The output of a binary fusion unit is decided by a function f given by:
f : s
1
x s
2
..... x s
n
x x 0, 1, where the decisions of the individual de-
tectors given by s
i
are deterministic and the data x is a random parameter.
Lemma 1
The decision rule used by each of the individual detectors is deterministic and
Chapter 8 161
can be expressed as a function f
i
given by:
f
i
: s
i
0, 1 dened as:
f
i
(x
j
) =
_
0, if p(s
j
i
= 0[x
j
) = 1
1, otherwise
where j corresponds to the class of the network trafc on which the fusion
rule as well as the respective sensor outputs depend. Since fusion center makes
the nal decision, the assumption is made that the output of the fusion rule is
binary, i.e., either Normal or Attack. It is the same case with all the individual
IDSs: each IDS classies the incoming trafc as Normal or Attack.
8.3.3 Statement of the problem
The problem statement is dened in the following steps:
The random variable x represents the observation to be made. This obser-
vation belongs to either of the two groups of the hypothesis: Normal or
Attack with probabilities p or q = 1 p, respectively.
A set of n IDSs monitors the random variable x and detects the presence
of attack in the trafc. The set of detections by the n sensors is given by
s
1
, s
2
.....s
n
, where s
i
is the output of the IDS indexed i. Each s
i
is a
function of the input x, i.e., s
i
= f
i
(x).
The problem of optimum detection with n IDSs selecting either of the two
possible hypotheses is considered from the decision theory point of view.
The loss function is dened in terms of the decisions made by each IDS
along with the observation and is given by:
: 0, 1 x 0, 1 x .....0, 1 x Normal, Attack R. (8.1)
The average of the loss is then minimized. The objective of the decision
strategy is to minimize the expected penalty (loss function) incurred as:
Chapter 8 162
minE[(s
1
, s
2
.....s
n
, H)], where H is the hypothesis and the minimization
is over the decision rules of each detector.
With (s
1
, s
2
.....s
n
, H) = k being the cost incurred for the IDS
1
de-
ciding s
1
, IDS
2
deciding s
2
, and so on. The minimum value of this
cost function occurs when all the sensors make the correct decisions as
(0, 0, ...0, Normal) = (1, 1, ...1, Attack) = 0 and increases to 1 if
any one IDS only is incorrect and so on. Thus the cost function takes the
maximum value of n when all the IDSs are unable to make the correct
decision. This is a trivial case where all cases of n errors is penalized by
the same amount and the function can be reduced by using afne trans-
formations. From the cost matrix of the KDD IDS evaluations [52], it is
clear that (0, s
1
, s
2
.....s
n
, Attack) > (1, s
1
, s
2
.....s
n
, Normal), or it is
more costly for any detector to miss an attack compared to a false alarm,
regardless of the detection of other detectors. The minimization of the loss
leads to sets of coupled inequalities in terms of the likelihood ratio of each
IDS and the decisions made at the other sensors.
As k decreases from 2 to 1, the thresholds would change in a way which
increases the probability of error, as double errors are discounted to single
ones. As k increases from 2, double errors become prohibitively expen-
sive, so it is to be expected that some mechanism will emerge to reduce
their likelihood. Thus, for k varying from 1 to n, there are n solutions to
minimize equation 1, one of which being the global minimum and thus the
optimal threshold pair.
8.3.4 The effect of setting threshold
To detect the attack in the incoming trafc, the IDSs are typically parameterized
with a threshold, T. Changing this threshold allows the change in performance
of the IDS. If the threshold is very large, some potentially dangerous attacks
get missed. A small threshold on the other hand results in more detections,
with a potentially greater chance that they are not relevant. The nal step in the
Chapter 8 163
approach towards solving of the fusion problem is taken by noticing that the de-
cision function f
i
(.) is characterized by the threshold T
i
and the likelihood ratio
(if independence is assumed). Thus the necessary condition for optimal fusion
decision occurs if the thresholds (T
1
, T
2
, ...T
n
) are chosen optimally. However,
this does not satisfy the sufcient condition. These refer to the many local min-
ima, each need to be checked to assure the global minimum.
The counterintuitive results at the individual sensors with the proper choice
of thresholds will be advantageous in getting an optimum value for the fusion
result. They are excellent paradigms for studying distributed decision architec-
tures, to understand the impact of the limitations, and even suggest empirical
experiments for IDS decisions.
The structure of the fusion rule plays a crucial role regarding the overall perfor-
mance of the IDS since the fusion unit makes the nal decision about the state of
the environment. While a few inferior IDSs might not greatly impact the over-
all performance, a badly designed fusion rule can lead to a poor performance
even if the local IDSs are well designed. The fusion IDS can be optimized by
searching the space of fusion rules and optimizing the local thresholds for each
candidate rule. Other than for some simple cases, the complexity of such an
approach is prohibitive due to exponential growth of the set of possible fusion
rules with respect to the number of IDSs. Searching for the fusion rule that
leads to the minimum probability of error is the main bottleneck due to discrete
nature of this optimization process and the exponentially large number of fu-
sion rules. In our experiment we are trying to maximize the true positive rate
by xing the false positive rate at
0
.
0
determines the threshold T by trial and
error. We have noticed that within two or three trials in our case. This is done
with the training data and hence it is done off line.
The computation of thresholds couples the choice of the local decision rules
so that the system-wide performance is optimized, rather than the performance
of the individual detector. This requirement is taken care of by the DD fusion
architecture proposed and discussed in the previous chapter. The weights as-
signed to the individual sensors is determined by the neural network learner.
Chapter 8 164
The neural network learner can be considered as a pre-processing stage to the
fusion unit. The neural network is the most appropriate for weight determi-
nation as it is difcult to dene the rules clearly, mainly when more number
of IDSs are added to the fusion unit. When a record is correctly classied by
one or more detectors, the neural network will accumulate this knowledge as
a weight and with more number of iterations, the weight gets stabilized. This
information is used to ne-tune the fusion unit, since the fusion depends on the
input feature vector. The fusion output is represented as:
y = F
j
(w
j
i
(x
j
, s
j
i
), s
j
i
),
where the weights w
j
i
are dependent on both the input x
j
as well as individual
IDSs output s
j
i
, where the prex j refers to the class label and the sufx i refers
to the IDS index. The fusion unit used gives a value of 1 or 0 depending
on the set threshold being higher or lower than the weighted aggregation of the
IDSs decisions. The fusion unit is optimized using this set up with the proper
weighting to each one of the input to the fusion unit. The individual IDS are op-
timized by the proper choice of the threshold which is decided by the detection-
false alarm trade-off. ROC curves are used to evaluate IDS performance over a
range of trade-offs between detection rate and the false positive rate. Each IDS
will have an operating point in the ROC curve and the optimum operating point
is located at the relatively top-left point. The optimal decision fusion detection
rule is obtained by forming the output of the fusion unit as: y = s =
n
i=1
w
j
i
s
j
i
.
The architecture is independent of the data set and the structures employed, and
can be used with any real valued data set.
8.3.5 Modeling of neural network learner unit
The neural network unit in the data-dependent architecture is a supervised learn-
ing system which learns from a training data set. The training of the neural
network unit by back propagation involves three stages: the feed forward of the
output of all the IDSs along with the input training pattern, which collectively
form the training pattern for the neural network learner unit, the calculation and
the back propagation of the associated error, and the adjustments of the weights.
After the training, the neural network is used for the computations of the feed-
forward phase. Learning can be dened over an input space X, an output space
Chapter 8 165
Y and a loss function . The training data can be specied as (x
i
, y
i
), where
x
i
X and y
i
Y . The output is a hypothesis function f
w
: X Y . f
w
is chosen from a hypothesis space T to minimize the prediction error given by
the loss function. The hypothesis function is that of the neural network and it
represents the non-linear function from the input space X to the output space Y .
It is simple to assume stationarity by assuming the distribution of data points
encountered in the future to be the same as the distribution of the training set.
For simplicity, the DARPA data set is assumed to represent the real time traf-
c pattern distribution. Stationarity allows us to reduce the predictive learning
problem to a minimization of the sum of the loss over the training set.
f
w
= argmin

(f
w
(x
i
), y
i
)
s.t f
w
T & (x
i
, y
i
) s (8.2)
Loss functions are typically dened to be non-negative over all inputs and zero
when f
w
(x
i
) = y
i
.
8.3.6 Dependence on the data and the individual IDSs
Often, the data in the databases is only an approximation of the true data. When
the information about the goodness of the approximation is recorded, the results
obtained from the database can be interpreted more reliably. Any database is
associated with a degree of accuracy, which is denoted with a probability den-
sity function, whose mean is the value itself.
In order to maximize the detection rate it is necessary to x the false alarm
rate to an acceptable value, taking into account the trade-off between the detec-
tion rate and the false alarm rate. The threshold (T) that maximizes the TP
rate
and thus minimizes the FN
rate
is given as:
Chapter 8 166
FP
rate
= P[alert[normal] = P
_
n
i=1
w
i
s
i
T [normal
_
=
0
(8.3)
TP
rate
= P[alert[attack] = P
_
n
i=1
w
i
s
i
T [attack
_
(8.4)
The fusion of IDSs becomes meaningful only when FP FP
i
i and TP
TP
i
i; where FP and TP correspond to the false positives and the true
positives of the fused IDS and FP
i
and TP
i
correspond to the false positives and
the true positives of the individual IDS indexed i. It is required to provide low
value of weight to any individual sensor that is unreliable, hence meeting the
constraint on false alarm as given in equation 3. Similarly, the fusion improves
the TP
rate
as the detectors get weighted according to their performance.
8.3.7 Threshold optimization
Tenney and Sandell in their work [119] establish the optimum strategy that min-
imizes a global cost in the case where the a priori probabilities of the hypothe-
ses, the distribution functions of the local observations, the cost functions, and
the fusion rule are given. They concluded that each local detector is optimally a
likelihood ratio detector but that the computation of the optimum thresholds for
these local detectors is complicated due to cross coupling. The global optimiza-
tion criterion for a distributed detection system would encompass local decision
statistics, local decision thresholds, the fusion center decision statistic, and the
fusion center decision threshold.
For each input trafc observation x, the set of n local thresholds should be
optimized with respect to the probability of error. With a fusion rule given by
a function f, the average probability of error at the fusion unit is given by the
weighted sum of false positive and false negative errors.
P
e
(T, f) = p P(s = 1[Normal) + q P(s = 0[Attack)
where p and q are the respective weights of false positive and false negative
Chapter 8 167
errors.
Assuming independence between the local detectors, the likelihood ratio is
given by:
P(s[Attack)
P(s[Normal)
=
P(s
1
, s
2
, ..., s
N
[Attack)
P(s
1
, s
2
, ..., s
N
[Normal)
=
n
i=1
P(s
i
[Attack)
P(s
i
[Normal)
Theoptimumdecisionruleforthefusionunitfollows :
f(s) = log
P(s[Attack)
P(s[Normal)
Depending on the value of f(s) being greater than or equal to the decision
threshold, T, or less than the decision threshold, T, the decision is made for the
hypothesis as Attack or Normal respectively. Thus the decisions from
the n detectors are coupled through a cost function. It is shown that the optimal
decision is characterized by thresholds as in the decoupled case. As far as the
optimum criterion is concerned, the rst step is to minimize the loss function
of equation 8.1. This leads to sets of simultaneous inequalities in terms of the
generalized likelihood ratios at each detector, the solutions of which determine
the regions of optimum detection.
The model is validated for the data-dependent decision fusion algorithms that
have been developed in this work. The false positive rate
0
is initially set at
an acceptable value of 0.00002 in all the cases. The maximization of the true
positive is achieved. The Table 8.1 shows the enhanced performance of the
different models by appropriate threshold optimization.
8.5 Summary
The sensor fusion techniques works effectively by gathering complementary
information that can improve the overall detection rate without adversely af-
fecting the false alarm rate. Simple theoretical model is initially illustrated in
Chapter 8 168
Fusion method Detection rate Average probability of error
PHAD 0.33 0.39
ALAD 0.41 0.35
Snort 0.61 0.23
DD Fusion 0.72 0.17
DD Fusion with 0.75 0.15
modied evidence theory
Table 8.1: Average probability of error with DD fusion algorithms
Figure 8.2: Average probability of error
this chapter for the purpose of showing the improved performance of IDS using
sensor fusion. The detection rate and the false positive rate quantify the perfor-
mance benet obtained through the optimization. The theoretical model is also
validated.
Chapter 9
Conclusions
Whether you think you can, or that you cant, you are usually right.
Henry Ford
Effective intrusion detection is a critical component of cyber infrastructure as
it is in the forefront of the battle against cyber-terrorism. The ingress trafc
to a network can be used for network intrusion detection as well as prevention
whereas the egress trafc can detect malware on a corporate network. The in-
dividual approaches to intrusion detection for the sake of network security, as
presented in literature, provide a partial solution to the overall issue of identi-
fying all the possible attacks on the network. Each intrusion detection system
is tailored to provide detection of a particular attack class. However, none of
the available IDSs can offer the full protection of the sensor fusion approach,
drawing from the strengths of all individual systems that take part in fusion to
surmount their respective weaknesses in a symbiotic manner.
In this thesis, it is brought out that relatively high probabilities of intrusion de-
tection, at acceptable false alarmlevels, can potentially be achieved in inclement
monitoring, heavy trafc, and sophisticated attack environments. Decisions ac-
quired from multiple sensor systems are more likely to be independent when
they look at entirely different parameters of the trafc. These independent de-
cisions from multiple sensors expand the total information that can be gathered
about a particular connection and aids in effective sensor fusion.
169
Chapter 9 170
In this thesis, we presented a framework for the performance enhancement of in-
trusion detection systems using advances in sensor fusion. The data-dependent
decision fusion architecture has been implemented and the results of implemen-
tation has been observed to be better than that of the individual detectors that
take part in fusion, thus validating the approach. We have also demonstrated the
importance of detecting the rarer and the most signicant attacks. The fusion al-
gorithm used modied evidence theory, which aids in providing the better than
the best protection. This chapter presents the conclusions drawn from this thesis
work and discusses the directions for future research.
A fair performance comparison of the proposed architecture is almost impossi-
ble due to the fact that each detection scheme is constructed for detection of a
specic class of attack. As an illustration of this point we observe three IDSs,
namely PHAD, ALAD and Snort as given in chapter 3. We claim that each
IDS is designed and developed for a specic class of attack. PHAD exhibits
superior performance in detecting the probes and the DoS attacks. On the other
hand, it exhibits sub-optimal performance in detecting the attacks belonging to
R2L and U2R classes. Similarly, the other IDSs also have their own preferences
for attack detection. But, the proposed algorithm claims a better than the best
protection since given the best IDSs, the fusion results in a detection better than
the best detector if the detectors are uncorrelated. Otherwise, as the worst case,
atleast the performance of the best IDS results from fusion.
The results presented in chapters ve to eight show that the data-dependent
decision fusion using the modied evidence theory was successful in generat-
ing an accurate empirical behavioral model from training data and then could
apply this empirical knowledge to data never seen before. Starting with three
single IDSs, the performance of attack detection was enhanced through various
sensor algorithms developed in this work. The nal model developed has high
overall accuracy level, which showed both a high detection rate of 0.75 and
an extremely low false positive rate of 0.00002. The F-score obtained is 0.66.
From these results, it was concluded that data-dependent decision fusion using
Chapter 9 171
modied evidence theory has evolved in this thesis work as a viable method for
empirical model generation for intrusion detection. The improved performance
of the IDS that has been demonstrated, if deployed fully would contribute to
53% reduction of the successful attacks in two years and 66% reduction of the
successful attacks in four years. This is a right step towards making the cyber
space safer over a period of time with the proper deployment of highly efcient
and sophisticated detectors .
9.2 Future work
This thesis has explored the feasibility of extracting information about the be-
havior of a network system that is more complete and reliable than any data
that had been available before as decisions of various intrusion detection sys-
tems. This availability opens multiple possibilities for future exploration and
research, and may lead to the design and development of more efcient, reli-
able and effective intrusion detection systems. Some of these possibilities are
listed below:
In this thesis the fusion architecture has been developed in order to offer
better than the best protection, which is a requirement for the future se-
curity solutions. The future improvements in individual IDSs can also be
easily incorporated in this technique. The approach developed in this thesis
is supposed to nd applications in defense-in-depth security architectures.
The main reason for using the DARPA 1999 data sets in majority of our
evaluations was the need of relevant data that can easily be shared with
other researchers, allowing them to duplicate and improve our results. The
common practice in intrusion detection to claim good performance with
real-world network trafc makes it difcult to verify and improve previous
research results, as the trafc is never quantied or released for privacy
concerns.
As a future work, the fusion IDS can be made more efcient by incorpo-
rating more number of individual IDSs, since it can be easily proved that
the more the number of individual components that make the fusion IDS,
the better the fusion IDS performs. Different classes of intrusion detection
Chapter 9 172
systems like signature-based, anomaly based, ow-based and packet-based
are all to be included for the purpose of enhanced fusion output. The ar-
chitecture developed in this thesis can be easily expanded by adding any
number of new IDSs to the fusion unit.
Multiple threshold levels were set for each of the IDSs that are components
of the fusion process illustrated in this work. We approximated the output
of the IDSs to take the binary values: zero for deciding on normal and one
for deciding on attack. This simplication can be avoided in the future
work. The severity or anomaly score can be normalized and multiplied
with the respective weights and used as the basic probability assignments
(bpa).
One of the intriguing properties of all the sensor fusion algorithms is its
ability to invent new features that are not explicit in the input to the unit.
In particular, it learns to represent intermediate features that are useful
for learning the target function and that are only implicit in the input to
the unit. With the increasing incidents of cyber attacks, building effective
intrusion detection models with good accuracy and real-time performance
are essential. More data mining techniques should hence be investigated
for a more effective feature extraction.
9.3 Summary
This thesis discussed the assertion that it is possible to perform intrusion detec-
tion for both rare and new attacks using advances in sensor fusion. The previous
chapters described the theoretical and experimental work done to show its va-
lidity and the results of experiments provide evidence in support of this thesis.
The experiments emphasize proof-of-concept demonstrating the viability of the
technique and also its efciency in comparison to the existing methods. The
proposed approaches are shown to signicantly improve detection rate and re-
duce the false alarm rate and hence results in an acceptable and usable intrusion
detection system.
While experimenting with a simple treatment for an enhanced intrusion de-
tection, it was found that the data-dependent decision fusion using modied
Chapter 9 173
evidence theory was highly successful. Hence, it was worthwhile to devise in-
vestigations of other applications. This is suggested since it is often useful to
cast the net a bit wider to give the argument presented in this thesis a further
support, or a comparative focus.
Appendix A
Attacks on the Internet: A study
In all science, error precedes the truth, and it is better it should go rst than
last.
Hugh Walpole
A.1 Introduction
An attack is realization of threat, the harmful action aiming to nd and exploit
the system vulnerability. Computer attacks may involve destroying or access-
ing data, subverting the computer or degrading its performance. Traditionally
attacks on computers have included methods such as viruses, worms, buffer-
overow exploits and denial of service attacks. Network attacks on the other
hand are mostly attacks on computers that use a network in some way. A net-
work could be used to send the attack (such as a worm), or it could be the
means of attack (such as Distributed Denial of Service attack). In general, net-
work attacks are a subset of computer attacks. However, there are several types
of network attacks that do not attack computers, but rather the network they are
attached to. Flooding a network with packets does not attack an individual com-
puter, but clogs up the network. Although a computer may be used to initiate the
attack, both the target and the means of attacking the target are network related.
There are lots of known computer system attack classications, and taxonomies
available in literature [120, 15, 122, 123, 125, 126, 127, 128].
Howard [120] classied attacks according to Attackers, Tools, Access, Results
174
Appendix A 175
and Objectives. In the Ph.D. thesis of Kumar [15], he has introduced a classi-
cation based on attack signatures used within the IDS IDIOT. This classication
is based on the type of observation required to detect a given attack. Lindqvist
and Jonsson [122] presented an attack taxonomy using two dimensions of an
attack. Probably one of the best known taxonomies is the Defence Advanced
Projects Agency (DARPA) attack taxonomy. This taxonomy was developed in
1998 for classifying attacks in order to simplify the process of evaluating IDSs
[123]. Work done by Chris Rodgers [124] covers many computer and network
attacks with regards to TCP/IP networking. His research was carried out in
2001 and provides a good overview of the threats and attacks that face TCP/IP
networking, as well as attacks such as viruses, worms, trojans and denial of ser-
vice attacks.
In the most widely used open source network intrusion prevention and detec-
tion system, namely the Snort, attack classication is based on its impact on the
computer system. The attacks whose effect is the most critical have the highest
priority. The priority levels are divided into high, medium and low ones. High-
level priority attacks are such as the attempted administrator privilege gain, the
network Trojan, or the web application attack. Medium priority attacks are
the Denial of service (DoS) attacks, a nonstandard protocol or event, potentially
bad trafc, attempted log-in using a suspicious user etc. Low-level priority
attacks are the ICMP event, the network scan, the generic protocol command
etc.[130].
A.2 History of Internet attacks
Computer and network attacks have evolved greatly over the last few decades.
The attacks are increasing in number and also improving in their strength and
sophistication. Figure A.1 is the well celebrated plot by Julia Allen, which
shows this trend and also some of the trends in the history of attacks. A few of
the developments in the history of computer and network attacks are discussed
below.
In 1978, the concept of a worm [131] was invented by researchers at the Xerox
Appendix A 176
Figure A.1: Plot of Attack sophistication vs Intruder Knowledge over the years
Palo Alto Research Center: The Morris Worm [132]. The rst viruses were
released in 1981, among them Apple Viruses 1, 2 and 3 which targeted the Ap-
ple II operating system. In 1983, Fred Cohen was the rst person to formally
introduce the term computer virus in his thesis [133], which was published in
1985. More recently, new attacks such as denial of service (DoS) (mid 1990s),
distributed DoS (DDoS) attacks (in 1999), botnets and storm botnets have been
developed. Two major recent developments in computer and network attacks
are blended attacks and information warfare. The blended attacks rst appeared
in 2001 with the release of Code Red [134] and then followed by Nimda [135],
Slammer [136] and Blaster [137]. Blended attacks contain two or more attacks
merged together to produce a more potent attack.
A.3 Attack motivation and objectives
Attack motivation can be understood by identifying what the attackers do and
how they can be classied. Icove, et al.[141], present a simple classication of
attackers as hackers, criminals (spies, terrorists, corporate raiders, professional
Appendix A 177
criminals) and vandals. The main motivation of a hacker is to access to a system
or data; the main motivation of the criminal is nancial or political gain; and the
main motivation of the vandal is to damage. In the thesis work of Howard [142]
the problem with classifying attackers into the three categories is highlighted,
with all the three categories describing criminal behavior.
The incidents of cyberattacks that were serious and harmful in nature, can be
seen to be motivated by political and social reasons as pointed out by Denning[143].
The potential threat of cyberterrorism becoming unavoidable is due to the criti-
cal infrastructures that are potentially vulnerable and studies show that the vul-
nerabilities were steadily increasing, while the costs of attack were decreasing.
The statistics of attacks in the recent years appear in the web site for Web Server
Intrusion Statistics [144].
A.4 Attack taxonomy
There are various classications of Internet attacks namely,
by the goal of the attacker to
by the effect on system like
by the operating system on the target host
by the attacked service
Some of the very commonly encountered attacks are listed below:
A.4.1 Viruses
Viruses are self-replicating programs that infect and propagate through les.
Usually they will attach themselves to a le, which will cause them to be run
when the le is opened. There are several main types of viruses as identied in
the thesis of Rodgers [124], which are examined below.
Appendix A 178
File infectors
File infector viruses infect les on the victims computer by inserting themselves
into a le. Usually the le is an executable le, such as a .EXE or .COM in
Windows. When the infected le is run, the virus executes as well.
System and boot record infectors
System and boot record infectors were the most common type of virus until the
mid 1990s. These types of viruses infect system areas of a computer such as the
Master Boot Record (MBR) on hard disks and the DOS boot record on oppy
disks. By installing itself into boot records, the virus can run itself every time
the computer is booted up.
Macro viruses
Macro viruses are simply macros for popular programs, such as Microsoft Word,
that are malicious. For example, they may delete information from a document
or insert phrases into it. Propagation is usually through the infected les. If a
user opens a document that is infected, the virus may install itself so that any
subsequent documents are also infected. Often the macro virus will be attached
as an apparently benign le to fool the user into infecting themselves. The
Melissa virus [145] is the best known macro virus. The virus worked by email-
ing a victim with an email that appeared to come from an acquaintance. The
email contained an Microsoft Word document as an attachment, that if opened,
would infect Microsoft Word and if the victim used the Microsoft Outlook 97
or 98 email client, the virus would be forwarded to the rst 50 contacts in the
victims address book. Melissa caused a signicant amount of damage, as the
email sent by the virus ooded email servers.
Virus properties
Viruses often have additional properties, beyond being an infector or macro
virus. A virus may also be multi-partite, stealth, encrypted or polymorphic.
Multi-partite viruses are hybrid viruses that infect both les and system and/or
boot-records. This means multi-partite viruses have the potential to be more
Appendix A 179
damaging, and resistant. A stealth virus is one that attempts to hide its pres-
ence. This may involve attaching itself to les that are not usually seen by the
user. Viruses can use encryption to hide their payload. A virus using encryption
will know how to decrypt itself to run. As the bulk of the virus is encrypted, it
is harder to detect and analyze. Some viruses have the ability to change them-
selves as either time goes by, or when they replicate themselves. Such viruses
are called polymorphic viruses. Polymorphic viruses can usually avoid being
eradicated longer than other types of viruses as their signature changes.
A.4.2 Worms
A worm is a self-replicating program that propagates over a network in some
way. Unlike viruses, worms do not require an infected le to propagate. There
are two main types of worms: mass-mailing worms and network-aware worms.
Mass-mailing Worms
A mass-mailing worm is a worm that spreads through email. Once the email
has reached its target it may have a payload in the form of a virus or trojan.
Network-aware Worms
Network-aware worms generally follow a four stage propagation model. The
rst step is target selection. The compromised host3 targets a host. The com-
promised host then attempts to gain access to the target host by exploitation.
Once the worm has access to the target host, it can infect it. Infection may
include loading trojans onto the target host, creating back doors or modifying
les. Once infection is complete, the target host is nowcompromised and can be
used by the wormto continue propagation. Examples are Blaster, SQL Slammer
etc.
A.4.3 Trojans
Trojans appear to be benign programs to the user, but will actually have some
malicious purpose. Trojans usually carry some payload such as remote ac-
cess methods, viruses and data destruction. Trojans provide a back door for
Appendix A 180
the malicious attacker and gives them the following abilities: Session logging,
Keystroke logging, File transfer, Program installation, Remote rebooting, Reg-
istry editing, and Process management.
Logic bombs
Logic bombs are a special form of trojans that only release their payload once a
certain condition is met. If the condition is not met, the logic bomb behaves as
the program it is attempting to simulate.
A.4.4 Buffer overows
Buffer overows are probably the most widely used means of attacking a com-
puter or network. They are rarely launched on their own, and are usually part of
a blended attack. Buffer overows are used to exploit awed programming, in
which buffers are allowed to be overlled. If a buffer is lled beyond its capac-
ity, the data lling it can then overow into the adjacent memory, and then can
either corrupt data or be used to change the execution of the program. There are
two main types of buffer overows described below.
Stack buffer overow
A stack is an area of memory that a process uses to store data such as local
variables, method parameters and return addresses. Often buffers are declared
at the start of a program and so are stored in the stack. Each process has its
own stack, and its own heap. Overowing a stack buffer was one of the rst
types of buffer overows and is one that is commonly used to gain control of
a process. In this type of buffer overow, a buffer is declared with a certain
size. If the process controlling the buffer does not make adequate checks, an
attacker can attempt to put in data that is larger than the size of the buffer. An
attacker may place malicious code in the buffer. Part of the adjacent memory
will often contain the pointer to the next line of code to execute. Thus, the buffer
overow can overwrite the pointer to point to the beginning of the buffer, and
hence the beginning of the malicious code. Thus, the stack buffer overow can
give control of a process to an attacker.
Appendix A 181
Heap overows
Heap overows are similar to stack overows but are generally more difcult to
create. The heap is similar to the stack, but stores dynamically allocated data.
The heap does not usually contain return addresses like the stack, so it is harder
to gain control over a process than if the stack is used. However, the heap
contains pointers to data and to functions. A successful buffer overow will
allow the attacker to manipulate the processs execution. An example would be
to overow a string buffer containing a lename, so that the lename is now an
important system le. The attacker could then use the process to overwrite the
system le (if the process has the correct privileges).
A.4.5 Denial of Service attacks
Denial of Service (DoS) attacks [146], sometimes known as nuke attacks, are
designed to deny legitimate users of a systemfromaccessing or using the system
in a satisfactory manner. DoS attacks usually disrupt the service of a network or
a computer, so that it is either impossible to use, or its performance is seriously
degraded. There are three main types of DoS attacks: host based, network based
and distributed.
Host-based DoS
Host based DoS attacks aim at attacking computers. Either a vulnerability in
the operating system, application software or in the conguration of the host
are targeted. Resource hogging is a possible way of DoS on a host. Resources
such as CPU time and memory use are the most common targets. Crashers are
a form of host based DoS that are simply designed to crash the host system,
so that it must be restarted. Crashers usually target a vulnerability in the hosts
operating system. Many crashers work by exploiting the implementation of net-
work protocols by various operating systems. Some operating systems cannot
handle certain packets, and if received cause the operating system to hang or
crash.
Appendix A 182
Network-based DoS
Network based DoS attacks target network resources in an attempt to disrupt
legitimate use. Network based DoS usually ood the network and the target
with packets. To succeed in ooding, more packets than the target can handle
must be sent, or if the attacker is attacking the network, enough packets must
be ooded so that the bandwidth left for legitimate users is severely reduced.
Three main methods of ooding have been identied:
TCP Floods: TCP packets are streamed to the target.
ICMP Echo Request/Reply6: ICMP packets are streamed to the target.
UDP Floods: UDP packets are streamed to the target.
In addition to a high volume of packets, often packets have certain ags set to
make them more difcult to process. If the target is the network, the broadcast
address7 of the network is often targeted. One simple way of reducing network
bandwidth is through a ping ood. Ping oods can be created by sending ICMP
request packets of a large size to a large number of addresses (perhaps through
the broadcast address) at a fast rate.
Distributed DoS
Distributed DoS (DDoS) attacks are a recent development in computer and net-
work attack methodologies. The DDoS attack methodology was rst seen in
1999 with the introduction of attack tools such as The DoS Projects Trinoo[36,
21], The Tribe Flood Network[1, 21] and Stacheldraht8[37]. DDoS attacks
work by using a large number of attack hosts to direct a simultaneous attack on
a target or targets. Figure A.2 shows the process of a DDoS attack. Firstly, the
attacker commands the master nodes to launch the attack. The master nodes
then order all daemon nodes under them to launch the attack. Finally the dae-
mon nodes attack the target simultaneously, causing a denial of service. With
enough daemon nodes, even a simple web page request will stop the target from
serving legitimate user requests.
The DDoS attack takes place when many compromised machines infected by
the malicious code act simultaneously and are coordinated under the control of
a single attacker in order to break into the victims system, exhaust its resources,
Appendix A 183
Figure A.2: Distributed Denial of Service
and force it to deny service to its customers. There are mainly two kinds of
DDoS attacks[ 10]: typical DDoS attacks and distributed reector DoS (DR-
DoS) attacks. In a typical DDoS attack, the army of the attacker consists of
master zombies and slave zombies. The hosts of both categories are compro-
mised machines that have arisen during the scanning process and are infected by
malicious code. The attacker coordinates and orders master zombies and they,
in turn, coordinate and trigger slave zombies. More specically, the attacker
sends an attack command to master zombies and activates all attack processes
on those machines, which are in hibernation, waiting for the appropriate com-
mand to wake up and start attacking. Then, master zombies, through those
processes, send attack commands to slave zombies, ordering them to mount a
DDoS attack against the victim. In that way, the agent machines (slave zom-
bies) begin to send a large volume of packets to the victim, ooding its system
with useless load and exhausting its resources.
Unlike typical DDoS attacks, in DRDoS attacks the army of the attacker con-
sists of master zombies, slave zombies, and reectors[11]. The scenario of this
Appendix A 184
Figure A.3: Distributed Reector DoS (DRDoS)
type of attack is the same as that of typical DDoS attacks up to a specic stage.
The attackers have control over master zombies, which, in turn, have control
over slave zombies. The difference in this type of attack is that slave zom-
bies are led by master zombies to send a stream of packets with the victims
IP address as the source IP address to other uninfected machines (known as
reectors), exhorting these machines to connect with the victim. Then the re-
ectors send the victim a greater volume of trafc, as a reply to its exhortation
for the opening of a new connection, because they believe that the victim was
the host that asked for it. Therefore, in DRDoS attacks, the attack is mounted
by noncompromised machines, which mount the attack without being aware of
the action. Comparing the two scenarios of DDoS attacks, we should note that a
DRDoS attack is more detrimental than a typical DDoS attack. This is because
a DRDoS attack has more machines to share the attack, and hence the attack
is more distributed. A second reason is that a DRDoS attack creates a greater
volume of trafc because of its more distributed nature. Figure A.3 graphically
depicts a DRDoS attack. The general taxonomy of the DDoS is shown in gure
Appendix A 185
Figure A.4: Taxonomy of Distributed DoS
A.4.
A.4.6 Network-based attacks
This section describes several kinds of attacks that operate on networks and
the protocols that run the networks. Network spoong is the process in which
an attacker passes themselves off as someone else. There are several ways of
spoong in the standard TCP/IP network protocol stack, including: MAC ad-
dress spoong at the data-link layer and IP spoong at the network layer. By
spoong who they are, an attacker can pretend to be a legitimate user or can
manipulate existing communications from the victim host.
Session Hijacking Session hijacking is the process by which an attacker takes
over a session taking place between two victim hosts. The attack essentially
cuts in and takes over the place of one of the hosts. Session hijacking usually
takes place at the TCP layer, and is used to take over sessions of applications
such as Telnet and FTP. TCP session hijacking involves use of IP spoong, as
mentioned above, and TCP sequence number guessing. To carry out a success-
ful TCP session hijacking, the attacker will attempt to predict the TCP sequence
number that the session being hijacked is up to. Once the sequence number has
been identied, the attacker can spoof their IP address to match the host they are
cutting out and send a TCP packet with the correct sequence number. The other
Appendix A 186
host will accept the TCP packet, as the sequence number is correct, and will start
sending packets to the attacker. The cut out host will be ignored by the other
host as it will no longer have the correct sequence number. Sequence number
prediction is most easily done if the attacker has access to the IP packets passing
between the two victim hosts. The attacker simply needs to capture packets and
analyze them to determine the sequence number. If the attacker does not have
access to the IP packets, then the attacker must guess the sequence number.
A.4.7 Password attacks
An attacker wishing to gain control of a computer, or a users account, will often
use a password attack to gain the needed password. Many tools exist to help
the attacker uncover passwords. Password Guessing/Dictionary Attack Pass-
word guessing is the most simplest of password attacks. It simply involves the
attacker attempting to guess the password. Often the attacker will use a form
of social engineering to gain clues as to what the password is. A dictionary
attack is similar, but is a more automated attack. The attacker uses a dictio-
nary of words containing possible passwords and uses a tool to see if any are
the required password. Brute force attacks work by calculating every possi-
ble combination that could make up a password and testing it to see if it is the
correct password.
A.4.8 Information gathering attacks
The attack process usually involves information gathering. Information gather-
ing is the process by which the attacker gains valuable information about po-
tential targets, or gains unauthorized access to some data without launching an
attack. Information gathering is passive in the sense that no attacks are explic-
itly launched. Instead networks and computers are sniffed, scanned and probed
for information.
Snifng
Packet sniffers are a simple but invaluable tool for anyone wishing to gather
information about a network or computer. For the attacker, packet sniffers pro-
vide a way to glean information about the host or person they wish to attack,
Appendix A 187
and even gain access to unauthorized information. Traditional packet sniffers
work by putting the attackers Ethernet card into promiscuous mode. An Ether-
net card in promiscuous mode accepts all trafc from the network, even when
a packet is not addressed to it. This means the attacker can gain access to any
packet that is traversing on the network they are on. By gathering enough of the
right packets the attacker can gain information such as login names and pass-
words. Other information can also be gathered, such as MAC and IP addresses
and what services and operating systems are being run on specic hosts. This
form of attack is very passive. The attacker is not sending any packets out, they
are only listening to packets on the network.
Mapping
Mapping is used to gather information about hosts on a network. Information
such as what hosts are online, what services are running and what operating
system a host is using, can all be gathered via mapping. Thus potential targets
and the layout of the network, are identied. Host detection is achieved through
a variety of methods. Simple ICMP queries can be used to determine if a host
is on-line. TCP SYN messages can be used to determine whether or not a port
on a host is open and thus, whether or not the host is on-line. After detecting
if a host is on-line, mapping tools can be used to determine what operating
system and what services are running on the host. Running services are usually
identied by attempting to connect to a hosts ports. Port scanners are programs
that an attacker can use to automate this process. Basic port scanners work
by connecting to every TCP port on a host and reporting back which ports were
open. Either the attacker has to choose an attack using the information gathered,
or more information needs to be gathered through security scanning, discussed
below.
Security scanning
Security scanning is similar to mapping, but is more active and more informa-
tion is gathered. Security scanning involves testing a host for known vulnera-
bilities or weaknesses that could be exploited by the attacker. For example, a
security scanning tool may be able to tell the attacker that port 80 of the target
is running an HTTP server, with a specic vulnerability.
Appendix A 188
A.4.9 Blended attacks
While blended attacks are not a new development, they have recently become
popular with attacks such as Code Red and Nimda. Blended attacks are at-
tacks that contain multiple threats, for example multiple means of propagation
or multiple attack payloads. Many of the attacks mentioned previously in this
appendix can be considered as blended. The rst instance of a blended attack
occurred in 1988 with the rst Internet worm named as Morris Worm. The
Internet is especially susceptible to blended threats, as was shown by the re-
cent SQL Slammer attack, in which the Internet suffered a signicant loss of
performance.
A.5 Top ten cyber security menaces for 2008
A list of the attacks most likely to cause substantial damage during 2008 was
compiled by experts [138] in ranked order is provided below:
1. Increasingly sophisticated web site attacks that exploit browser vulnerabil-
ities - especially on trusted web sites
Web site attacks on browsers are increasingly targeting components, such
as Flash and QuickTime, that are not automatically patched when the
browser is patched. Placing better attack tools on trusted sites is giving
attackers a huge advantage over the unwary public.
2. Increasing sophistication and effectiveness in botnets
The Storm worm started spreading in January, 2007 with an email saying,
230 dead as storm batters Europe, and was followed by subsequent vari-
ants. Within a week it accounted for one out of every twelve infections on
the Internet, installing rootkits and making each infected system a member
of a new type of botnet. Previous botnets used centralized command and
control; the Storm worm uses peer-to-peer control.
3. Cyber espionage efforts by well resourced organizations looking to extract
large amounts of data - particularly using targeted phishing
Appendix A 189
Economic espionage will be increasingly common as nation-states use cy-
ber theft of data to gain economic advantage in multinational deals. The
attack of choice involves targeted spear phishing with attachments, using
well-researched social engineering methods to make the victimbelieve that
an attachment comes from a trusted source, and using newly discovered
Microsoft Ofce vulnerabilities and hiding techniques to circumvent virus
checking.
4. Mobile phone threats, especially against iPhones and android-based phones;
plus VOIP
Mobile phones are general purpose computers, so worms, viruses, and
other malware will increasingly target them. A truly open mobile platform
will usher in completely unforeseen security nightmares. The developer
toolkits provide easy access for hackers.
Attacks on VoIP systems are on the horizon and may surge in the coming
years. VoIP phones and the IP PBXs have had numerous published vul-
nerabilities. Attack tools exploiting these vulnerabilities have been written
and are available on the Internet.
5. Insider attacks
Insider attacks are initiated by rogue employees, consultants and/or con-
tractors of an organization. Insider-related risk has long been exacerbated
by the fact that insiders usually have been granted some degree of physical
and logical access to systems, databases, and networks that they attack,
giving them a signicant head start in attacks that they launch. More re-
cently, however, security perimeters have broken down, something that
allows insiders to attack both from the inside and from outside an organi-
zations network boundaries.
6. Advanced identity theft from persistent bots
A new generation of identity theft is being powered by bots that stay
on machines for three to ve months collecting passwords, bank account
Appendix A 190
information, surng history, frequently used email addresses, and more.
They gather enough data to enable extortion attempts and advanced iden-
tify theft attempts where criminals have enough data to pass basic security
checks.
7. Increasingly malicious Spyware
Tools that also increasingly target and dodge anti-virus, anti-spyware, and
anti-rootkit tools to help preserve the attackers control of a victimmachine
for as long as possible will become more common.
8. Web application security exploits
Large percentages of web sites have cross site scripting, SQL injection, and
other vulnerabilities resulting from programming errors. Web 2.0 applica-
tions are vulnerable because user-supplied data cannot be trusted; your
script running in the users browser still constitutes user supplied data.
9. Increasingly sophisticated social engineering including blending phishing
with VOIP and event phishing
Blended approaches will amplify the impact of many more common at-
tacks. For example, the success of phishing is being radically increased
by rst stealing IDs of users of other technologies. Tax ling scams and
scams based on the U.S. Presidential elections were widely used this year,
and many of them have succeeded. A second area of blended phishing
combines email and VoIP. An inbound email, apparently being sent by a
credit card company, asks recipients to re-authorize their credit cards by
calling a 1-800 number. The number leads them (via VoIP) to an auto-
mated system in a foreign country that, quite convincingly, asks that they
key in their credit card number, CVV, and expiration date.
10. Supply chain attacks infecting consumer devices (USB Thumb Drives,
GPS Systems, Photo Frames, etc.)
Distributed by Trusted Organizations Retail outlets are increasingly be-
coming unwitting distributors of malware. Devices with USB connections
Appendix A 191
and the CDs packaged with those devices sometimes contain malware that
infect victims computers and connect them into botnets.
A.6 Conclusion
Even though a lot of attacks have been listed above, many of the terms tend not
to be mutually exclusive. For example, the virus may contain a logic bomb, so
that the categories overlap. Also, any successful attack may get classied into
multiple categories since attackers use multiple methods. This makes the clas-
sication ambiguous and difcult to repeat.
We conclude by stating Cohen [133], ...a complete list of the things that can go
wrong with the information systems is impossible to create. People have tried
to make comprehensive lists, and in some cases have produced encyclopedic
volumes on the subject, but there are a potentially innite number of different
problems that can be encountered, so any list can only serve a limited purpose.
Appendix B
Intrusion Detection Systems: A survey
If I have been able to see farther than others, it was because I stood on the
shoulder of giants.
Sir Isaac Newton
B.1 Introduction
Intrusion Detection is a rapidly evolving and changing technology. Even though
the blooming took place in the early 1980s, all the early intrusion detection work
was done as research projects for US government and military organizations.
The major works in intrusion detection has happened in the mid and late 1990s
along with the explosion of the Internet. The early research work in the eld of
intrusion detection often focused on host-based solutions, but the drastic growth
of networking changed the later efforts to be concentrated on network-based
systems. The tools discussed here reect a core of active research that has
happened in the last two decades. Several surveys have indeed been published
in the past [147, 148, 149, 150, 152, 155], but the growth of IDSs has been
such that a lot of IDSs have appeared in the meantime. This survey hence tries
to present an updated view by starting with the historical developments in the
eld of intrusion detection from the perspective of the people who did the initial
research and development and their projects, providing us with a better insight
into the motivation behind it.
192
Appendix B 193
B.2 History of Intrusion Detection Systems
James P Anderson is acknowledged as the rst person to document the need for
automated audit trail review to support security goals for the US Department of
Defense in 1978. He published the Reference Monitor concept Computer Secu-
rity Technology Planning Study 2 in a planning study for US Air Force and this
report is considered to be the seminal work on intrusion detection. Anderson
also published a paper Computer Security Threat Monitoring and Surveillance
[154] in 1980 and this is widely considered to be the rst real work in the area
of intrusion detection. The paper proposes taxonomy of classifying internal and
external threats to computer systems. He points out that when a violation oc-
curs, in which the attacker attains the highest level of privilege, such as root
or super user in UNIX, there is no reliable remedy. He also comments on the
problems associated with masqueraders for which he suggests that some sort of
statistical analysis of user behavior, capable of determining unusual patterns of
system use, might represent a way of detecting masqueraders. This suggestion
was tested in the next milestone in Intrusion detection, the IDES project.
The US Navys Space and Naval Warfare Systems Commands (SPAWARS) in
1984 funded a project to research and develop a model for a real-time intrusion
detection system and Dorothy Denning and Peter Neumann came up in 1988
with the Intrusion Detection Expert System (IDES) model. The rare or unusual
traces of trafc were referred to as anomalous and the assumptions made in this
project served as the basis for many intrusion detection research and system
prototypes of the late 1980s. The IDES model is based on the use statistical
metrics and models to describe the behavior of benign users. The IDES pro-
totype used hybrid architecture, comprising an anomaly detector and an expert
system. The anomaly detector used statistical techniques to characterize abnor-
mal behavior. The expert system used a rule-based approach to detect known
security violations. The expert system was included to mitigate the risk that a
patient intruder might gradually change his behavior over a period of time to
defeat the anomaly detector. This situation was possible because the anomaly
detector adapted to gradual changes in behavior to minimize false alarms.
Dennings paper on An Intrusion Detection Model [7] in 1986 illustrates the
Appendix B 194
model of a real-time intrusion-detection expert system capable of detecting
break-ins, penetrations, and other forms of computer abuse. The model is based
on the hypothesis that security violations can be detected by monitoring a sys-
tems audit records for abnormal patterns of system usage. The model includes
proles for representing the behavior of subjects with respect to objects in terms
of metrics and statistical models, and rules for acquiring knowledge about this
behavior from audit records and for detecting anomalous behavior. The model
is independent of any particular system, application environment, system vul-
nerability, or type of intrusion, thereby providing a framework for a general-
purpose intrusion-detection expert system. This paper is considered to be the
stepping-stone for all the further works in this eld. In the following years, an
ever-increasing number of research prototypes are explored. Several of these
efforts will be looked at in brief and more details are available in [182].
B.2.1 The emergence of intrusion detection systems
In 1984, the US Navys SPAWARS funded a research project Audit Analysis at
Sytek and the prototyped system utilized data collected at shell level of a UNIX
machine running in a research environment. The data was then analyzed by
using database tools. This research helped in identifying the normal system us-
age from the abnormal system usage. The researchers were Lawrence Halme,
Teresa Lunt and John Van Horne.
In 1985 an internal research and development project named Discovery started
at TRW and this monitored the TRWs online credit database application and
not the operating system for intrusions and misuse. Discovery used a statistical
engine to locate patterns in the input data and an expert system detecting and
deterring problems in TRWs online credit database. The principal investigator
was William Tener. Haystack [158] was developed for the US Air Force in 1988
to help security ofcers detect insider abuse of Air Force Standard Base Level
Computers. Haystack was implemented on an Oracle database management
system and performed anomaly detection in batch mode. Haystack character-
ized the information from system audit trails as sets of features like session
duration, number of les opened, number of pages printed, number of CPU re-
sources consumed in the session and the number of sub processes created in the
Appendix B 195
session. It used a two-stage statistical analysis to detect anomalies in system ac-
tivities. The rst stage checked each session for unusual activity and the second
stage used a statistical test to detect trends in sessions. The combination of the
two techniques was designed to allow detection of both out-of-bounds activities
as well as activities that gradually deviated from normal over a period of time.
The principal investigators were Smaha and Stephen.
Almost the same time, Multics Intrusion Detection and Alerting System (MI-
DAS) was developed by the National Computer Security Center to monitor
NCSCs Dockmaster system, which is a highly secure operating system. The
MIDAS was designed to take data from Dockmasters answering system audit
log and used a hybrid analysis strategy, combining statistical anomaly detection
with expert system rule-based approaches. In 1989, Wisdom and Sense from
Los Alamos National Laboratory and Information Security Ofcers Assistant
(ISOA) from Planning Research Corporation were developed.
In 1990 Kerr and Susan reported all the experimental as well as actually im-
plemented IDSs in the Datamation report titled Using AI to improve security.
In the same year, an audit trail analysis tool Computer Watch was developed
by AT&T and was designed to consume operating system audit trails generated
by UNIX system. An expert system was used to summarize system security-
relevant events and a statistical analyzer and query mechanism allowed statis-
tical characterization of system-wide events. Network System Monitor (NSM)
was developed at the University of California at Davis in 1990, to run on a Sun
UNIX workstation. NSM was the rst system, monitoring network trafc and
using that trafc as the primary data source. NSM was a signicant milestone in
intrusion detection research because it was the rst attempt to extend intrusion
detection to heterogeneous network environments. Principal researchers were
Levitt, Heberlein and Mukherjee.
Network Audit Director and Intrusion Reporter (NADIR) was developed in 1991
by the Computer division of Los Alamos National Laboratory to monitor user
activities on the Integrated Computing Network (ICN) at Los Alamos. NADIR
performs a combination of expert rule-based analysis and statistical proling.
Appendix B 196
NADIR being a successful intrusion detection system has been extended to
monitor systems beyond the ICN at Los Alamos. Shieh et al. in 1991 presented
a paper A pattern oriented ID model and its applications, with an entirely new
approach, which mentions that a pattern-oriented ID model can analyze object
privilege and data ows in secure computer systems to detect operational secu-
rity problems. This model addresses context-dependent intrusion and comple-
ments the then popular statistical approaches to ID. In the same year, Snapp and
Steven in their paper A system for distributed ID [164] presented a proposed
architecture consisting of the following components: a host manager with a
collection of processes running in background, a LAN manager for monitoring
each LAN in the system and a central manager that receives reports from vari-
ous hosts and LAN managers and processes these reports, correlates them and
detects intrusions.
The US Air Force in 1992 funded the research for the Distributed Intrusion
Detection System (DIDS)[169]; a major initiative to integrate host and network-
based monitoring approaches. Until 1990, intrusion detection systems were
mostly host-based and then in1990 NSM extended intrusion detection to the
network environment. Integrating the host and network-based IDS offers ad-
vantages and disadvantages of both the approaches. It resolves many of the
problems associated with promiscuous network monitoring, while maintaining
the ability to observe the entire communication between victim and attacker.
The principal investigator for DIDS was Steve Smaha.
USTAT, a real-time IDS for Unix [196] was introduced in 1993 by Ilgun and
Koral. USTAT is a state-transition analysis tool for Unix. This is a Unix spe-
cic implementation of a generic design STAT, state-transition analysis tool. In
STAT, a penetration is identied as a sequence of state changes that take the
computer system from some initial stage to a target compromised stage. This
approach differs from other rule-based penetration identication tools that pat-
tern match sequences of audit records. Helman and Paul in 1993 came up with a
paper on Statistical foundations of audit trail analysis for the detection of com-
puter misuse where the modeling of computer transactions is done, as generated
Appendix B 197
by two stationary stochastic processes, the normal process and the misuse pro-
cess. Misuse detection is identication of transactions most likely to have been
generated by misuse process.
In 1994, Crosbie and Spafford suggested the use of autonomous agents in or-
der to improve the scalability, maintainability, efciency and fault tolerance of
an Intrusion Detection System [165]. Next generation Intrusion Detection Ex-
pert System (NIDES) [157] which was developed in 1995 is the successor to
the IDES project. It has a strong anomaly detection foundation using innovative
statistical algorithms, complemented with a signature based expert system com-
ponent that encodes known intrusion scenarios. NIDES is highly modularized
and is designed to operate in real time to detect intrusions as they occur.
Christoph and Gray in 1995 has expanded NADIR to include processing of
audit and activity records for the Cray UNICOS operating system and called
UNICORN: misuse detection for UNICOS [197]. An approach to address the
scalability deciencies in most contemporary intrusion detection systems was
proposed with the design and implementation of GrIDS. A Graph based Intru-
sion Detection Systemfor large networks (GrIDS) [166] was developed in 1996,
with graphs typically codifying hosts on the network as nodes and connections
between hosts as edges between these nodes. The choice of trafc taken to rep-
resent activity in the form of edges is made on the basis of user supplied rule
sets. The graph and the edges have global and local attributes, including time
of connection etc., that are computed by user supplied rule sets. These graphs
present network events in a graphic fashion that enables the viewer to determine
if suspicious network activity is taking place.
Kosoresow and Hofmeyr in 1997 published a paper on Intrusion Detection via
System Call Traces [204]. Computer user leaves trails of activity that can re-
veal signatures of misuse as well as of legitimate activity. Depending on the
audit method used, one can record a users keystrokes, the system resources
used, or the system calls made by some collection of processes. Event Moni-
toring Enabling Responses to Anomalous Live Disturbances (EMERALD) [30]
is a framework for scalable, distributed, inter-operable computer and network
Appendix B 198
intrusion detection. It was developed in 1997 and targets both external and in-
ternal threat agents that attempt to misuse system or network resources. It is
an advanced highly software engineered environment that combines signature
based and statistical analysis components with a resolver that interprets analysis
results, all of which can be used iteratively and hierarchically.
In 1998, Anderson and Khattak offered an innovative approach to intrusion
detection, by incorporating information retrieval techniques into intrusion de-
tection tools. Bonifacio and Mauricio in 1998 were the rst to introduce the
application of Neural networks in IDSs [199]. The system works by capturing
packets and uses neural network to identify an intrusive behavior within the
analyzed data stream. The identication is based on the previous well known
intrusion proles. The system is adaptive, since new proles can be added to the
database and the neural network re-trained to consider them. The paper presents
the proposed model, the results achieved and the analysis of an implemented
prototype. In 1998 a stand-alone system named Bro [200] for detecting net-
work intruders in real-time by passively monitoring a network link over which
the intruders trafc transits was introduced by Paxson and Vern. Bro would
make high-speed, large volume monitoring of network trafc possible without
dropping packets and it also provides a real-time notication of ongoing or at-
tempted attacks. The system is extensible since it is easy to add knowledge of
new types of attack. Bro contains mechanism to withstand attacks and hence is
the rst to incorporate that theory into practice.
Huang and Ming-Yuh in 1999 introduces a large scale distributed ID archi-
tecture based on IDS agents and collaborative attack strategy analysis which
creates an opportunity for IDS agents to pro-actively look ahead for data most
pertinent to current case development. This look ahead adaptive behavior fo-
cuses limited system resources on collecting and auditing those events which
are most likely to reveal intrusions.
In 2000, Ning et al. presented the paper on Modelling requests among coop-
erating IDSs [201]. IDSs have to share information in order to discover attacks
Appendix B 199
involving multiple sites. This paper proposes a formal framework modeling re-
quests among the cooperating IDSs. Dickerson and John had a different idea
which they presented in the paper on Fuzzy network proling for Intrusion De-
tection in 2000. The Fuzzy Intrusion Recognition Engine (FIRE) is an anomaly-
based IDS that uses fuzzy logic to assess whether malicious activity is taking
place on the network. It uses simple data mining techniques to process the
network input data and helps expose metrics that are particularly signicant to
anomaly detection. These metrics are then evaluated as fuzzy sets. Kent and
Stephen presented the paper On the trail of intrusions into information systems
during the same time.
Luo and Jianxiong in 2000 published their paper on Mining fuzzy association
rules and fuzzy frequency episodes for Intrusion Detection [203]. Lee, Stolfo
and Mok previously reported the use of association rules and frequency episodes
for mining audit data to gain knowledge for intrusion detection. Experimental
results show the utility of fuzzy association rules and fuzzy frequency episodes
for intrusion detection. Luo and Jianxiong published another paper on Fuzzy
frequent episodes for real-time intrusion detection in 2001. Data mining meth-
ods including association rule mining and frequent episode mining have been
applied to the intrusion detection problem.
In 2001, Balajinath in his paper Intrusion detection through learning behav-
ior model observes that normally the users exhibit regularities in their usage of
commands of a system, as they tend to achieve the same or perhaps similar ob-
jective. Hence it is popularly known that the command sequences can be used
to characterize the user behavior . Deviations from the characteristic behavior
pattern of a user can be used to detect potential intrusions.
In 2002, Mukkamala and Srinivas suggested the use of neural networks and sup-
port vector machines in intrusion detection. Their paper on Intrusion detection
using neural networks and support vector machines describes these approaches
to intrusion detection and also compares the two methods. Lichodzijewski and
Peter described the Host-based intrusion detection using self-organizing maps
in 2002. Hierarchical SOMs are applied to the problem of host based intrusion
Appendix B 200
detection on computer networks. Unlike systems based on operating system
audit trails, the approach operates on real-time data without extensive off-line
training and with minimal expert knowledge.
Forrest et al. [188] presented one of the rst papers analyzing sequences of
system calls issued by a process for intrusion detection. In 2002, Dasgupta and
Gonzalez in their paper An immunity-based technique to characterize intrusions
in computer networks [192] present a technique inspired by the negative selec-
tion mechanism of the immune system that can detect foreign patterns in the
complement (nonself) space. In particular, the novel pattern detectors (in the
complement space) are evolved using a genetic search, which could differenti-
ate varying degrees of abnormality in network trafc. The paper demonstrates
the usefulness of such a technique to detect a wide variety of intrusive activ-
ities on networked computers. Also a positive characterization method based
on a nearest-neighbor classication is used. Experiments are performed using
intrusion detection data sets and tested for validation. Seleznyov and Alexander
presented the paper Learning temporal patterns for anomaly intrusion detection
in 2002. Being able to accurately recognize its legitimate users a system may
effectively detect masqueraders.
In 2002 Krugel and Christopher presented a paper Service specic anomaly
detection for network intrusion detection. The paper presents an approach that
utilizes application specic knowledge of the network services that should be
protected. This information helps to extend current, simple network trafc mod-
els to form an application model that allows to detect malicious content hidden
in single network packets. The features of the proposed model is described and
experimental data that underlines the efciency of the system is also presented.
Gao and Bo introduces HMMs (Hidden Markov models) based on anomaly in-
trusion detection method where the key idea is to use HMMs to learn the (nor-
mal and abnormal) patterns of Unix processes.
In 2002, Cho and Sung-Bae incorporates soft computing techniques into a prob-
abilistic intrusion detection system. There are a lot of industrial applications
that can be solved competitively by hard computing, while still requiring the
Appendix B 201
tolerance for imprecision and uncertainty that can be exploited by soft comput-
ing. This paper presents a novel intrusion detection system (IDS) that models
normal behaviors with hidden Markov models and attempts to detect intrusions
by noting signicant deviations from the models. At almost the same time,
Abouzakhar and Nasser came up with An intelligent approach to prevent dis-
tributed systems attacks. This paper proposes an innovative way to counteract
distributed protocols attacks such as distributed denial of service (DDoS) at-
tacks using intelligent fuzzy agents. Cansian and Adriano in the paper An at-
tack signature model to computer security intrusion detection mention internal
and external computer network attacks or security threats occur according to
standards and follow a set of subsequent steps, allowing to establish proles or
patterns. This well-known behavior is the basis of signature analysis intrusion
detection systems. This work presents a new attack signature model to be ap-
plied on network-based intrusion detection systems engines.
Zhao and Jun-Zhong in their paper An intrusion detection system based on data
mining and immune principles describe a framework of immune-based intru-
sion detection system (IDS). Here data mining techniques are used to discover
frequently occurred patterns. Ouyang and Ming-Guang presented A fuzzy com-
prehensive evaluation based distributed intrusion detection. Fuzzy Decision
Engine (FDE), which is a component of the detection agent in a distributed in-
trusion detection system, can consider various factors based on fuzzy compre-
hensive evaluation when an intrusion behavior is judged. Kumar and Parimal
in their paper on Detection of port-scans and OS ngerprinting using cluster-
ing explain the port-scanning and OS ngerprinting exploit vulnerabilities of
TCP/IP for intrusion in a computer network.
In 2002, Sekar in his paper Specication-based anomaly detection: A new ap-
proach for detecting network intrusions introduced a different idea. Specication-
based techniques have been shown to produce a low rate of false alarms, but are
not as effective as anomaly detection in detecting novel attacks, especially when
it comes to network probing and denial-of-service attacks. This paper presents
a new approach that combines specication-based and anomaly-based intrusion
detection, mitigating the weaknesses of the two approaches while magnifying
Appendix B 202
their strengths. The approach begins with state-machine specications of net-
works protocols, and augments these state machines with information about
statistics that need to be maintained to detect anomalies.
Inoue and Hajime in the paper on Anomaly intrusion detection in dynamic ex-
ecution environments describe an anomaly intrusion-detection system for plat-
forms that incorporate dynamic compilation and proling. This approach called
dynamic sandboxing gathers information about behavior of applications, usu-
ally unavailable to other anomaly intrusion detection systems, and is able to
detect anomalies at the application layer. This implementation is shown to be
both effective and efcient at stopping a backdoor and a virus, and has a low
false positive rate. Taylor and Carol in 2002 presented a paper on An empir-
ical analysis of NATE - Network analysis of Anomalous Trafc Events. This
paper presents results of an empirical analysis of NATE (Network Analysis of
Anomalous Trafc Events), a lightweight, anomaly based intrusion detection
tool.
Mahoney and Chan have done credible work in detecting the novel attacks and
presented a paper on Learning nonstationary models of normal network trafc
for detecting novel attacks in 2002. The paper proposes a learning algorithm
that constructs models of normal behavior from attack-free network trafc. Be-
havior that deviates from the learned normal model signals possible novel at-
tacks. This IDS is unique in two respects. First, it is nonstationary, modeling
probabilities based on the time since the last event rather than on average rate.
This prevents alarm oods. Second, the IDS learns protocol vocabularies (at
the data link through application layers) in order to detect unknown attacks that
attempt to exploit implementation errors in poorly tested features of the target
software.
Kemmerer and Richard in 2003 presented a paper on Internet security and in-
trusion detection which highlights the principal attack techniques that are used
in the Internet today and possible countermeasures. In particular, intrusion de-
tection techniques are analyzed in detail. This paper mixes a practical character
with a discussion of the current research in the eld. Feng and Hanping came
Appendix B 203
up with a paper on Anomaly detection using call stack information in 2003. The
call stack of a program execution can be a very good information source for
intrusion detection. There is no prior work on dynamically extracting informa-
tion from call stack and effectively using it to detect exploits. In this paper, a
new method is proposed to do anomaly detection using call stack information.
The basic idea is to extract return addresses from the call stack, and generate
abstract execution path between two program execution points. Experiments
show that this method can detect some attacks that cannot be detected by other
approaches, while its convergence and false positive performance is comparable
to or better than the other approaches.
In 2003, Ling and Jun in the paper Novel immune system model and its applica-
tion to network intrusion detection analyzes the techniques and architecture of
existing network Intrusion Detection Systems, and probes into the fundamen-
tals of Immune System (IS), a novel immune model is presented and applied to
network IDS, which is helpful to design an effective IDS. Besides, this paper
suggests a scheme to represent the self prole of network. And an automated
self prole extraction algorithm is provided to extract self prole from packets.
Almost the same time, Tapiador and Juan in their paper on NSDF: A computer
network system description framework and its application to network security
describe a general framework, termed NSDF, for describing network systems.
Both entities and relationships are the basis underlying the concept of system
state. The dynamics of a network system can be conceived of as a trajectory in
the state space. The term action is used to describe every event which can pro-
duce a transition from one state to another. These concepts (entity, relationship,
state, and action) are enough to construct a model of the system. Evolution and
dynamism are easily captured, and it is possible to monitor the behavior of the
system.
In 2003, Xiang and Ga in their paper Generating IDS attack pattern automat-
ically based on attack tree illustrate the generation of attack pattern automati-
cally based on attack tree. The extending denition of attack tree is proposed
and the algorithm of generating attack tree is presented. The method of gener-
ating attack pattern automatically based on attack tree is shown, which is tested
Appendix B 204
by concrete attack instances. The results show that the algorithm is effective and
efcient. The efciency of generating attack pattern is improved and the attack
trees can be reused. In 2003, Gao and Meimei worked on a paper Fuzzy intru-
sion detection based on fuzzy reasoning Petri Nets. Fuzzy rule-based technique,
combining fuzzy logic and expert system methodology, not only is capable to
deal with uncertainty in intrusion detection but also allows the most exible
reasoning about the widest variety of information possible. It can be used in
both anomaly and misuse detections. This paper presents a method for detect-
ing intrusion based on fuzzy rule-based technique. Fuzzy Reasoning Petri Nets
(FRPN) model is used to represent fuzzy rule base and to derive the nal detec-
tion decision as an inference engine. FRPN have parallel reasoning ability and
are readily used into real time detection.
In 2003, Sarawagi and Sunita in their paper on Sequence data mining techniques
and applications comment that many interesting real-life mining applications
rely on modeling data as sequences of discrete multi-attribute records. Mining
models for network intrusion detection view data as sequences of TCP/IP pack-
ets. Erbacher and Robert in 2003 presented a paper on Analysis and Application
of Node Layout Algorithms for Intrusion Detection. The proposed monitoring
environment aids system administrators in keeping track of the activities on
such systems with much lower time requirements than that of perusing typical
log les. With many systems connected to the network the task becomes signif-
icantly more difcult. If an attack is identied on one system then all systems
have likely been attacked. The ability to correlate activity among multiple ma-
chines is critical for complete analysis and monitoring of the environment. This
paper discusses the layout techniques experimented with and their effectiveness.
Zhong and Shao-Chun presented a paper on A safe mobile agent system for
distributed intrusion detection where some applications of the technology of
mobile agent (MA) in Intrusion detection system have been developed. MA
technology can bring IDS exibility and enhanced distributed detection ability.
The MA-IDS architecture and detail methods of local intrusion detection and
distributed intrusion detection are presented. Sabhnani and Maheshkumar in
their work on Application of Machine Learning Algorithms to KDD Intrusion
Appendix B 205
Detection Dataset within Misuse Detection Context in 2003 comments that a
small subset of machine learning algorithms, mostly inductive learning based,
applied to the KDD 1999 Cup intrusion detection dataset resulted in dismal
performance for user-to-root and remote-to-local attack categories as reported
in the recent literature. This paper evaluates performance of a comprehensive
set of pattern recognition and machine learning algorithms on four attack cat-
egories as found in the KDD 1999 Cup intrusion detection data set. Results
of simulation study implemented to that effect indicated that certain classica-
tion algorithms perform better for certain attack categories: a specic algorithm
specialized for a given attack category. Consequently, a multi-classier model,
where a specic detection algorithm is associated with an attack category for
which it is the most promising, was built. Empirical results obtained through
simulation indicate that noticeable performance improvement was achieved for
probing, denial of service, and user-to-root attacks.
Sabhnani and Maheshkumar continued with the work and in another paper on
Formulation of a heuristic rule for misuse and anomaly detection for U2R at-
tacks in solaris operating system environment proposes a heuristic rule for de-
tection of user-to-root (U2R) attacks against Solaris operating system. Relevant
features for developing heuristic rules were manually mined using Solaris Basic
Security Module audit data. Results show that all user-to-root attacks exploit-
ing the suid program were detected with 100% probability and with zero false
alarms. The rule can detect both successful and unsuccessful U2R attempts
against the Solaris operating system. The proposed rule is general enough to
detect any U2R attack that leverages the buffer overow technique. Empirical
results indicate that the rule also detected novel user-to-root attacks in DARPA
1998 intrusion detection dataset.
Heo and Young-Jun presented a paper on Defeating DoS attacks using wavelet
analysis which propose a new approach for detection toward a DoS and DDoS
attack. The use of LRU cache lter and Wavelet approach to analyze character-
istics of network trafc anomalies considering the elicit changes in the wavelet
variance to be a potential DoS attack, and comparing wavelet variance with ow
Appendix B 206
prole to validate attack. Amo and Sandra in 2003 presented the paper on Min-
ing generalized sequential patterns using genetic programming. They propose
a new kind of sequential pattern called Generalized Sequential Pattern, and in-
troduce the problem of mining generalized sequential patterns over temporal
databases.
Ye and Nong in 2004 had a paper on Robustness of the Markov-chain model for
cyber-attack detection. This paper presents a cyber-attack detection technique
through anomaly-detection, and discusses the robustness of the modeling tech-
nique employed. In this technique, a Markov-chain model represents a prole of
computer-event transitions in a normal/usual operating condition of a computer
and network system. The Markov-chain model of the norm prole is generated
from historic data of the systems normal activities. The observed activities of
the system are analyzed to infer the probability that the Markov-chain model of
the norm prole supports the observed activities. The lower probability the ob-
served activities receive from the Markov-chain model of the norm prole, the
more likely the observed activities are anomalies resulting from cyber-attacks,
and vice versa.
Xu and Ming in their paper Anomaly detection based on system call classi-
cation aim to create a new anomaly detection model based on rules. A detailed
classication of the LINUX system calls according to their function and level
of threat is presented. The detection model only aims at critical calls (i.e. the
threat level 1 calls). In the learning process, the detection model dynamically
processes every critical call, but does not use data mining or statistics fromstatic
data. Therefore, the increment learning could be implemented. Based on some
simple predened rules and rening, the number of rules in the rule database
could be reduced, so that the rule match time can be reduced effectively during
detection processing. The experimental results demonstrate that the detection
model can detect R2L and U2R attacks. The detected anomaly is limited in the
corresponding requests, but not in the entire trace. The detection model is t for
the privileged processes, especially for those based on request-responses.
In 2004, Yang and Hongyu introduced a different idea with Decision Support
Appendix B 207
Module introduced to Intrusion Detection. They presented a paper on An appli-
cation of decision support to network intrusion detection in 2004 which briefs
network intrusion system, describes the design of a decision support module
(DSM) for intrusion detection system, which can provide active detection and
automated response support during intrusions. The primary function of the de-
cision support module is to provide recommended actions and alternatives and
the implications of each recommended action. In the decision support mod-
ule, the GA (genetic algorithm) was run over a subset of the data, called the
training data, and then tested over the entire data set to test real-world perfor-
mance. Zhang and Lian-Hua in their paper on Intrusion detection using rough
set classication in 2004 comment that recently machine learning-based intru-
sion detection approaches have been subjected to extensive researches because
they can detect both misuse and anomaly. In this paper, rough set classication
(RSC), a modern learning algorithm, is used to rank the features extracted for
detecting intrusions and generate intrusion detection models.
Imamura and Kosuke in 2004 presented a paper on Potential application of
training based computation to intrusion detection comments that without detec-
tion of a network intrusion, a system is not capable of properly defending itself.
Therefore, the rst step in preserving system integrity is to detect whether or not
the system is under attack. Packet analysis approaches are effective at detect-
ing known attacks, but fail at unknown attack detection. In order to protect the
system from unknown attacks, a classier system which is independent of the
signatures found in network packets is developed. One of the promising ways
to perform this classication is to prole kernel level activities. A probabilisti-
cally optimal classier ensemble method is used to monitor kernel activity, and
ultimately to predict whether or not the system is under attack.
Du and Yan-Hui in their paper on Formalized description of distributed de-
nial of service attack try to analyze, check and judge DDoS. Based on a careful
study of the attack principles and characteristics, an object-oriented formalized
description is presented, which contains a three-level framework and offers full
specications of all kinds of DDoS modes and their features and the relations
between one another. Its greatest merit lies in that it contributes to analyzing,
Appendix B 208
checking and judging DDoS. Teng and Shaohua in their paper on Scan attack
detection model by combining feature and statistic analysis in 2004 remark that
attackers often nd a host that is attacked in the Internet by scan, so a lot of
attacks can be prevented if such scan attacks are detected. Presently, there are
mainly two kinds of methods to detecting scan attacks: statistics-based detec-
tion and feature-based detection. But the high rate of false negative and false
positive makes them not very effective. In this study, a new method for detect-
ing scan attacks is presented by combining feature with statistic analysis. It can
detect efciently scan attacks with lower false positive rate and false negative
rate.
Teng and Shaohua again in a different paper on Case reasoning and state tran-
sition analysis for intrusion detection in 2004 make remarks that when a new
intrusion scenario is developed, many intrusion methods can be derived by ex-
changing the command sequences or replacing commands with the functionally
similar commands, which makes the detection of the developed intrusion very
difcult. To overcome this problem, a Case Reasoning And State Transition
Analysis (CRASTA) is proposed in this paper. For an intrusion case all the
possible derived intrusions are generated as an intrusion base and based on this
intrusion base, an efcient algorithm to detect such intrusions by using nite
automation is presented. A derived intrusion can be seen as an unknown intru-
sion, in this sense the technique presented can detect some unknown intrusions.
Zhao and Yuming in their paper on Study of anomaly detection based on system
call and data mining technology in 2004 introduce the categories of intrusion
detection and the methods of data mining applied in anomaly detection. It also
describes the design and implementation of the anomaly IDS based on system
calls and data mining algorithms.
Abraham [171] in 2004 investigated the suitability of linear genetic program-
ming (LGP) technique to model fast and efcient intrusion detection systems.
The performance and accuracy of the LGP was compared to results obtained
by ANN and regression tree methods. Experiments performed over the popular
DARPA IDS data set showed that LGP outperformed decision trees and support
Appendix B 209
vector machines in terms of detection accuracies (except for one class). Deci-
sion trees were considered as the second best, especially for the detection of
U2R attacks.
Xu and Ming in the paper Two-layer Markov chain anomaly detection model
in 2005 propose, on the basis of the current single layer Markov chain anomaly
detection model, a new two-layer model. Two distinctly different processes,
the different requests and the system call sequence in the same request section,
are classied as two layers and dealt with by different Markov chains respec-
tively. The two-layer frame can depict the dynamic activity of the protected
process more exactly than the single layer frame, so that the two-layer detection
model can promote the detection rate and degrade the false alarm rate. Fur-
thermore, the detected anomaly will be limited in the corresponding request
sections where anomaly happens. The new detection model is suitable for priv-
ileged processes, especially for those based on request-response.
Zhao et al. [172] in 2005 have proposed a misuse detection system and anomaly
detection system that encode an experts knowledge of known patterns of attack
and system vulnerabilities as if-then rules. The normal connection and intruded
connection divided into different clustering sets and then to distinguish them,
then the researchers integrate GA to detect intruded action. The researchers sys-
temcombines two stages (clustering stage and genetic optimizing stage) into the
process. The GA was successfully applied and learned to a real-world test case.
At almost the same time, Gong et al. [173] chose GA approach to network mis-
use detection, because robust to noise, no gradient information is required to
nd a global optimal or suboptimal solution, self-learning. Kim et al. in 2005
[174] proposed Genetic Algorithm to improve Support Vector Machines based
IDS. They fused GA and SVM in order to improve the overall performance of
IDS. An optimal detection model for SVM classier was determined. As the
result of the fusion, SVM based IDS did not only select optimal parameters for
SVM but also optimal feature set among the whole feature set.
Appendix B 210
Abraham and Grosan [175] evaluated the performances of two Genetic Pro-
gramming techniques for IDS, Linear Genetic Programming (LGP) and Multi-
Expression Programming (MEP) and provided a comprehensive comparison
of obtained results with selected nonevolutionary machine learning techniques
such as Support Vector Machines (SVM) and Decision Trees (DT). Based on
numerical experiments and comparisons, they showed that Genetic Program-
ming techniques outperformed the reference machine learning methods. In
detail, the MEP outperformed LGP for three of the considered classes and
LGP outperform MEP for two of the classes. MEP classication accuracy was
greater than 95% for all considered classes and for three of them is greater
than 99.75%. Moreover, they suggested that for real time intrusion detection
systems, MEP and LGP would be the ideal candidates because of its simple
implementation.
B.3 Taxonomy of Intrusion Detection System
We have made use of a large number of concepts to classify the IDSs. The clas-
sication is presented in Figure B.1 with a detailed discussion in this section.
B.3.1 Intrusion detection methods
The basic intrusion detection methods are the two complementary approaches to
detecting intrusions, namely the anomaly detection (behavior-based) approaches
and the knowledge-based approaches (misuse detection). Both methods have
their distinct advantages and disadvantages as well as suitable application areas
of intrusion detection.
Anomaly detection methods
Anomaly detection or the behavior-based detection or Heuristic detection meth-
ods use information about repetitive and usual behavior on the systems they
monitor, and this approach identies events that deviate from expected usage
patterns as malicious. Most anomaly detection approaches attempt to build
some kind of a model over the normal data and then check to see how well
new data ts into that model. In other words, anything that does not correspond
Appendix B 211
Figure B.1: Taxonomy of Intrusion Detection Systems
Appendix B 212
to a previously learned behavior is considered intrusive. Therefore, the intru-
sion detection system might not miss any attacks, but its accuracy is a difcult
issue, since it can generate a lot of false alarms. Examples of anomaly detec-
tion systems are IDES, NIDES, EMERALD and Wisdom and Sense. Anomaly
detection can be either by unsupervised learning techniques or by supervised
learning techniques.
1. Unsupervised learning systems Unsupervised or self-learning systems learn
the normal behavior of the trafc by observing the trafc for an extended
period of time and building some model of the underlying process. Exam-
ples include such techniques such as Hidden Markov Model (HMM) and
Articial Neural Network (ANN). More details are available in the work
of Sundaram [176].
2. Supervised Systems In the programmed systems or the supervised learn-
ing method, the system has to be taught to detect certain anomalous events.
The supervised anomaly detection approaches build predictive models pro-
vided labeled training data (normal or abnormal users or applications be-
havior) are available. Thus the user of the system forms an opinion on
what is considered abnormal for the system to signal a security violation.
Advantages of behavior-based approaches
Detects new and unforeseen vulnerabilities.
Less dependent on operating system-specic mechanisms.
Detect abuse of privileges types of attacks that do not actually in-
volve exploiting any security vulnerability.
Disadvantages of behavior-based approaches
The high false alarm rate is generally cited as the main drawback of
behavior-based techniques because:
The entire scope of the behavior of an information system may not be
covered during the learning phase.
Behavior can change over time, introducing the need for periodic on-
line retraining of the behavior prole.
Appendix B 213
The information system can undergo attacks at the same time the in-
trusion detection system is learning the behavior. As a result, the be-
havior prole contains intrusive behavior, which is not detected as
anomalous.
It must be noted that very few commercial tools today implement such
an approach, leaving anomaly detection to research systems, even if the
founding paper by Denning [7] recognizes this as a requirement for IDS
systems.
Knowledge-based detection methods
Knowledge-based detection or misuse detection or signature detection methods
use information about known security policy, known vulnerabilities, and known
attacks on the systems they monitor. This approach, compares network activity
or system audit data to a database of known attack signatures or other misuse
indicators, and pattern matches produce alarms of various sorts. All commercial
systems use some form of knowledge-based approach. Thus, the effectiveness
of current commercial IDS is based largely on the validity and expressiveness of
their database of known attacks and misuse, and the efciency of the matching
engine that is used. It requires frequent updates to keep up with the new stream
of vulnerabilities discovered, this situation being aggravated by the requirement
to represent all possible facets of the attacks as signatures. This leads to an
attack being represented by a number of signatures, at least one for each operat-
ing system to which the intrusion detection system has been ported. Examples
of product prototypes are Discovery, IDES, Haystack and Bro. The work of
Gordeev [180] discusses these methods in detail.
B.3.2 Deployment techniques
The effectiveness of the Intrusion Detection System depends on their internal
design and, even more importantly, on their position within the corporate archi-
tecture. Generally, IDS can be classied into different categories depending on
their deployment.
Appendix B 214
Host-based monitoring
A host-based IDS is deployed on devices that have other primary functions such
as Web servers, database servers and other host devices. Host logs, comprised
of the combination of audit, system and application logs, offer an easily acces-
sible and non-intrusive source of information on the behavior of a system. In
addition, logs generated by high-level entities can often summarize many lower-
level events such as a single HTTP application log entry covering many system
calls, in a context-aware fashion. A host-based IDS provides information such
as user authentication, le modications/deletions and other host-based infor-
mation, thus designated as secondary protection to devices on the network. Ex-
amples of HIDS products are EMERALD, NFR etc.
Advantages of Host-based Intrusion Detection Systems
Although overall host-based IDS is not as robust as network-based IDS, host-
based IDS does offer several advantages over network-based IDS:
More detailed logging:- HIDS can collect much more detail information
regarding exactly what occurs during the course of an attack.
Increased recovery:- Because of the increased granularity of tracking events
in the monitored system, recovery from a successful incident is usually
more complete.
Detects unknown attacks:- Since the attack affects the monitored host,
HIDS detects unknown attacks more than the Network-based IDS.
Fewer false positives:- The way HIDS works provides fewer false alerts
than produced by Network-based IDS.
Disadvantages of Host-Based IDS
Indecipherable information:- Because of network heterogeneity and the
profusion of operating systems, no single host-based IDS can translate all
operating systems, network applications, and le systems. In addition,
in the absence of something like a corporate key, no IDS can decipher
encrypted information.
Appendix B 215
Indirect information:- Rather than monitor activity directly (as do network-
based IDS), host-based IDS usually rely heavily or completely on an audit
record of activity that is created by a system or application. This audit
record varies widely in quality and quantity between different systems and
applications, thus dramatically affecting IDS effectiveness.
Complete coverage:- Host-based IDS are installed on the system being
monitored. On very large networks this can comprise many thousands
of workstations. Providing IDS on this scale is both very expensive and
difcult to manage.
Outsiders:- A host-based IDS can potentially detect an outside intruder
only after the intruder has reached the monitored host system, not before,
as can network-based IDS. To reach a host system, the intruder must have
already bypassed network security measures.
Host interference:- Host-based IDS places such a load on the host CPU as
to interfere with normal host operations. On some systems, just invoking
an audit record sufcient for the IDS can result in unacceptable loading.
Network-based monitoring
The sole function of network-based IDS is to monitor the trafc of that network.
This ensures that the IDS can observe all communication between a network
attacker and the victim system, resolving many of the problems associated with
log monitoring. Typical Network-based IDS are Microsoft Network Monitor,
Cisco Secure IDS (formerly NetRanger), Snort etc.
Advantages of network-based intrusion detection
Ease of deployment:- Passive nature and hence few performance or com-
patibility issues in the monitored environment.
Cost:- Strategically placed sensors can be used to monitor a large organi-
zational environment where as a host-based IDS requires software on each
monitored host.
Appendix B 216
Range of detection:- The variety of malicious activities able to be detected
through the analysis of network trafc is wider than the variety able to be
detected in host-based IDS.
Forensics integrity:- Since the network-based IDS sensors run on a host
separate from the target, they are more impervious to tampering.
Detects all attempts, even failed ones:- Host-based IDS detects only suc-
cessful attacks because unsuccessful attacks do not affect the monitored
host directly.
Disadvantages of Network-based IDS
Direct attack susceptibility:- A recently released a study by Secure Net-
works, Inc. of leading network-based IDS products found that network-
based IDS are susceptible to:
i. Packet spoong, which tricks the IDS into thinking packets have come
from an incorrect location.
ii.Packet fragmentation attacks that retransmit sequence numbers so that
the IDS sees only what a hacker wants it to see.
Indecipherable packets:- Because of network heterogeneity and the rela-
tive profusion of protocols, network-based IDSs often cannot decipher the
packets they capture. In addition, in the absence of something like a cor-
porate key, no IDS can decipher encrypted information.
Failure when loaded:- A recent evaluation of leading network-based com-
mercial products found that products that detect all tested attacks suc-
cessfully on an empty or moderately utilized network have been found to
start missing at least some attacks when the monitored network is heavily
loaded.
Failure at wire speed:- While network-based IDS can process packets on
low-speed networks (10Mbps), few claim to be able to keep up and miss
no information at 100Mbps or higher.
Complete coverage:- Most sensors are designed to be installed on shared-
access segments, and can monitor only that trafc running through those
Appendix B 217
segments. To provide coverage, the IDS user must select key shared-access
segments for IDS sensors. Most frequently they place sensors in the de-
militarized zone and, in some cases, in front of port and server farms. To
monitor distributed ports, internal attack points, distributed Ethernet con-
nections, and desktops, many sensors must be installed. Even then, elastic
or unauthorized connections such as desktop dial-ins and modems will not
be monitored.
Switched networks:- To make matters worse, switching has replaced shared
/routed networks as the architecture of choice. Switching effectively hides
trafc from shared-access network-based IDS products. Switched net-
works fragment communication and divide a network into myriad micro
segments that make deploying shared-access IDS prohibitively expensive
since to provide coverage, very many sensors must be deployed. Alterna-
tives could be attaching hubs to switches wherever switched trafc must be
monitored or mirroring selected information such as that moving to spe-
cic critical devices, to a sensor for processing. None of these are easy or
ideal solutions.
Insiders:- Network-based IDS focus is on detecting attacks from outside,
rather than attempting to detect insider abuse and violations of local secu-
rity policy
Host network monitoring
Host Network Monitoring is also called the network-node or the hybrid intru-
sion detection, this approach is used in personal rewalls and some IDS probe
designs lies in combining network monitoring with host-based probes. By ob-
serving data at all levels of the hosts network protocol stack, the ambiguities
of platform-specic trafc handling and the problems associated with crypto-
graphic protocols can be resolved. The data and event streams observed by
the probe are those observed by the system itself. This approach offers advan-
tages and disadvantages similar to both alternatives listed above. It resolves
many of the problems associated with promiscuous network monitoring, while
maintaining the ability to observe the entire communication between victim
and attacker. Like all host-based approaches, however, this approach implies a
Appendix B 218
performance impact on every monitored system, requires additional support to
correlate events on multiple hosts, and is subject to subversion when the host is
compromised. Sometimes this hybrid intrusion detection system is considered
as a subtype of network-based intrusion detection system because it relies pri-
marily upon the network trafc analysis for detection. Example ot hybrid IDS
is prelude.
Target-based monitoring
An attempt to resolve the ambiguities inherent in protecting multiple platforms
lies in combining network knowledge with trafc reconstruction. These target-
based ID systems typically use scanning techniques to form an image of what
systems exist in the protected network, including such details as host operating
system, active services, and possible vulnerabilities. Using this knowledge, a
probe can reconstruct network trafc in the same fashion, as would be the case
on the receiver system, preventing attackers from injecting or obscuring attacks.
In addition, this approach allows an IDS to automatically differentiate attacks
that are a threat to the targeted system, from those that target vulnerabilities not
present - thus rening generated alerts. Whether attacks that cannot succeed
should be reported is something of a contentious issue - offering a trade-off be-
tween lower security alerts being generated, versus the possibility of recogniz-
ing novel attacks when combined with known sequences. In addition, the need
to maintain an accurate map of the protected network - including valid points of
vulnerability - may reduce the ability of this class of system to recognize novel
attacks.
B.3.3 Information source
The information that an IDS product can access is determined by where it is
deployed. Network-based IDS always capture and analyze network packets,
while host-based IDS products potentially have many information sources on
the hosts where they are installed. The IDS classication based on the data
source is listed below:
Appendix B 219
Network packets
The IDS includes a network-based sensor designed to capture and process net-
work packets and decipher at least one network protocol (e.g. TCP/IP).
Audit trial
The IDS includes a host-based agent designed to process the audit record of at
least one specic operating system (e.g., Solaris, Ultrix, Unicos).
B.3.4 Architecture
The IDS should provide a distributed capability, since this component of scala-
bility is vital for effective deployment of IDS in the vast majority of corporate
networks. A distributed capability means that both a central manager or man-
agers and local collection/ processing agents placed as needed throughout the
monitored network provide the IDS functionality. However, some products are
available in both a local and distributed versions.
Monolithic systems
The simplest model of IDS is a single application, containing probe, monitor,
resolver and controller all in one called the monolithic or the centralised system.
This focuses on a specic host or system - with no correlation of actions that
cross system boundaries. Such systems are conceptually simple, and relatively
easy to implement. Their major weakness lies in the ability for an attack to be
implemented using a sequence of individually innocuous steps. The alerts gen-
erated by such systems may in fact be aggregated centrally - but this architecture
offers no synergy between IDS instances.
Hierarchic systems
If one considers the alerts generated by an IDS instance to be events in them-
selves, suitable for feeding into a higher-level IDS structure, an intrusion de-
tection hierarchy results. At the root of the hierarchy, lie a resolver unit and
controller. Below this lie one or more monitor components, with subsidiary
Appendix B 220
probes distributed across the protected systems. Effectively, the whole hierar-
chy forms a macro-scale IDS. The use of a centralized controller unit allows
information from different subsystems to be correlated, potentially identifying
transitive or distributed attacks. For example, a simple address range probe,
while difcult to detect using a network of monolithic host IDS instances, can
be trivial to observe when correlating connections using a hierarchic structure.
Agent-based systems
A more recent model of IDS architecture divides the system into distinct func-
tional units: probes, monitors, resolver and controller units. These may be
distributed across multiple systems, with each component receiving input from
a series of subsidiaries, and reporting to one or more higher-level components.
Probes report to monitors, which may report to resolver units or higher-level
monitors, and so forth. This architecture, implemented in systems such as
EMERALD, allows great exibility in the placement and application of indi-
vidual components. In addition, this architecture offers greater survivability in
the face of overload or attack, high extensibility, and multiple levels of reporting
throughout the structure. FIRE is a product prototype that uses agent-based ap-
proach of intrusion detection. The paper, which discusses this method in detail,
is [186].
Distributed systems
All the IDS architectural models described so far consider attacks in terms of
events on individual systems. A recent development, typied by the GrIDS
system, lies in regarding the whole system as a unit. Attacks are modeled as
interconnection patterns between systems, with each link representing network
activity. The graphs that form can be viewed at different scales, ranging from
small systems to the interconnection between large and complex systems (where
sub-networks are collapsed into points). This novel approach promises high
scalability and the potential to recognize widely distributed attack patterns such
as worm behavior. Also this architecture is implemented in DIDS.
Appendix B 221
B.3.5 Analysis frequency
The classication depending on execution frequency or periodicity is based on
how often an IDS analyzes data from its information sources. Most commercial
IDS claim real-time processing capability, and a few provide the capability for
batch processing of historical data.
Dynamic execution
IDS is designed to perform concurrent and continuous automated processing
and analysis implying real-time operation or on-the-y processing. IDS deploy-
able in real-time environments are designed for online monitoring and analyzing
system events and user actions.
Static execution
IDS is designed to perform periodic processing and analysis implying batch or
other sporadic operation. This may serve effective for the low intensity probes
the attacker makes to hide his presence by spreading his attack over a very long
duration period with appreciable gap between the consecutive attacks. Audit
trail analysis is the prevalent method used by periodically operated systems.
B.3.6 Response
The behavior on detection of an attack describes the response of the IDS. An
IDS may respond to an identied attack, misuse, or anomalous activity in the
following three ways:
Passive
In passive response, the IDS simply generates alarms to inform responsible per-
sonnel of an event by way of console messages, email, paging, and report up-
dates. Passive or indirect gathering of information aids in identifying the source
of attack using techniques such as DNS lookups, passive ngerprinting etc.
Appendix B 222
Reactive
It is an active response to critical events, where it takes corrective action that
stops the attacker from gaining further access to resources, thus mitigating the
effects of the attack. These responses are executed after the attack has been
detected by the IDS. Reactive responses change the surrounding system envi-
ronment, either in the host on which the IDS resides or outside in the surround-
ing network. For example the IDS recongures another system such as rewall
or router to block out the attacker, or uses TCP reset frames to tear down any
connection attempts, correcting a system vulnerability, logging off a user, se-
lectively increasing monitoring, or disconnecting a port as specied by the user.
The main goal of these responses is to stop the attacker from gaining further
access to resources, thus mitigating the effects of the attack.
Proactive
This is an active response to critical events, where it takes proactive action by
intervening and actively stopping an attack from taking place. The only differ-
ence between proactive and reactive responses is when they are executed. A
proactive response could be to drop a network packet before it has reached its
destination, thereby intervening and stopping the actual attack. A reactive re-
sponse would have been able to terminate the ongoing connection, but it would
not have stopped the packet that triggered the IDS from reaching its destination.
A more exhaustive taxonomy of intrusion detection systems is available in the
work of Sabahi [187].
B.4 Latest Intrusion Detection softwares
Along with the intrusion detection systems that have made signicant contri-
butions to the ongoing research in the eld and mentioned above, there are a
few other products that deserve a special discussion. Most of the currently used
Open-Source and free Software Packages, Commercial Software Packages and
Academic Software Packages are included below:
AirCERT Automated Incident Reporting (AirCERT) is a scalable distributed
Appendix B 223
system for sharing security event data among administrative domains. Us-
ing AirCERT, organizations can exchange security data ranging from raw
alerts generated automatically by network intrusion detection systems (and
related sensor technology), to incident reports based on the assessments of
human analysts.
ISS Real Secure This IDS works satisfactorily at Gigabit speed. The high
speed is possible by IDS integrated into the switch or by using a specic
port called the span port, which mirrors all the trafc on the switch. The
Blackice technology of this sensor includes protocol analysis and anomaly
detection combined with the Real Secures library of signature-based de-
tection capabilities.
Real Secure Server Sensor It is a hybrid IDS, which resides on one host and
still monitors the network trafc and detects attacks in the network layer
of the protocol stack. However the sensor also detects attacks at higher
layers and therefore it can detect attacks hidden in encrypted sessions such
as IP sec or SSL encryptions. The sensors can also monitor application
and operating system logs.
Snort Snort is an Open Source Network Intrusion Detection Systems that
keeps track of intrusion attempts, signs of possible bad behavior or hack-
ing exploits. It is capable of performing real-time trafc analysis and
packet logging on IP networks. It can perform protocol analysis, con-
tent searching/matching and can be used to detect a variety of attacks and
probes, such as buffer overows, stealth port scans, CGI attacks, SMB
probes, OS ngerprinting attempts, and much more. It is non-intrusive,
easily congured, utilizes familiar methods for rule development, currently
includes the ability to detect more than 1200 potential vulnerabilities.
Sourcere Founded by the creators of Snort, the most widely deployed In-
trusion Detection technology worldwide, Sourcere has been recognized
throughout the industry for enabling customers to quickly and effectively
address security risks. Today, Sourcere is redening the network secu-
rity industry by combining enhanced Snort with sophisticated proprietary
technologies to offer the rst ever unied security monitoring infrastruc-
ture, delivering all of the capabilities needed to proactively identify threats
Appendix B 224
and defend against intruders.
Shadow Shadow is an Intrusion Detection system developed on inexpensive
PChardware running Open Source, public domain, or freely available soft-
ware. A SHADOW system consists of at least two pieces: a sensor located
at a point near an organizations rewall, and an analyzer inside the re-
wall. Shadow performs trafc analysis; the sensor collects packet headers
from all IP packets that it sees; the analyzer examines the collected data
and displays user dened interesting events on a web page.
Entercept Entercept is a HIDS that prevents and detects attacks; uses a com-
bination of signatures and behavioral rules; safeguards the server, applica-
tions and resources from known and unknown worms and buffer-overow
attacks; reduces false positives and protects customer data.
McAfee Desktop Firewall It is an HIDS which provides rewall protection
and intrusion detection for desktop; guards against threats from internal
and internal intruders, malicious code and silent attacks.
OKENA StormWatch It is an HIDS that intercepts all system calls to le, net-
work, COM and registry resources and correlates behaviors of such system
requests to make real time allow or deny decisions; supports XP, Win2K
and UNIX systems; scalable to 5000 intelligent agents manageable from
one console.
Symantec Host IDS It is a HIDS that detects unauthorized and malicious
activity like access to critical les and bad logins, alerts administrators and
takes precautionary action to prevent information theft or loss, without any
overhead to the deployed monitoring machine. It has the advantage that it
supports all the popular operating systems.
SMART Watch It is a HIDS that performs le-change detection; provides a
restoration tool that reacts in near-real time without polling.
GFI LANguard It is a HIDS that monitors the security event logs of all Win-
dows XP, Windows 2000, and Windows NT servers and workstations on
your network; alerts administrators in real time about possible intrusions
and attacks.
Appendix B 225
NetRanger NetRanger is a network based IDS that monitors network trafc
with special hardware devices that can be integrated into Cisco routers and
switches or act as stand-alone boxes. In addition to network packets, router
log les can also be used as additional source of information. The system
consists of Sensors, centralized data processing units called Directors and
a proprietary communication subsystem called Post Ofce. NetRanger is
integrated into Cisco Secure Intrusion Detection System.
Network Flight Recorder NFR is a network based ID system that uses lters
for misuse detection. The NFR did not start as an IDS but provides an
architecture to monitor and lter network packets, log results, perform sta-
tistical evaluation and initiate alarms when certain conditions are met and
therefore can be used to detect intrusions as well. The NFR is designed
to provide such post-mortem analysis capability for networks when mali-
cious activities have happened. This can be used to shorten the lifetime of
new attacks by quickly adding their signature to the detection unit. Ad-
ditionally, the system also performs statistic gathering and provides infor-
mation about usage growth of applications or trafc peaks of certain pro-
tocol types. The architecture is built in a modular fashion with interfaces
between the main components to easily add new subsystems. The NFR
Securitys intelligent intrusion management system, not only detect and
deter network attacks, but also integrates with popular rewall providers
to prevent future attacks.
Fuzzy Intrusion Recognition Engine FIRE is a network intrusion detection
system that uses fuzzy systems to assess malicious activity against com-
puter networks. The system uses an agent-based approach to separate
monitoring tasks. Individual agents perform their own fuzzy process of in-
put data sources. All agents communicate with a fuzzy evaluation engine
that combines the results of individual agents using fuzzy rules to produce
alerts that are true to certain degree. The results show that fuzzy systems
can easily identify port scanning and denial of service attacks. The sys-
tem can be effective at detecting some types of backdoor and Trojan Horse
attacks. The paper [191] gives more details on this product.
Intelligent intrusion detection system IIDS is being developed to demonstrate
Appendix B 226
the effectiveness of data mining techniques that utilize fuzzy logic. This
system combines two distinct intrusion detection approaches: Anomaly
based intrusion detection using fuzzy data mining techniques, and Mis-
use detection using traditional rule-based expert system techniques. The
anomaly-based components look for deviations from stored patterns of
normal behavior. The misuse detection components look for previously
described patterns of behavior that are likely to indicate an intrusion. Both
network trafc and system audit data are used as inputs. The paper, which
describes this prototype, is [193].
DERBI DERBI is a computer security tool that targets at diagnosing and
recovering from network-based break-ins. The technology adopted has the
ability to handle multiple methods (often with different costs) of obtaining
desired information, and the ability to work around missing information.
The prototype will not be an independent program, but will invoke and
coordinate a suite of third-party computer security programs (COTS or
public) and utility programs.
MINDS MINDS (Minnesota Intrusion Detection System) project is develop-
ing a suite of data mining techniques to automatically detect attacks against
computer networks and systems. It uses an unsupervised anomaly detec-
tion system that assigns a score to each network connection that reects
how anomalous that connection is.
NetSTAT NetSTAT is a tool aimed at real-time network-based intrusion de-
tection. The NetSTAT approach extends the state transition analysis tech-
nique (STAT) to network-based intrusion detection in order to represent
attack scenarios in a networked environment. Net-STAT is oriented to-
wards the detection of attacks in complex networks composed of several
subnetworks.
BlackICE The BlackICE IDS scans network trafc for hostile signatures in
much the same way that virus scanners examine les for virus signatures.
BlackICE runs at 148,0 00 packets per second, checks all 7 layers of the
stack and rates each attack on a scale of 1 to 100 so that only attacks
it considers serious are alerted. There are two versions: desktop agent
Appendix B 227
(BlackICE Defender) and network agent (BlackICE Sentry) . The desktop-
agent runs on Win95/WinNT desktop. The network agent runs just like any
other sniffer-type IDS.
Cylops
Snort-based Cyclops IDS provides advanced and exible intrusion detection at
Gigabit speeds and secures networks by performing high-speed packet analysis
to detect malicious activities in real-time and automatically launch preventive
measures before security can be compromised.
Dragon Sensor
Dragon sensor detects suspicious activity with both signature based and anomaly
based techniques. Its library of attacks detects thousands of potential network
attacks and probes, and also hundreds of successful system compromises and
backdoors.
E-Trust
eTrust intrusion detection delivers state-of-the-art network protection including
DDoS attacks. All incoming and outgoing trafc is checked against a catego-
rized list of web sites to ensure compliance. It is then checked for content,
malicious codes and viruses and notify the administrator of offending payloads.
Manhunt
Symantec ManHunt provides high-speed, network intrusion detection, real-time
analysis and correlation, and proactive prevention and response to protect en-
terprize networks against internal and external intrusions and denial-of-service
attacks. The ability to detect unknown threats using protocol anomaly detec-
tion, helps in eliminating network exposure and the vulnerability inherent in
signature-based intrusion detection systems. Symantec ManHunt trafc rate
monitoring capability allows for detection of stealth scans and denial-of-service
attacks that can cripple even the most sophisticated networks.
NetDetector
Appendix B 228
NetDetector is a network surveillance system for IP networks that provides non-
intrusive, continuous trafc recording and real-time trafc analysis. NetDetec-
tor records network trafc, analyzes every packet, detects the activities of in-
truders, sets alarms for real-time alerting, and gathers evidence for post-event
analysis.
B.5 Review of the data processing techniques used in IDS
It is clear from the discussions on IDSs in the above sections that various pro-
cessing methods are employed on the network trafc for detection by different
IDSs. A brief review of the various systems is presented in this section.
Anomaly detection methods
1. Statistical analysis In statistical analysis approach the user or system be-
havior, which are a set of attributes, is measured over a period of time, by
a number of variables such as user login, logout, number of les accessed
in a period of time, usage of disk space, memory, CPU etc. The system
stores mean values for each variable and gives an alert when it exceeds
that of a predened threshold. A sophisticated model of user behavior has
been developed using short- and long-term user proles. These proles are
regularly updated to keep up with the changes in user behaviors. Statisti-
cal methods are often used in implementations of normal user behavior
prole-based Intrusion Detection Systems. IDES, NIDES, EMERALD,
SECURENET, and SPADE use the statistical analysis approach.
2. Articial Neural Networks Articial Neural networks use their learning
algorithms to learn about the relationship between input and output vectors
and to generalize them to extract new input/output relationships. With
the neural network approach to intrusion detection, the main purpose is
to learn the behavior of actors in the system (e.g., users, daemons). The
advantage of using neural networks over statistics is due to the simple way
of expressing nonlinear relationships between variables, and in learning
about relationships automatically.
3. User intention identication This technique models normal behavior of
users by the set of high-level tasks they have to perform on the system in
Appendix B 229
relation to the users functions. These tasks are taken as series of actions,
which in turn are matched to the appropriate audit data. The analyzer keeps
a set of tasks that are acceptable for each user. Whenever a mismatch is
encountered, an alarm is produced. SECURENET uses this technique for
intrusion detection.
4. Computer immunology Analogies with immunology has lead to the devel-
opment of a technique that constructs a model of normal behavior of UNIX
network services, rather than that of individual users. This model consists
of short sequences of system calls made by the processes. Attacks that ex-
ploit aws in the application code are very likely to take unusual execution
paths. First, a set of reference audit data is collected which represents the
appropriate behavior of services, then the knowledge base is added with all
the known good sequences of system calls. These patterns are then used
for continuous monitoring of system calls to check whether the sequence
generated is listed in the knowledge base; if not an alarm is generated.
This technique has a potentially very low false alarm rate provided that the
knowledge base is fairly complete. Its drawback is the inability to detect
errors in the conguration of network services. Whenever an attacker uses
legitimate actions on the system to gain unauthorized access, no alarm is
generated.
5. Machine learning This is an articial intelligence technique that stores the
user-input stream of commands in a vectorial form and is used as a refer-
ence of normal user behavior prole. Proles are then grouped in a library
of user commands having certain common characteristics.
6. Data mining Data mining generally refers to a set of techniques that use
the process of extracting previously unknown but potentially useful data
from large stores of data. Data mining method excels at processing large
system logs of audit data. However they are less useful for stream analysis
of network trafc. One of the fundamental data mining techniques used in
intrusion detection is associated with decision trees. Decision tree models
allow one to detect anomalies in large databases. Another technique refers
to segmentation, allowing extraction of patterns of unknown attacks. This
is done by matching patterns extracted from a simple audit set with those
Appendix B 230
referred to warehoused unknown attacks. A typical data mining technique
is associated with nding association rules. With data mining it is easy to
correlate data related to alarms with mined audit data, thereby considerably
reducing the rate of false alarms . Examples include ADAM (Anomaly
Data Analysis and Mining), IDDM and MINDS.
Misuse detection methods
1. Expert system Expert systems work with the previously dened set of rules
describing an attack. All security related events incorporated in an audit
trail are translated in terms of if-then-else rules. Examples are IDES, Wis-
dom & Sense and ComputerWatch.
2. Signature analysis Signature analysis detects attacks by capturing features
of attack in the audit trail. Thus, attack signatures can be found in logs
as a sequence of audit events that a given attack generates or input data
streams or patterns of searchable data that are captured in the audit trail.
This method uses abstract equivalents of audit trail data. Detection is ac-
complished by using common text string matching mechanisms. Examples
are Real Secure, Haystack, NetRanger, and Emerald.
3. State-transition analysis In State-transition analysis, an attack gets repre-
sented on state-transition diagrams whereby a set of transitions is identi-
ed to be completed by an intruder to compromise a system. Examples are
USTAT and NetSTAT.
4. Colored Petri Nets The Colored Petri Nets approach is often used to gener-
alize attacks from expert knowledge bases and to represent attacks graphi-
cally. With this technique, it is easy for system administrators to add new
signatures to the system. However, matching a complex signature to the
audit trail data may be time-consuming and hence not used in commercial
systems. Purdue University IDIOT system uses Colored Petri Nets.
5. Data Mining Data Mining is the non-trivial process of identifying valid
and novel attack patterns in the network trafc. Examples are the Mining
Audit Data for Automated Models for Intrusion detection (MADAM ID),
and JAM.
Appendix B 231
B.6 Current Intrusion Detection research
In any network environment, rewall takes the role of protection while detec-
tion is handled by the IDS. The IDS can be used to assess the effectiveness of
the rewall rule sets and policies. While the role of reaction has traditionally
been assumed by the system or network manager, an IDS that can operate online
and in real time can also be programmed to behave either reactively or proac-
tively. A reactive IDS would respond to the detection of an intrusion by, say,
stopping the suspect process, disconnecting the doubting user or modifying a
router access control list. A proactive IDS instead will take pre-emptive coun-
termeasures, like, actively interrogating all extant user processes and stopping
all processes which did not originate from bona de users at approved sites.
Thus, the proactive IDS which can also be called the Intrusion Prevention Sys-
tem combines the functionalities of a rewall, which has blocking capabilities
depending on where the packet came from, and also an IDS with the deep packet
inspection.
B.6.1 Intrusion Prevention System
Intrusion Prevention Systems (IPS) actively search a computer or network of
computers for security aws and alerts administrator about the security prob-
lems before those problems are exploited by an attacker. However, as new at-
tack methods are discovered, they must be updated with the information about
the attacks. COPS (Computer Oracle and Password System) is an example of
the intrusion prevention system. It is a collection of shell scripts which check
for a variety of security aws, including checking whether the les have the
correct permissions and scanning the system for any les with the setuid bit set.
Present day attacks spread at tremendous speed. These fast moving attacks
can inltrate a network before conventional tools such as anti-virus software
have time to formulate a signature to prevent infection. IPS, with their be-
havioral analysis and speed, operate fast enough to detect such attacks without
performance degradation. Thus an IPS can be properly congured to prevent
intrusions and also worm or virus attacks. The main problem identied with
the Intrusion Prevention System is the critical need to minimize false positives,
Appendix B 232
failing which a legitimate user may be disconnected or the unnecessary shut
down of a network service. Hence the greatest challenge for IPS is to allow
legitimate trafc while blocking attacks and do this without adversely affecting
performance.
Types of IPS
Host-based Intrusion Prevention System (HIPS) OKENAs StormWatch
uses a kernel-based approach and works on servers and workstations. It has
four Interceptors: Network interceptor provides address and port blocking
like a rewall; File system and Conguration interceptors monitor and
prevent changes to critical les or registry keys and the Network and File
system interceptors provide worm prevention.
By correlating events from multiple systems at the management station,
Storm-Watch not only blocks the threat but also pushes out a new policy
to all agents and blocks future attacks. This reduces the number of false
positives and false negatives.
Network-based Intrusion Prevention System (NIPS)
NIPS uses different detection methods, stateful signature detection, pro-
tocol anomaly detection and some proprietary methods to block specic
attacks. The stateful signature detection looks at the relevant portions of
the trafc, where the attack can be perpetrated. It does this by tracking
state and based on the context specied by the user detects an attack.
IPS adds to the defense in depth approach to security and is an evolution of
IDS technology. Its proactive capabilities will help to keep our networks safer
from more sophisticated attacks. Even though NIPS will prevent attacks, if
some thing slips through HIPS would prevent them. HIPS, being the last line of
defense provides operating system hardening with greater granularity and ap-
plication specic control.
Snort Inline is a mode of operation for Snort providing it with the intrusion
prevention capabilities. It uses the Netlters and IPtables software to provide
detection at the application layer to the IPtables rewall so that it can respond
dynamically to real time attacks that take advantage of vulnerabilities at the
Appendix B 233
application level. It is the Netlter/IPtables software that allows for the im-
plementation of the response mechanism while Snort Inline provides policies
based on which IPtables makes the decision to allow or to deny packets. After
an incoming packet to the network is provided by IPtables, Snort performs the
rule matching against the packet. Thus Snort Inline provides a more proactive
and dynamic capability against todays attacks. However, the rule matching is
against a statically created rule base and thus needs prior estimate of the kinds
of attacks that will be seen and the action is taken at the site of detection.
McAfee Internet Security Suite (ISS) has been developed for the Windows op-
erating system platform that integrates many security technologies to protect
desktop computers from malicious code, spam and unwanted or unauthorized
access. Thus it functions both as an antivirus as well as a rewall. The anti-
virus subsystem allows for the detection of viruses, worms, and other types of
malicious code by using a signature-base approach along with a heuristic en-
gine for unknown attacks. The rewall component scans multiple points of data
entry. McAfee IntruShield IPS is a Network Prevention product for encrypted
attacks, botnets, and VoIP vulnerability based attacks. It delivers unique foren-
sic features to analyze key characteristics of known and zero-day threats and
intrusions.
B.7 Intrusion detection using multi-sensor fusion
The motivation for applying sensor fusion in enhancing the performance of in-
trusion detection systems is that a better analysis of existing data gathered by
various individual IDSs can detect many attacks that currently go undetected.
This species the broadest solution of advanced intrusion detection technology
to provide ubiquitous coverage through individual IDSs observing the same net-
work trafc. The essential components are:
Intrusion detection systems that performs real-time monitoring of network
packets
A fusion unit that aggregates the decisions generated by the IDSs
When the individual IDS performance is suboptimal, distributed decision mak-
ing systems and the subsequent fusion of the decisions create a variety of new
Appendix B 234
circumstances that may exacerbate or ameliorate the problems. The IDS fusion
offers the following advantages over a single IDS:
Analytically proved [220] higher system detection rates and lower system
false alarm rates than those of a single IDS or a weighted average.
Error probabilities of fusion system are signicantly reduced and approach
zero[220].
As Axelsson highlights it in [221], In reality there are many different types
of intrusions, and different detectors are needed to detect them. The same
argument is made by Lee et al. [222] and additionally they mention that, Com-
bining evidence from multiple base classiers ... is likely to improve the ef-
fectiveness in detecting intrusions. As such, analyzing the data from multiple
sensors should increase the accuracy of the IDS [222]. Kumar [15] observes
that, Correlation of information from different sources has allowed additional
information to be inferred that may be difcult to obtain directly. Such correla-
tion is also useful in assessing the severity of other threats, be it severe because
an attacker is making a concerted effort to break in to a particular host, or severe
because the source of the activity is a worm with the potential to infect a large
number of hosts in a short amount of time.
Multisensor correlation has long been a theme in Intrusion Detection, especially
as most of the early IDS work took place in the wake of the Morris Worm, as
well as by the need to centrally manage the alerts from a network of host based
IDSs. Recently, a great deal of work has been done to standardize the protocols
that IDS components use to communicate with each other. The rst solid proto-
col to do this is the Common Intrusion Detection Format (CIDF) [223]. CIDF
spurred additional work in protocols for multisensor correlation, for example,
Ning et al. [224] extended CIDF with a query mechanism to allow IDSs to
query their peers to obtain more information on currently observed suspicious
activity.
B.7.1 Existing fusion IDSs
Some of the IDS that make use of multisensor correlation and various fusion
techniques are covered in this section.
Appendix B 235
Research IDSs
The rst couple of IDSs that performed data fusion and cross sensor correla-
tion were the Information Security Ofcers Assistant (ISOA) [225] and the
Distributed Intrusion Detection System (DIDS)[226]. ISOA used the audit in-
formation from numerous hosts whereas DIDS used the audit information from
numerous host and network-based IDSs. Both made use of a rule-based expert
system to perform the centralized analysis. The primary difference between the
two was that ISOA was more focused on anomaly detection and DIDS on mis-
use detection. Additional features of note were that ISOA provided a suite of
statistical analysis tools that could be employed either by the expert system or
a human analyst, and the DIDS expert system featured a limited learning capa-
bility.
EMERALD was an extension to NIDES [7, 227] with a hierarchical analysis
system. The various levels (host, network, enterprize, etc) would each perform
some level of analysis and pass any interesting results up the chain for correla-
tion [228, 229, 230]. It provided a feedback system such that the higher levels
could request more information for a given activity. Of particular interest is
the analysis done at the top level which monitored the system for network-wide
threats such as Internet worm-like attacks, attacks repeated against common
network services across domains, or coordinated attacks from multiple domains
against a single domain, [228]. The EMERALD architects employed numerous
approaches such as statistical analysis, an expert system and modular analysis
engines as they believed, no one paradigm can cover all types of threats.
Commercial IDSs
RealSecure Siteprotector does advanced data correlation and analysis by in-
teroperating with other Realsecure products [231]. Symantec ManHunt [232]
and nSecure nPatrol [233] integrate the means to collect alarms. Cisco IDS
[234] and Network Flight Recorder (NFR) [235] provide a means to do central-
ized sensor conguration and alarm collection. The problem with all of these
systems is that they are designed more for prioritizing what conventional intru-
sion (misuse) detection systems already detect, and not for nding new threats.
Other products, such as Computer Associates eTrust Intrusion Detection Log
Appendix B 236
View [236], and NetSecure Log [237] are more focused on capturing log infor-
mation to a database, and doing basic analysis on it. Such an approach seems
to be more oriented towards insuring the integrity of the audit trail (itself an
important activity in an enterprize environment), than data correlation and anal-
ysis.
B.7.2 Current status of applying sensor fusion in IDS
Despite the proved utility of multiple classier systems, no general answer
to the original question about the possibility of exploiting the strengths while
avoiding the weaknesses of different IDS designs has yet emerged. Many fun-
damental issues are a matter of ongoing research in different research communi-
ties. The results achieved during the past fewyears are also spread over different
research communities, and this makes it difcult to exchange such results and
promote their cross-fertilization.
B.8 Conclusion
Intrusion Detection System is currently gaining considerable interest from both
research community and commercial companies. It has become an indispens-
able and integral component of any comprehensive enterprize security program.
The reason being that the intrusion detection system has the potential to alle-
viate many of the problems facing current network security. A number of the
techniques and solutions found in current systems and literature are outlined in
this work. As evidenced by recent events, however, network security has some
way to go before any network can be considered safe and hence its near-term
future is very promising. It is clear though, that under the pressures of a highly
competitive global research environment, the eld of Intrusion Detection Sys-
tem will re-mould rapidly and overcome many current limitations and hurdles.
Appendix C
Modeling of the Internet Attacks and the
Countermeasure for Detection
Success is the ability to go from one failure to another with no loss of enthu-
siasm.
Winston Churchill
C.1 Introduction
This appendix introduces dynamic models for the attack-detector interactions
with the simple Nicholson-Bailey precursor, in which the detector is randomly
searching for the attack on the network trafc independent of the attack distri-
bution. The dependence between the detectors and their heterogeneity is intro-
duced as a subsequent step. The heterogeneity is incorporated by the use of neg-
ative binomial distribution as introduced in chapter 2, which also accounts for
the non-randomness in the attacks and the detectors. The attack-detector mod-
els that incorporate the attack carrying capacity, detector improvement with the
attacks detected, detector correlation and the non-randomness of attacks and de-
tectors have been derived in this appendix. The proposed modeling idea is new
and the related works other than Shimeall and Williams [54] and Browne et al.
[55] are discussed here.
Ravishankar Iyer et al.[238] combine an analysis of data on security vulnerabil-
ities and a focused source-code examination to develop a Finite State Machine
237
Appendix C 238
(FSM) model to depict and reason about security vulnerabilities and also to ex-
tract characteristics shared by a large class of commonly seen vulnerabilities.
This information is used to devise a generic, randomization-based technique
for protecting against a wide range of security attacks. Jonsson and Olovsson
[239, 240] try to quantitatively model the security intrusion process based on
attacker behavior. This model presents the phases in performing attacks on a
system in the presence of detector system. They discuss the three phases in the
security intrusion process namely the learning phase, the standard attack phase
and the innovative attack phase.
Ed Skoudis [241] in his book Counter Hack: A step-by-step guide to com-
puter attacks and effective defenses presents a model of an attack using ve
phases: reconnaissance, scanning, gaining access, maintaining access and cov-
ering tracks. McDermott [242] mentions that most of the quantitative mod-
els of security or survivability have been dened on a range of probable in-
truder behavior. This measures survivability as a statistic such as mean time to
breach. This kind of purely stochastic quantication is not suitable for high-
consequence systems. Detailed aspects of the intruders attack potential can
have signicant impact on the expected survivability of an approach.
This section also surveys the different research efforts related to the eld of
intrusion correlation. IBM has developed a prototype called the aggregation
and correlation component (ACC) [243]. The purpose of the aggregation and
correlation algorithm is to form groups of related alerts using a small number
of relationships. M2D2 uses a formal data model to include external informa-
tion in the alert correlation process [244]. Four different information types are
handled: information about the monitored system, information about known
vulnerabilities, information about security tools (vulnerability scanners and in-
trusion detection systems), and information generated by the security tools, e.g.
scans and alerts. A relational database is used to store information from IDS
and scanners, together with product information from the ICAT vulnerability
database [245].
SRI has introduced a probabilistic approach to alert correlation [246, 247, 248].
Appendix C 239
To be able to handle heterogeneous alerts, a generic alert template is used. The
correlation is then performed hierarchically. Threads are used to correlate alerts
relating to the same incident on the same sensor. Security incidents are then
composed of the same incidents correlated over several sensors. It is then pos-
sible to create correlated attack reports by correlating over several alert classes.
A similar approach to the one developed at SRI has been chosen by MIT Lin-
coln Laboratory [249]. To perform correlation, alerts are partitioned into ve at-
tack categories called discovery, scan, escalation, denial-of-service, and stealth.
New alerts are possibly added to existing intrusion scenarios after the evalua-
tion of the probability that one attack category is followed by another, the time
difference between alerts, and the proximity of source IP addresses.
One of the sources that signicantly supported numerous concepts developed
in this work is the VulDa, a database of collected attacks and vulnerabilities
from the IBM site. VulDa provides the necessary and profound knowledge of
practical security issues. This database is used for categorizing a large number
of attacks, which yielded results that were highly valuable to the IDS analysis
approach developed in this work. It categorizes more than 350 attacks and an-
alyzes the IDS scopes. It collects information from security-relevant material
like Bugtraq, CERT CC, SANS and NIAP.
The rest of the appendix is organized as follows. Section C.2 models the attack-
detector relationship taking into account the various possibilities of interaction
between the two groups of population. Finally, section C.3 summarizes the
developed model.
C.2 Nicholson-Bailey model
An assumption made in the initial phase while working with the model trying to
dene the attack-detector population dynamics and in addition to the assump-
tions introduced in section 2.5 is given as follows:
Detectors detect attacks randomly and cause the attack to be ineffective or
unsuccessful, at a rate proportional to the detector density. (As an initial
assumption, it is reasonable to consider the case with no prior knowledge
Appendix C 240
on the probable attacks that happen on the Internet. Hence, it is reasonable
to consider that the detectors search randomly for attacks, and more the
number of detectors, more is the chance of attacks becoming ineffective or
unsuccessful.)
The logistic model simulates the effect of the limiting resources on the growth
of the two interacting populations. Such a model can be used to incorporate the
effect of attacks for a detector, or the detector effect for an attack. The trivial
growth rate of both the attack and the detector can be given by the functions
dened by Nicholson-Bailey. Preliminary investigations have been carried out
using the Nicholson-Bailey model [?, ?] to explain the attack-detector growth
rate. This model takes care of the rst four of the assumptions given in section
2.5.
Let A
t
and D
t
denote the number of attacks and detectors at any time t. Let d
denote the detection efciency constant of the IDS on the attack and a denote
the attack increase rate ignoring detection. The number of encounters between
the detector system and the attack is given by:
N
e
= dA
t
D
t
(since d =
N
e
A
t
; when D
t
= 1).
To distribute encounters among attacks, assumption is made that an IDS searches
at random and thus would re-encounter attacks previously detected. Taking into
consideration the stochastic nature of the network trafc, it is easy to assume
that the detector does a random search. Poissons model can hence be used to
distribute encounters among attacks as:
P(X) = exp(G)
G
X
x
X !
;
where P(X) is the probability of X occurrences and G
x
is the average oc-
currence of X. Solving for zero detection, i.e. probability of not being detected
(proportion of attacks undetected) is given by P(0) = exp(G
x
).
Setting G
x
=
N
e
A
t
gives P(0) = exp(
N
e
A
t
).
Appendix C 241
1 2 3 4 5 6
0.5
1
1.5
2
2.5
x 10
4
Time
A
(
t
)

a
n
d

D
(
t
)

A(t)
D(t)
Figure C.1: Attack-Detector relationship using the Nicholson-Bailey model
Since
N
e
A
t
denes the detection efciency constant, multiplying it with the num-
ber of detections D
t
gives the probability of an attack escaping detection by D
t
detectors. P(0) = exp(dD
t
). dD
t
is the effect of detector density D
t
or in
other words the mean or the Poisson rate. Then the probability of being detected
is P(X > 0) = 1 exp(dD
t
). Hence the attack and the detector growth can
be given by:
A
t+1
= aA
t
exp(dD
t
) (C.1)
and
D
t+1
= A
t
[1 exp(dD
t
)] (C.2)
respectively. Figure C.1 shows the attack-detector relationship using the Nicholson-
Bailey model with typical values and initial conditions as a = 0.25, A(1) =
20000, d = 0.9, D(1) = 1 and t varying from 1 to 5. This basic Nicholson-
Bailey model showing the detector performance over the years was in agreement
with the gure of merit of IDSs over the years from 1995 to 2004 as shown in
gure 2.6. The decrease in the growth rate of attacks over the years as seen in
the Figure C.1 depends on the number of detectors initially deployed and also
on the efciency of the deployed detectors. The actual attacks that happen on
the Internet is expected to be more than what gets reported. This is because
some of the attacks may not be detected and then it is only a small portion of
the detected attacks that gets reported.
Appendix C 242
With the available experimental data to test the model and also with the practi-
cal data in the work of Shimeall et al. [54], it is seen that the Nicholson-Bailey
model is in good agreement. It is seen to provide a quantitative possibility of os-
cillations in the attack-detector interactions. The attack-detector interactions are
often characterized by very strong uctuations from year-to-year, and then the
complete extinction of either the attack or the detector. Moreover, during certain
time span, detector levels become so low that the model predicts the eventual
extinction of the detector, when the attacks increase exponentially or vice-versa.
In this appendix, an attempt has been made to model the dynamic relationship
existing between the detectors and the attacks and this knowledge can be used
to enrich the design and development of IDSs. For each combination of a and
d there is an unstable attack-detector equilibrium, with the slightest disturbance
leading to expanding population oscillations. With Nicholson-Bailey equations
C.1 and C.2, it is shown that depending on the initial state, the systemcan evolve
towards a simple steady state or a limit cycle, in which the attack-detector popu-
lations oscillate periodically in time. The attack-detector relationship may thus
exhibit coupled oscillations. The aim is to study the oscillatory behavior of an
attack-detector model with intelligent pursuit and evasion rules. Usually de-
tectors respond to attack distribution. A constant searching efciency is more
difcult to accept. Searching efciency depends on the speed of the trafc,
attack density on a priori grounds and also on the detector density.
C.2.1 Attack/Detection as they stand alone
This appendix investigates the fate of the attack and the detector in the absence
of the other, in order to assess whether the modeling of attack-detector relation-
ship using the Nicholson-Bailey model would be reasonable.
A
t+1
= aA
t
exp(dD
t
)
and
D
t+1
= A
t
[1 exp(dD
t
)]
In the absence of detectors A
t+1
= aA
t
and D
t+1
= 0; attacks increase ex-
ponentially and at any time (t + n), A
t+n
= a
n
A
t
. With this simple model, it
Appendix C 243
is clear that when the detector density D
t
= 0, the attack density will follow
the logistic function. It is reasonable to set an attack carrying capacity k for the
attacks beyond a certain limit if the detectors are totally absent.
In the absence of any attack, it is reasonable to assume that the presence of
detectors is of no use. So the existing detection systems will also die out at
the next instant of time if there are no attacks. i.e., if A
t
= 0; D
t+1
= 0,
A
t+1
= 0. To explain in detail, the beginning years of the Internet is taken as an
year of no attack, the succeeding year will not have any detectors. This happens
till attacks are found and with a latency, detectors evolve. The detector density
is a function of the attack density weighted by the efciency and the density of
the detector. As soon as one of these detector parameters becomes very large,
the detector density will match, but never overcomes the attack density.
The basic model of Nicholson-Bailey can be extended by incorporating addi-
tional features such as density-dependence in the attacks, interference among
detectors, and the refuges. The classication of the detectors spans a wide
range of complexity. The general statistics show that the more the detectors are
successful in detecting the attacks, the more are the chances of highly sophis-
ticated detectors emerging, possibly learning from the detected attacks. Thus
the Nicholson-Bailey model for attack-detector modeling is based on the data
that reect the knowledge that one has about the system and/or the potential
attacks, but it does not express all the different possibilities that are encountered
in the attack-detector interaction. The following sections look into the different
possibilities in order to generalize the attack-detector interactions.
C.2.2 Attack carrying capacity
The Nicholson-Bailey model suffers from the important defect of having the
attacks with a constant rate of increase and thus a potentially unlimited number
of attacks. It is necessary to take into account the fact that the attack density
does not exceed beyond some carrying capacity. The rst denitive theoretical
treatment of this relationship is that while detectors and attacks grow logarith-
mically, the resources on which they depend may not follow such a fast rate
of increase. Thus the demand for resources must eventually exceed the supply,
Appendix C 244
and population growth being dependent on the resource supply, must then cease.
This is mathematically modeled as a logistic equation with the attack remaining
at a saturation value equal to the attack carrying capacity. Practically there are
technical bottlenecks for the attacks to increase beyond this value; other reasons
of the sated state of the attacks can be the ineffectiveness of detectors, or that
the existing attacks serve all the malicious intents. Hence the attacks should sat-
urate at the attack carrying capacity given by A
t+1
= A
t
when A
t
= k, where k
is the attack equilibrium density or the attack carrying capacity. This condition
is substituted in the attack equation
A
t+1
= aA
t
exp(dD
t
)
Attacks saturate as A
t+1
= A
t
= k. This simplies the attack equation to
exp(dD
t
) =
1
a
or dD
t
= ln(a). Hence to incorporate the attack carrying
capacity, the attack-detector equations can be modied as:
A
t+1
= aA
t
exp(ln(a)
A
t
k
dD
t
)
and
D
t+1
= A
t
[1 exp(dD
t
)]
respectively, where ln(a) is density-dependent using the expression ln(a)
A
t
k
,
such that as A
t
approaches k, the growth rate of attack approaches zero. The
introduction of the carrying capacity is to cause the system to be stable, thereby
making the Nicholson-Bailey model more realistic. The impact of the attack
getting sated on the detector is that it will vary depending on the probability of
detecting attacks. The detectors pick up to a stage of maximum detection during
this time when the attacks are sated. If given enough time lag, detectors also
will stabilize. Since k depends on the technology bottleneck, it is expected to
increase every year. Expecting around 500 varieties of attacks in year 2000, it
can be several fold higher now.
The stability of this density-dependent model is determined by the attack in-
crease rate a and also by the detector searching efciency d as shown in Figure
C.4 with k = 500. If the detector is extremely efcient with a large value of d,
Appendix C 245
1 2 3 4 5
0.5
1
1.5
2
2.5
x 10
4
Time
A
(
t
)

a
n
d

D
(
t
)

A(t)
D(t)
Figure C.2: Attack-Detector relationship with attack carrying capacity
then it is expected to hold the attacks below their carrying capacity. The dynam-
ics are determined most strongly by the unstable attack-detector interactions. If
the detector is inefcient with a small value of d, attack dynamics are deter-
mined largely by the density-dependent feedback. Thus the density-dependent
attack growth rate can be applied to populations facing limited resources, a
situation that is unlikely to occur in successful cases of stable systems where
equilibria occur at very low levels where resources are not limited. Thus the
detections and the increase rate can be considered as a means of stabilizing the
interactions.
C.2.3 Stability in attack-detector model
With D
t
detectors searching for A
t
attacks, the ones that are not detected or sur-
vive patch xing, along with the new attacks generated in the interval between
t and t + 1 are given by A
t+1
. Similarly, there will be detectors that learn from
the attacks detected and also there will be detectors that remain effective even
with new vulnerabilities or service closure at any point of time and hence D
t+1
denote the number of detectors at any time t + 1. This is a cyclic pickup of at-
tacks and detectors and hence is oscillatory in nature without any over damping
as such.
It is shown in section 2.5.2 that simple detector models when aggregated in
a network of high attack density contribute to the stability of an attack-detector
Appendix C 246
interaction. For the detection of external intrusion activities, if there are multiple
paths to the Internet, an IDS needs to be present at every entry point, whereas
for the detection of internal intrusion activities, an IDS is required in every
network segment. This species the broadest solution of advanced intrusion
detection technology to provide ubiquitous coverage through individual IDSs
spread everywhere on the network. The success of a security system depends
on the detector or the security measures in reducing the attack population and
maintaining it at a new lower level in a stable interaction. These equilibrium
levels depend on the following two factors:
1. the effective rate of increase of the attack unaffected by detection.
2. the average proportion of the attacks detected, which in turn depends on
the number of detectors and all factors affecting the searching efciency
(
N
e
A
t
D
t
).
The IDSs that are likely to stabilize the attack population at low levels have the
following characteristics:
high intrinsic searching efciency
small attack handling time
detector interference to a certain level
high level of detector aggregation using the techniques of sensor fusion
C.2.4 Inclusion of stealthy attacks
If both the attack and the detector were randomly and independently distributed
in Nicholson-Bailey fashion, then the proportion of the attacks escaping detec-
tion at time t is given by e
dD
t
. If a proportion b of the attacks that are at the
risk of detection are allowed to hide themselves, for example, with the fragmen-
tation of packets or even tunneling, then the proportion of the attacks escaping
detection is raised to:
e
dD
t
+b(1 e
dD
t
) for 0 b 1
Appendix C 247
The equations for the number of attacks and detectors at any instant of time
t are:
A
t+1
= aA
t
(b + (1 b)exp(dD
t
)
and
D
t+1
= A
t
A
t+1
/a
The equilibrium solutions of the above equation are as:
dD
= ln
_
a(1b)
1ab
_
and
A
=
_
a
a1
.
It is necessary for the value of b to lie between 0 and 1 for a solution to exist.
These solutions are stable against small disturbances. When b tends towards
zero, the system resembles the Nicholson-Bailey model.
C.2.5 Modeling of non-random attacks and detection
The random search is a mathematically convenient assumption, but not a real-
istic one. When a network intrusion happens, the sequence of attacks does not
take place in a totally random order. Intruders come with a set of tools trying
to achieve a specic goal. The selection of the cracking tools and the order of
application depends heavily on the situation as well as the responses from the
targeted system. Typically there are multiple ways to invade a system. Nev-
ertheless, it usually requires several actions/tools to be applied in a particular
logical order to launch a sequence of effective attacks to achieve a particular
goal. It is this logical partial order that reveals the short and long-term goals
of the invasion. Random search is an exception rather than a rule. Real attacks
are likely to be distributed in a patchwork of high and low densities, and the
detectors can be expected to respond to the attacks by orienting towards high
density patches. It is natural that the detector searches for certain trafc features
for signs of attack. This provides a strong selective advantage for detectors that
result in a more focused search. The modeling of non-random behavior of the
attack-detector interactions is provided in detail in chapter 2.
Appendix C 248
C.3 Summary
The modeling shows the restricted growth rates of both the attacks as well as its
detection. With the existing IDSs it is not possible to attain a growth rate so that
the effect of attacks is not felt in the information systems. Hence, it is required
to look at advanced techniques for performance enhancement of the available
IDSs. The level of severity of the alert is understood with this modeling. This
knowledge could then potentially be used by a security analyst to understand
and respond more effectively to future intrusions. As seen from the model, the
existing as well as emerging attacks are not expected to totally evade the de-
tectors monitoring the network. The modeling is realistic in an environment of
network with multiple IDSs for protection, looking at the system as a whole,
instead of the individual responses to an attack. For more proactive defense, it
is essential to understand the network defensive and offensive strategies. With
the attack-detector scenario better understood, the future evolution of attacks
can be estimated in a certain way thereby aiding better attack detection and in
turn reduced false negatives. This knowledge helps the security community to
become proactive rather than reactive with respect to incident response.
Appendix D
Methodology for Evaluation of Intrusion
Detection Systems
Make everything as simple as possible, but not simpler.
Albert Einstein
D.1 Introduction
The poor understanding of the performance of Intrusion Detection Systems
available in literature may be in-part caused by the shortage of an effective,
unbiased evaluation and testing methodology that is both scientically rigorous
and technically feasible. The choice of intrusion detection systems for a partic-
ular environment is a general problem, more concisely stated as the intrusion
detection evaluation problem, and its solution usually depends on several fac-
tors. The most basic of these factors are the false alarm rate and the detection
rate, and their tradeoff can be intuitively analyzed with the help of the Receiver
Operating Characteristic (ROC) curve [14], [57], [12], [58], [59]. However, as
pointed out by the earlier investigators [21] [60] [61], the information provided
by the detection rate and the false alarm rate alone might not be enough to pro-
vide a good evaluation of the performance of an IDS. Hence, the evaluation
metrics need to consider the environment the IDS is going to operate in, such as
the maintenance costs and the hostility of the operating environment (the like-
lihood of an attack). In an effort to provide such an evaluation method, several
performance metrics such as Bayesian detection rate [21], expected cost [60],
sensitivity [62] and intrusion detection capability [63], have been proposed in
249
Appendix D 250
literature. These metrics usually assume the knowledge of some uncertain pa-
rameters like the likelihood of an attack, or the costs of false alarms and missed
detections. Yet despite the fact that each of these performance metrics makes
their own contribution to the analysis of intrusion detection systems, they are
rarely applied in the literature when proposing a new IDS.
This Appendix introduces a framework for evaluating IDSs along with some
new metrics for IDS evaluation. Classication accuracy in intrusion detection
systems deals with such fundamental problems as how to compare two or more
IDSs, how to evaluate the performance of an IDS, and how to determine the
best conguration of an IDS. In an effort to analyze and solve these related
problems, evaluation metrics such as Area Under ROC Curve, precision, re-
call, and F-score, have been introduced. Additionally, we introduce the P-test,
which is more of an intuitive way of comparing two IDSs and also more rel-
evant to intrusion detection evaluation problem. We also introduce a formal
framework for reasoning about the performance of an IDS and the proposed
metrics against adaptive adversaries. We provide simulations and experimental
results with these metrics using the real-world network trafc and the DARPA
1999 data set in order to illustrate the benets of the proposed algorithms in the
chapters ve to nine.
D.2 Metrics for performance evaluation
This section introduces the metrics for IDS performance evaluation with its mer-
its and demerits for such an evaluation and analyzes them in a unied frame-
work.
D.2.1 Detection rate and false alarm rate
Let TP be the number of attacks that are correctly detected, FN be the number
of attacks that are not detected, TN be the number of normal trafc packet/connections
that are correctly classied, and FP be the number of normal trafc packet/connections
that are incorrectly detected as attack. In the case of an IDS, there are both the
security requirements and the usability requirements. The security requirement
is determined by the TPrate and the usability requirement is decided by the
Appendix D 251
number of FPs. There is a natural trade-off between these two metrics. The
concept of nding the optimal trade-off of the metrics used to evaluate an IDS
is an instance of the more general problem of multi-criteria optimization. In
this setting, we want to maximize (or minimize) two quantities that are related
by a trade-off, which can be done via two approaches. The rst approach is to
directly compare the two metrics via a trade-off curve. The second approach
is to nd a suitable way of combining these two metrics in a single objective
function to optimize. We therefore classify the above dened metrics into two
general approaches that will be explored in the rest of this section: the tradeoff
approach and the maximization of a gure-of-merit value.
D.2.2 Receiver Operating Characteristic (ROC) Curve
ROC curves are used to evaluate classier performance over a range of trade-
offs between TP
rate
and FP
rate
. ROC curve is a plot which has the x-axis as the
false alarmrate and y-axis as the detection rate. i.e., ROC=< TP
rate
, FP
rate
>.
One of the benets of ROC graphs is its ability to separate error cost con-
siderations from the IDS performance. Additionally, the ROC curves remain
invariant under changing class distributions. However, the disadvantage with
the ROC curve is that even small changes in false alarm rate may cause drastic
differences in detection rate when the normal trafc abound in comparison to
the attack trafc in the network trafc.
D.2.3 The Area Under ROC Curve (AUC)
AUC is a convenient way of comparing IDSs. AUC is the performance metric
for the ROC curve. A random IDS has an area of 0.5 whereas an ideal one has
an area of one.
D.2.4 Accuracy
The commonly used IDS evaluation metric on a test data is the overall accuracy.
Overall Accuracy =
TP+TN
TP+FP+TN+FN
Appendix D 252
Overall Accuracy is not a good metric for comparison in the case of network
trafc data since the true negatives abound.
D.2.5 Precision
Precision (P) is a measure of what fraction of test data detected as attack is
actually from the attack class.
P =
TP
TP+FP
D.2.6 Recall
Recall (R) is a measure of what fraction of attack class is correctly detected.
R =
TP
TP+FN
There is a trade-off between the two metrics precision and recall. As the num-
ber of detections increase by lowering of the threshold, the recall will increase,
while precision is expected to decrease. A plot showing the recall-precision
characterization of a particular IDS is used to analyze the relative and absolute
performance of an IDS over a range of operating conditions.
D.2.7 F-score
F-score scores the balance between precision and recall. The F-score is a mea-
sure of the accuracy of a test. The F-score can be considered as the harmonic
mean of recall and precision, and is given as:
F-score =
2PR
P+R
The standard measures, namely, precision, recall, and F-score are grounded on a
probabilistic framework and hence allows one to take into account the intrinsic
variability of performance estimation. The comparison of IDSs with the metric
F-score has the limitation in directly applying tests of signicance to it in order
Appendix D 253
to determine the condence level of the comparison. The primary goal was to
achieve improvement in both precision as well as recall, and hence P-test [110]
was used for IDS comparison.
D.2.8 P-test
To compare two IDS X and Y , let (R
X
, P
X
) and (R
Y
, P
Y
) be the values of re-
call and precision with respect to attack respectively. Let IDS X and Y predict
N
Pos
X
and N
Pos
Y
positives respectively and N
Pos
be the total number of positives
in the test sample. Then the P-test is applied as follows:
Z
R
=
R
X
R
Y
2R(1R)/N
Pos
Z
P
=
P
X
P
Y
2P(1P)(1/N
Pos
X
+1/N
Pos
Y
)
where R =
R
X
+R
Y
2
and P =
N
Pos
X
P
X
+N
Pos
Y
P
Y
N
Pos
X
+N
Pos
Y
If Z
R
1.96, then R
X
can be regarded as being signicantly better than R
Y
at
the 95% condence level.
If Z
R
1.96, then R
X
can be regarded as being signicantly poorer than
R
Y
at the 95% condence level.
If [Z
R
[ 1.96, then R
X
can be regarded as being comparable to R
Y
.
Similar tests are applied to compare P
X
and P
Y
.
Now, in order to compare the two IDSs X and Y ;
IDS X is better than IDS Y if either of the following criteria is satised:
R
X
R
Y
and P
X
P
Y
R
X
R
Y
and P
X
P
Y
R
X
R
Y
and P
X
P
Y
Appendix D 254
R
X
R
Y
and P
X
P
Y
, then X Y .
It may so happen that one metric is signicantly better and the other metric
is signicantly worse. In such cases of conict, the non-probabilistic metric
F-score can be used instead of applying the signicance test.
D.3 Test setup
The test setup for the experimental evaluation undertaken in this thesis work
consists of three Pentium machines with Linux operating system. A combina-
tion of shallow and deep sensors distributed across a single subnet and observ-
ing the same domain is required for a good protection. Intrusion detection sys-
tems can extract information to detect attacks from different layers like packet
headers (shallow), packet payload (deep) or both. To take advantage of such a
complementary collection, the following three IDSs are chosen:
1. PHAD[67], which is based on attack detection by extracting the packet
header information,
2. ALAD[68], which is application payload-based,
3. Snort[69], which collects information from both the header and the pay-
load part of every packet on time-based as well as on connection-based
manner.
This choice of heterogeneous sensors in terms of their functionality is to exploit
the advantages of fusion IDS [94]. In addition, complementary IDSs provide
versatility and similar IDSs ensure reliability. An experimental Packet Header
Anomaly Detector (PHAD) [67] that monitors the 33 elds of the Ethernet,
TCP, UDP and ICMP protocols is chosen as one of the IDSs for the combina-
tion. Observing the header elds makes it efcient to detect Probes and DoS
attacks. The second sensor chosen is Application Layer Anomaly Detector
(ALAD) [68] and it complements PHAD in detection by monitoring incoming
TCP connections to well-known server ports. ALAD has six attributes for de-
tection namely source IP address, destination IP address, destination port, TCP
ags, application keywords and the application argument. It detects the R2L
Appendix D 255
attack with high detection rate since R2L attack normally exploits the applica-
tion layer. Other than the diversity of the chosen IDSs, yet another reason for
the choice of the two anomaly detectors PHAD and ALAD was the acceptably
low false alarm rates. Snort is an open source network intrusion prevention and
detection system utilizing a rule-driven language, which combines the benets
of signature, protocol and anomaly based inspection methods. Snort is the most
widely deployed intrusion detection and prevention technology worldwide and
has become the de facto standard for the industry. Snort is efcient in detecting
the DoS attacks and the U2R attacks with high detection rate.
D.4 Summary
In an effort to analyze and solve the IDS evaluation problems identied in this
thesis, evaluation metrics such as Area Under ROC Curve, precision, recall, and
F-score have been introduced in this appendix. Additionally, the P-test, which
is more of an intuitive way of comparing two IDSs and also more relevant to
intrusion detection evaluation problem has been included. The metrics used for
IDS evaluation like F-score and P-test are highly effective for a perfect compar-
ison of IDSs.
References
[1] M. McLuhan, Letters of Marshall McLuhan, Oxford University Press,
1987, pp. 254.
[2] Internet Domain Survey Host Count, https://www.isc.org/solutions/survey
[3] J. McHugh, A. Christie, J.Allen, Defending Yourself: The Role of Intru-
sion Detection Systems, IEEE software, Sep/Oct. 2000.
[4] Losses due to cyber crime can be as high as $40 billion, Business Line,
Business Daily fromTHE HINDUgroup of publications, Monday, May21,
2007.
[5] CSI/FBI Computer Crime and Security Survey,
http://www.gocsi.comipress/20020407
[6] J.P. Anderson, Computer Security Threat Monitoring and Surveillance,
Technical report, James P. Anderson Co., Fort Washington, PA., April
1980.
[7] D.E. Denning, An Intrusion-Detection Model, IEEE Transactions on Soft-
ware Engineering, vol. SE-13, pp. 222-232, 1987.
[8] P. Helman, G. Liepins, Statistical Foundations of Audit Trail Analysis for
the Detection of Computer Misuse. In IEEE Transactions on Software En-
gineering, volume Vol 19, No. 9, pages 886-901, 1993.
[9] H.S. Javitz, A. Valdes, The NIDES Statistical Component Description and
Justication, Technical report, SRI International, Menlo Park, CA, March
1994.
256
REFERENCES 257
[10] C. Ko, M. Ruschitzka, K. Levitt, Execution Monitoring of Security-
Critical Programs in Distributed Systems: A Specication-based Ap-
proach, In Proceedings of the 1997 IEEE Symposium on Security and
Privacy, pp. 175-187, May 1997.
[11] D. Wagner, D. Dean, Intrusion Detection via Static Analysis, In Proceed-
ings of the IEEE Symposium on Security and Privacy, IEEE Press, 2001.
[12] C. Warrender, S. Forrest, B.A. Pearlmutter, Detecting intrusions using sys-
tem calls: Alternative data models, In IEEE Symposium on Security and
Privacy, pages 133-145, 1999.
[13] DEF CON 8 conference. Las Vegas, NV, 2000. www.defcon.org
[14] W. Lee, S.J. Stolfo, P.K. Chan, E. Eskin, W. Fan, M. Miller, S. Her-
shkop, and J. Zhang, Real time data mining-based intrusion detection. In
Proc. Second DARPA Information Survivability Conference and Exposi-
tion, IEEE Computer Society, pp. 85100.
[15] S. Kumar, Classication and Detection of Computer Intrusions, PhD the-
sis, West Lafayette, IN: Purdue University, Computer Sciences, 1995.
[16] T.D. Lane, Machine Learning Techniques for the computer security do-
main of anomaly detection, Ph. D. thesis, Purdue Univ., West Lafayette,
IN, 2000.
[17] F. Neri, Comparing local search with respect to genetic evolution to detect
intrusion in computer networks, In Proc. of the 2000 Congress on Evolu-
tionary Computation CEC00, IEEE Press, pp. 238243
[18] P.K. Chan, S. Stolfo, Toward parallel and distributed learning by meta-
learning, In Working Notes AAAI Work, Knowledge Discovery in
Databases, Portland, OR, pp. 227240, AAAI Press, 1993.
[19] A. L. Prodromidis, and S. J. Stolfo, Cost complexity-based pruning of
ensemble classiers, Knowledge and Information Systems 3(4), 449469,
2001.
REFERENCES 258
[20] P.L. Carbone, Data mining or knowledge discovery in databases: An
overview, In Data Management Handbook, New York: Auerbach Publi-
cations, 1997.
[21] S. Axelsson, A preliminary attempt to apply detection and estimation the-
ory to intrusion detection, Technical Report 00-4, Chalmers Univ. of Tech-
nology, Goteborg, Sweden, 2000.
[22] W. Lee, A Data Mining Framework for Constructing Features and Models
for Intrusion Detection Systems, Ph. D. thesis, Columbia University.
[23] W. Lee, R. A. Nimbalkar, K. K. Yee, S. B. Patil, P. H. Desai, T. T. Tran,
and S. J. Stolfo, A data mining and CIDF based approach for detecting
novel and distributed intrusions, 2000.
[24] W. Lee, and S. J. Stolfo, Data mining approaches for intrusion detection,
In Proc. of the 7th USENIX Security Symp., San Antonio, TX. USENIX,
1998.
[25] M.V. Mahoney, P.K. Chan, Learning non stationary models of normal net-
work trafc for detecting novel attacks, SIGKDD, 2002.
[26] W. Fan, Cost-Sensitive, Scalable and Adaptive Learning Using Ensemble-
based Methods. Ph. D. thesis, Columbia University, 2001.
[27] K. Kendall, A database of computer attacks for the evaluation of intrusion
detection sytsems, Thesis, MIT, 1999.
[28] L. Didaci, G. Giacinto, F. Roli, Intrusion detection in computer networks
by multiple classiers systems, International Conference on Pattern recog-
nition, 2002.
[29] G. Giacinto, and F. Roli, Intrusion detection in computer networks by mul-
tiple classier systems, In Proc. of the 16th International Conference on
Pattern Recognition (ICPR), Volume 2, Quebec City, Canada, pp. 390393.
IEEE press, 2002.
[30] A. Porras and P. G. Neumann, EMERALD: Event Monitoring Enabling
Responses to Anomalous Live Disturbances, Proc. 20th NISSC, pp. 353-
365, 1997.
REFERENCES 259
[31] M. Kubat, R.C. Holte, S. MATWIN, Learning when negative examples
abound: One-sided selection, Proceedings of the ninth European Confer-
ence on machine learning, pp. 146-153, 1997.
[32] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. STONE, Classication and
regression trees, Belmount, CA: Wadsworth, 1984.
[33] P.K. Chan, S. Stolfo, Towards scalable learning with non-uniform class
and cost distributions: A case study in credit card fraud detection, Pro-
ceedings of 4rth International Conference on knowledge discovery and
data mining (KDD-98), pp. 164-168, 1998.
[34] K. McCarthy, B. Zabar, G. Weiss, Does cost-sensitive learning beat sam-
pling for classifying rare classes?, Proceedings of the 1st International
workshop on utility-based data mining, pp. 69-77, 2005.
[35] Z. Chair, P.K. Varshney, Optimal data fusion in multiple sensor detection
systems, IEEE Transactions on Aerospace and Electronics systems, 22, 1,
98-101, 1986.
[36] M.V. Joshi, On evaluating performance of classiers for rare classes, Pro-
ceedings of the 2002 IEEE International Conference on data mining, pp.
641-644, 2002.
[37] W.W. Cohen, Fast effective rule induction, Proceedings of 12th Interna-
tional conference on machine learning, California, 1995.
[38] J.R. Quinlan, C4.5: Programs for machine learning, Morgan Kaufmann,
1993
[39] K. Lan, A. Hussain, D. Dutta, Effect of malicious trafc on the network,
Proceedings of PAM, 2003.
[40] S. Bay, A framework for discovering anomalous regimes
in multivariate time-series data with local models, 2004,
http://cll.stanford.edu/symposia/anomaly/abstracts.html
[41] T. Fawcett, Activity monitoring: anomaly detection as an on-line classi-
cation, 2004.
REFERENCES 260
[42] DARPA intrusion detection evaluation,
http://www.ll.mit.edu/IST/ideval/data/
data index.html
[43] W. Lee, S.J.Stolfo, A Data Mining framework for building intrusion de-
tection models, IEEE Symposium on Security and Privacy, 1999.
[44] T. Lane, C.E. Brodley, Temporal sequence learning and data reduction for
anomaly detection, ACM Trans. Inform. Syst. Secur. 2 (3), 1999.
[45] M. Thottan, C. Ji, Anomaly detection in IP networks, IEEE Trans. Signal
Processing 51 (8) (2003) 21912204.
[46] S. Jin, D. Yeung, A covariance analysis model for DDoS attack detec-
tion, IEEE International Communication Conference (ICC04), vol. 4, June
2004, pp. 2024.
[47] S. Jina, D. S. Yeunga, XizhaoWangb, Network intrusion detection in co-
variance feature space, Pattern Recognition, vol.40, pp 2185-2197, 2007.
[48] DARPA intrusion detection evaluation, http://www.ll.mit.edu/IST/ideval/
[49] S. Axelsson. The base-rate fallacy and its implications for the difculty of
intrusion detection. In Proceedings of the 6th ACMConference on Com-
puter and Communications Security (CCS 99), pages 17, November 1999.
[50] D.E. Denning, Information Warfare and Security, Addison Wesley, 1999.
[51] W. Lee, W. Fan, M. Miller, S. Stolfo, E. Zadok, Toward cost-sensitive
modeling for intrusion detection and response, Technical report CUCS-
002-00, Computer Science, Columbia University, 2000.
[52] Elkan, C., Results of the KDD99 classier learning, SIGKDD Explo-
rations, Vol. 1, Issue 2, pp. 63-64, Jan 2000.
[53] CERT report of vulnerabilities, http://www.cert.org/stats/cert stats.htm/
#vulnerabilities
[54] T. Shimeall, P. Williams, Models of Information Security Trend Analysis,
www.cert.org/archive/pdf/info-security.pdf
REFERENCES 261
[55] H.K. Browne, W.A. Arbaugh, J. McHugh, W. L. Fithen, A Trend Analysis
of Exploitations,www.cs.umd.edu/ waa/pubs/CS-TR-4200.pdf
[56] L. Hamza, K. Adi, K. El Guemhioui, Automatic generation of attack sce-
narios for intrusion detection systems, Proceedings of the advanced Inter-
national Conference on Telecommunications and International Conference
on Internet and Web applications and services, AICT/ICIW, 2006.
[57] S.J. Stolfo, and K. Mok, A data mining framework for building intrusion
detection models. In Proceedings of the IEEE Symposium on SecurityPri-
vacy, pages 120132, Oakland, CA, USA, 1999.
[58] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo. A geometric
framework for unsupervised anomaly detection: Detecting intrusions in
unlabeled data. In D. Barbara and S. Jajodia, editors, Data Mining for
Security Applications. Kluwer, 2002.
[59] C. Kruegel, D. Mutz,W. Robertson, and F. Valeur. Bayesian event classi-
cation for intrusion detection. In Proceedings of the 19th Annual Computer
Security Applications Conference (ACSAC), pp. 1424, 2003.
[60] J. E. Gaffney and J. W. Ulvila. Evaluation of intrusion detectors:A decision
theory approach. In Proceedings of the 2001 IEEE Symposium on Security
and Privacy, pages 5061, Oakland, CA, USA, 2001.
[61] G. Gu, P. Fogla, D. Dagon, W. Lee, and B. Skoric. Measuring intrusion
detection capability: An information-theoretic approach. In Proceedings
of ACM Symposium on Information, Computer and Communications Se-
curity (ASIACCS 06), Taipei, Taiwan, March 2006.
[62] G. Di Crescenzo, A. Ghosh, and R. Talpade. Towards a theory of intrusion
detection. In ESORICS 2005, 10th European Symposium on Research in
Computer Security, Springer Lecture Notes in Computer Science, 3679,
pp. 267286, 2005.
[63] G. Gu, P. Fogla, D. Dagon, W. Lee, B. Skoric, Towards an Information-
Theoretic Framework for Analyzing Intrusion Detection Systems, In Pro-
ceedings of the 11th European Symposium on Research in Computer Se-
curity.
REFERENCES 262
[64] J. McHugh, Testing Intrusion Detection Systems: A Critique of the 1998
and 1999 DARPA IDS evaluations as performed by Lincoln Laboratory,
ACM Transactions on Information and System Security, vol.3, No.4, Nov.
2000.
[65] M. V. Mahoney, P. K. Chan, An analysis of the 1999 DARPA/Lincoln Lab-
oratory evaluation data for network anomaly detection, Technical Report
CS-2003-02.
[66] V. Paxson, The Internet Trafc Archive. http://ita.ee.lbl.gov/2002
[67] M.V. Mahoney, P.K. Chan, Detecting Novel attacks by identifying anoma-
lous Network Packet Headers, Florida Institute of Technology Technical
Report CS-2001-2.
[68] M.V. Mahoney, P.K. Chan, Learning non stationary models of normal net-
work trafc for detecting novel attacks, SIGKDD, 2002.
[69] Snort Manual, www.snort.org/docs/snort htmanuals/htmanual 260
[70] Cisco IDS4215 manual, http://www.cisco.com/en/US/products/hw/vpndevc/
ps4077/index.html
[71] S. T. Brugger, J. Chow, An assessment of the DARPA IDS evaluation
dataset using Snort, Tech. Report, CSE-2007-1, 2005.
[72] K. J. Pickering, Evaluating the viability of intrusion detection system
benchmarking, Bachelor Thesis, University of Virginia, US, 2002.
[73] S. M. Bellovin, Packets found on an Internet, Technical report, AT&T Bell
Laboratories, May, 1992.
[74] J. Sommers, V. Yegneswaran, P. Barford, Toward comprehensive trafc
generation for online IDS evaluation, Technical Report, University of Wis-
consin.
[75] SAFE: A security blueprint for enterprise networks, White paper, Cisco
Systems, 2000.
REFERENCES 263
[76] R. Durst, T. Champion, B. Witten, E. Miller, L. Spagnuolo, Testing and
evaluating computer intrusion detection systems, Communications of the
ACM, vol.42, No.7, Jul. 1999.
[77] S.S. Iyengar, R.R. Brooks, Multi-Sensor Fusion: Fundamentals and Ap-
plications with Software, Prentice Hall, 1998.
[78] W. Fan, W. Lee, S. J. Stolfo, and M. Miller, Amultiple model cost sensitive
approach for intrusion detection. In R. L. de Mantaras and E. Plaza (Eds.),
Proc. of Machine Learning: ECML 2000, 11th European Conference on
Machine Learning, Volume 1810, Springer Lecture Notes in Computer
Science, Barcelona, Spain, pp. 142153.
[79] C. Siaterlis, B. Maglaris, Towards Multisensor Data Fusion for DoS detec-
tion, ACM Symposium on Applied Computing, 2004.
[80] G. Brown, Diversity in Neural Network ensembles, PhD thesis, The Uni-
versity of Birmingham, B15 2TT United Kingdom, 2004.
[81] R.S. Blum, On multisensor image fusion performance limits from an esti-
mation theory, Information Fusion Journal, vol.7 , 3, pp. 250-263, 2006.
[82] B.V. Dasarathy, Sensor fusion potential exploitation-innovative architec-
tures and illustrative applications, Proceedings of the IEEE, Vol. 85, 1,
pp.24-38, 1997.
[83] O. Cohen, Y. Edan, E. Schechtman, Statistical Evaluation Method for
Comparing Grid Map Based Sensor Fusion Algorithms, International Jour-
nal of Robotic Research, Vol. 25, No. 2, pp.117-133, 2006.
[84] X. R. Li, Y.-M. Zhu, and C.-Z. Han, Unied optimal linear estimation
fusion-part I: Unied models and fusion rules, Proc. 2000 International
Conf. Information Fusion, pp. MoC2-10-MoC2-17, 2000.
[85] A. Krogh and J. Vedelsby, Neural network ensembles, cross validation,
and active learning, NIPS, 7, pp.231-238, 1995.
[86] D. H. Hall, S. A. H. McMullen, Mathematical Techniques in Multi-Sensor
Data Fusion, Second Edition, Artech House.
REFERENCES 264
[87] P.J. Nahin, J.L. Pokoski, NCTR Plus Sensor Fusion Equals IFFN or can
Two Plus Two Equal Five?, IEEE Transactions on Aerospace and Elec-
tronic Systems,vol. AES-16, 3, pp.320-337, 1980.
[88] S.C.A. Thomopoulos, R. Vishwanathan, D.C. Bougoulias, Optimal deci-
sion fusion in multiple sensor systems, IEEE Transactions on Aerospace
and Electronics Systems, vol. 23, 5, pp.644-651, 1987.
[89] W. Baek and S. Bommareddy, Optimal m-ary data fusion with distributed
sensors, IEEE Transactions on Aerospace and Electronics Systems, vol.31,
3, pp.1150-1152, 1995.
[90] V. Aalo, R. Viswanathan, On distributed detection with correlated sensors:
Two examples, IEEE Trans. Aerospace Electron. Syst., vol.25, pp.414-
421.
[91] E. Drakopoulos, C.C. Lee, Optimum multisensor fusion of correlated lo-
cal, IEEE Trans. Aerospace Electron. Syst., vol.27, pp.593-606.
[92] M. Kam, Q. Zhu, W. Gray, Optimal data fusion of correlated local deci-
sions in multiple sensor detection systems. IEEE Trans. Aerospace Elec-
tron. Syst., vol.28, pp.916-920.
[93] R. Blum, S. Kassam, H. Poor, Distributed detection with multiple sensors
- Part II: Advanced topics, Proceedings of IEEE, pp.64-79.
[94] T. Bass, Multisensor Data Fusion for Next Generation Distributed Intru-
sion Detection Systems, IRIS National Symposium, 1999.
[95] G. Giacinto, F. Roli, L. Didaci, Fusion of multiple Classiers for Intru-
sion Detection in Computer Networks, Pattern Recognition Letters, 24,
pp. 1795-1803, 2003.
[96] Y. Wang, H. Yang, X.Wang, R. Zhang, Distributed intrusion detection
system based on data fusion method, Intelligent control and automation,
WCICA 2004.
[97] W. Hu, J. Li, Q. Gao, Intrusion Detection Engine on Dempster-Shafers
Theory of Evidence, Proceedings of International Conference on Commu-
nications , Circuits and Systems, vol.3, pp. 1627-1631, Jun 2006.
REFERENCES 265
[98] A. Siraj, R.B. Vaughn, S.M. Bridges, Intrusion Sensor Data Fusion in an
Intelligent Intrusion Detection System Architecture, Proceedings of the
37th Hawaii international Conference on System Sciences, 2004.
[99] R. Perdisci, G. Giacinto, F. Roli, Alarm clustering for intrusion detec-
tion systems in computer networks, Engg. applications of Articial intel-
ligence, Elsevier publications, March 2006.
[100] A. Valdes, K. Skinner, Probabilistic alert correlation, Springer Verlag
Lecture notes in Computer Science, 2001.
[101] O.M. Dain, R.K. Cunningham, Building Scenarios from a Heterogeneous
Alert Stream, IEEE Workshop on Information Assurance and Security,
2001.
[102] F. Cuppens, A. Miege, Alert correlation in a cooperative intrusion detec-
tion framework, Proceedings of the 2002 IEEE symposium on security and
privacy, 2002.
[103] B. Morin, H. Debar, Correlation of Intrusion Symptoms : an Application
of Chronicles, RAID 2003.
[104] H. Debar, A. Wespi, Aggregation and Correlation of Intrusion-Detection
Alerts, RAID 2001.
[105] F. Valeur, G. Vigna, C. Kruegel, R. Kemmerer, A Comprehensive Ap-
proach to Intrusion Detection Alert Correlation, In IEEE Transactions on
Dependable and Secure Computing, 2004.
[106] H. Wu, M. Seigel, R. Stiefelhagen, J. Yang, Sensor Fusion using
Dempster-Shafer Theory, IEEE Instrumentation and Measurement Tech-
nology Conference, 2002.
[107] M. Zhu, S. Ding, R. R. Brooks, Q. Wu, S. S. Iyengar, N. S. V. Rao, Deci-
sion making-based multiple sensor data Fusion, Report, US Department
of Energy.
[108] rfp@wiretrip.net/libwhisker
REFERENCES 266
[109] R.C. Holte, N. Japkowicz, C.X. Ling, Learning from imbalanced data
sets, Technical Report WS-00-05, AAAI Press, Menlo Park, CA.
[110] R. Agarwal, M.V. Joshi, PNrule: A new framework for learning classier
models in data mining (a case-study in network intrusion detection), Tech.
Rep. RC 21719, IBM Research report, Computer Science/Mathematics,
2000.
[111] Lippmann, R.P., An introduction to computing with Neural Nets, IEEE
ASSP Magazine, Vol.4, pp. 4-22, April 1987.
[112] G. Shafer, A Mathematical Theory of Evidence, Princeton University
Press.
[113] G. Shafer, Perspectives on the theory and practice of belief functions,
International Journal of Approximate Reasoning 31-40, 1990.
[114] P. Smets, What is Dempster-Shafers model? in Advances in the
Dempster-Shafer theory of evidence Pages:5-34, John Wiley Sons, 1994,
iridia.ulb.ac.be/ psmets/WhatIsDS.pdf
[115] G. Pasi, R. R. Yager, Modeling the concept of majority opinion in group
decision making, Information Sciences 176, 390414, 2006.
[116] R. R. Yager, On the determination of strength of belief for decision sup-
port under uncertaintyPart II: fusing strengths of belief, Fuzzy Sets and
Systems 142, 129142, 2004.
[117] P. Smets, The combination of evidence in the transferable belief
model, IEEE Transactions on pattern analysis and machine intelligence,
12(5):447458, May 1990.
[118] D. Yonga,S. WenKanga, Z. ZhenFub, L. Qi, Combining belief functions
based on distance of evidence, Science Direct, Volume 38, Issue 3, Pages
489-493,Dec.2004.
[119] R.R. Tenney, N. R. Sandel, Detection and distributed sensors, IEEE
Trans. Aerospace Electronic Systems 23(4), 501509, 1981.
REFERENCES 267
[120] J.D. Howard, An analysis of security incidents on the Internet, 1989-
1995, PhD thesis, Carnegie Mellon University, Department of Engineering
and Public Policy, April 1997.
[121] www.cert.org/research/JHThesis/table of contents.html
[122] U. Lindqvist, E. Jonsson, How to systematically classify computer secu-
rity intrusions, IEEE Symposium on Security and Privacy, p. 154163, Los
Alamitos, CA, 1997.
[123] D.J. Weber, A taxonomy of computer intrusions. Masters thesis, Depart-
ment of Electrical Engineering and Computer Science, Massachusetts In-
stitute of Technology, June 1998.
[124] Chris Rodgers. Threats to TCP/IP Network Security. 2001.
[125] G. Alvarez, S. Petrovic, A new taxonomy of web attacks suitable for
efcient encoding, Computers and Security, 22(5): p. 435449, July 2003.
[126] M.A. Bishop, A taxonomy of Unix and network security vulnerabilities,
Technical report, Department of Computer Science, University of Califor-
nia at Davis, May 1995.
[127] I.V. Krsul, Software Vulnerability Analysis. PhD thesis, Comp. Sci.
Dept., Purdue University, May 1998.
[128] C.E. Landwehr, A.R. Bull, A taxonomy of computer program security
aws, with examples, ACM Computing Surveys, 26(3), pp. 211254, 1994.
[129] J. Korba, Windows NT Attacks for the Evaluation of Intrusion Detection
Systems, M. Eng. Thesis, MIT Department of Electrical Engineering and
Computer Science, June 2000.
[130] A. Baker, J.B. Beale, Snort 2.1 Intrusion Detection (Second Edition)
pp.751, 2004.
[131] Xerox Palo Alto Research Center, Parc history, 2003, http://www.p
arc.xerox.com/ about/history/default.html
[132] Eugene Spafford. The InternetWorm Program: An Analysis. Technical
report, Department of Computer Sciences, Purdue University, 1988.
REFERENCES 268
[133] Fred Cohen. Computer Viruses. PhD thesis, University of Southern Cali-
fornia, 1985.
[134] CERT Coordination Center. Advisory CA-2001-19 Code Red Worm
Exploiting Buffer Overow In IIS Indexing Service DLL. July 2001.
http://www.cert.org/advisories/ CA-2001-19.html.
[135] CERT Coordination Center. Advisory CA-2001-26 NimdaWorm.
September 2001. http://www. cert.org/advisories/CA-2001-26.html.
[136] CERT Coordination Center. Advisory CA-2003-04 MS-SQL Server
Worm. January 2003. http://www.cert.org/advisories/CA-2003-04.html.
[137] CERT Coordination Center. Advisory CA-2003-20 W32/Blaster Worm.
August 2003. http://www.cert.org/advisories/CA-2003-20.html.
[138] Top Ten Cyber Security Menaces for 2008,
http://www.sans.org/2008menaces/
[139] Symantec. Symantec Internet Security
Threat Report Volume III. February 2003.
http://enterprisesecurity.symantec.com/content.cfm?articleid=1539EID=0
[140] Symantec. Symantec Internet Security
Threat Report Volume IV. September 2003.
http://enterprisesecurity.symantec.com/content.cfm?articleid=1539EID=0
[141] Icove, David, Seger, VonStorch, A Crimeghters Handbook, OReilly &
Associates, 1995.
[142] http://www.cert.org/research/JHThesis/chapter6.html
[143] D.E. Denning, Cyberterrorism, http://www.cs.georgetown.edu/denning/infosec/
cyberterror.html
[144] 2004 web Server Intrusion Statistics, www.zone-h.org
[145] CERT Coordination Center. Advisory CA-1999-04 Melissa Macro Virus.
March 1999. http://www.cert.org/advisories/CA-1999-04.html
REFERENCES 269
[146] CERT Coordination Center. Denial of Service Attacks. 1997.
http://www.cert.org/tech tips/denial of service.html
[147] H. Debar, M. Dacier, and A. Wespi, A revised taxonomy of Intrusion
Detection Systems, Research Report, IBM, 1999.
[148] M. Esmaili, R. S. Naini, B. Balachandran, J. Pieprzyk, Case-based rea-
soning for intrusion detection, 12th annual computer security applications
conference, pp. 214-223, 1996.
[149] S. Northcutt, J. Novak, Network Intrusion Detection, New Rid-
ers/Pearson, Indianapolis, IN, third edition, 2003.
[150] D. E. Denning, An intrusion detection model, IEEE Trans. S. E., SE-
13(2), pp. 222-232, 1987.
[151] H. Debar, M. Dacier, A. Wespi, Towards a taxonomy of intrusion detec-
tion systems, Computer Networks, vol.31, pp. 805-822, 1989.
[152] R. Weber, Information systems control and audit, Upper Saddle River,
NJ: Prentice Hall, 1999.
[153] R. P. Lippmann, R. K. Cunningham, Improving intrusion detection per-
formance using keyword selection and neural networks, Computer Net-
works, vol. 34, pp. 597-603, 2000.
[154] James P. Anderson, Computer Security Threat Monitoring and Surveil-
lance, Technical report, James P. Anderson Co., Fort Washington, PA.,
April 1980. on Software Engineering, vol. SE-13, pp. 222-232, February
1987.
[155] T. F. Lunt, A survey of intrusion detection techniques, Comput. Security,
vol. 12, no. 4, pp. 405-418, June 1993.
[156] Teresa Lunt et al., IDES: The enhanced prototype, Technical report, SRI
International, Computer Science Lab, October 1988.
[157] D. Anderson, T. Frivold, A. Valdes, Next-generation intrusion detection
expert system (NIDES), Technical report, SRI-CSL-95-07, SRI Interna-
tional, Computer Science Lab, May 1995.
REFERENCES 270
[158] S. E. Smaha, Haystack: An Intrusion Detection System, Proceedings of
the IEEE Fourth Aerospace Computer Security Applications Conference,
Orlando, FL., December 1988.
[159] MSebring et al., Expert systems in intrusion detection: Acase study, Pro-
ceedings of the 11th National Computer Security Conference, Baltimore,
MD., October 1988.
[160] H. S. Vaccaro, G. E. Liepins, Detection of anomalous computer session
activity, Proceedings of the 1989 Symposium on Research in Security and
Privacy, Oakland, CA., May 1989.
[161] J. R. Winkler, W. J. Page, Intrusion and Anomaly Detection in Trusted
Systems, Proceedings of the Fifth Annual Computer Security Applications
Conference, Tucson, AZ., December 1989.
[162] L. T. Heberlein et al., A network security monitor, Proceedings of the
IEEE Symposium on Research in Security and Privacy, Oakland, CA.,
May 1990.
[163] K. Jackson, D. DuBois, C. Stallings, An expert system application for
network intrusion detection, Proceedings of the 14th Department of En-
ergy Computer Security Group Conference, 1991.
[164] S. R. Snapp et al., A system for distributed intrusion detection, Proceed-
ings of the IEEE COMPCON 91, San Francisco, CA., February 1991.
[165] Mark Crosbie, Gene Spafford, Defending a Computer System using Au-
tonomous Agents, Technical report No. 95-022, COAST Laboratory, De-
partment of Computer Sciences, Purdue University, March 1994.
[166] S. Staniford-Chen, S. Cheung, R. Crawford, M. Dilger, J. Frank, J.
Hoagland, K. Levitt, C. Wee, R. Yip, D. Zerkle, GrIDS A Graph-Based
Intrusion Detection System for Large Networks, The 19th National Infor-
mation Systems Security Conference, Baltimore, MD., October 1996.
[167] Ross Anderson, Abida Khattak, The Use of Information Retrieval Tech-
niques for Intrusion Detection, Proceedings of RAID 98, Louvain-la-
Neuve, Belgium, September 1998.
REFERENCES 271
[168] Biswanath Mukherjee, L. Todd Heberlein, Karl N. Levitt, Network Intru-
sion Detection, IEEE Network, May/June 1994.
[169] May Grance, The DIDS (Distributed Intrusion Detection System) pro-
totype, Proceedings of the Summer USENIX Conference, 227-233, San
Antonio, Texas, 8-12 June 1992.
[170] Herve Debar, Marc Dacier and Andreas Wespi, Towards a taxonomy of
Intrusion-Detection Systems, Computer Networks, 31(8): 805-822, April
1999.
[171] A. Abraham, Evolutionary Computation in Intelligent Network Manage-
ment, Evolutionary Computing in Data Mining, Springer, pp. 189.210,
2004.
[172] J. L. Zhao, J. F. Zhao, and J. J. Li, Intrusion Detection Based on Clus-
tering Genetic Algorithm, International Conference on Machine Learning
and Cybernetics IEEE, Guangzhou, pp. 3911-3914, 2005.
[173] R. H. Gong, M. Zulkernine, and Purang, A software Implementation
of a Genetic Algorithm Based Approach to Network Intrusion Detection,
SNPD/SAWNf05, IEEE, 2005.
[174] Dong Seong Kim, Ha-Nam Nguyen, Jong Sou Park, Genetic Algorithm
to Improve SVM Based Network Intrusion Detection System, AINA05,
IEEE, 2005.
[175] A. Abraham and C. Grosan: Evolving Intrusion Detection Systems, Stud-
ies in Computational Intelligence (SCI) 13, 57-79, 2006
[176] Sundaram A., An Introduction to Intrusion Detection,
http://www.acm.org 2001.
[177] Koral Ilgun, Richard A Kemmerer, and Phillip A. Porras, State Transition
Analysis: A rule-based Intrusion Detection Approach, IEEE Transactions
on Software Engineering, 21(3): 181-199, March 1995.
[178] Verwoerd T., and Hunt R. Intrusion Detection Techniques and Ap-
proaches, 2001, http://www. Elsevier.com
REFERENCES 272
[179] Jean Phillippe, Application of Neural Networks to Intrusion Detection,
2004, http://www.sans.org
[180] Gordeev M., Intrusion Detection Techniques and Approaches, 2004,
http://www.ict.tuwein.ac.a
[181] Stefan Axelsson, Intrusion Detection Systems: A Survey and Taxonomy,
Technical Report 99-15, Department of Computer Engineering, Chalmers
University of Technology, Swedan, March 2000.
[182] Bace R., Intrusion Detection, Macmillan Technical Publishing, 2002.
[183] R. Lippmann, D. Fried, I. Graf, J. Haines, K. Kendall, D. McClung, D.
Weber, S. Webster, D. Wyschogrod, R. Cunningham and M. Zissman,
Evaluating Intrusion Detection Systems: The 1998 DARPA Off-line In-
trusion Detection Evaluation, IEEE Computer Society Press, 2000.
[184] S. Axelsson, Intrusion Detection Systems: A Sur-
vey and Taxonomy, Chalmers University 99-15, 2000,
http://citeseer.nj.nec.com/axelsson00intrusion.html
[185] S. Axelsson, Research in Intrusion-Detection Systems: A survey,
Chalmers University of Technology, 1998, revisited 1999.
[186] Christopher Kr ugel and Thomas Toth, A Survey on Intrusion Detection
Systems, TUV-1841-00-11 Technical University of Vienna, Information
Systems Institute, Distributed Systems Group, December 12, 2000.
[187] F.Sabahi, Intrusion Detection: A Survey, The Third International Confer-
ence on Systems and Networks Communications, IEEE Computer society.
[188] S. Forrest, S. A. Hofmeyr, and A. Somayaji, Computer immunology,
Commun. ACM, vol. 40, no. 10, pp. 88-96, Oct. 1997.
[189] H. Deba, M. Dacier, and A. Wespi, Toward a taxonomy of Intrusion De-
tection Systems, Comput. Networks, vol. 31, pp. 805-822, 1999.
[190] Shambhu Upadhyaya, Ramkumar Chinchani, and Kevin
Kwiat, An Analytical Framework for Reasoning About Intru-
sions,http://ieeexplore.ieee.org/iel5/7654/20915/00969760.pdf
REFERENCES 273
[191] John E. Dickerson, Jukka Juslin, Ourania Koukousoula, Julie A. Dick-
erson, Fuzzy Intrusion Detection, Proceedings: IFSA World Congress
and 20th North American Conference Fuzzy Intrusion Detection,2001,
http://ieeexplore.ieee.org/iel5/7506/20427/00943772.pdf
[192] Dipankar Dasgupta and Fabio Gonzalez, An Immunity-Based Technique
to Characterize Intrusions in Computer Networks, IEEE Transactions on
Evolutionary Computation, Vol. 6, No. 3, June 2002.
[193] Susan M. Bridges and Rayford B. Vaughn, Intrusion Detection via Fuzzy
Data Mining, The Twelfth Annual Canadian Information Technology Se-
curity Symposium June 19-23, 2000.
[194] Kristopher Kendall, A Database of Computer attacks for the evaluation
of intrusion detection systems, Thesis Report, Department of Electrical
Engineering and Computer Science at MIT, June 1999.
[195] Shieh et al., Apattern oriented IDmodel and its applications, Proceedings
of the Symposium on security and privacy, 1991.
[196] Ilgun, Koral, USTAT:a real time IDS for Unix, Proceedings of the 1993
IEEE Computer Society Symposium on research insecurity and privacy,
1993.
[197] Christoph, Gray G, UNICORN: misuse detection for UNICOS, Proceed-
ings of the 1995 ACM/IEEE Supercomputing Conference, Dec. 1995.
[198] White, Gregory B, PEER-based hardware protocol for IDSs, Journal of
Engineerng and Applied Science, v 2, 1996.
[199] Bonifacio, Jose Mauricio, Neural Networks applied in IDSs, IEEE Inter-
national Conference on neural networks, 1998.
[200] Paxson,Vern, Bro:A system for detecting network intruders in rel-time,
Computer Network, v 31, n 23, Dec 1999.
[201] Ning, Wang X.S, Jajodia S, Modelling requests among cooperating IDSs,
Computer Communications, v 23, n 17, Nov, 2000.
REFERENCES 274
[202] Dickerson, John E, Fuzzy network proling for intrusion detection, An-
nual Conference of the North American Fuzzy information processing so-
ciety, 2000.
[203] Luo, JianXiong, Mining Fuzzy association rules and fuzzy frequent
episodes for intrusion detection, International Journal of Intelligent sys-
tems, v15, n 8, Aug, 2000.
[204] Andrew P. Kosoresow and Steven A. Hofmeyr, Intrusion Detection via
System Call Traces, IEEE Software, 14(5), pp. 24-42, September /oct
1997.
[205] Aviv Bergman Intrusion Detection with Neural Networks Technical Re-
port, SRI International, Number A012, February 1993.
[206] D. Gunetti and G. Ruffo, Intrusion Detection through Behavioral Data,
Proc. of The Third Symposiumon Intelligent Data Analysis, Lecture Notes
in Computer Science, Springer-Verlag, 1999.
[207] Gunar E. Liepins and H. S. Vaccaro Intrusion Detection: Its Role and
Validation Computers & Security, 11(4), pp. 347-355, 1992.
[208] Guy Helmer and Johnny Wong and Vasant Honavar and Les Miller, Fea-
ture Selection Using a Genetic Algorithm for Intrusion Detection, Pro-
ceedings of the Genetic and Evolutionary Computation Conference, Vol.
2, p. 1781, Morgan Kaufmann, 13-17 July 1999.
[209] Jake Ryan and Meng-Jang Lin and Risto Miikkulainen, Intrusion Detec-
tion with Neural Networks, Advances in Neural Information Processing
Systems 10 (Proceedings of NIPS97, Denver, CO), MIT Press, 1998.
[210] K. Ilgun, USTAT : A Real-Time Intrusion Detection System for UNIX,
Proceedings of the IEEE Symposium on Security and Privacy, pp. 16-29,
1993.
[211] Koral Ilgun and Richard A. Kemmerer and Phillip A. Porras, State Tran-
sition Analysis: A Rule-Based Intrusion Detection Approach, IEEE Trans-
actions on Software Engineering, 21(3), pp. 181-199, March 1995.
REFERENCES 275
[212] Mark Crosbie and Eugene H. Spafford, Applying Genetic Programming
to Intrusion Detection, Working Notes for the AAAI Symposium on Ge-
netic Programming, pp. 1-8, AAAI, 10-12 November 1995.
[213] Phillip Andrew Porras, A State Transition Analysis Tool For Intrusion
Detection, Technical Report, University of California, Santa Barbara,
1992.
[214] S. Kumar and E. Spafford, A pattern-matching model for intrusion de-
tection, Proceedings National Computer Security Conference, pp. 11-21,
1994.
[215] Wenke Lee and Salvatore J. Stolfo, Data Mining Approaches for In-
trusion Detection, Proceedings of the 7th USENIX Security Symposium
(SECURITY-98), pp. 79-94, Usenix Association, January 26-29 1998.
[216] J.McHugh, The 1998 Lincoln Laboratory IDS Evaluation (A Critique),
Proceedings of the Recent Advances in Intrusion Detection, 145-161,
Toulouse, France, 2000.
[217] Network Computing, Security Feature, November 15, 1999,
http://www.nwc.com/1023/1023f19.html
[218] Peter Mell, Vincent Hu , Richard Lippmann, An
Overview of Issues in Testing Intrusion Detection Systems,
http://csrc.nist.gov/publications/nistir/nistir-7007.pdf
[219] CERT report of vulnerabilities, http://www.cert.org/stats/cert stats.htm/
#vulnerabilities
[220] Zhu, Ding, Brooks, Wu, Iyengar, Rao, Decision making-based multiple
sensor data Fusion, Report, US Department of Energy.
[221] S. Axelsson, A preliminary attempt to apply deetction and estimation
theory to intrusion detection, Technical report 00-4, Chalmers University
of Technology, Goteborg, Sweden.
[222] W. Lee, S. J. Stolfo, Data mining approaches for intrusion detection, In
Proc of 7th USENIX security symposium, San Antonio, TX. USENIX.
REFERENCES 276
[223] B. Tung, Common intrusion detection framework,
http://www.isi.edu/gost/cidf/
[224] P. Ning, X. S. Wang, and S. Jajodia, Modeling requests among coop-
erating intrusion detection systems, Computer Communications 23(17),
17021716.
[225] J.R. Winkler, and W.J. Page, Intrusion and anomaly detection in trusted
systems, In Fifth Annual Computer Security Applications Conf., 1989,
Tucson, AZ, pp. 3945.
[226] S.R. Snapp, J. Brentano, G. V. Dias, T. L. Goan, T. Grance, L. T. Heber-
lein, C. lin Ho, K. N. Levitt, B. Mukherjee, D. Mansur, K. L. Pon, and
S. E. Smaha, A system for distributed intrusion detection, In COMPCON
Spring 91, Digest of Papers, San Francisco, pp. 170176.
[227] NIDES: Next-generation Intrusion Detection Expert System,
http://www.sdl. sri.com/projects/nides/
[228] P.A. Porras, and P. G.Neumann, EMERALD: conceptual overview state-
ment. http://www.sdl.sri.com/papers/emerald-position1/
[229] P.A. Porras, and A. Valdes, Live trafc analysis of TCP/IP gateways, In
Proc. of the 1998 ISOC Symp. on Network and Distributed Systems Secu-
rity (NDSS98), San Diego, 1998.
[230] P. A. Porras (1999, April). Experience with EMERALD to date. In
First USENIX Workshop on Intrusion Detection and Network Monitor-
ing, Santa Clara, CA, pp. 7380.
[231] Internet Security Systems, RealSecure SiteProtector,
http://www.iss.net/products services/enterprise protection/
rssite protector/siteprotector.php
[232] Symantec Enterprise Solutions: ManHunt.http://enterprisesecurity,
symantec.com/products/products.cfm?ProductID=156.
[233] nSecure Software, nSecure nPatrol, http://www.nsecure.net/features.htm
REFERENCES 277
[234] Cisco intrusion detection,
http://www.cisco.com/warp/public/cc/pd/sqsw/sqidsz/index.shtml
[235] NFR Intrusion Management System, http://www.nfr.net/products/
[236] eTrust Intrusion Detection Log View,
http://www.cai.com/solutions/enterprise/etrust/intrusion
detection/product info/sw3 log view.htm
[237] NetSecure Log,http://www.netsecuresoftware.com/netsecurenew/
Products/NetSecure Log/netsecure log.html
[238] R.K. Iyer, S. Chen, J. Xu, Z. Kalbarczk, SecurityVulnerabilities: from
analysis to detection and masking techniques, Proceedings of the ninth
International workshop on object-oriented real-time dependable systems,
2004.
[239] E. Jonsson, T. Olovsson, An empirical model of the security intrusion
process, Proc. of the 11th annual conference on computer assurance, sys-
tems integrity and software safety, pg 176-186, 1996.
[240] E. Jonsson, T. Olovsson, An quantitative model of the security intrusion
process based on attacker behavior, IEEE Trans. on Software Engineering,
23(4): pg 235-245, 1997.
[241] E. Skoudis, Counter Hack: A step-by-step guide to computer attacks and
effective defenses, Prentice Hall, 2002.
[242] J. McDermott, Attack-Potential-Based Survivability Modeling for High-
Consequence Systems, Proc. of the third IEEE International Workshop on
Information Assurance, 2005.
[243] H. Debar and A. Wespi, Aggregation and Correlation of Intrusion-
Detection Alerts, In Proceedings of the 4th International Symposium,
Recent Advances in Intrusion Detection (RAID) 2001, Springer-Verlag
LNCS, 2001.
[244] F. Cuppens, Managing alerts in multi-intrusion detection environments.
In 17th Annual Computer Security Applications Conference (ACSAC),
New-Oreans, 2001.
REFERENCES 278
[245] ICAT METABASE, http://icat.nist.gov/icat.cfm. Aug. 2003.
[246] A. Valdes and K. Skinner, Probabilistic Alert Correlation, In Proceedings
of the 4th International Symposium, Recent Advances in Intrusion Detec-
tion (RAID) 2001, Springer-Verlag, LNCS.
[247] A. Valdes and K. Skinner, Adaptive, Model-Based Monitoring for Cy-
ber Attack Detection, In Proceedings of the third International Workshop,
Recent Advances in Intrusion Detection (RAID) 2000, Springer-Verlag
LNCS, 2000.
[248] D. Andersson, M. Fong, and A. Valdes, Heterogeneous Sensor Correla-
tion: A Case Study of Live Trafc Analysis, In IEEE Information Assur-
ance Workshop, 2002.
[249] O. M. Dain and R. K. Cunningham, Building Scenarios from a Hetero-
geneous Alert Stream. In IEEE Workshop on Information Assurance and
Security, 5-6, 2001.

CizaThomas PHD Thesis

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

CizaThomas PHD Thesis

Enviado por

Direitos autorais:

Formatos disponíveis

i

Performance Enhancement of Intrusion Detection

, second packet containing ., third packet containing /, fourth

observes the network connection can be written as:

elements, where denes the working space or the Frame of Dis-

which is closed under union, intersection and complement and hence is a

Você também pode gostar