Você está na página 1de 6

XSSDM: Towards Detection and Mitigation of

Cross-Site Scripting Vulnerabilities in Web


Applications
Mukesh Kumar Gupta , Mahesh Chandra Govil , Girdhari Singh , Priya Sharma

Department of Computer Science & Engineering


Malviya National Institute of Technology, Jaipur-302017, Rajasthan, India
Swami Keshvanand Institute of Technology, Jaipur-302017, Rajasthan, India

Email: mukesh.iitb08@gmail.com, govilmc@yahoo.com, girdharisingh@rediffmail.com, pryashrma@gmail.com


, ,

AbstractWith the growth of the Internet, web applications


are becoming very popular in the user communities. However,
the presence of security vulnerabilities in the source code of these
applications is raising cyber crime rate rapidly. It is required to
detect and mitigate these vulnerabilities before their exploitation
in the execution environment. Recently, Open Web Application
Security Project (OWASP) and Common Vulnerabilities and
Exposures (CWE) reported Cross-Site Scripting (XSS) as one of
the most serious vulnerabilities in the web applications. Though
many vulnerability detection approaches have been proposed in
the past, existing detection approaches have the limitations in
terms of false positive and false negative results. This paper
proposes a context-sensitive approach based on static taint
analysis and pattern matching techniques to detect and mitigate
the XSS vulnerabilities in the source code of web applications.
The proposed approach has been implemented in a prototype tool
and evaluated on a public data set of 9408 samples. Experimental
results show that proposed approach based tool outperforms
over existing popular open source tools in the detection of XSS
vulnerabilities.
Index TermsCross-site scripting (XSS), Static Analysis, Pattern matching, Web Application Security, Context Sensitive

I. I NTRODUCTION
The complexity of web applications has increased tremendously from the static web sites to dynamic web applications.
The dynamic applications get user inputs and utilize them in
the output statements for providing the dynamic response to
the end users. The use of user input in the output statements
without any validations permit attackers to inject malicious
scripts for account hijacking, cookie theft and web content
manipulations. Such type of scenario is termed as cross-site
scripting (acronym XSS) attack.
Cross-site scripting (XSS) attack occurs in three different ways: reected-XSS (Type 1), stored-XSS (Type 2),
and DOM-based (Type 0). Reected-XSS, considered nonpersistent, allows the attackers to insert malicious scripts via.
GET or POST methods into the server immediate returned
response page. Persistent-XSS attack occurs when attackers
malicious input is stored in the server and later it is inserted
into an output statement, to perform unusual activities. For
example, the attacker logs into a forum and stores comment
that contains malicious JavaScript. Further, if the page is

c
978-1-4799-8792-4/15/$31.00 2015
IEEE

loaded from a clean URL, then that script gets executed in


the client browser. DOM Based XSS attack occurs when the
user input with an unsafe JavaScript is allowed to update the
document object model.
The essence of XSS may vary from pretty nuisance like
pop-up an alert box to a signicant security threat. It may steal
cookies, send illegal HTTP requests, redirect user to malicious
websites, install malware, and perform other malicious operations. It mainly depends on the sensitivity of data handled
and nature of security mitigation implemented in the web
applications. The seriousness of XSS can be understood by
considering that it is always ranked in the OWASP [1] top ten
vulnerability list. Therefore, detection of these vulnerabilities
prevents the web application from any malicious activities in
their actual execution environment.
Researchers have proposed many approaches to detect XSS
vulnerabilities from the source code of web applications
developed in PHP. Most of the existing taint analysis based
approaches [2], [3] considered the source code as vulnerable
free if the user input is validated through any PHP standard
built-in sanitization functions (e.g. htmlspecialchars, htmlentities). However, these functions neutralize the effect of only
some special characters (e.g. <, >) and not sufcient always.
In dynamic web applications, an output statement refers userinput in constant HTML strings to generate the dynamic
response. This combination reveals an HTML context, where
the use of a standard built-in function has been always not
sufcient to mitigate the vulnerabilities.
Consider the code snippets given in Fig. 1, to explain
the vulnerability status of source code that is varied w.r.t.
the context of user input in the output statement. In the
both code snippets, the user input is sanitized by a standard
inbuilt function (i.e. htmlspecialchars). Our practical inspection found out that Fig. 1(a) code snippet is non-vulnerable
as the sanitization function is sufcient to protect against
script injections, whereas, Fig. 1(b) code snippet is vulnerable
because user-input is referenced inside an unquoted attribute
value and input like 10 onmouseover=alert(attacked) as
value of UserData is able to pop-up an alert dialog box.
The existing popular open-source vulnerability detection tools,

2010

Fig. 1. Examples to Illustrate the need of Context Sensitivity Consideration

i.e. Pixy [2], RIPS [3] detect both code snippets as nonvulnerable, as these tools do not consider the effect of HTML
context in the output statement. Therefore, it is necessary to
consider the context sensitivity for precise identication of
XSS vulnerabilities.
In this paper, we present an approach that embeds context
sensitivity to the existing taint analysis technique with a sole
purpose of identifying the XSS vulnerability with much more
precise detection rate. We focus on the web applications
developed in PHP as it occupies the highest percentage of
server-side programming language [4] in the web application
development. The contributions of this paper are as follows:

This paper proposes a context-sensitive approach based


on taint analysis and pattern matching techniques that
caters the current needs for precise detection and mitigation of XSS vulnerabilities.
An implementation of the proposed work in a prototype
tool developed using C#.Net to compare our proposed
approach with existing ones.

The rest of the paper is organized as follows. Section 2


presents the literature survey. Section 3 explains the proposed
approach followed by the results and discussions in the section
4. In Section 5 conclusions are drawn.
II. L ITERATURE S URVEY
The static analysis based vulnerability detection approaches
can nd vulnerabilities in the source code, even before the
application is executed for the rst time. Hydara et al. (2015)
[5] discussed that the security researchers and developers
should start concentrating more on the abolition of XSS
vulnerabilities before deployment of the web applications. It
is desirable to cultivate policies and tools that can administer
the development of safe and sound applications.
Pixy [2] is the rst open source static source code analyzer
tool written in Java. It uses ow-sensitive, inter-procedural,
and context-sensitive data-ow analysis techniques to detect
vulnerabilities. However, it can only detect reected-XSS, and
it does not provide support for PHP 5.x version. It nds XSS
vulnerability and SQL injections with high false positive rates.
RIPS [3] is a static source code analyzer to detect various
common vulnerabilities, along with XSS and SQL injection.
By using different verbosity levels, RIPS can nd vulner-

abilities with persistent payloads that are stored in les or


databases.
Wassermann et al. (2008) [6] presented a static analysis approach to discover the XSS vulnerabilities that directly address
the problem of weak or absent input validation. Their approach
integrated the work on tainted information ow with string
analysis. As a JavaScript interpreter can be invoked using
different methods, thus the validation of input complicates.
G. Agosta et. al. (2012) used symbolic execution and string
analysis techniques, to improve the precision by approximating
the string values that may appear in the sensitive link.
Shar et al. (2012) [7] suggested that there is a need for
code-auditing despite using defensive coding practices and
vulnerability detection methods. Later in [8], similar to our
approach, they presented a two-phase approach for nding and
removing XSS vulnerabilities in Java web applications. Their
approach considered HTML context and strictly followed
OWASP prevention rules [9] for identifying and mitigating
XSS vulnerabilities in the source code of web applications
developed in Java.
To the best of our knowledge existing taint analysis based
open source tools for PHP language have not analyzed and
considered the HTML context. Therefore, these tools are not
able to detect XSS vulnerabilities from PHP source codes
precisely.
III. P ROPOSED A PPROACH
Our approach is inspired by the fact that XSS threats are
the result of improper input validation routines in the source
code. Hence, if the context, where user-input is referenced,
is taken care of while detecting the XSS vulnerabilities, then
precise detection rate can be achieved. The architecture of the
proposed approach is depicted in the Fig. 2 and explained in
further subsections. The proposed approach uses static analysis
and pattern matching techniques for precise detection of XSS
vulnerabilities. Specically, our approach can be categorized
into three phases. First, a list of probable vulnerable outputstatements (pv-out) and associate dependent statements are
prepared. Second, the context of input in the pv-out statement
is determined. Third, an input validation mechanism used for
sanitizing the pv-out and its dependent statements are examined, to conclude whether the pv-out statement is vulnerable
or not.

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

2011

2)

3)

Fig. 2. An Architecture of Proposed Approach

A. Identication of Plausible Vulnerable Statements


Initially, the entire PHP code le is traversed, and outputstatements (pv-out) which can trigger XSS vulnerabilities are
identied. Then, the variables used in the pv-out statement
are traced back to the source (i.e. input statement) statement
for the identication of dependent statements. The input statement in PHP comprises the global variables $ GET,$ POST,
$ REQUEST, $ COOKIE, $ SESSION, $ SERVER, and
$ FILES. The outcome of this phase is the list of pv-out
statements with their corresponding dependent statements.

4)

B. Context Identication
In this phase, we process the probable vulnerable statement
discovered in the last phase to identify the context associated
with it. First, we determine the block context (e.g. body, script,
style, etc.) in which pv-out statement is present. Then, we
analyze constant string in the pv-out statement for statementlevel context identication. The proposed rules for contextidentication that also include nested context (e.g. script
inside a body block) are as follows.
1) Rule #1: If the user-input is referenced in an outputstatement that either contains a complete HTML tag or
no HTML tag in the constant string. Then, the context
of the user-input in the output-statement is equal to its
block context.
Example:
<!-- <?php $var = $_GET['input'];
echo $var; ?> -->

2012

5)

In this example, output-statement (i.e. echo) is in a


comment block and does not contain any constant string,
so the user-input context in output statement is comment block context.
Rule #2: If the user-input is referenced in an outputstatement that contains a constant string begins with a
special tag (e.g. anchor, style, script etc) and ends with a
double quote (i.e. =) , single quote (i.e =) or no quote
(i.e = ) symbol. It means the input is referenced in a
special tag attribute value context. Then, we combine
block context to represent the user-input context.
Example:
<body> <?php $var=$_GET['input'];
echo "<a href=".$var.">content</a>";
?> </body>
In this example, output-statement (i.e. echo) is in
a body block, and input is referenced in the anchor tag as a no quote attribute value, so context is
body Anchor NQ Attr Val context.
Rule #3: If the user-input is referenced in an outputstatement that contains a constant string begins with any
HTML tag except special tag (e.g. style, script etc) and
ends with a double quote (i.e.=), single quote (i.e =) or
no quote (i.e = ) symbol. It means the input is referenced
in the simple tag attribute value context.
Example:
<?php $var = $_GET['input'];
echo "<div id='".$var."'>content
</div>"; ?>
In this example, output-statement (i.e. echo) is in
a body block, and user-input is referenced in a
tag as a single quote attribute value, so context is
body Tag SQ Attr Val context.
Rule #4: If the user-input is referenced in an outputstatement which contains a constant string that begin
with any HTML tag, contains event handler and end with
a double quote (i.e.=) or single quote (i.e. =) symbol.
It means the input is referenced in event attribute value
context.
Example:
<?php $var=$_GET['input'];
echo "<div id=\"abc\" onmouseover=\"".
$var."\">content</div>"; ?>
Here, the user-input context in the output-statement is
body Event DQ Attr Val context.
Rule #5: If the user-input is referenced in an outputstatement that contains a constant string begins with
any HTML tag and ends with any word except double
quote (i.e=), single quote (i.e =) or no quote (i.e. =
) symbol. It means the input is referenced in attribute
name context.
Example:
<body> <?php $var=$_GET['input'];
echo "<div".$var."= bob />"
content </div>"; ?></body>

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

In this example, output-statement (i.e. echo) is in a body


block, and context is body Attr Name context.
6) Rule #6: If the user-input is referenced in an outputstatement that contains a constant string with only <
symbol. It means the input is referenced in HTML tag
name context.
Example:
<?php $var=$_GET['input'];
echo"<".$var."href="www.mweb.in"/>
content </a>";?>
In this example, output-statement (i.e. echo) is in a
body block, and user-input context is body Tag Name
context.
C. Identify & Validate Escaping Mechanisms
In this phase, identication and validation of the escaping
mechanism are performed. Initially, construct a list of the
sanitization function or escaping mechanism being used in
each pv-out statement and their dependent statements. For a
variable, if more than one sanitized statements are available,
then the latest sanitized statement (i.e., sanitization perform
just before the variable is used in the pv-out statement) is
chosen. Based on the context of the pv-out statement, a list
of safe functions is constructed that are capable of securing
the pv-out from the XSS threat. The identied sanitization
function is validated against the list of the safe functions and
based on the outcome of the validation; the pv-out statement
is either marked vulnerable or non-vulnerable. If sanitization
function is validated as sufcient, then the pv-out statement is
marked as non-vulnerable, otherwise marked as vulnerable.
In case of a vulnerable statement, a list of safe functions is
suggested that can be used by the developer to mitigate the
XSS threat. It must be noted that if the sanitization function is
missing, then it is marked as vulnerable. After the completion
of these phases, in the case of vulnerable results, the developer
can easily mitigate the detected XSS vulnerability by selecting
the best-suited sanitization function from the list of suggested
safe functions. The reason for including the manual intervention rather than automated replacement in code is to ensure
that well-matched sanitization function is being used and to
encourage best programming practices.
D. Example
Lets apply our proposed approach to the code-snippets
presented in Fig. 1. In the rst step, taint analysis is performed
and data dependency statements are identied as shown in
Fig. 3. Recall that pv-out statement prints the input on the
output interface and may reference the input in an HTML
context. For both code-snippets, line 47 (others are comment
lines) is identied as a pv-out statement. The body block is
the default block context for these statements. For Fig. 3(a)
code-snippet, the context of a pv-out statement is body block
context ( by Rule #1). For Fig. 3(b) code-snippet, the context
of a pv-out statement is body Tag NQ Attr Val context ( by
Rule #3). Next, the variable used inside pv-out statement is
extracted (i.e. $checked data) and data dependency statements

are searched for the statement where sanitization is performed


on the $checked data value. In line 46 a sanitization function
being used i.e. htmlspecialchars(). The extracted function is
matched against the list of safe functions generated w.r.t to
the context associated with the pv-out statement. Finally, a
pv-out statement in Fig. 3(a) and Fig. 3(b) is determined as
non-vulnerable and vulnerable statement respectively.
IV. R ESULTS AND A NALYSIS
The test-subject for the evaluation was taken from [10]. It
contains 9408 sample les written in PHP. It has 5600 safe
and 3808 unsafe samples that are organized into different
categories. Pixy and RIPS tools were chosen as they are
popular and efciently detect XSS vulnerabilities, and being
open source, they are easily accessible. We ltered out over
7056 les from the taken test-subject by excluding the code
les related to object oriented as both tools do not support the
object-oriented paradigm. In this data set 2856 les have at
least one vulnerability and remaining les do not contain any
XSS vulnerability (as shown in Table I). We checked these
les with various tools and noted down the detection results
of each tool. The results of RIPS are noted by selecting the
verbosity level 01 (user tainted only) and vulnerability type to
cross-site scripting. The results produced by Pixy and RIPS are
promising, but for certain test cases the results were either false
positive or false negative. Table II summarizes the conclusion
of our experiment. It shows that our approach can detect XSS
vulnerability in different HTML contexts very precisely. As
shown in the Fig. 4, the ratio of the positive (Unsafe) and
negative (Safe) results for Pixy and RIPS deviates from the
actual results tremendously.
TABLE I
DISPLAYS ACTUAL FILE VULNERABILITY STATUS
Dataset
XSS

# of Safe
Sample Files
4200

# of Unsafe
Sample Files
2856

TABLE II
DISPLAYS THE FALSE RESULTS % OF EACH TOOL
Results (# of Samples)
False Positive
False Negative

Vulnerabilities Detection Tools


Pixy
RIPS
XSSDM
34.45% 0%
0%
0%
35.88% 0%

The standard sanitization functions may provide protection


against one HTML context and not able to protect in the other
HTML context. Therefore, the source code may be vulnerable
to XSS attack in the presence of standard sanitization function
i.e. htmlentities, htmlspecialchars, as they are not sufcient
to eliminate the XSS threats in different contexts. Our tool
efciently identies the context in output-statement and then
check sanitation is sufcient or not to prevent the XSS attacks.
Based on the status ( i. e. vulnerable or non-vulnerable),
the tool further provides suggestions to the developer for
mitigating the XSS threats. From the Table I and II, it can be

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

2013

Fig. 3. Taint Analysis Results

mitigation the manual intervention is needed as the user selects


the appropriate sanitization function from the list of functions
suggested.
V. C ONCLUSION AND F UTURE W ORK

Fig. 4. Vulnerability Detection Results

interpreted that XSSDM is more efcient than existing open


source tools for the considered data set.
Table III presents the list of some functions for which
the existing tools have generated false results. The table has been generated by examining the test-subject. We
found functions such as addslashes, http build query, settype,
mysql real escape string, lter var (as it was included in
PHP5 so is not supported by the tools), rawurlencode, and
urlencode (both these functions encode the URL) produce false
results.
TABLE III
L IST OF PHP SANITIZATION / VALIDATION FUNCTIONS
Sanitization Functions
Cast value to Numeric
(oat), (int), += 0, += 0.0, $tainted + 0, settype(),
oatval(), intval()
Basic Sanitization Functions
addslashes(), htmlentities(), htmlspecialchars(),
http build query(),
rawurlencode(), urlencode(), mysql real escape string()
Filter a variable with a specied lter
lter var()

Similar to existing open source tools (e.g. Pixy, RIPS), our


tools also cannot work with object-oriented codes. Also, the
approach is anticipated for the PHP server programs, but can
be easily extended to support other server-side programming
languages. Though the detection of XSS is automated, but for

2014

Static analysis based detection techniques use a set of predened rules to detect vulnerabilities in source code without executing it. In this paper, we incorporated the context-sensitivity
concepts with existing static taint analysis technique for the
sole purpose of improvement in the precise detection rate.
An implementation of the proposed approach as a prototype
tool XSSDM has been tested against the public dataset. The
experimental results attained serve to assure the preciseness
and efciency of the proposed approach.
Although the experimental results of the XSSDM tool on
considered data set are promising, the tool still needs to be
tested for real-world web applications. In future, the proposed
work will be upgraded to support object-oriented paradigm as
it is an urgent need for current web application development.
R EFERENCES
[1] OWASP, Usage top 10 2013, https://www.owasp.org/index.php, 2015,
accessed: 2015-04-09.
[2] N. Jovanovic, C. Kruegel, and E. Kirda, Pixy: a static analysis tool for
detecting web application vulnerabilities, pp. 258263, May 2006.
[3] J. Dahse, A vulnerability scanner for different kinds of vulnerabilities,
http://rips-scanner.sourceforge.net, 2015, accessed: 2015-04-09.
[4] W3Techs, Usage of server-side programming languages for websites,
http://w3techs.com/technologies/overview/programming language/all,
2015, accessed: 2015-04-09.
[5] I. Hydara, A. B. M. Sultan, H. Zulzalil, and N. Admodisastro, Current
state of research on cross-site scripting a systematic literature review,
Information and Software Technology, vol. 58, no. 0, pp. 170 186,
2015.
[6] G. Wassermann and Z. Su, Static detection of cross-site scripting
vulnerabilities, New York, NY, USA, pp. 171180, 2008.
[7] L. Shar and H. Tan, Auditing the xss defence features implemented in
web application programs, Software, IET, vol. 6, no. 4, pp. 377390,
August 2012.
[8] L. K. Shar and H. B. K. Tan, Automated removal of cross site scripting
vulnerabilities in web applications, Inf. Software Technology, vol. 54,
no. 5, pp. 467478, May 2012.
[9] OWASP,
Xss
(cross
site
scripting)
prevention
cheat
sheet,
https://www.owasp.org/index.php/
2015,
XSS %28Cross Site Scripting%29 Prevention Cheat Sheet,
accessed: 2015-04-09.
[10] B. S. Aurelien DELAITRE, Php vulnerabilities test suite,
https://github.com/stivalet/PHP-Vulnerability-test-suite , 2015, accessed:
2015-04-09.

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

VI. APPENDIX
As an illustration, some PHP source codes, results of their output-statement HTML Context and vulnerability status are
shown in Table 4. It shows that a standard PHP sanitization function(e.g. htmlspecialchars) is not sufcient to mitigate XSS
vulnerabilities in different HTML contexts.

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

2015