Escolar Documentos
Profissional Documentos
Cultura Documentos
Keheliya Gallaba, Student Member, IEEE, and Shane McIntosh, Member, IEEE
Abstract—Continuous Integration (CI) is a popular practice where software systems are automatically compiled and tested as changes
appear in the version control system of a project. Like other software artifacts, CI specifications require maintenance effort. Although
there are several service providers like T RAVIS CI offering various CI features, it is unclear which features are being (mis)used. In this
paper, we present a study of feature use and misuse in 9,312 open source systems that use T RAVIS CI. Analysis of the features that
are adopted by projects reveals that explicit deployment code is rare—48.16% of the studied T RAVIS CI specification code is instead
associated with configuring job processing nodes. To analyze feature misuse, we propose H ANSEL—an anti-pattern detection tool for
T RAVIS CI specifications. We define four anti-patterns and H ANSEL detects anti-patterns in the T RAVIS CI specifications of 894
projects in the corpus (9.60%), and achieves a recall of 82.76% in a sample of 100 projects. Furthermore, we propose G RETEL—an
anti-pattern removal tool for T RAVIS CI specifications, which can remove 69.60% of the most frequently occurring anti-pattern
automatically. Using G RETEL, we have produced 36 accepted pull requests that remove T RAVIS CI anti-patterns automatically.
Index Terms—Continuous integration, Anti-patterns, Mining software repositories
© 2018 IEEE. Author pre-print copy. The final publication is available online at: https://dx.doi.org/10.1109/TSE.2018.2838131
2
T RAVIS CI service. Through empirical analysis of the CI of T RAVIS CI users (9.60%). H ANSEL and G RETEL can detect
configuration files of the studied projects, we address the and remove these anti-patterns accurately, allowing teams to
following research questions about feature usage: mitigate or avoid the consequences of misusing CI features.
• RQ1 What are the commonly used languages in T RAVIS CI Paper organization. The remainder of the paper is orga-
projects? nized as follows. Section 2 describes the modern CI process.
Despite being the default T RAVIS CI language, R UBY is Section 3 outlines the design of our study of CI feature
only the sixth most popular language in our data set. usage, while Section 4 presents the results. Sections 5 and 6
N ODE . JS is the most popular language in our corpus. outline the motivation for and design of our study of CI
• RQ2 How are statements in CI specifications distributed misuse, respectively, while Section 7 presents the results.
among different sections? Section 8 discusses the broader implications of our results.
We find that 48.16% of the studied T RAVIS CI con- Section 9 discloses the threats to the validity of our study.
figuration code applies to build job processing nodes. Section 10 situates this paper with respect to the related
Explicit deployment code is rare (2%). This shows that work. Finally, Section 11 draws conclusions.
although the developers are using tools to integrate
changes into their repositories, they rarely use these
tools to implement continuous delivery [16]—the process 2 M ODERN CI P ROCESS
of automatically releasing code that integrates cleanly. The main goal of CI is automating the integration of soft-
• RQ3 Which sections in the CI specifications induce the most
ware as soon as it is developed so that it can be released
churn? rapidly and reliably [11]. Figure 1 provides an overview of
Most CI configuration files, once committed, rarely the cycle. We describe each step below.
change. The sections that are related to the configura-
• Build-triggering events: In projects that adopt CI, the
tion of job processing nodes account for the most modifi-
cations. In the projects that are modified, all sections cycle begins with a build-triggering event. These events
are likely to be modified an equal number of times. can occur in the development, review, or integration
Similar to RQ2, this again suggests that deployment- stages. While a feature is being developed, builds can
related features in CI tools are not being used. be triggered manually by the developer to try out the
feature under development. Later, when the code is
To study misuse, we define four anti-patterns:
submitted to be reviewed, builds are triggered to avoid
(1) redirecting scripts into interpreters (e.g., curl
wasting reviewer’s time on patches that do not compile.
https://install.sandstorm.io|bash); (2) bypassing
Finally, when the change is integrated into the project
security checks (e.g., setting the ssh_known_hosts prop-
VCS, a build is triggered to ensure that the change does
erty to unsafe values); (3) using irrelevant properties; and
not introduce regression errors.
(4) using commands in an incorrect phase (e.g., using install
• Build job creation service: When a build-triggering
phase commands in the script phase). Using H ANSEL—our
event occurs, a build job creation node will add a job
tool for detecting anti-patterns in .travis.yml files—we
to the queue of pending build jobs if certain criteria are
address the following research question:
met. For example, in T RAVIS CI, developers can specify
• RQ4 How prevalent are anti-patterns in CI specifications? the VCS branches on which commits should (or should
H ANSEL detects at least one anti-pattern in the CI not) generate build jobs.
specifications of 894 projects in the corpus (9.60%), and • Build job processing service: Build jobs in the pending
achieves a recall of 82.76% in a sample of 100 projects. queue will be allocated to build job processing nodes
Using G RETEL—our anti-pattern removal tool for CI config- for processing. The job processing node will first down-
uration code—we address the following research questions: load the latest version of the source code and apply the
• RQ5 Can anti-patterns in CI specifications be removed change under consideration. Next, the job processing
automatically? node will initiate the build process, which will compile
Yes, G RETEL can remove the detected cases of the most the system (if necessary), execute a suite of automated
frequent anti-pattern automatically with a precision of unit and integration tests to check for regression, and
69.60%. This increases to 97.20% if a post hoc manual in the case of Continuous Delivery (CD) [16], make
inspection phase is included. the updated system available for users to download or
• RQ6 Are automatic removals of CI anti-patterns accepted by interact with. Finally, the job processing node will add
developers? the results of the build job to the reporting queue.
Yes, we submitted 174 pull requests that contain G RE - • Build job reporting service: In this final stage, build job
TEL-generated fixes, of which, developers have: (1) results in the reporting queue will be communicated
responded to 49 (response rate of 28.16%); and (2) to the development team. Reporting preferences can
accepted 36 (20.69% of submitted pull requests and be configured such that particular recipients receive
73.47% of pull requests with responses). notifications when build jobs are marked as successful,
Our study of CI feature usage leads us to conclude that unsuccessful, or irrespective of the job status. Tradition-
future CI research and tooling would have the most imme- ally, these results were shared via mailing lists or IRC
diate impact if it targets the configuration of job processing channels; however, other communication media is also
nodes. Moreover, our study of misuse of CI shows that anti- popular nowadays (e.g., Slack, web dashboards).
patterns that threaten the correctness, performance, and se- Operating and maintaining CI infrastructure is a burden
curity of build jobs are impacting a considerable proportion for modern software organizations. As organizations grow,
3
Slack
Code Review Load
System Balancer
Integrate
Version Web
Control Dashboard
System
Schedule Builds Build + Test Report Results
2.1.1 Node Configuration environment variables that need to be set prior to build
This section specifies how CI nodes should be prepared execution can be configured using the env property.
before building commences. • Build job reporting nodes: In this subsection, nodes
that are responsible for reporting on the status of build
• Build job creation nodes: In this subsection, nodes that
jobs can be configured. Notification services, such as
are responsible for creating build jobs can be config- e-mail and Slack, are configured to notify the develop-
ured. For example, the branches property specifies the ment team about the status of build jobs. For example,
branches where commits should create build jobs. using the notifications property, T RAVIS CI users
• Build job processing nodes: In this subsection, nodes
can specify the list of recipients of build status reports
that are responsible for processing build jobs can be (recipients) and the scenarios under which they
configured. For example, since different programming should be notified (on_success, on_failure).
languages have different basic toolchain requirements
(e.g., P YTHON projects require the python interpreter
2.1.2 Build Process Configuration
to be installed, while N ODE . JS projects require the node
interpreter to be installed), specifying the language This section is comprised of install, script, and
property allows the T RAVIS CI runtime to configure deploy phases, which each consists of sub-phases.
processing nodes appropriately. Moreover, if there are These sub-phases check pre- and post-conditions before
libraries and services that need to be installed on the (before_X) and after (after_X) executing the main phase.
job processing nodes prior to build execution, they • The install phase prepares job processing nodes for
can be specified using the services property. The build job execution, and has install_apt_addons,
4
Data Filtering
3 × 106
DF1: DF2: DF3: DF4:
Select
2,991,522 145,876 Select 56,947 12,153
Active Select Select
9,312
# of Projects
Projects
Google
BigQuery
and
that use
Non-Forked Non-Duplicated
Subject 2 × 106
Large Projects Projects
Travis CI Systems
Projects
0 × 100
before_install, and install sub-phases. Unless
10 100 1,000 10,000 100,000
specified, the phase runs a default command for the Threshold (# of commits)
specified programming language. For example, T RAVIS
CI runs npm install by default for N ODE . JS projects. Fig. 4: Threshold plot for commit activity.
• The script phase executes the bulk of the build job,
and has before_script, script, after_success,
after_failure, and after_script sub-phases. In
in CI specifications require the most code, it does not
this phase, systems are compiled, tested, scanned by
help in understanding which sections require the most
static code analyzers, and packaged for deployment.
change. To complete the picture, we set out to study
Similar to the install phase, script runs default
how churn is dispersed among the sections.
commands for the specified programming language,
unless otherwise specified. For example, T RAVIS CI
runs npm test by default for N ODE . JS projects.
• The deploy phase makes newly produced deliver- 3 CI U SAGE S TUDY D ESIGN
ables visible to system users, and has before_deploy,
deploy, and after_deploy sub-phases. When this In this section, we provide our rationale for studying
phase is present, the CI process is transformed into a G IT H UB projects and explain our data filtering approach.
continuous delivery process [16], where regression-free
changes are released to system users.
3.1 Corpus of Candidate Systems
2.2 Research Questions In order to arrive at reliable conclusions, it is important to
As a community, knowing how CI is being used in reality is select a large and diverse set of software projects. With this
important for several reasons. First, CI service providers will in mind, we begin our analysis with systems that are hosted
be able to make data-driven decisions about how to evolve on the popular G IT H UB platform.
their products, e.g., where to focus feature development to We start by querying the public G IT H UB dataset on
maximize (or minimize) impact. Second, researchers will be Google BigQuery8 for project activity (i.e., the number
able to target elements of CI that are of greater impact to of commits) and project size heuristics (i.e., the number
users of CI. Finally, individuals and companies who provide of files). This query returns 4,022,651,601 commits and
products and services that depend on or are related to CI 2,133,880,097 files spanning 2,991,522 G IT H UB repositories.
(such as H ANSEL and G RETEL) will be able to tailor their
solutions to fit the needs of target users.
Hilton et al. [15] analyzed a broad spectrum of properties 3.2 Data Filtering
of CI specifications. We aim to complement the prior work
by studying how features within CI specifications are used While G IT H UB is a large corpus, it is known to contain
to configure their build nodes and jobs. To do so, we conduct projects that have not yet reached maturity [18]. To pre-
an empirical study of 9,312 G IT H UB projects that use T RAVIS vent the bulk of immature projects from impacting our
CI, addressing the following research questions: conclusions, we first apply a set of filters to our G IT H UB
• RQ1 What are the commonly used languages in T RAVIS CI
data. Figure 3 provides an overview of our data filtering
projects? approach. We describe each step in the approach below.
We first aim to understand whether projects that are de-
veloped in certain languages are more common among DF1: Select Active and Large Projects
the T RAVIS CI user base. This will help future tool
developers and researchers studying CI processes to We first remove inactive projects from our corpus. To detect
identify potential target languages and technologies. such projects, Figure 4 plots threshold values against the
• RQ2 How are statements in CI specifications distributed
number of surviving systems. Selecting a threshold of 100
among different sections? commits reduces the corpus to 574,325 projects.
To develop an understanding of the spread of CI con- Next, we remove small projects from our corpus. To
figuration code across sections, we are interested in the detect such projects, Figure 5 again plots threshold values
quantity of code that appears within each section. against the number of surviving systems. Selecting a thresh-
• RQ3 Which sections in the CI specifications induce the most old of 500 files further reduces the corpus to 145,876 projects.
churn?
While RQ2 provides a high-level view of which section 8. https://cloud.google.com/bigquery/public-data/github
5
6 × 105
4 × 105
# of Projects
Project Count
5000
3 × 105
2 × 105
2500
1 × 105
0 × 100 0
10 100 1,000 10,000 100,000 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Threshold (# of files) Max. Similarity with Another Project
Fig. 5: Threshold plot for project size. Fig. 6: A histogram of the maximum commit similarity
among the candidate repositories.
DF2: Select Projects that use T RAVIS CI TABLE 1: Domains in a sub-sample of our subject systems.
We focus our study on users of the T RAVIS CI service for two Type # Projects Percentage
reasons. First, while other CI services are available, T RAVIS Web Application 23 15.13
CI is the most popular, accounting for roughly 50% of the CI Graphics/Visualization 21 13.82
market on G IT H UB.7 C IRCLE CI ranks second with roughly Application Framework/Library 15 9.87
Development Tools 15 9.87
25%, while J ENKINS (a CI tool rather than a service) ranks Communication/Collaboration Tool 13 8.55
third with roughly 10%. Second, since other CI services DevOps 10 6.58
have a similar configuration syntax (YAML-based DSL), it Scientific Computing 10 6.58
is likely that our observations will be applicable to other CI Games/Game Engine 8 5.26
Mobile Application 7 4.61
services. We elaborate on this in Section 8.4. Other 30 19.74
To identify G IT H UB projects that use T RAVIS CI, we Total 152 100.00
check for a .travis.yml configuration file in the root
directory. This filter reduces the corpus to 56,947 projects.
survives the DF1–DF3 filters, we compute all pairwise
DF3: Select Non-Forked Projects commit similarity percentages. Then, for each project, we
Forking9 allows G IT H UB users to duplicate a repository in select the maximum similarity percentage. Figure 6 shows
order to make changes without affecting the original project. the histogram of these maximum similarity percentages. We
Developers working on forked repositories can submit Pull observe a largely bimodal distribution where many projects
Requests to contribute changes to the original project. are either distinct (similarity = 0%) or almost identical to
Forks should not be analyzed individually, since they another project in terms of commit SHAs (similarity ≈
are primarily duplicates of the forked repository. If forks 100%). Indeed, a more stringent 60% threshold only removes
are not removed from the corpus, the same development 140 more projects (1.50%) and a more lenient threshold of
activity will be counted multiple times. We detect forks 80% only adds 73 projects (0.78%), indicating that sample
using the G IT H UB API. Repositories that are flagged as forks does not depend heavily upon the threshold value.
according to this API are removed from our corpus. This
filter reduces the corpus to 12,153 projects. 3.3 Domain of the Subject Systems
To understand the domain of subject systems, we need to
DF4: Select Non-Duplicated Projects classify each subject system by inspecting their source and
The DF3 filter only removes explicitly forked repositories documentation. Since this is impractical in our context, we
that were created using the G IT H UB fork feature. Repos- analyze a randomly selected subset of 152 subject systems.
itories may also be re-uploaded under a different owner Table 1 shows that our corpus contains a broad variety of
and/or name without using the fork feature. subject systems, including games, and web and mobile apps.
To detect these duplicated repositories, we extract the
list of commit hashes (SHAs) in each of the candidate
repositories that survive the prior filters. If any two repos- 4 R ESULTS OF CI U SAGE S TUDY
itories share more than 70% of the same commit SHAs, In this section, we present the results of our CI usage
we label both repositories as duplicates. Since we cannot study with respect to our three research questions. For
automatically detect which of the duplicated repositories each research question, we first present our approach for
is the original repository and which ones are the copies, addressing it followed by the results that we observe.
we remove all duplicated repositories from our corpus.
9,312 candidate repositories survive this final filter and are (RQ1) What are the commonly used languages in T RAVIS
selected as subject systems for the following analyses. CI projects?
To check if the selected similarity threshold for filtering
out duplicated projects is suitable, for each project that Approach. We identify the commonly used languages in
T RAVIS CI projects by detecting the setting of the language
9. https://help.github.com/articles/fork-a-repo/ property in the T RAVIS CI configuration file.
6
Percentage of Corpus
C++ 995 10.69
R UBY 811 8.71 10%
C 702 7.54
GO 290 3.11
O BJECTIVE -C 250 2.68
A NDROID 195 2.09 5%
O THER 987 10.60
0%
Node.js
Java
*Python
*Ruby
*PHP
*C++
Go
Objective−C
Android
Results. Table 2 shows the ten most popular languages in
our corpus of studied projects. Hilton et al. [15] explored
the rate at which users of particular languages adopt CI,
observing higher rates of adoption in projects that are pri-
marily implemented using dynamic languages. Six of the Fig. 7: The percentage of the corpus that uses the ten
top ten languages with the highest rates of CI adoption [15] most popular languages. Asterisks (*) denote languages that
appear in our list, i.e., J AVA S CRIPT (N ODE . JS in our setting), change ranks when the file count threshold changes (DF1).
R UBY, G O, P YTHON, PHP, and C++. The four languages
from the Hilton et al. setting that do not appear in our
sample (i.e., S CALA, C OFFEE S CRIPT, C LOJURE, and E MACS (DF1). Figure 7 shows that while the third through sixth
L ISP) are infrequently used, altogether appearing in 5.8% of ranked languages vary, six ranks are resilient to threshold
the projects in the top ten languages in their setting. changes and N ODE . JS remains the most popular language.
When compared with the language statistics released by
G IT H UB,10 we find nine of our top ten languages are among
the ten most popular languages on G IT H UB (by opened pull Summary: Although R UBY is the default language in
requests). A NDROID does not appear in the list by G IT H UB T RAVIS CI, N ODE . JS is more popular in our sample.
because it is grouped with Java projects. C# appears in Implications: Since language popularity fluctuates, CI
G IT H UB’s top ten, but not ours. Although not shown, C# service providers should carefully consider whether a
appears in 149 projects, and would rank eleventh. popular language of the day should be implicit when no
language is declared explicitly.
Observation 1: Despite being the default T RAVIS CI lan-
guage, R UBY is not the most popular language in our corpus of
studied systems. Table 2 shows that 811 projects are labelled
explicitly as R UBY projects, making R UBY the sixth ranked
language in our corpus. There are an additional 421 projects (RQ2) How are statements in CI specifications dis-
that do not specify a language property. In this case, the tributed among different sections?
T RAVIS CI execution environment assumes that the project
is using R UBY. Even if all 421 of these unlabelled projects are Approach. To answer this research question, we first label
indeed R UBY projects, this would only increase the R UBY each property in the .travis.yml file as related to CI node
project count to 1,232, which would rank third. configuration or build process configuration. The tags that
Observation 2: N ODE . JS is the most popular language in specify the phases in the CI process are labelled as build
our corpus of studied systems. Table 2 shows that there are process configuration. The tags that are related to CI node
1,460 projects (16%) that are labelled explicitly as N ODE . JS configuration are further divided into four sub-categories
projects in our corpus. Our study is not the only context depending on the type of CI nodes that are being con-
in which the popularity of N ODE . JS has been observed. For figured, i.e., build job creation, build job processing, build
example, according to a recent StackOverflow survey11 of status notification, or other. Table 3 shows our mapping of
64,000 developers, N ODE . JS was the most commonly used .travis.yml tags to these sub-categories.
framework. Moreover, the recent left-pad debacle, where the We then parse the .travis.yml files of our subject
removal of an NPM package for left-padding strings had a systems. We use the parsed output to count lines in each
ripple effect that crippled several popular e-commerce web- of the sections of each file. Finally, we apply the Scott-Knott
sites,12 highlights the pivotal role that N ODE . JS plays in the Effect Size Difference (ESD) test [31]—an enhancement to
development stacks of several prominent web applications. the Scott-Knott test [27], which also considers the effect size
Since some languages may require more files than others, when clustering CI sections into statistically distinct ranks.
we repeat our analysis with four file count threshold values Results. Table 4 shows the popularity of the sections, as well
as their overall length and proportion within the corpus.
10. https://octoverse.github.com/
11. http://stackoverflow.com/insights/survey/2017
Observation 3: For CI node configuration, sections that are
12. https://www.theregister.co.uk/2016/03/23/npm left pad related to job processing nodes appear in the most projects. Ta-
chaos/ ble 4 shows that 8,852 (95.06%) of the studied .travis.yml
7
TABLE 3: The identified build process configuration tags. ure 8b shows the distribution of commands after removing
Sub-category Key zero-length sections. The difference in the deploy phases
Creation branches in Figure 8a (with zeros) and Figure 8b (without zeros) is
Processing addons, android, bundler_args,
compiler, cran, d, dart, dist,
striking. It appears that when the deploy phase is included,
dotnet, elixir, env, gemfile, it tends to require plenty of .travis.yml configuration
ghc, git, go, haxe, jdk, code. For example, the oden-lang/oden project15 requires 42
julia, language, lein, matrix,
mono, node, node_js, nodejs, lines of code to describe their deployment process. These
os, osx_image, otp_release, lines of code describe how to deploy the release artifacts for
perl, php, podfile, python, r,
r_binary_packages, r_build_args, a specific release and the current commit on the master
r_check_args, r_github_packages, branch to Amazon S3. Indeed, it may be the case that
r_packages, repos, ruby, rust,
rvm, sbt_args, scala, services, organizations avoid using deploy phase features because
smalltalk, solution, sudo, it requires lengthy and complex configuration.
virtualenv, warnings_are_errors,
with_content_shell, xcode_scheme, The .travis.yml file supports configuration of de-
xcode_sdk, xcode_workspace, ployment to many popular cloud services including AWS,
xcode_project
Notification notifications
A ZURE, G OOGLE A PP E NGINE, and H EROKU. So it is un-
Other before_cache, cache, group, likely that the reason for developers not using T RAVIS CI
source_key for deployment is lack of platform support. Since A NSIBLE
is a popular tool used by developers for the automation of
TABLE 4: The popularity of .travis.yml sections, as well
deployments, we study the use of A NSIBLE as an alternative
as their length and proportion of lines in our corpus.
to the deployment features of T RAVIS CI in our corpus by
Section # Projects # lines % lines searching for syntactically valid A NSIBLE playbooks. Un-
creation 1,441 2,236 1.45 fortunately, we find only 109 (1%) projects where A NSIBLE
CI Node
Config.
processing 8,852 74,285 48.16 is being used. Further studies are needed to identify why
reporting 2,914 7,361 4.77
other 1,836 3,500 2.27 deployment features of T RAVIS CI are rarely used.
before install 3,551 14,452 9.37
install 3,519 11,895 7.71
before script 3,863 14,597 9.46 Summary: Although code for configuring job processing
Build Process
script 7,122 18,972 12.30 nodes is most common (48.16%), and deployment code is
Config.
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
● ●
● ●
● ●
● ●
●
● ●
●
●
● ●
●
● ●
●
● ●
●
● ●
● ●
● ● ● ●
100 ●
●
●
●
●
●
●
100 ●
●
●
●
●
●
●
●
●
● ● ● ●
● ● ●
●
● ● ● ●
● ●
●
● ●
● ●
●
●
● ● ● ●
● ● ●
●
● ● ●
●
● ● ●
●
Line Count
Line Count
●
● ● ● ●
●
● ● ● ●
● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ●
● ●
● ●
● ●
● ● ● ●
● ●
●
● ● ● ●
●
● ● ●
● ●
●
● ●
●
● ●
●
● ●
● ● ● ●
● ● ●
●
● ●
● ●
● ●
● ●
● ●
● ●
●
●
● ●
● ●
● ●
● ● ●
● ●
●
●
● ●
● ●
● ●
● ● ● ●
● ●
●
● ●
● ●
● ●
● ●
● ●
●
●
● ●
● ●
● ●
● ● ●
● ● ●
● ●
●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
10 ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● 10 ●
●
●
●
●
●
● ●
●
● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
1 ● ● ● ● ● ● ● ●
0 ●
1
processing
script
before_install
processing
before_script
install
reporting
deploy
other
after_success
creation
(n=8852)
script
(n=7122)
after_script
before_deploy
before_install
(n=3551)
before_script
(n=3863)
install
(n=3519)
reporting
(n=2914)
deploy
(n=343)
other
(n=1836)
after_deploy
after_failure
after_success
(n=1243)
creation
(n=1441)
(n=626)
(n=223)
after_script
before_deploy
(n=115)
(n=23)
after_deploy
after_failure
(a) The distribution for all projects. (b) The distribution after removing zero-length sections.
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
● ●
● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ●
● ● ● ● ● ●
●
100 ●
●
●
●
●
●
●
●
●
● ● 100
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
● ● ●
● ●
●
●
● ● ● ● ●
● ● ●
● ●
● ●
●
● ●
● ● ●
● ● ● ●
●
● ●
● ● ● ● ●
● ● ●
● ●
●
● ● ● ●
● ●
● ●
● ●
● ● ●
● ●
● ●
● ● ● ● ●
● ● ●
●
● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ●
●
● ● ● ● ● ●
●
● ●
●
● ●
● ● ● ● ● ● ●
●
● ●
● ●
● ● ● ● ●
●
● ● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
Churn
Churn
● ●
● ● ● ● ● ● ● ● ● ●
● ●
● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ● ●
● ●
● ●
● ●
● ●
● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ● ● ● ● ● ● ●
●
● ●
● ●
● ●
● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ●
●
● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ● ● ● ● ●
10 ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
1 ● ● ● ● ● ● ● ●
0 1
processing
script
before_install
install
before_script
reporting
after_success
creation
other
processing
after_script
deploy
before_deploy
(n=7366)
script
(n=6156)
before_install
(n=4099)
install
(n=3644)
before_script
(n=4097)
reporting
(n=2879)
after_success
(n=1478)
creation
(n=1846)
other
(n=1861)
(n=803)
deploy
(n=467)
after_deploy
after_failure
before_deploy
(n=165)
(n=39)
(n=318)
after_script
after_deploy
after_failure
(a) The distribution for all projects. (b) The distribution after removing zero-length sections.
marks,17 a project that provides performance benchmarks for are likely to be modified an equal number of times. Since Figure 9a
web application frameworks, has 290 modifications to its shows that sections after the fourth rank are not modified in
.travis.yml file, of which, 242 modify its job processing most of the projects (i.e. the median churn of these sections
node configuration. In this case, it is because the bench- is 0.), we omit such projects in the next box plot shown in
marks are contributed by the developer community and Figure 9b. Here, we can observe that the median churn for
the benchmarks for each framework requires job processing all of the sections is in the range of 1–10.
nodes to be configured differently. This complements our
earlier observations that most of the effort in configuring CI Summary: In 75% of the studied configurations, sections
is spent on the processing node configuration. of .travis.yml files are modified fewer than ten times.
Hilton et al. [15] studied the frequency of reasons for CI Implications: Research and tooling for CI configuration
changes and observed different rankings than those that we should focus on the creation of an initial specification
observe. This discrepancy is likely due to differences in the rather than supporting specification maintenance.
granularities of our analyses. For example, in our analysis,
we study distributions of project-specific rates of change,
while their analysis uses a single measurement of the overall
rates of change for each identified reason. Nonetheless,
there are similarities in our rankings. For example, their top 5 A NTI - PATTERNS IN CI S PECIFICATIONS
ranked reason for CI change is related to the build matrix, If improperly configured, T RAVIS CI build jobs may have
which is a subset of our top ranked job processing section. unintended behaviour, resulting in broken or incorrect
Observation 8: In the projects that are modified, all sections builds. Violating the semantics of CI specifications could
also introduce maintenance and comprehensibility prob-
17. https://github.com/TechEmpower/FrameworkBenchmarks lems. Furthermore, the T RAVIS CI runtime environment
9
may be unable to optimize provisioning of CI job processing Below, for each anti-pattern, we present our rationale
nodes for specifications where semantics are violated. for labelling it as an anti-pattern, and the approach that (1)
To help T RAVIS CI users avoid common pitfalls, the H ANSEL uses to detect it and (2) G RETEL uses to remove it.
T RAVIS CI team provides T RAVIS L INT,18 an online service
and an open source tool that scans .travis.yml files for Anti-pattern 1: Redirecting Scripts into Interpreters
mistakes (e.g., YAML formatting issues, missing mandatory Motivation. A common approach to software package in-
fields). If the issues are fixed, T RAVIS L INT can prevent stallation is to download a script from a hardcoded URL and
configuration errors from breaking project builds. pipe it into a shell interpreter. For example, the installation
instructions for the Sandstorm package,21 a self-hostable
5.1 Research Questions web productivity suite, includes a shell command: curl
https://install.sandstorm.io|bash. While this in-
The .travis.yml files that are syntactically valid can still
stallation procedure is convenient, it is known to be sus-
violate the semantics of T RAVIS CI and introduce build
ceptible to security vulnerabilities.22 Moreover, if a network
correctness, performance, and security problems.
failure occurs during the execution of the curl command,
To detect such semantic violations, we propose
the installation script may only be partially executed.
H ANSEL—a .travis.yml anti-pattern detector. Then, we
also propose G RETEL—a tool for removing anti-patterns Detection. In order to detect this anti-pattern, we follow a
from .travis.yml files. We apply H ANSEL and G RETEL three-step approach. First, we parse the .travis.yml file
to the 9,312 .travis.yml files in our corpus in order to to identify commands that contain a pipe. Next, those com-
address the following research questions: mands are split into the pre- and post-pipe sub-commands
using the bashlex library. We check the pre-pipe command
• RQ4 How prevalent are anti-patterns in CI specifications?
for known downloaders (i.e, wget, curl). We then check
In this research question, we aim to study what type
the post-pipe command for known shell interpreters (i.e.,
of CI anti-patterns are commonly occurring in software
sh, bash, node). If both of these conditions are met, we
projects “in the wild”.
identify the command as an instance of this anti-pattern.
• RQ5 Can anti-patterns in CI specifications be removed
automatically? Removal. CI specifications should verify the integrity of ex-
This research question explores whether the detected ternally hosted scripts before executing them. This could be
anti-patterns can be fixed automatically and to what achieved by automatically verifying the script after down-
degree are the transformed files still valid. loading it but before execution. Alternatively, one could
• RQ6 Are automatic removals of CI anti-patterns accepted by download the installation scripts, verify their integrity, and
developers? commit known-to-be secure versions to the VCS. Since
This research question explores whether our anti- either solution requires changes that are beyond the scope
pattern detection technique is useful for real developers of the .travis.yml file, we have not implemented an
in practice. If developers accept our fixes and integrate automatic removal for this anti-pattern in G RETEL yet.
them into their projects, it would suggest that our
findings are useful to some degree. Anti-pattern 2: Bypassing Security Checks
Motivation. During the CI process, if the T RAVIS CI job
processing node communicates with other servers via SSH
6 CI M ISUSE S TUDY D ESIGN for transferring artifacts, it is important to have this connec-
We implement H ANSEL to detect anti-patterns and G RE - tion be configured securely. A misconfigured connection can
TEL to remove them. In a nutshell, H ANSEL parses a make job processing node(s) vulnerable to network attacks.
.travis.yml file using YAML and BASHLEX parsers in For example, using the ssh_known_hosts property in
order to detect anti-patterns. Then, G RETEL applies the the addons section of the .travis.yml file exposes job
RUAMEL . YAML serialization/deserialization framework19 to processing nodes to man-in-the-middle attacks.23 , 24
remove the detected anti-patterns automatically. Detection. We parse .travis.yml files and check whether
We define CI specification anti-patterns as violations of they satisfies at least one of the following conditions:
best practices in CI configuration files that could hinder • There exists an addons section, which contains an
the correctness, performance, or security of the CI process ssh_known_hosts property.
of a software system. Similar to the approach followed by • There exists a command containing the line
prior work [17], [29], we first read the rules implemented by StrictHostKeyChecking=no.
T RAVIS L INT,18 formal T RAVIS CI documentation,20 informal • There exists a command containing the line
documentation from the T RAVIS CI user community (e.g., UserKnownHostsFile=/dev/null.
blogs, posts on Q&A sites such as S TACK O VERFLOW) and
inspect a sample of artifacts (i.e., .travis.yml files) to Removal. To remove this anti-pattern, three steps should
prepare a list of recommended best practices. Then, we be followed. First, all of the vulnerability-inducing lines
group related best practices and deduce corresponding anti-
21. https://sandstorm.io/install
patterns (i.e., cases where best practice are being violated). 22. https://www.idontplaydarts.com/2016/04/detecting-curl-pipe-
bash-server-side/
18. https://docs.travis-ci.com/user/travis-lint 23. https://annevankesteren.nl/2017/01/secure-secure-shell
19. https://pypi.python.org/pypi/ruamel.yaml 24. https://docs.travis-ci.com/user/ssh-known-hosts/#Security-
20. https://docs.travis-ci.com/ Implications
10
In the remaining two false negatives, H ANSEL fails script, since the interpreter will only execute the script
to find anti-pattern 4 (commands unrelated to the phase) instructions when the function is invoked at the end.
where composer.phar, the dependency management tool • Users should regularly audit the installation script.
for PHP, is used in the before_script phase. Our ini-
However, when the project has identified the supported
tial mapping of commands to phases did not bind the
platforms or has accumulated several external dependen-
composer tool to the install phase (see Table 5). This
cies, migration to a package manager may pay off.
can easily be remedied by adding the missing binding.
Observation 11: The majority of instances of anti- Observation 12: Although rare, there are instances anti-
pattern 1 are installing the popular M ETEOR web frame- pattern 2 in T RAVIS CI specifications. H ANSEL detects 63
work. We detect 206 instances where scripts are being instances of this anti-pattern in our corpus. In 37 (58.73%)
downloaded and piped into shell interpreters directly, of these cases, the StrictHostKeyChecking=no com-
of which, 106 (51%) are in projects using N ODE . JS. In mand is being used. This command disables an interac-
these 106 projects, we find that 94 of them (88%) are tive prompt for permission to add the host server fin-
using the above anti-pattern to install the M ETEOR web gerprint to the known_hosts file. Developers may dis-
framework.26 In fact, the M ETEOR documentation instructs able the prompt because it will impede cloning a reposi-
users to install the framework using this method (curl tory via SSH in a headless environment, such as T RAVIS
https://install.meteor.com|/bin/sh).27 CI, which can lead to build breakage. However, setting
StrictHostKeyChecking=no exposes the host to man-
We reached out to the M ETEOR team to discuss the
in-the-middle attacks by skipping security checks in ssh.
potential security implications of this installation approach.
The M ETEOR team explained that the developer community In 18 instances (28.57%), the ssh_known_hosts prop-
is divided about using script redirection to install software erty is set in the addons section to define host names or IP
packages. On the one hand, some have shown how script addresses of the servers to which T RAVIS CI job processing
redirection can be exploited by attackers22 or how network- nodes need to connect during the CI process. This is inse-
ing interruptions during the download command may lead cure because if the network is compromised (e.g., by DNS
to partial execution of the installation script.28 On the other spoofing), T RAVIS CI job processing nodes may connect and
hand, members of the S ANDSTORM project defend script share private data with an attacker’s machine.
redirection for cases where script downloads are served In another eight instances of anti-pattern 2 (12.70%),
strictly over HTTPS.29 The S ANDSTORM team argues that UserKnownHostsFile=/dev/null is being used. In this
script redirection allows developers to iterate faster by case, host server fingerprints are written to and read from
avoiding the hassle of maintaining a variety of package an empty file, effectively disabling host key checking, and
formats for different platforms (e.g., .rpm and .deb for exposing the host to man-in-the-middle attacks.
RedHat-type and Debian-type Linux distributions, respec- The secure way to prevent the interactive prompt
tively). Moreover, discussion threads on H ACKER N EWS30 from interrupting scripted operations is to store the pri-
argue that other standard package distribution methods vate keys of the hosts that T RAVIS CI job processing
(e.g., binary installers, package managers) are also sus- nodes connect to in a known_hosts file. The file may
ceptible to man-in-the-middle attacks unless the delivered be enabled within the .travis.yml file using the -o
packages are signed cryptographically. The M ETEOR team UserKnownHostsFile=<file_name> property.
argue that they have not been able to identify a more secure Observation 13: Irrelevant properties that are ignored by
alternative for the script redirection installation method. T RAVIS CI runtime (anti-pattern 3) appear frequently. H ANSEL
If a project advocates for the script redirection installa- detects 242 instances of anti-pattern 3, which can present
tion method, we propose the following guidelines: imminent concerns or future risks (see Table 6).
• The installation script should be served over HTTPS. Making spelling mistakes when defining properties and
• The installation script should be made resilient to net- placing properties in the incorrect location within the
work interruptions by wrapping the core script be- .travis.yml are example causes of irrelevant properties
haviour in a function, which is invoked at the end of that raise imminent concerns. We find 74 instances of mis-
the script. Doing so will prevent partial execution of the spelled properties in our corpus. These misspelled proper-
ties are an imminent concern because misspelled properties
26. https://www.meteor.com and all of the commands that are associated with those
27. https://www.meteor.com/install
properties are ignored by the T RAVIS CI runtime. In the best
28. https://www.seancassidy.me/dont-pipe-to-your-shell.html
29. https://sandstorm.io/news/2015-09-24-is-curl-bash-insecure-
case, ignored properties will lead to build breakage, which
pgp-verified-install is frustrating and may slow development progress down.
30. https://news.ycombinator.com/item?id=12766049 In the worst case, the CI job will successfully build while
12
TABLE 7: Commands that appear in unrelated phases. tasks should appear in the script phase. The deploy
phase is typically reserved for uploading deliverables to
Observed in
Install Script Deploy cloud service providers (e.g., H EROKU, AWS, G OOGLE
Expected in A PP E NGINE) or package repositories (e.g., NPM, P Y PI,
Install - 467 0 R UBY G EMS). This separation of concerns allow the T RAVIS
Script 0 - 0 CI runtime to optimize resources within its CI infrastruc-
Deploy 0 52 - ture. For example, during the script phase, the infrastruc-
ture can be tuned to perform more CPU- and I/O-heavy op-
erations, while during the deploy phase, the infrastructure
producing incorrect deliverables, which may allow failures can allocate additional network bandwidth and less CPU
or unintended behaviour to leak into official releases. horsepower. If the separation of concerns is not respected,
We also find 148 instances of misplaced properties in the T RAVIS CI team cannot make such optimizations.
our corpus. For example, the webhooks property should be Observation 15: Developers often violate semantics by ap-
defined as a sub-property of the notifications property; plying static analysis too late in the CI process. For detecting
however, it appears as a root-level property in four subject semantics violations in sub-phases of the CI process, we
systems. This is an imminent concern because misconfig- search for calls to popular code coverage and static analysis
ured properties are also ignored by the T RAVIS CI runtime. tools (listed in the ‘static analysis’ row of Table 5) in the
We label the use of experimental or deprecated features after_script phase. We detect 40 of such instances.
in the T RAVIS CI specification as a future risk. There are 15 One plausible explanation for the occurrence of this
instances of using experimental properties in the corpus. anti-pattern is that developers may assume that the
For example, the undocumented group property allows after_script phase is executed immediately after the
users to specify which set of build images are to be used by script phase, similar to how the after_deploy phase
the T RAVIS CI runtime. Since this feature is actively being is executed immediately after the deploy phase. Yet, as
developed, the T RAVIS CI team does not recommend using shown in Figure 2, the after script phase is executed
it yet. Projects that use the group property may encounter after deployment-related phases are executed. Indeed, we
future problems if the property name or behaviour changes. find 40 cases where static analysis tools are being executed
Users may also use deprecated properties such as at the end of the CI process, after deployment, when is likely
source_key. We find five instances of use of deprecated too late to act upon issues that are detected.
features in the corpus. They present a future risk because
T RAVIS CI may stop supporting these properties at any time. Summary: Developers misuse and misconfigure CI spec-
Observation 14: The most common variant of anti-pattern ifications. The anti-patterns that we define can expose a
4 is using install phase commands in the script phase. system to security vulnerabilities, cause unintended CI
Table 7 shows that commands that we expect to appear behaviour, or delay SQA activities until after deployment.
in the install phase appear 467 times in other phases. Implications: H ANSEL, our anti-pattern detector, can
We find that this often occurs because developers prepend detect misuse and misconfiguration of CI specifications. If
lines to install required packages to the body of the script H ANSEL’s warnings are addressed, the consequences of
phase. By not using install phase for installing dependencies, CI misuse and misconfiguration can be avoided.
these projects are unable to leverage T RAVIS CI runtime
optimizations (e.g., caching), which speed up builds.
The commands that we expect in the deploy phase ap-
(RQ5) Can anti-patterns in CI specifications be removed
pear 52 times in the script phase. We find that developers
automatically?
tend to run deployment-related commands in the script
phase immediately after compiling and testing. Approach. We aim to check whether H ANSEL-detected anti-
The T RAVIS CI team states that compiling and testing patterns can be removed automatically. To do so, we ran-
13
domly select a subset of candidates for removal and manu- Seven (2.80%) of the remaining projects use the YARN
ally classify them until we achieve saturation [24], i.e., when package manager32 along with NPM to manage dependen-
new data do not add to the meaning of the categories. In our cies. The removals that we propose are incompatible with
case, saturation was achieved after analyzing 250 candidates such projects. We plan to add support for YARN and other
for removal, where no new categories were detected during package managers in the future.
the analysis of the last 79 candidates.
Before transforming the candidates, we check whether Summary: The detected instances of the most frequent
they are valid specifications by using the T RAVIS L INT tool.18 CI anti-pattern can be removed automatically in 69.60%
We then apply G RETEL to the valid candidates in order of cases. This improves to 97.20% if a post hoc manual
to remove the anti-pattern. We apply T RAVIS L INT again to inspection phase is included (semi-automatic removal).
the transformed files to make sure that they are still valid. Implications: H ANSEL-detected anti-patterns can be
Finally, we manually inspect the instances of removed anti- removed (semi-)automatically with G RETEL to avoid the
patterns to check whether the transformation has changed consequences of CI misuse and misconfiguration.
the behaviour of the original specification.
Results. We find that 174 of the 250 randomly selected anti-
pattern instances (69.60%) can be removed automatically. (RQ6) Are automatic removals of CI anti-patterns ac-
Moreover, 69 (27.60%) of the remaining cases can be fixed, cepted by developers?
but require manual verification to ensure that the original Approach. To better understand the utility of G RETEL, we
behaviour is preserved. We perform this manual verification apply it to the 174 instances that could be removed automat-
and provide three observations about these 69 cases. ically to fix the anti-patterns and offer these improvements
Observation 16: There are 38 instances of anti-patterns to the studied projects as pull requests.
where the command under analysis is preceded by a state-altering Results. Of the submitted pull requests, 49 received
command. The state-altering commands include: responses from the projects’ developers (response rate:
• File system operations (i.e., cp, cd, mv, mkdir). 28.16%).
• Package managers (i.e., npm update, npm cache Observation 19: 36 of the 49 pull requests that received re-
clean, gem update, apt-get update, bower sponses (73.47%) have been accepted and integrated by the subject
cache clean, git submodule update). systems. Of the 49 anti-pattern fixes to which developers
• Environment variable and database-related operations. responded, 36 have already been accepted by the projects
at the time of this submission.
State-altering commands may also need to migrate along 13 pull requests were rejected by project maintainers.
with the anti-pattern commands to the more appropriate Two of the 13 were rejected because our pull request ap-
section. Figure 10 shows an example where a state-altering peared to introduce build breaks, which were introduced
command impacts the removal of an anti-pattern, taken by other commits. In another two pull requests, develop-
from lamkeewei/battleships,31 a tool for building P YTHON ers did not understand why our change had added new
apps for the G OOGLE A PP E NGINE. In this case, lines 6–8 are commands. These commands were added to preserve the
implicated in the anti-pattern, but line 5 must be executed implicit behaviour of phases that did not exist prior to
before lines 6–8, and thus, must be included in the fix. applying our removal. Two other rejected pull requests came
Observation 17: In 12 instances, there are compound com- from projects that are no longer being maintained.
mands that are connected by a double ampersand. In this case, Only in one pull request were our changes rejected be-
the bash shell only invokes the command(s) that follow cause the developer did not agree with our premise that this
after the ampersands if the command(s) that precede the change is beneficial. The developer pointed to T RAVIS CI
ampersands did not fail (i.e., returned an error code of zero). documentation, which has an example that uses install-
Installation commands that appear before the ampersands related commands in the before_script phase.33 We con-
can be safely moved to the install phase while preserving tacted the T RAVIS CI team regarding this and they agreed
this behaviour, since if the install phase fails, the build that the documentation needs to be fixed by moving the
job terminates with an error status in T RAVIS CI. install commands out of the before_script phase in
Observation 18: In 29 instances, limitations in the ru- the example as it is violating the semantics.
amel.yaml framework19 lead to problems in the removal of anti- The six other rejected pull requests were closed without
patterns. The problems that we encountered are listed below: any explanation from the project maintainers.
• Version numbers may be parsed as floating point num-
Summary: Automated fixes for CI anti-patterns are often
bers, causing trailing zeros to be removed in the output.
accepted by developers and integrated into their projects
For example, 0.10 is transformed into 0.1.
(73.47% of pull requests that received a response or
• Property-level comments are missing after removal.
20.68% of all submitted pull requests).
• Duplicate properties are missing after removal.
Implications: H ANSEL and G RETEL produce patches
• Line breaks in multi-line commands are replaced with
that are of value to active development teams.
‘\n’ after removal.
We manually fix these minor issues before proceeding.
32. https://yarnpkg.com/en/
33. https://docs.travis-ci.com/user/languages/javascript-with-
31. https://github.com/lamkeewei/battleships nodejs/#Using-Gulp
14
this paper may also apply to these services. For example, given the similarities among the popular CI services (see
C IRCLE CI uses a config.yml file40 to configure the CI pro- Section 8.4), our observations are likely applicable to some
cess. Since commands to be executed during build jobs are degree. Nonetheless, replication studies using other CI ser-
specified in this file, anti-pattern 1 (i.e., redirecting scripts vices may yield further insight.
into interpreters) may occur in C IRCLE CI specifications.
C IRCLE CI users are also susceptible to the anti-pattern 2 9.3 Construct Validity
(i.e., bypassing security checks) because users can manually
set StrictHostKeyChecking=no in the config.yml Our proposed CI anti-patterns are subject to our interpre-
file, exposing the host to man-in-the-middle attacks, when tation. To mitigate this threat, we review T RAVIS CI doc-
executing commands that require an SSH connection.41 umentation and consult with the T RAVIS CI support team
when inconsistencies are encountered. Furthermore, the rate
C IRCLE CI is robust to anti-pattern 3 (i.e., using irrelevant
at which our pull requests are being accepted (73.47%) is
properties) because build jobs terminate immediately if an
suggestive of the value of addressing these anti-patterns.
unsupported property is processed in the config.yml file.
CI use and misuse statistics are computed using various
This behaviour differs from T RAVIS CI, where unsupported
scripts that we have written. These scripts may themselves
properties do not prevent build jobs from proceeding.
contain defects, which would affect our results. To address
C IRCLE CI users are susceptible to anti-pattern
this threat, we test our tools and scripts on subsamples of
4 (commands unrelated to the phase). Similar to
our datasets, and manually verify the results.
.travis.yml files, config.yml files have seven sections
The filters that we apply to remove small, inactive, and
that represent phases of the CI process (i.e., machine,
duplicated repositories from our corpus are based on thresh-
checkout, dependencies, database, compile, test,
olds, i.e., project size in files, project activity in commits, and
and deployment). Each phase has three sub-phases (i.e.,
rate of duplication in percentage of duplicated commits. The
pre, override, and post). Similar to Table 5, we can map
specific threshold values that we selected may impact our
commands to C IRCLE CI phases where they should appear.
observations. With this in mind, we did not select threshold
values arbitrarily. First, we analyze threshold plots to un-
9 T HREATS TO VALIDITY derstand the impact that various threshold values will have
This section describes the threats to the validity of our study. on the number of retained systems. Second, we perform
sensitivity analyses (Figures 6 and 7), where the impact of
selecting different thresholds is shown to be minimal.
9.1 Internal Validity
The list of anti-patterns that we present in the paper is not 10 R ELATED W ORK
exhaustive. However, to the best of our knowledge, this
paper is the first to define, detect, and remove anti-patterns In this section, we situate our work with respect to the liter-
in CI specifications. Our set of anti-patterns is a starting ature on continuous integration and configuration smells.
point for future studies to build upon. Future studies that
define anti-patterns using other data sources, e.g., developer 10.1 Continuous Integration
surveys [9], may prove fruitful. As a relatively new practice in software development, CI
H ANSEL uses a lightweight approach to detect instances has only just begun to attract the attention of software
of anti-patterns. A more rigorous analysis may uncover engineering researchers [2].
additional instances of anti-patterns. Thus, our anti-pattern Recent work has characterized CI practices and out-
frequency results should be interpreted as a lower bound. comes along different dimensions. Meyer [23] discussed
Projects may use T RAVIS CI without a .travis.yml features of the CI tools that were used by practitioners. He
file. In this case, the T RAVIS CI runtime assumes that the emphasizes the importance of good tooling, fully automated
project is using R UBY and would apply the conventional builds, fast test suites, feature-toggling, and monitoring
R UBY CI process. Since we are unable to identify such for CI. Ståhl and Bosch [30] also provided a systematic
projects automatically, we only consider projects with a overview of CI practices and their differences from a tech-
.travis.yml file in the root directory of the project. nical perspective. Vasilescu et al. [32] studied quality and
productivity outcomes of using CI. They find that teams
9.2 External Validity that are using CI are significantly more effective at merging
the pull requests of core members.
In terms of the generalizability of our results to other
In addition to positive outcomes, challenges and limita-
systems, we focus only on open source subject systems,
tions of CI have been pointed out by researchers. For exam-
which are hosted on G IT H UB and use T RAVIS CI as the CI
ple, Hilton et al. [15] analyzed open source projects from
service provider. G IT H UB is one of the most popular hosting
G IT H UB and surveyed developers to understand which
platforms for open source software projects and T RAVIS
CI systems developers use, how developers use CI, and
CI is the most widely adopted CI service among open
reasons for using CI (or not). They conclude that the main
source projects [15]. Therefore, our findings are applicable
reason why open source projects choose not to use CI is
to a large proportion of open source projects. Moreover,
that the developers are not familiar enough with it. In a
40. https://circleci.com/docs/2.0/
recent qualitative study, Hilton et al. [14] also found that,
41. https://discuss.circleci.com/t/add-known-hosts-on-startup- when adopting CI, developers face trade-offs between speed
via-config-yml-configuration/12022 and certainty, accessibility and security, and configurability
16
and ease of use. Laukkanen et al. [19] surveyed the recent negatively impact the understandability, testability, extensi-
literature for the problems, causes, and solutions when bility, and reusability of a software system. Moha et al. [25]
adopting continuous delivery. They point out large com- define smells as poor solutions to recurring implementation
mits, merge conflicts, broken builds, and slow integration and design problems. They also specify four well-known
approval as problems that are related to integration. By design smells and define their detection algorithms.
interviewing practitioners in 15 ICT companies, Leppänen The anti-patterns that we propose share similarities with
et al. [20] found that domain-imposed restrictions, resistance configuration smells defined in prior work. For example,
to change, customer needs, and developers’ skill and confi- since externally-hosted scripts are not analyzed by the
dence are adoption obstacles for continuous deployment. T RAVIS CI runtime, anti-pattern 1 (redirecting scripts into
Other works focus on improving specific stages of the CI interpreters) can lead to non-deterministic errors and non-
process. Beller et al. [4] studied testing practices in CI, partic- idempotence problems that were identified by Shambaugh et
ularly focusing on J AVA and R UBY projects. They conclude al. [28]. Alicherry and Keromytis [3] showed that trusting
testing is an established and integral part in the CI process of SSH hosts keys (also known as trust-on-first-use) exposes
open source software. However, Beller et al. [4] also observe hosts to man-in-the-middle attacks. Our anti-pattern 2 also
a latency of more than 20 minutes between writing code detects instances where users bypass ssh security measures
and receiving test feedback from CI when compared to the by disabling SSH host key checking. Our anti-pattern 3
fast-paced nature of testing in the local environments. They (using irrelevant properties) is similar to the Invalid Property
suggest that low test failure rates from CI are a sign that Value and Deprecated Statement Usage configuration smells
developers submit pre-tested contributions to CI. Similarly, proposed by Sharma et al. [29] and the Silent Failure problem
Elbaum et al. [12] propose algorithms based on test case proposed by Shambaugh et al. [28]. Finally, our anti-pattern
selection and prioritization techniques to make CI processes 4 (commands unrelated to the phase) is similar to Sharma
more cost effective. Other work has studied how to improve et al.’s Misplaced Attribute and Multifaceted Abstraction con-
the effectiveness of automated testing in CI [6], [10] and figuration smells [29]. Indeed, if dependency installation,
how CI can be extended to include additional performance compilation, and testing commands are all included in
and robustness tests when standard testing frameworks are the Script phase, the tasks in that phase are not cohesive,
insufficient for highly concurrent, real-time applications [7]. violating the single responsibility principle.
Our goal in this paper is to characterize the usage of
CI features by analyzing a large corpus of existing CI
specifications. Our work is complementary to prior studies, 11 C ONCLUSIONS
contributing to a larger understanding of how CI tools and
techniques are being adopted in real-world projects. CI has become a widely used practice among many software
teams today. A CI service typically consists of nodes for
creating, processing, and reporting of build jobs. To mitigate
10.2 Software Configuration Smells the overhead of maintaining and operating this infrastruc-
To the best of our knowledge, this paper is the first to ture themselves, many organizations are moving to cloud-
define, detect, and remove anti-patterns in CI specifications; based CI services. These services allow for customizing the
however, prior work has explored anti-patterns in the con- CI process using configuration files. Similar to programming
text of other configuration files. Brown et al. [5] published languages, the features in CI configuration files can be used
a catalog of anti-patterns and patterns for software con- and misused by the developers.
figuration management. Shambaugh et al. [28] proposed Through our study of 9,312 open source systems that use
R EHEARSAL, a verification tool for P UPPET configurations. T RAVIS CI, we make the following observations about the
Sharma et al. [29] have also recently explored smells that are use and misuse of CI specifications:
related to the P UPPET configuration management language. • Despite being the default T RAVIS CI language, R UBY is
They presented a set of implementation and design con- not the most popular language in our corpus of studied
figuration smells that violate recommended best practices. systems. N ODE . JS is the most popular language in our
Bent et al. [9] surveyed developers and used the findings corpus of studied systems (Observations 1 & 2).
to develop a P UPPET code quality analysis tool. Rahman • In terms of CI node configuration, sections that are re-
and Williams [26] applied text mining techniques to identify lated to job processing nodes appear in the most projects,
defects in P UPPET scripts, identifying file system operations, while for build process configuration, sections that are
infrastructure provisioning, and user account management related to the script phase appear in the most projects
properties as characteristics of defective P UPPET scripts. (Observations 3 & 4).
Jha et al. [17] proposed a static analysis tool for detecting • Job processing configuration and script phase config-
errors in configuration files of A NDROID apps. In an ex- uration have statistically distinct and higher ranks in
ploratory empirical study, Cito et al. [8] assessed the quality projects compared to other sections (Observation 5).
of D OCKER configuration files on G IT H UB, observing that • Although commands in the deploy phase appear only
they violate 3.1 linter rules on average. in 343 projects (3.68%), the median number of com-
Another related context is architectural or design smells. mands is comparable to other sections. (Observation 6)
Marinescu [21] has defined detection strategies for cap- • The CI code that configures job processing nodes ac-
turing important flaws of object-oriented design that were counts for the most modifications. In the projects that
reported in the literature. Garcia et al. [13] have defined ar- are modified, all sections are likely to be modified an
chitectural bad smells as architectural design decisions that equal number of times (Observations 7 & 8).
17