Tidy Portfoliomanagement in R

Sebastian Stöckl
Tidy Portfoliomanagement
in R
DEDICATION
Contents
List of Tables vii
List of Tables vii
List of Figures ix
List of Figures ix
0.1 Introduction . . . . . . . . . . . . . . . . . . . . xi
0.1.1 Introduction to Timeseries . . . . . . . . . xi
0.1.1.1 Date and Time . . . . . . . . . . xii
0.1.1.2 eXtensible Timeseries . . . . . . xxii
0.1.1.3 Downloading timeseries and basic

visualization with quantmod . . xxvii
0.1.2 Introduction to the tidyVerse . . . . . . xxx
0.1.2.1 Tibbles . . . . . . . . . . . . . . xxx
0.1.2.2 Summary statistics . . . . . . . xxxiii
0.1.2.3 Plotting . . . . . . . . . . . . . . xxxiii
0.2 Managing Data . . . . . . . . . . . . . . . . . . . xxxvi
0.2.1 Getting Data . . . . . . . . . . . . . . . . xxxvi
0.2.1.1 Downloading from Online Data-

sources . . . . . . . . . . . . . . xxxvi
0.2.1.2 Manipulate Data . . . . . . . . . l
0.3 Exploring Data . . . . . . . . . . . . . . . . . . . lvi
iii
iv Contents
0.3.1 Plotting Data . . . . . . . . . . . . . . . . lvii
0.3.1.1 Time-series plots . . . . . . . . . lvii
0.3.1.2 Box-plots . . . . . . . . . . . . . lvii
0.3.1.3 Histogram and Density Plots . . lvii
0.3.1.4 Quantile Plots . . . . . . . . . . lvii
0.3.2 Analyzing Data . . . . . . . . . . . . . . . lvii
0.3.2.1 Calculating Statistics . . . . . . lvii
0.3.2.2 Testing Data . . . . . . . . . . . lvii
0.3.2.3 Exposure to Factors . . . . . . . lvii
0.4 Managing Portfolios . . . . . . . . . . . . . . . . lviii
0.4.1 Introduction . . . . . . . . . . . . . . . . lviii
0.4.1.1 The portfolio.spec() Object . lviii
0.4.1.2 Constraints . . . . . . . . . . . . lxi
0.4.1.3 Objectives . . . . . . . . . . . . lxviii
0.4.1.4 Solvers . . . . . . . . . . . . . . lxxi
0.4.2 Mean-variance Portfolios . . . . . . . . . . lxxiii
0.4.2.1 Introduction and Theoretics . . . lxxiii
0.4.3 Mean-CVaR Portfolios . . . . . . . . . . . lxxiv
0.5 Managing Portfolios in the Real World . . . . . . lxxiv
0.5.1 Rolling Portfolios . . . . . . . . . . . . . . lxxiv
0.5.2 Backtesting . . . . . . . . . . . . . . . . . lxxiv
0.6 Further applications in Finance . . . . . . . . . . lxxiv
0.6.1 Portfolio Sorts . . . . . . . . . . . . . . . lxxiv
0.6.2 Fama-MacBeth-Regressions . . . . . . . . lxxiv
0.6.3 Risk Indices . . . . . . . . . . . . . . . . . lxxiv
0.7 References . . . . . . . . . . . . . . . . . . . . . lxxiv
.0.1 Introduction to R . . . . . . . . . . . . . . lxxiv

Contents v
.0.1.1 Getting started . . . . . . . . . . lxxv
.0.1.2 Working directory . . . . . . . . lxxv
.0.1.3 Basic calculations . . . . . . . . lxxvi
.0.1.4 Mapping variables . . . . . . . . lxxvii
.0.1.5 Sequences, vectors and matrices lxxx
.0.1.6 Vectors and matrices . . . . . . lxxxiii
.0.1.7 Functions in R . . . . . . . . . . lxxxv
.0.1.8 Plotting . . . . . . . . . . . . . . lxxxviii
.0.1.9 Control Structures . . . . . . . . lxxxix
Bibliography xci
Bibliography xci
List of Tables
vii
List of Figures
Preface
This book should accompany my lectures “Research Meth-

ods”, “Quantitative Analysis”, “Portoliomanagement and Finan-
cial Analysis” and (to a smaller degree) “Empirical Methods in
Finance”. In the past years I have been a heavy promoter of the
Rmetrics1 tools for my lectures and research. However, in the last
year the development of the project has stagnated due to the tragic
death of its founder Prof. Dr. Diethelm Würtz2 . It therefore hap-
pened several times that code from past semesters and lectures has
stopped working and no more support for the project was avail-
able.
Also, in the past year I have started to be a heavy user of the
tidyverse3 and the financial packages that have been developed
on top (e.g. tidyquant). Therefore I have taken the chance, to
put together some material from my lectures and start writing this
book. In structure it is kept similar to the excellent RMetrics book
Würtz et al. (2015) on Portfolio Optimization with R/Rmetrics4 ,
that I have been heavily using and recommending to my students
in the past years!
1
https://www.rmetrics.org/
2
https://www.rmetrics.org/about
3
https://www.tidyverse.org/
4
https://www.rmetrics.org/ebooks-portfolio
ix
x List of Figures
Why read this book
Because it may help my students :-)
Structure of the book
Not yet fixed. But the book will start with an introduction to the
most important tools for the portfolio analysis: timeseries and the
tidyverse. Afterwards, the possibilities of managing and explor-
ing financial data will be developed. Then we do portfolio opti-
mization for mean-Variance and Mean-CVaR portfolios. This will
be followed by a chapter on backtesting, before I show further ap-
plications in finance, such as predictions, portfolio sorting, Fama-
MacBeth-regressions etc.
Prerequisites
To start, install/load all necessary packages using the pacman-

package (the list will be expanded with the growth of the book).
pacman::p_load(tidyverse,tidyquant,PortfolioAnalytics,quantmod,PerformanceAna
tibbletime,timetk,ggthemes,timeDate,Quandl,alphavantager,readx
DEoptim,pso,GenSA,Rglpk,ROI,ROI.plugin.glpk,ROI.plugin.quadpro
Acknowledgments
I thank my family…
I especially thank the developers of:
• the excellent fPortfolio-Book

• the tidyquant package and its vignettes
• the PerformanceAnalytics developers and the package vi-
gnettes
• the portfolioAnalytics developers (currently working very
hard on the package) and its package vignettes
Introduction xi
Sebastian Stöckl University of Liechtenstein Vaduz, Liechtenstein
0.1 Introduction
0.1.1 Introduction to Timeseries
For an introduction to R see the Appendix @ref(ss_991IntrotoR)

Many of the datasets we will be working with have a (somehow reg-
ular) time dimension, and are therefore often called timeseries.
In R there are a variety of classes available to handle data, such
as vector, matrix, data.frame or their more modern implemen-
tation: tibble.[^According to the Vignette5 of the xts.] Adding a
time dimension creates a timeseries from these objects. The most
common/flexible package in R that handles timeseries based on
the first three formats is xts, which we will discuss in the fol-
lowing. Afterwards we will introduce the package timetk-package
that allows xts to interplay with tibbles to create the most pow-
erful framework to handle (even very large) time-based datasets
(as we often encounter in finance).
The community is currently working heavily to develop time-

aware tibbles to bring together the powerful grouping fea-
ture from the dplyr package (for tibbles) with the abbilities
of xts, which is the most powerful and most used timeseries
method in finance to date, due to the available interplay with
quantmod and other financial package. See also this link6 for
more information.
5
https://cran.r-project.org/web/packages/xts/vignettes/xts.pdf
xii List of Figures
All information regarding tibbles and the financial universe is

summarized and updated on the business-science.io-Website7 .
In the following, we will define a variety of date and time classes,

before we go and introduce xts, tibble and tibbletime. Most
of this packages come with some excellent vignettes, that I will
reference for further reading, while I will only pickup the necessary
features for portfolio management, which is the focus of this book.
0.1.1.1 Date and Time
There some basic functionalities in base-R, but most of the time

we will need additional functions to perform all necessary tasks.
Available date (time) classes are Date, POSIXct, (chron), yearmon,
yearqtr and timeDate (from the Rmetrics bundle).
0.1.1.1.1 Basic Date and Time Classes
There are several Date and Time Classes in R that can all be used
as time-index for xts. We start with the most basic as.Date()
d1 <- "2018-01-18"
str(d1) # str() checks the structure of the R-object
## chr "2018-01-18"
d2 <- as.Date(d1)
str(d2)
## Date[1:1], format: "2018-01-18"

Introduction xiii
In the second case, R automatically detects the format of the

Date-object, but if there is something more complex involved you
can specify the format (for all available format definitions, see
?strptime())
d3 <- "4/30-2018"
as.Date(d3, "%m/%d-%Y") # as.Date(d3) will not work
## [1] "2018-04-30"
If you are working with monthly or quarterly data, yearmon and

yearqtr will be your friends (both coming from the zoo-package
that serves as foundation for xts)
as.yearmon(d1); as.yearmon(as.Date(d3, "%m/%d-%Y"))
## [1] "Jan 2018"
## [1] "Apr 2018"
as.yearqtr(d1); as.yearqtr(as.Date(d3, "%m/%d-%Y"))
## [1] "2018 Q1"
## [1] "2018 Q2"
Note, that as.yearmon shows dates in terms of the current locale

of your computer (e.g. Austrian German). You can find out about
your locale with Sys.getlocale() and set a different locale with
Sys.setlocale()
xiv List of Figures
Sys.setlocale("LC_TIME","German_Austria")
## [1] "German_Austria.1252"
## [1] "Jän 2018"
## [1] "Apr 2018"
Sys.setlocale("LC_TIME","English")
## [1] "English_United States.1252"
## [1] "Jan 2018"
## [1] "Apr 2018"
When your data wants you to also include information on time,

then you will either need the POSIXct (which is the basic package
behind all times and dates in R) or the timeDate-package. The
latter one includes excellent abilities to work with financial data
(see the next section). Note that talking about time also requires
you to talk about timezones! We start with several examples of
the POSIXct-class:
Introduction xv
strptime("2018-01-15 13:55:23.975", "%Y-%m-%d %H:%M:%OS") # converts from ch
## [1] "2018-01-15 13:55:23 CET"
as.POSIXct("2009-01-05 14:19:12", format="%Y-%m-%d %H:%M:%S", tz="UTC")
## [1] "2009-01-05 14:19:12 UTC"
We will mainly use the timeDate-package that provides many use-

ful functions for financial timeseries.
An introduction to timeDate by the Rmetrics group can be

found at https://www.rmetrics.org/sites/default/files/2010-
02-timeDateObjects.pdf.
Dates <- c("1989-09-28","2001-01-15","2004-08-30","1990-02-09")

Times <- c( "23:12:55", "10:34:02", "08:30:00", "11:18:23")
DatesTimes <- paste(Dates, Times)
as.Date(DatesTimes)
## [1] "1989-09-28" "2001-01-15" "2004-08-30" "1990-02-09"
as.timeDate(DatesTimes)
## GMT
## [1] [1989-09-28 23:12:55] [2001-01-15 10:34:02] [2004-08-30 08:30:00]
## [4] [1990-02-09 11:18:23]
xvi List of Figures
You see, that the timeDate comes along with timezone information
(GMT) that is set to your computers locale. timeDate allows you
to specify the timezone of origin zone as well as the timezone to
transfer data to FinCenter:
timeDate(DatesTimes, zone = "Tokyo", FinCenter = "Zurich")
## Zurich
## [1] [1989-09-28 15:12:55] [2001-01-15 02:34:02] [2004-08-30 01:30:00]
## [4] [1990-02-09 03:18:23]
timeDate(DatesTimes, zone = "Tokyo", FinCenter = "NewYork")
## NewYork
## [1] [1989-09-28 10:12:55] [2001-01-14 20:34:02] [2004-08-29 19:30:00]
## [4] [1990-02-08 21:18:23]
timeDate(DatesTimes, zone = "NewYork", FinCenter = "Tokyo")
## Tokyo
## [1] [1989-09-29 12:12:55] [2001-01-16 00:34:02] [2004-08-30 21:30:00]
## [4] [1990-02-10 01:18:23]
listFinCenter("Europe/Vi*") # get a list of all financial centers available
## [1] "Europe/Vaduz" "Europe/Vatican" "Europe/Vienna"

## [4] "Europe/Vilnius" "Europe/Volgograd"
Introduction xvii
Date as well as the timeDate package allow you to create time

sequences (necessary if you want to manually create timeseries)
dates1 <- seq(as.Date("2017-01-01"), length=12, by="month"); dates1 # or to=
## [1] "2017-01-01" "2017-02-01" "2017-03-01" "2017-04-01" "2017-05-01"

## [6] "2017-06-01" "2017-07-01" "2017-08-01" "2017-09-01" "2017-10-01"
## [11] "2017-11-01" "2017-12-01"
dates2 <- timeSequence(from = "2017-01-01", to = "2017-12-31", by = "month");
## GMT
## [1] [2017-01-01] [2017-02-01] [2017-03-01] [2017-04-01] [2017-05-01]
## [6] [2017-06-01] [2017-07-01] [2017-08-01] [2017-09-01] [2017-10-01]
## [11] [2017-11-01] [2017-12-01]
Now there are several very useful functions in the timeDate pack-
age to determine first/last days of months/quarters/… (I let them
speak for themselves)
timeFirstDayInMonth(dates1 -7) # btw check the difference between "dates1-7"
## GMT
## [1] [2016-12-01] [2017-01-01] [2017-02-01] [2017-03-01] [2017-04-01]
## [6] [2017-05-01] [2017-06-01] [2017-07-01] [2017-08-01] [2017-09-01]
## [11] [2017-10-01] [2017-11-01]
xviii List of Figures
timeFirstDayInQuarter(dates1)
## GMT
## [1] [2017-01-01] [2017-01-01] [2017-01-01] [2017-04-01] [2017-04-01]
## [6] [2017-04-01] [2017-07-01] [2017-07-01] [2017-07-01] [2017-10-01]
## [11] [2017-10-01] [2017-10-01]
timeLastDayInMonth(dates1)
## GMT
## [1] [2017-01-31] [2017-02-28] [2017-03-31] [2017-04-30] [2017-05-31]
## [6] [2017-06-30] [2017-07-31] [2017-08-31] [2017-09-30] [2017-10-31]
## [11] [2017-11-30] [2017-12-31]
timeLastDayInQuarter(dates1)
## GMT
## [1] [2017-03-31] [2017-03-31] [2017-03-31] [2017-06-30] [2017-06-30]
## [6] [2017-06-30] [2017-09-30] [2017-09-30] [2017-09-30] [2017-12-31]
## [11] [2017-12-31] [2017-12-31]
timeNthNdayInMonth("2018-01-01",nday = 5, nth = 3) # useful for option expir
## GMT
## [1] [2018-01-19]
Introduction xix
timeNthNdayInMonth(dates1,nday = 5, nth = 3)
## GMT
## [1] [2017-01-20] [2017-02-17] [2017-03-17] [2017-04-21] [2017-05-19]
## [6] [2017-06-16] [2017-07-21] [2017-08-18] [2017-09-15] [2017-10-20]
## [11] [2017-11-17] [2017-12-15]
If one wants to create a more specific sequence of times, this can

be done with timeCalendar using time ‘atoms’:
timeCalendar(m = 1:4, d = c(28, 15, 30, 9), y = c(1989, 2001, 2004, 1990), F
## Europe/Zurich
## [1] [1989-01-28 01:00:00] [2001-02-15 01:00:00] [2004-03-30 02:00:00]
## [4] [1990-04-09 02:00:00]
timeCalendar(d=1, m=3:4, y=2018, h = c(9, 14), min = c(15, 23), s=c(39,41),
## Europe/Zurich
## [1] [2018-03-01 10:15:39] [2018-04-01 16:23:41]
0.1.1.1.2 Week-days and Business-days
One of the most important functionalities only existing in the

timeDate-package is the possibility to check for business days in
almost any timezone. The most important ones can be called by
holidayXXX()
xx List of Figures
holidayNYSE()
## NewYork
## [1] [2018-01-01] [2018-01-15] [2018-02-19] [2018-03-30] [2018-05-28]
## [6] [2018-07-04] [2018-09-03] [2018-11-22] [2018-12-25]
holiday(year = 2018, Holiday = c("GoodFriday","Easter","FRAllSaints"))
## GMT
## [1] [2018-03-30] [2018-04-01] [2018-11-01]
dateSeq <- timeSequence(Easter(year(Sys.time()), -14), Easter(year(Sys.time(
## GMT
## [1] [2018-03-18] [2018-03-19] [2018-03-20] [2018-03-21] [2018-03-22]
## [6] [2018-03-23] [2018-03-24] [2018-03-25] [2018-03-26] [2018-03-27]
## [11] [2018-03-28] [2018-03-29] [2018-03-30] [2018-03-31] [2018-04-01]
## [16] [2018-04-02] [2018-04-03] [2018-04-04] [2018-04-05] [2018-04-06]
## [21] [2018-04-07] [2018-04-08] [2018-04-09] [2018-04-10] [2018-04-11]
## [26] [2018-04-12] [2018-04-13] [2018-04-14] [2018-04-15]
dateSeq2 <- dateSeq[isWeekday(dateSeq)]; dateSeq2 # select only weekdays
## GMT
## [1] [2018-03-19] [2018-03-20] [2018-03-21] [2018-03-22] [2018-03-23]
## [6] [2018-03-26] [2018-03-27] [2018-03-28] [2018-03-29] [2018-03-30]
## [11] [2018-04-02] [2018-04-03] [2018-04-04] [2018-04-05] [2018-04-06]
## [16] [2018-04-09] [2018-04-10] [2018-04-11] [2018-04-12] [2018-04-13]
Introduction xxi
dayOfWeek(dateSeq2)
## 2018-03-19 2018-03-20 2018-03-21 2018-03-22 2018-03-23 2018-03-26

## "Mon" "Tue" "Wed" "Thu" "Fri" "Mon"
## 2018-03-27 2018-03-28 2018-03-29 2018-03-30 2018-04-02 2018-04-03
## "Tue" "Wed" "Thu" "Fri" "Mon" "Tue"
## 2018-04-04 2018-04-05 2018-04-06 2018-04-09 2018-04-10 2018-04-11
## "Wed" "Thu" "Fri" "Mon" "Tue" "Wed"
## 2018-04-12 2018-04-13
## "Thu" "Fri"
dateSeq3 <- dateSeq[isBizday(dateSeq, holidayZURICH(year(Sys.time())))]; dat
## GMT
## [1] [2018-03-19] [2018-03-20] [2018-03-21] [2018-03-22] [2018-03-23]
## [6] [2018-03-26] [2018-03-27] [2018-03-28] [2018-03-29] [2018-04-03]
## [11] [2018-04-04] [2018-04-05] [2018-04-06] [2018-04-09] [2018-04-10]
## [16] [2018-04-11] [2018-04-12] [2018-04-13]
dayOfWeek(dateSeq3)
## 2018-03-19 2018-03-20 2018-03-21 2018-03-22 2018-03-23 2018-03-26

## "Mon" "Tue" "Wed" "Thu" "Fri" "Mon"
## 2018-03-27 2018-03-28 2018-03-29 2018-04-03 2018-04-04 2018-04-05
## "Tue" "Wed" "Thu" "Tue" "Wed" "Thu"
## 2018-04-06 2018-04-09 2018-04-10 2018-04-11 2018-04-12 2018-04-13
## "Fri" "Mon" "Tue" "Wed" "Thu" "Fri"
Now, one of the strongest points for the timeDate package is made,
when one puts times and dates from different timezones together.
xxii List of Figures
This could be a challenging task (imagine hourly stock prices from

London, Tokyo and New York). Luckily the timeDate-package can
handle this easily:
ZH <- timeDate("2015-01-01 16:00:00", zone = "GMT", FinCenter = "Zurich")

NY <- timeDate("2015-01-01 18:00:00", zone = "GMT", FinCenter = "NewYork")
c(ZH, NY)
## Zurich
## [1] [2015-01-01 17:00:00] [2015-01-01 19:00:00]
c(NY, ZH) # it always takes the Financial Center of the first entry
## NewYork
## [1] [2015-01-01 13:00:00] [2015-01-01 11:00:00]
0.1.1.1.3 Assignments
Create a daily time series for 2018:
1. Find the subset of first and last days per month/quarter

(uniquely)
2. Take December 2017 and remove all weekends and holi-
days in Zurich (Tokyo)
3. create a series of five dates & times in New York. Show
them for New York, London and Belgrade
0.1.1.2 eXtensible Timeseries
The xts format is based on the timeseries format zoo, but extends
its power to be more compatible with other data classes. For ex-
ample, if one converts dates from the timeDate, xts will be so
Introduction xxiii
flexible as to memorize the financial center the dates were coming

from and upon retransformation to this class will be reassigned
values that would have been lost upon transformation to a pure
zoo-object. As quite often we (might) want to transform our data
to and from xts this is a great feature and makes our lifes a lot
easier. Also xts comes with a bundle of other features.
For the reader who wants to dig deeper,

we recommend the excellent zoo vignettes
(vignette("zoo-quickref"), vignette("zoo"),
vignette("zoo-faq"), vignette("zoo-design") and
vignette("zoo-read")). Read up on xts in vignette("xts")
and vignette("xts-faq").
To start, we create an xts object consisting of a series of randomly

created data points:
data <- rnorm(5) # 5 std. normally distributed random numbers

dates <- seq(as.Date("2017-05-01"), length=5, by="days")
xts1 <- xts(x=data, order.by=dates); xts1
## [,1]
## 2017-05-01 0.72838032
## 2017-05-02 0.47100977
## 2017-05-03 -0.04537768
## 2017-05-04 1.61845234
## 2017-05-05 0.07191067
coredata(xts1) # access data

xxiv List of Figures
## [,1]
## [1,] 0.72838032
## [2,] 0.47100977
## [3,] -0.04537768
## [4,] 1.61845234
## [5,] 0.07191067
index(xts1) # access time (index)
## [1] "2017-05-01" "2017-05-02" "2017-05-03" "2017-05-04" "2017-05-05"
Here, the xts object was built from a vector and a series of Dates.
We could also have used timeDate, yearmon or yearqtr and a
data.frame:
s1 <- rnorm(5); s2 <- 1:5

data <- data.frame(s1,s2)
dates <- timeSequence("2017-01-01",by="months",length.out=5,zone = "GMT")
xts2 <- xts(x=data, order.by=dates); xts2
## Warning: timezone of object (GMT) is different than current timezone ().
## s1 s2
## 2017-01-01 0.7462329 1
## 2017-02-01 -0.1551448 2
## 2017-03-01 -0.9693310 3
## 2017-04-01 0.3428151 4
## 2017-05-01 0.4692079 5
Introduction xxv
dates2 <- as.yearmon(dates)

xts3 <- xts(x=data, order.by = dates2)
In the next step we evaluate the merging of two timeseries:
set.seed(1)
xts3 <- xts(rnorm(6), timeSequence(from = "2017-01-01", to = "2017-06-01", b
xts4 <- xts(rnorm(5), timeSequence(from = "2017-04-01", to = "2017-08-01", b
colnames(xts3) <- "tsA"; colnames(xts4) <- "tsB"
merge(xts3,xts4)
Please be aware that joining timeseries in R does sometimes want

you to do a left/right/inner/outer join of the two objects
merge(xts3,xts4,join = "left")
merge(xts3,xts4,join = "right")
merge(xts3,xts4,join = "inner")
merge(xts3,xts4,join="outer",fill=0)
In the next step, we subset and replace parts of xts objects
xts5 <- xts(rnorm(24), timeSequence(from = "2016-01-01", to = "2017-12-01",

xts5["2017-01-01"]
xts5["2017-05-01/2017-08-12"]
xts5[c("2017-01-01","2017-05-01")] <- NA
xts5["2016"] <- 99
xts5["2016-05-01/"]
first(xts5)
last(xts5)
first(xts5,"3 months")
xts6 <- last(xts5,"1 year")
xxvi List of Figures
Now let us handle the missing value we introduced. One possi-

bility is just to omit the missing value using na.omit(). Other
possibilities would be to use the last value na.locf() or linear
interpolation with na.approx()
na.omit(xts6)
na.locf(xts6)
na.locf(xts6,fromLast = TRUE,na.rm = TRUE)
na.approx(xts6,na.rm = FALSE)
Finally, standard calculations can be done on xts objects, AND

there are some pretty helper functions to make life easier
periodicity(xts5)
nmonths(xts5); nquarters(xts5); nyears(xts5)
to.yearly(xts5)
to.quarterly(xts6)
round(xts6^2,2)
xts6[which(is.na(xts6))] <- rnorm(2)
# For aggregation of timeseries
ep1 <- endpoints(xts6,on="months",k = 2) # for aggregating timesries
ep2 <- endpoints(xts6,on="months",k = 3) # for aggregating timesries
period.sum(xts6, INDEX = ep2)
period.apply(xts6, INDEX = ep1, FUN=mean) # 2month means
period.apply(xts6, INDEX = ep2, FUN=mean) # 3month means
# Lead, lag and diff operations
cbind(xts6,lag(xts6,k=-1),lag(xts6,k=1),diff(xts6))
Finally, I will show some applications that go beyond xts, for ex-
ample the use of lapply to operate on a list
Introduction xxvii
# splitting timeseries (results is a list)

xts6_yearly <- split(xts5,f="years")
lapply(xts6_yearly,FUN=mean,na.rm=TRUE)
# using elaborate functions from the zoo-package
rollapply(as.zoo(xts6), width=3, FUN=sd) # rolling standard deviation
Last and least, we plot xts data and save it to a (csv) file, then
open it again:
tmp <- tempfile()

write.zoo(xts2,sep=",",file = tmp)
xts8 <- as.xts(read.zoo(tmp, sep=",", FUN=as.yearmon))
plot(xts8)
0.1.1.3 Downloading timeseries and basic visualization with quant-

mod
Many downloading and plotting functions are (still) available in

quantmod. We first require the package, then download data for
Google, Apple and the S&P500 from yahoo finance. Each of these
“Symbols” will be downloaded into its own “environment”. For
plotting there are a large variety of technical indicators available,
for an overview see here8 .
Quantmod is developed by Jeffrey Ryan and Joshua Ulrich9 and

has a homepage10 . The homepage includes an Introduction11 ,
describes how Data can be handled between xts and quantmod12
and has examples about Financial Charting with quantmod and
TTR13 . More documents will be developed within 2018.
8
https://www.r-bloggers.com/a-guide-on-r-quantmod-package-how-to-
get-started/
xxviii List of Figures
require(quantmod)
# the easiest form of getting data is for yahoo finance where you know the
getSymbols(Symbols = "AAPL", from="2010-01-01", to="2018-03-01", periodicity=
head(AAPL)
is.xts(AAPL)
plot(AAPL[, "AAPL.Adjusted"], main = "AAPL")
chartSeries(AAPL, TA=c(addVo(),addBBands(), addADX())) # Plot and add techni
getSymbols(Symbols = c("GOOG","^GSPC"), from="2000-01-01", to="2018-03-01", p
getSymbols('DTB3', src='FRED') # fred does not recognize from and to
Now we create an xts from all relevant parts of the data
stocks <- cbind("Apple"=AAPL[,"AAPL.Adjusted"],"Google"=GOOG[,"GOOG.Adjusted"

rf.daily <- DTB3["2010-01-01/2018-03-01"]
rf.monthly <- to.monthly(rf.daily)[,"rf.daily.Open"]
rf <- xts(coredata(rf.monthly),order.by = as.Date(index(rf.monthly)))
One possibility (that I adopted from (here)[https://www.

quantinsti.com/blog/an-example-of-a-trading-strategy-coded-in-
r/]) is to use the technical indicators provided by quantmod to
devise a technical trading strategy. We make use of a fast and
a slow moving average (function MACD in the TTR package that
belongs to quantmod). Whenever the fast moving average crosses
the slow moving one from below, we invest (there is a short term
trend to exploit) and we drop out of the investment once the red
(fast) line falls below the grey (slow) line. To evaluate the trading
strategy we need to also calculate returns for the S&P500 index
using ROC.
chartSeries(GSPC, TA=c(addMACD(fast=3, slow=12,signal=6,type=SMA)))

macd <- MACD(GSPC[,"GSPC.Adjusted"], nFast=3, nSlow=12,nSig=6,maType=SMA, per
buy_sell_signal <- Lag(ifelse(macd$macd < macd$signal, -1, 1))
Introduction xxix
buy_sell_returns <- (ROC(GSPC[,"GSPC.Adjusted"])*buy_sell_signal)["2001-06-01

portfolio <- exp(cumsum(buy_sell_returns)) # for nice plotting we assume tha
plot(portfolio)
For evaluation of trading strategies/portfolios and

other financial timeseries, almost every tool is available
through the package PerformanceAnalytics. In this case
charts.PerformanceSummary() calculates cumulative returns
(similar to above), monthly returns and maximum drawdown
(maximum loss in relation to best value, see here14 .
PerformanceAnalytics is a large package with an uncountable

variety of Tools. There are vignettes on the estimation of higher
order (co)moments vignette("EstimationComoments"), per-
formance attribution measures according to Bacon (2008)
vignette("PA-Bacon"), charting vignette("PA-charts") and
more that can be found on the PerformanceAnalytics cran
page15 .
require(PerformanceAnalytics)
rets <- cbind(buy_sell_returns,ROC(GSPC[,"GSPC.Adjusted"]))
colnames(rets) <- c("investment","benchmark")
charts.PerformanceSummary(rets,colorset=rich6equal)
chart.Histogram(rets, main = "Risk Measures", methods = c("add.density", "ad
14
https://de.wikipedia.org/wiki/Maximum_Drawdown
xxx List of Figures
0.1.2 Introduction to the tidyVerse
0.1.2.1 Tibbles
Since the middle of 2017 a lot of programmers have put in a huge

effort to rewrite many r functions and data objects in a tidy way
and thereby created the tidyverse16 .
For updates check the tidyverse homepage17 . A very well written

book introducing the tidyverse can be found online: R for Data
Science18 . The core of the tidyverse currently contains several
packages:
– ggplot2 for creating powerful graphs19 (see the

vignette("ggplot2-specs"))
– dplyr for data manipulation20 (see the
vignette("dplyr"))
– tidyr for tidying data21
– readr for importing datasets22 (see the
vignette("readr"))
– purrr for programming23 (see the “)
– tibble for modern data.frames24 (see the
vignette("tibble"))
and many more25 .
require(tidyverse) # install first if not yet there, update regularly: insta

require(tidyquant) # this package wraps all the quantmod etc packages into t
Most of the following is adapted from “Introduction to Statistical

Learning with Applications in R” by Gareth James, Daniela Wit-
ten, Trevor Hastie and Robert Tibshirani at http://www.science.
16
https://www.tidyverse.org/
Introduction xxxi
smith.edu/~jcrouser/SDS293/labs/. We begin by loading in the

Auto data set. This data is part of the ISLR package.
require(ISLR)
data(Auto)
Nothing happens when you run this, but now the data is avail-
able in your environment. (In RStudio, you would see the name
of the data in your Environment tab). To view the data, we can
either print the entire dataset by typing its name, or we can “slice”
some of the data off to look at just a subset by piping data us-
ing the %>% operator into the slice function. The piping operator
is one of the most useful tools of the tidyverse. Thereby you can
pipe command into command into command without saving and
naming each Intermittent step. The first step is to transform this
data.frame into a tibble (similar concept but better26 ). A tibble
has observations in rows and variables in columns. Those variables
can have many different formats:
Auto %>% slice(1:10)

tbs1 <- tibble(
Date = seq(as.Date("2017-01-01"), length=12, by="months"),
returns = rnorm(12),
letters = sample(letters, 12, replace = TRUE)
)
As you can see all three columns of tbs1 have different formats.
One can get the different variables by name and position. If you
want to use the pipe operator you need to use the special place-
holder ..
26
http://r4ds.had.co.nz/tibbles.html
xxxii List of Figures
tbs1$returns
tbs1[[2]]
tbs1 %>% .[[2]]
Before we go on an analysis a large tibble such as Auto, we quickly

talk about reading and saving files with tools from the tidyverse.
We save the file as csv using write_csv and read it back using
read_csv. because the columns of the read file are not in the exact
format as before, we use mutate to transform the columns.
Auto <- as.tibble(Auto) # make tibble from Auto

tmp <- tempfile()
write_csv(Auto,path = tmp) # write
Auto2 <- read_csv(tmp)
Auto2 <- Auto2 %>%
mutate(cylinders=as.double(cylinders),horsepower=as.double(horsepower),yea
all.equal(Auto,Auto2) # only the factor levels differ
Notice that the data looks just the same as when we loaded it from
the package. Now that we have the data, we can begin to learn
things about it.
dim(Auto)
str(Auto)
names(Auto)
The dim() function tells us that the data has 392 observations and
nine variables. The original data had some empty rows, but when
we read the data in R knew to ignore them. The str() function
tells us that most of the variables are numeric or integer, although
the name variable is a character vector. names() lets us check the
variable names.
Introduction xxxiii
0.1.2.2 Summary statistics
Often, we want to know some basic things about variables in our

data. summary() on an entire dataset will give you an idea of some
of the distributions of your variables. The summary() function pro-
duces a numerical summary of each variable in a particular data
set.
summary(Auto)
The summary suggests that origin might be better thought of

as a factor. It only seems to have three possible values, 1, 2 and
3. If we read the documentation about the data (using ?Auto)
we will learn that these numbers correspond to where the car is
from: 1. American, 2. European, 3. Japanese. So, lets mutate()
that variable into a factor (categorical) variable.
Auto <- Auto %>%

mutate(origin = factor(origin))
summary(Auto)
0.1.2.3 Plotting
We can use the ggplot2 package to produce simple graphics.

ggplot2 has a particular syntax, which looks like this
ggplot(Auto) + geom_point(aes(x=cylinders, y=mpg))
The basic idea is that you need to initialize a plot with ggplot()
and then add “geoms” (short for geometric objects) to the plot.
xxxiv List of Figures
The ggplot2 package is based on the Grammar of Graphics27 ,

a famous book on data visualization theory. It is a way to map
attributes in your data (like variables) to “aesthetics” on the plot.
The parameter aes() is short for aesthetic.
For more about the ggplot2 syntax, view the help by typing
?ggplot or ?geom_point. There are also great online resources
for ggplot2, like the R graphics cookbook28 .
The cylinders variable is stored as a numeric vector, so R has
treated it as quantitative. However, since there are only a small
number of possible values for cylinders, one may prefer to treat it
as a qualitative variable. We can turn it into a factor, again using
a mutate() call.
Auto = Auto %>%

mutate(cylinders = factor(cylinders))
To view the relationship between a categorical and a numeric vari-

able, we might want to produce boxplots. As usual, a number of
options can be specified in order to customize the plots.
ggplot(Auto) + geom_boxplot(aes(x=cylinders, y=mpg)) + xlab("Cylinders") + y
The geom geom_histogram() can be used to plot a histogram.
27
https://www.google.com/url?sa=t&rct=j&q=
&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=
0ahUKEwjV6I6F4ILPAhUFPT4KHTFiBwgQFggcMAA&
url=https%3A%2F%2Fwww.amazon.com%2FGrammar-
Graphics-Statistics-Computing%2Fdp%2F0387245448&
usg=AFQjCNF5D6H3ySCsgqBTdp96KNF3bGyU2Q&sig2=
GnNgoN6Ztn3AJSTJYaMPwA
28
http://www.cookbook-r.com/Graphs/
Introduction xxxv
ggplot(Auto) + geom_histogram(aes(x=mpg), binwidth=5)
For small datasets, we might want to see all the bivariate relation-
ships between the variables. The GGally package has an exten-
sion of the scatterplot matrix that can do just that. We make use
of the select operator to only select the two variables mpg and
cylinders and pipe it into the ggpairs() function
Auto %>% select(mpg, cylinders) %>% GGally::ggpairs()
Because there are not many cars with 3 and 5 cylinders we use
filter to only select those cars with 4, 6 and 8 cylinders.
Auto %>% select(mpg, cylinders) %>% filter(cylinders %in% c(4,6,8)) %>% GGal
Sometimes, we might want to save a plot for use outside of R. To

do this, we can use the ggsave() function.
ggsave("histogram.png",ggplot(Auto) + geom_histogram(aes(x=mpg), binwidth=5)
TO DO: * Tidyquant: Document more technical features. * For

extensive manipulations a la timeseries, there is an extension of the
tibble objects: time aware tibbles, that allow for many of the
xts functionality without the necessary conversion tibbletime29 .
29
https://github.com/business-science/tibbletime
xxxvi List of Figures
0.2 Managing Data
In this chapter we will learn how to download/import data from

various sources. Most importantly we will use the quantmod library
through tidyquant to download financial data from a variety of
sources. We will also lear how to import ‘.xlsx’ (Excel) files.
0.2.1 Getting Data
0.2.1.1 Downloading from Online Datasources
The tidyquant package comes with a variiety of readily compiled

datasets/datasources. For whole collections of data, there are the
following commands available
tq_exchange_options() # find all exchanges available
## [1] "AMEX" "NASDAQ" "NYSE"
tq_index_options() # find all indices available
## [1] "RUSSELL1000" "RUSSELL2000" "RUSSELL3000" "DOW" "DOWGLOBAL"

## [6] "SP400" "SP500" "SP600" "SP1000"
tq_get_options() # find all data sources available
## [1] "stock.prices" "stock.prices.google" "stock.prices.japan"

## [4] "financials" "key.ratios" "dividends"
## [7] "splits" "economic.data" "exchange.rates"
## [10] "metal.prices" "quandl" "quandl.datatable"
## [13] "alphavantager" "rblpapi"
Managing Data xxxvii
The commands tq_exchange() and tq_index() will now get you

all symbols and some additional information on the stock listed at
that exchange or contained in that index.30
glimpse(sp500)
## Observations: 504
## Variables: 5
## $ symbol <chr> "AAPL", "MSFT", "AMZN", "BRK.B", "FB", "JPM", "JNJ...
## $ company <chr> "Apple Inc.", "Microsoft Corporation", "Amazon.com...
## $ weight <dbl> 0.044387857, 0.035053855, 0.032730459, 0.016868330...
## $ sector <chr> "Information Technology", "Information Technology"...
## $ shares_held <dbl> 53939268, 84297440, 4418447, 21117048, 26316160, 3...
glimpse(nyse)
## Observations: 3,139
## Variables: 7
## $ symbol <chr> "DDD", "MMM", "WBAI", "WUBA", "EGHT", "AHC", "...
## $ company <chr> "3D Systems Corporation", "3M Company", "500.c...
## $ last.sale.price <dbl> 18.4800, 206.7100, 11.6400, 68.1800, 23.2000, ...
## $ market.cap <chr> "$2.11B", "$121.26B", "$491.85M", "$10.06B", "...
## $ ipo.year <dbl> NA, NA, 2013, 2013, NA, NA, 2014, 2014, NA, NA...
## $ sector <chr> "Technology", "Health Care", "Consumer Service...
## $ industry <chr> "Computer Software: Prepackaged Software", "Me...
glimpse(nasdaq)
30
Note that tq_index() unfortunately makes use of the package XLConnect
that requires Java to be installed on your system.
xxxviii List of Figures
## Variables: 7
## $ symbol <chr> "YI", "PIH", "PIHPP", "TURN", "FLWS", "FCCY", ...
## $ company <chr> "111, Inc.", "1347 Property Insurance Holdings...
## $ last.sale.price <dbl> 13.800, 6.350, 25.450, 2.180, 11.550, 20.150, ...
## $ market.cap <chr> NA, "$38M", NA, "$67.85M", "$746.18M", "$168.8...
## $ ipo.year <dbl> 2018, 2014, NA, NA, 1999, NA, NA, 2011, 2014, ...
## $ sector <chr> NA, "Finance", "Finance", "Finance", "Consumer...
## $ industry <chr> NA, "Property-Casualty Insurers", "Property-Ca...
The datset we will be using consists of the ten largest stocks within
the S&P500 that had an IPO before January 2000. Therefore we
need to merge both datasets using inner_join() because we only
want to keep symbols from the S&P500 that are also traded on
NYSE or NASDAQ:
stocks.selection <- sp500 %>%

inner_join(rbind(nyse,nasdaq) %>% select(symbol,last.sale.price,market.cap
filter(ipo.year<2000&!is.na(market.cap)) %>% # filter years with ipo<2000
arrange(desc(weight)) %>% # sort in descending order
slice(1:10)
The ten largest stocks in the S&P500 with a history longer than
January 2000.
symbol
company
weight
sector
shares_held
last.sale.price
market.cap
ipo.year
Managing Data xxxix
AAPL
Apple Inc.
0.044
Information Technology
53939268
221.07
$1067.75B
1980
MSFT
Microsoft Corporation
0.035
84297440
111.71
$856.62B
1986
AMZN
Amazon.com Inc.
0.033
Consumer Discretionary
4418447
1990.00
$970.6B
1997
CSCO
Cisco Systems Inc.
0.009
xl List of Figures
51606584
46.89
$214.35B
1990
NVDA
NVIDIA Corporation
0.007
6659463
268.20
$163.07B
1999
ORCL
Oracle Corporation
0.006
32699620
49.34
$196.43B
1986
AMGN
Amgen Inc.
0.005
Health Care
7306144
199.50
Managing Data xli
$129.13B
1983
ADBE
Adobe Systems Incorporated
0.005
5402625
267.79
$131.13B
1986
QCOM
QUALCOMM Incorporated
0.004
15438597
71.75
$105.41B
1991
GILD
Gilead Sciences Inc.
0.004
Health Care
14310276
73.97
$95.89B
1992
In a next step, we will download stock prices from yahoo.
xlii List of Figures
Data from that source usually comes in the OHLC format

(open,high,low,close) with additional information (volume, ad-
justed). We will additionall download data for the S&P500-index
itself. Note, that we get daily prices:
stocks.prices <- stocks.selection$symbol %>%

tq_get(get = "stock.prices",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
index.prices <- "^GSPC" %>%
tq_get(get = "stock.prices",from = "2000-01-01",to = "2017-12-31")
stocks.prices %>% slice(1:2) # show the first two entries of each group
## # A tibble: 20 x 8
## # Groups: symbol [10]
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2000-01-03 3.75 4.02 3.63 4.00 133949200 3.54
## 2 AAPL 2000-01-04 3.87 3.95 3.61 3.66 128094400 3.24
## 3 ADBE 2000-01-03 16.8 16.9 16.1 16.4 7384400 16.1
## 4 ADBE 2000-01-04 15.8 16.5 15.0 15.0 7813200 14.8
## 5 AMGN 2000-01-03 70 70 62.9 62.9 22914900 53.5
## 6 AMGN 2000-01-04 62 64.1 57.7 58.1 15052600 49.4
## 7 AMZN 2000-01-03 81.5 89.6 79.0 89.4 16117600 89.4
## 8 AMZN 2000-01-04 85.4 91.5 81.8 81.9 17487400 81.9
## 9 CSCO 2000-01-03 55.0 55.1 51.8 54.0 53076000 43.6
## 10 CSCO 2000-01-04 52.8 53.5 50.9 51 50805600 41.2
## 11 GILD 2000-01-03 1.79 1.80 1.72 1.76 54070400 1.61
## 12 GILD 2000-01-04 1.70 1.72 1.66 1.68 38960000 1.54
## 13 MSFT 2000-01-03 58.7 59.3 56 58.3 53228400 42.5
## 14 MSFT 2000-01-04 56.8 58.6 56.1 56.3 54119000 41.0
## 15 NVDA 2000-01-03 3.94 3.97 3.68 3.90 7522800 3.61
## 16 NVDA 2000-01-04 3.83 3.84 3.60 3.80 7512000 3.51
## 17 ORCL 2000-01-03 31.2 31.3 27.9 29.5 98114800 26.4
## 18 ORCL 2000-01-04 28.9 29.7 26.2 26.9 116824800 24.0
## 19 QCOM 2000-01-03 99.6 100 87 89.7 91334000 65.7
## 20 QCOM 2000-01-04 86.3 87.7 80 81.0 63567400 59.4
Managing Data xliii
Dividends and stock splits can also be downloaded:
stocks.dividends <- stocks.selection$symbol %>%

tq_get(get = "dividends",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
stocks.splits <- stocks.selection$symbol %>%
tq_get(get = "splits",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
We additionally can download financial for the different stocks.

Therein we have key ratios (financials, profitability, growth,
cash flow, financial health, efficiency ratios and valuation
ratios). These ratios are from Morningstar31 and come in a nested
form, that we will have to ‘dig out’ using unnest.
stocks.ratios <- stocks.selection$symbol %>%

tq_get(get = "key.ratios",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
## # A tibble: 42 x 3
## symbol section data
## <chr> <chr> <list>
## 1 AAPL Financials <tibble [150 x 5]>
## 2 AAPL Profitability <tibble [170 x 5]>
## 3 AAPL Growth <tibble [160 x 5]>
## 4 AAPL Cash Flow <tibble [50 x 5]>
## 5 AAPL Financial Health <tibble [240 x 5]>
## 6 AAPL Efficiency Ratios <tibble [80 x 5]>
## 7 AAPL Valuation Ratios <tibble [40 x 5]>
## 8 MSFT Financials <tibble [150 x 5]>
## 9 MSFT Profitability <tibble [170 x 5]>
31
http://www.morningstar.com/
xliv List of Figures
## 10 MSFT Growth <tibble [160 x 5]>

## # ... with 32 more rows
We find that financial ratios are only available for a subset of

the ten stocks. We first filter for the ‘Growth’-information, then
unnest() the nested tibbles and filter again for ‘EPS %’ and the
‘Year over Year’ information. Then we use ggplot() to plot the
timeseries of Earnings per Share for the different companies.
stocks.ratios %>% filter(section=="Growth") %>% unnest() %>%

filter(sub.section=="EPS %",category=="Year over Year") %>%
ggplot(aes(x=date,y=value,color=symbol)) + geom_line(lwd=1.1) +
labs(title="Year over Year EPS in %", x="",y="") +
theme_tq() + scale_color_tq()
Year over Year EPS in %

300
200
100
-100
2010 2012 2014 2016 2018
AAPL AMZN MSFT

symbol
AMGN CSCO NVDA
A variety of other (professional) data services are available, that

are integrated into tidyquant which I will list in the following
subsections:
Managing Data xlv
0.2.1.1.1 Quandl
Quandl32 provides access to many different financial and economic

databases. To use it, one should acquire an api key by creating a
Quandl account.33 Searches can be done using quandl_search()
(I personally would use their homepage to do that). Data can
be downloaded as before with tq_get(), be aware that you can
download either single timeseries or entire datatables with the ar-
guments get = "quandl" and get = "quandl.datatable". Note
that in the example for ‘Apple’ below, the adjusted close prices are
different from the ones of Yahoo. An example for a datatable is
Zacks Fundamentals Collection B34 .
quandl_api_key("enter-your-api-key-here")
quandl_search(query = "Oil", database_code = "NSE", per_page = 3)
quandl.aapl <- c("WIKI/AAPL") %>%
tq_get(get = "quandl",
from = "2000-01-01",
to = "2017-12-31",
column_index = 11, # numeric column number (e.g. 1)
collapse = "daily", # can be “none”, “daily”, “weekly”, “mon
transform = "none") # for summarizing data: “none”, “diff”,
## Oil India Limited

## Code: NSE/OIL
## Desc: Historical prices for Oil India Limited National Stock Exchan
## Freq: daily
## Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Tur
##
## Oil Country Tubular Limited
## Code: NSE/OILCOUNTUB
32
https://www.quandl.com/
33
If you do not use an API key, you are limited to 50 calls per day.
34
https://www.quandl.com/databases/ZFB/documentation/about
xlvi List of Figures
## Desc: Historical prices for Oil Country Tubular Limited National St

## Freq: daily
##
## Essar Oil Limited
## Code: NSE/ESSAROIL
## Desc: Historical prices for Essar Oil Limited National Stock Exchan
## Freq: daily
## # A tibble: 3 x 13
## id dataset_code database_code name description refreshed_at
## * <int> <chr> <chr> <chr> <chr> <chr>
## 1 6668 OIL NSE Oil ~ Historical~ 2018-09-13T~
## 2 6669 OILCOUNTUB NSE Oil ~ Historical~ 2018-09-13T~
## 3 6041 ESSAROIL NSE Essa~ Historical~ 2016-02-09T~
## # ... with 7 more variables: newest_available_date <chr>,
## # oldest_available_date <chr>, column_names <list>, frequency <chr>,
## # type <chr>, premium <lgl>, database_id <int>
## # A tibble: 5 x 12
## date open high low close volume ex.dividend split.ratio
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2000-01-03 105. 112. 102. 112. 4.78e6 0 1
## 2 2000-01-04 108. 111. 101. 102. 4.57e6 0 1
## 3 2000-01-05 104. 111. 103 104 6.95e6 0 1
## 4 2000-01-06 106. 107 95 95 6.86e6 0 1
## 5 2000-01-07 96.5 101 95.5 99.5 4.11e6 0 1
## # ... with 4 more variables: adj.open <dbl>, adj.high <dbl>,
## # adj.low <dbl>, adj.close <dbl>
0.2.1.1.2 Alpha Vantage
Alpha Vantage35 provides access to a real-time and historical fi-

nancial data. Here we also need to get and set an api key (for
free).
35
https://www.alphavantage.co
Managing Data xlvii
av_api_key("enter-your-api-key-here")
alpha.aapl <- c("AAPL") %>%
tq_get(get = "alphavantager",
av_fun="TIME_SERIES_DAILY_ADJUSTED") # for daily data
alpha.aapl.id <- c("AAPL") %>%
tq_get(get = "alphavantager",
av_fun="TIME_SERIES_INTRADAY", # for intraday data
interval="5min") # 5 minute intervals
## # A tibble: 5 x 9
## timestamp open high low close adjusted_close volume dividend_amount
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
## 1 2018-04-24 166. 166. 161. 163. 162. 3.37e7 0
## 2 2018-04-25 163. 165. 162. 164. 162. 2.84e7 0
## 3 2018-04-26 164. 166. 163. 164. 163. 2.80e7 0
## 4 2018-04-27 164 164. 161. 162. 161. 3.57e7 0
## 5 2018-04-30 162. 167. 162. 165. 164. 4.24e7 0
## # ... with 1 more variable: split_coefficient <dbl>
## timestamp open high low close volume
## <dttm> <dbl> <dbl> <dbl> <dbl> <int>
## 1 2018-09-11 14:25:00 224. 224. 224. 224. 261968
## 2 2018-09-11 14:30:00 224. 224. 224. 224. 334069
## 3 2018-09-11 14:35:00 224. 224. 224. 224. 285138
## 4 2018-09-11 14:40:00 224. 224. 224. 224. 229329
## 5 2018-09-11 14:45:00 224. 224. 224. 224. 193316
0.2.1.1.3 FRED (Economic Data)
A large quantity of economic data can be extracted from the Fed-

eral Reserve Economic Data (FRED) database36 . Below we down-
load the 1M- and 3M- risk-free-rate for the US. Note that these
are annualized rates!
36
https://fred.stlouisfed.org/
xlviii List of Figures
ir <- tq_get(c("TB1YR","TB3MS"), get = "economic.data") %>%

group_by(symbol)
## symbol date price
## <chr> <date> <dbl>
## 1 TB1YR 2018-08-01 2.36
## 2 TB1YR 2018-07-01 2.31
## 3 TB1YR 2018-06-01 2.25
## 4 TB3MS 2018-08-01 2.03
## 5 TB3MS 2018-07-01 1.96
## 6 TB3MS 2018-06-01 1.9
0.2.1.1.4 OANDA (Exchange Rates and Metal Prices)
Oanda37 provides a large quantity of exchange rates (currently

only for the last 180 days). Enter them as currency pairs using “/”
notation (e.g “EUR/USD”), and set get = "exchange.rates".
Note that most of the data (having a much larger horizon) is also
available on FRED.
eur_usd <- tq_get("EUR/USD",

get = "exchange.rates",
from = Sys.Date() - lubridate::days(10))
plat_price_eur <- tq_get("plat", get = "metal.prices",
from = Sys.Date() - lubridate::days(10),
base.currency = "EUR")
eur_usd %>% arrange(desc(date)) %>% slice(1:3)
37
https://www.oanda.com
Managing Data xlix
## date exchange.rate
## <date> <dbl>
## 1 2018-09-12 1.16
## 2 2018-09-11 1.16
## 3 2018-09-10 1.16
plat_price_eur %>% arrange(desc(date)) %>% slice(1:3)
## date price
## <date> <dbl>
## 1 2018-09-12 681.
## 2 2018-09-11 681.
## 3 2018-09-10 680.
0.2.1.1.5 Bloomberg and Datastream
Bloomberg is officially integrated into the tidyquant-package, but

one needs to have Bloomberg running on the terminal one is using.
Datastream is not integrated but has a nice R-Interface in the
package rdatastream38 . However, you need to have the Thomson
Dataworks Enterprise SOAP API (non free)39 licensed, then the
package allows for convienient retrieval of data. If this is not the
case, then you have to manually retrieve your data, save it as
“.xlsx” Excel-file that we can import using readxl::read_xlsx()
from the readxl-package.
0.2.1.1.6 Fama-French Data (Kenneth French’s Data Library)
To download Fama-French data in batch there is a package

38
https://github.com/fcocquemas/rdatastream
39
http://dataworks.thomson.com/Dataworks/Enterprise/1.0/
l List of Figures
FFdownload that I updated and that now can be installed

via devtools::install_bitbucket("sstoeckl/FFdownload").
Currently you can either download all data or skip the (large)
daily files using the command exclude_daily=TRUE. The result
is a list of data.frames that has to be cleaned somehow but
nonetheless is quite usable.
FFdownload(output_file = "FFdata.RData", # output file for the final

tempdir = NULL, # where should the temporary downloads go to (cre
exclude_daily = TRUE, # exclude daily data
download = FALSE) # if false, data already in the temp-directory
load(file = "FFdata.RData")
factors <- FFdownload$`x_F-F_Research_Data_Factors`$monthly$Temp2 %>%
tk_tbl(rename_index="date") %>% # make tibble
mutate(date=as.Date(date, frac=1)) %>% # make proper month-end dat
gather(key=FFvar,value = price,-date) # gather into tidy format
factors %>% group_by(FFvar) %>% slice(1:2)
## # Groups: FFvar [4]
## date FFvar price
## <date> <chr> <dbl>
## 1 1926-07-31 HML -2.87
## 2 1926-08-31 HML 4.19
## 3 1926-07-31 Mkt.RF 2.96
## 4 1926-08-31 Mkt.RF 2.64
## 5 1926-07-31 RF 0.22
## 6 1926-08-31 RF 0.25
## 7 1926-07-31 SMB -2.3
## 8 1926-08-31 SMB -1.4
0.2.1.2 Manipulate Data
A variety of transformations can be applied to (financial) time-

series data. We will present some examples merging together our
Managing Data li
stock file with the index, the risk free rate from FRED and the
Fama-French-Factors.
Doing data transformations in tidy datasets is either called
a transmute (change variable/dataset, only return calculated
column) or a mutate() (add transformed variable). In the
tidyquant-package these functions are called tq_transmute and
tq_mutate, because they simultaneously allow changes of periodic-
ity (daily to monthly) and therefore the returned dataset can have
less rows than before. The core of these functions is the provision
of a mutate_fun that can come from the the xts/zoo, quantmod
(Quantitative Financial Modelling & Trading Framework for R40 )
and TTR (Technical Trading Rules41 ) packages.
In the examples below, we show how to change the periodicity of
the data (where we keep the adjusted close price and the volume
information) and calculate monthly log returns for the ten stocks
and the index. We then merge the price and return information
for each stock, and at each point in time add the return of the
S&P500 index and the 3 Fama-French-Factors.
stocks.prices.monthly <- stocks.prices %>%

tq_transmute(select = c(adjusted,volume), # which column t
mutate_fun = to.monthly, # funtion: make
indexAt = "lastof") %>% # ‘yearmon’, ‘ye
ungroup() %>% mutate(date=as.yearmon(date))
stocks.returns <- stocks.prices %>%
tq_transmute(select = adjusted,
mutate_fun = periodReturn, # create monthly
period="monthly",
type="arithmetic") %>%
ungroup() %>% mutate(date=as.yearmon(date))
index.returns <- index.prices %>%
tq_transmute(select = adjusted,mutate_fun = periodReturn,
period="monthly", type="arithmetic") %>%
40
https://www.quantmod.com/
41
https://www.rdocumentation.org/packages/TTR/
lii List of Figures
mutate(date=as.yearmon(date))
factors.returns <- factors %>% mutate(price=price/100) %>% # already is mon
mutate(date=as.yearmon(date))
stocks.prices.monthly %>% ungroup() %>% slice(1:5) # show first 5 entries
## symbol date adjusted volume
## <chr> <S3: yearmon> <dbl> <dbl>
## 1 AAPL Jan 2000 3.28 175420000
## 2 AAPL Feb 2000 3.63 92240400
## 3 AAPL Mrz 2000 4.30 101158400
## 4 AAPL Apr 2000 3.93 62395200
## 5 AAPL Mai 2000 2.66 108376800
stocks.returns %>% ungroup() %>% slice(1:5) # show first 5 entries
## symbol date monthly.returns
## <chr> <S3: yearmon> <dbl>
## 1 AAPL Jan 2000 -0.0731
## 2 AAPL Feb 2000 0.105
## 3 AAPL Mrz 2000 0.185
## 4 AAPL Apr 2000 -0.0865
## 5 AAPL Mai 2000 -0.323
index.returns %>% ungroup() %>% slice(1:5) # show first 5 entries
## date monthly.returns
## <S3: yearmon> <dbl>
Managing Data liii
## 1 Jan 2000 -0.0418

## 2 Feb 2000 -0.0201
## 3 Mrz 2000 0.0967
## 4 Apr 2000 -0.0308
## 5 Mai 2000 -0.0219
factors.returns %>% ungroup() %>% slice(1:5) # show first 5 entries
## date FFvar price
## <S3: yearmon> <chr> <dbl>
## 1 Jul 1926 Mkt.RF 0.0296
## 2 Aug 1926 Mkt.RF 0.0264
## 3 Sep 1926 Mkt.RF 0.0036
## 4 Okt 1926 Mkt.RF -0.0324
## 5 Nov 1926 Mkt.RF 0.0253
Now, we merge all the information together
## # A tibble: 5 x 10
## symbol date return adjusted volume sp500 Mkt.RF SMB HML
## <chr> <S3:> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL Jan ~ -0.0731 3.28 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
## 2 AAPL Feb ~ 0.105 3.63 9.22e7 -0.0201 0.0245 0.221 -0.106
## 3 AAPL Mrz ~ 0.185 4.30 1.01e8 0.0967 0.052 -0.173 0.0794
## 4 AAPL Apr ~ -0.0865 3.93 6.24e7 -0.0308 -0.064 -0.0771 0.0856
## 5 AAPL Mai ~ -0.323 2.66 1.08e8 -0.0219 -0.0442 -0.0501 0.0243
## # ... with 1 more variable: RF <dbl>
Now we can calculate and add additional information, such as

the MACD (Moving Average Convergence/Divergence42 ) and its
driving signal. Be aware, that you have to group_by symbol, or
the signal would just be calculated for one large stacked timeseries:
42
https://en.wikipedia.org/wiki/MACD
liv List of Figures
stocks.final %>% group_by(symbol) %>%

tq_mutate(select = adjusted,
mutate_fun = MACD,
col_rename = c("MACD", "Signal")) %>%
select(symbol,date,adjusted,MACD,Signal) %>%
tail() # show last part of the dataset
## symbol date adjusted MACD Signal
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 GILD 2018. 73.4 -5.40 -4.38
## 2 GILD 2018. 80.8 -3.86 -4.27
## 3 GILD 2018. 78.7 -2.85 -3.99
## 4 GILD 2018. 72.8 -2.68 -3.73
## 5 GILD 2018. 72.6 -2.52 -3.49
## 6 GILD 2018. 70.0 -2.66 -3.32
save(stocks.final,file="stocks.RData")
0.2.1.2.1 Rolling functions
One of the most important functions you will need in reality is

the possibility to perform a rolling analysis. One example would
be a rolling regression to get time varying α and β of each stock
with respect to the index or the Fama-French-Factors. To do that
we need to create a function that does everything we want in one
step:
regr_fun <- function(data,formula) {

coef(lm(formula, data = timetk::tk_tbl(data, silent = TRUE)))
}
Managing Data lv
This function takes a dataset and a regression formula as input,

performs a regression and returns the coefficients, as well as the
residual standard deviation and the respective R2
Step 2: Create a custom function
Next, create a custom regression function, which will be used to
apply over the rolling window in Step 3. An important point is
that the “data” will be passed to the regression function as an xts
object. The timetk::tk_tbl function takes care of converting to a
data frame for the lm function to work properly with the columns
“fb.returns” and “xlk.returns”.
regr_fun <- function(data) { coef(lm(fb.returns ~ xlk.returns,
data = timetk::tk_tbl(data, silent = TRUE))) } Step 3: Apply
the custom function
Now we can use tq_mutate() to apply the custom regression func-
tion over a rolling window using rollapply from the zoo package.
Internally, since we left select = NULL, the returns_combined
data frame is being passed automatically to the data argument of
the rollapply function. All you need to specify is the mutate_fun
= rollapply and any additional arguments necessary to apply the
rollapply function. We’ll specify a 12 week window via width = 12.
The FUN argument is our custom regression function, regr_fun.
It’s extremely important to specify by.column = FALSE, which
tells rollapply to perform the computation using the data as a
whole rather than apply the function to each column indepen-
dently. The col_rename argument is used to rename the added
columns.
returns_combined %>% tq_mutate(mutate_fun = rollapply,
width = 12, FUN = regr_fun, by.column = FALSE, col_rename
= c(“coef.0”, “coef.1”))
As shown above, the rolling regression coefficients were added to
the data frame.
Also check out the functionality of tibbletime for that task

(rollify)!
lvi List of Figures
0.3 Exploring Data
In this chapter we show how to explore and analyze data using the
dataset created in Chapter @ref(#s_2Data):
load("stocks.RData")
glimpse(stocks.final)
## Variables: 10
## $ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL...
## $ date <S3: yearmon> Jan 2000, Feb 2000, Mrz 2000, Apr 2000, Mai 2...
## $ return <dbl> -0.07314358, 0.10481916, 0.18484202, -0.08651613, -0....
## $ adjusted <dbl> 3.283894, 3.628109, 4.298736, 3.926826, 2.658767, 3.3...
## $ volume <dbl> 175420000, 92240400, 101158400, 62395200, 108376800, ...
## $ sp500 <dbl> -0.041753145, -0.020108083, 0.096719828, -0.030795756...
## $ Mkt.RF <dbl> -0.0474, 0.0245, 0.0520, -0.0640, -0.0442, 0.0464, -0...
## $ SMB <dbl> 0.0505, 0.2214, -0.1728, -0.0771, -0.0501, 0.1403, -0...
## $ HML <dbl> -0.0045, -0.1057, 0.0794, 0.0856, 0.0243, -0.1010, 0....
## $ RF <dbl> 0.0041, 0.0043, 0.0047, 0.0046, 0.0050, 0.0040, 0.004...
stocks.final %>% slice(1:2)
## # A tibble: 2 x 10
## 1 AAPL Jan ~ -0.0731 3.28 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
## 2 AAPL Feb ~ 0.105 3.63 9.22e7 -0.0201 0.0245 0.221 -0.106
Exploring Data lvii
0.3.1 Plotting Data
In this chapter we show how to create various graphs of financial

timeseries and their properties, which should help us to get a better
understanding of their properties, before we go on to calculate and
test their statistics.
0.3.1.1 Time-series plots
0.3.1.2 Box-plots
0.3.1.3 Histogram and Density Plots
0.3.1.4 Quantile Plots
0.3.2 Analyzing Data
0.3.2.1 Calculating Statistics
0.3.2.2 Testing Data
0.3.2.3 Exposure to Factors
The stocks in our example all have a certain exposure to risk fac-
tors (e.g. the Fama-French-factors we have added to our dataset).
Let us specify these exposures by regression each stocks return on
the factors Mkt.RF, SMB and HML:
stocks.factor_exposure <- stocks.final %>%

nest(-symbol) %>%
mutate(model = map(data, ~ lm(return ~ Mkt.RF + SMB + HML, data= .x)),
tidied = map(model, tidy)) %>%
unnest(tidied, .drop=TRUE) %>%
filter(term != "(Intercept)") %>%
select(symbol,term,estimate) %>%
spread(term,estimate) %>%
select(symbol,Mkt.RF,SMB,HML)
lviii List of Figures
0.4 Managing Portfolios
In this chapter we show how to explore and analyze data using the
dataset created in Chapter @ref(#s_2Data):
At first we will learn how to full-sample optimize portfolios, then
(in the next chapters) we will do the same thing in a rolling anal-
ysis and also perform some backtesting. The major workhorse of
this chapter is the portfolioAnalytics-package developed by Pe-
terson and Carl (2018).
portfolioAnalytics comes with an excellent introduc-

tory vignette vignette("portfolio_vignette") and in-
cludes more documents, detailing on the use of ROI-solvers
vignette("ROI_vignette"), how to create custom mo-
ment functions vignette("custom_moments_objectives")
and how to introduce CVaR-budgets
vignette("risk_budget_optimization").
0.4.1 Introduction
SHORT INTRODUCTION TO PORTFOLIOMANAGEMENT

We start by first creating a portfolio object, before we…
0.4.1.1 The portfolio.spec() Object
The portfolio object is a so-called S3-object43 , which means, that it

has a certain class (portfolio) describing its properties, behavior
and relation to other objects. Usually such an objects comes with
a variety of methods. To create such an object, we reuse the stock
data set that we have created in Chapter @ref(#s_2Data):
43
http://adv-r.had.co.nz/S3.html
Managing Portfolios lix
load("stocks.RData")
glimpse(stocks.final)
## Variables: 10
## $ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL...
## $ date <S3: yearmon> Jan 2000, Feb 2000, Mrz 2000, Apr 2000, Mai 2...
## $ return <dbl> -0.07314358, 0.10481916, 0.18484202, -0.08651613, -0....
## $ adjusted <dbl> 3.283894, 3.628109, 4.298736, 3.926826, 2.658767, 3.3...
## $ volume <dbl> 175420000, 92240400, 101158400, 62395200, 108376800, ...
## $ sp500 <dbl> -0.041753145, -0.020108083, 0.096719828, -0.030795756...
## $ Mkt.RF <dbl> -0.0474, 0.0245, 0.0520, -0.0640, -0.0442, 0.0464, -0...
## $ SMB <dbl> 0.0505, 0.2214, -0.1728, -0.0771, -0.0501, 0.1403, -0...
## $ HML <dbl> -0.0045, -0.1057, 0.0794, 0.0856, 0.0243, -0.1010, 0....
## $ RF <dbl> 0.0041, 0.0043, 0.0047, 0.0046, 0.0050, 0.0040, 0.004...
stocks.final %>% slice(1:2)
## # A tibble: 2 x 10
## 1 AAPL Jan ~ -0.0731 3.28 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
## 2 AAPL Feb ~ 0.105 3.63 9.22e7 -0.0201 0.0245 0.221 -0.106
For the portfolioAnalytics-package we need our data in xts-

format (see @ref(#sss_112xts)) and therefore first spread the
dataset returns in columns of stocks and the convert to xts using
tk_xts() from the timetk-package.
lx List of Figures
returns <- stocks.final %>%

select(symbol,date,return) %>%
spread(symbol,return) %>%
tk_xts(silent = TRUE)
Now its time to initialize the portfolio.spec() object passing

along the names of our assets. Afterwards we print the object
(most S3 obejcts come with a printing methods that nicely displays
some nice information).
pspec <- portfolio.spec(assets = stocks.selection$symbol,

category_labels = stocks.selection$sector)
print(pspec)
## **************************************************
## PortfolioAnalytics Portfolio Specification
## **************************************************
##
## Call:
## portfolio.spec(assets = stocks.selection$symbol, category_labels = stocks.
##
## Number of assets: 10
## Asset Names
## [1] "AAPL" "MSFT" "AMZN" "CSCO" "NVDA" "ORCL" "AMGN" "ADBE" "QCOM" "GILD"
##
## Category Labels
## Information Technology : AAPL MSFT CSCO NVDA ORCL ADBE QCOM
## Consumer Discretionary : AMZN
## Health Care : AMGN GILD
str(pspec)
Managing Portfolios lxi
## List of 6
## $ assets : Named num [1:10] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0
## ..- attr(*, "names")= chr [1:10] "AAPL" "MSFT" "AMZN" "CSCO" ...
## $ category_labels:List of 3
## ..$ Information Technology: int [1:7] 1 2 4 5 6 8 9
## ..$ Consumer Discretionary: int 3
## ..$ Health Care : int [1:2] 7 10
## $ weight_seq : NULL
## $ constraints : list()
## $ objectives : list()
## $ call : language portfolio.spec(assets = stocks.selection$symb
## - attr(*, "class")= chr [1:2] "portfolio.spec" "portfolio"
Checking the structure of the object str() we find that it contains

several elements: assets which contains the asset names and initial
weights that are equally distributed unless otherwise specified (e.g.
portfolio.spec(assets=c(0.6,0.4))), category_labels to cate-
gorize assets by sector (or geography etc.), weight_seq (sequence
of weights for later use by random_portfolios), constraints that
we will set soon, objectives and the call that initialised the object.
Before we go and optimize any portfolio we will show how to set
contraints.
0.4.1.2 Constraints
Constraints define restrictions and boundary conditions on the

weights of a portfolio. Constraints are defined by add.constraint
specifying certain types and arguments for each type, as well as
whether the constraint should be enabled or not (enabled=TRUE
is the default).
0.4.1.2.1 Sum of Weights Constraint
Here we define how much of the available budget can/must be

invested by specifying the maximum/minimum sum of portfolio
lxii List of Figures
weights. Usually we want to invest our entire budget and there-

fore set type="full_investment" which sets the sum of weights
to 1. ALternatively we can set the type="weight_sum" to have
mimimum/maximum weight_sum equal to 1.
pspec <- add.constraint(portfolio=pspec,

type="full_investment")
# print(pspec)
# pspec <- add.constraint(portfolio=pspec,type="weight_sum", min_sum=1, max
Another common constraint is to have the portfolio dollar-neutral

type="dollar_neutral" (or equivalent formulations specified be-
low)
# pspec <- add.constraint(portfolio=pspec,

# type="dollar_neutral")
# print(pspec)
# pspec <- add.constraint(portfolio=pspec, type="active")
# pspec <- add.constraint(portfolio=pspec, type="weight_sum", min_sum=0, ma
0.4.1.2.2 Box Constraint
Box constraints specify upper and lower bounds on the asset

weights. If we pass min and max as scalars then the same max
and min weights are set per asset. If we pass vectors (that should
be of the same length as the number of assets) we can specify
position limits on individual stocks

type="box",
min=0,
Managing Portfolios lxiii
max=0.4)
# print(pspec)
# add.constraint(portfolio=pspec,
# type="box",
# min=c(0.05, 0, rep(0.05,8)),
# max=c(0.4, 0.3, rep(0.4,8)))
Another special type of box constraints are long-only constraints,

where we only allow positive weights per asset. These are set
automatically, if no min and max are set or when we use
type="long_only"
# pspec <- add.constraint(portfolio=pspec, type="box")

# pspec <- add.constraint(portfolio=pspec, type="long_only")
0.4.1.2.3 Group Constraints
Group constraints allow the user to specify constraints per groups,

such as industries, sectors or geography.44 These groups can be
randomly defined, below we will set group constraints for the
sectors as specified above. The input arguments are the follow-
ing: groupslist of vectors specifying the groups of the assets,
group_labels character vector to label the groups (e.g. size, asset
class, style, etc.), group_min and group_max specifying minimum
and maximum weight per group, group_pos to specifying the num-
ber of non-zero weights per group (optional).

type="group",
44
Note, that only the ROI, DEoptim and random portfolio solvers support
group constraints. See also @(#sss_4solvers).
lxiv List of Figures
groups=list(pspec$category_labels$`Information Techno
pspec$category_labels$`Consumer Discretionar
pspec$category_labels$`Health Care`),
group_min=c(0.1, 0.15,0.1),
group_max=c(0.85, 0.55,0.4),
group_labels=pspec$category_labels)
# print(pspec)
0.4.1.2.4 Position Limit Constraint
The position limit constraint allows the user to specify limits on

the number of assets with non-zero, long, or short positions. Its
arguments are: max_pos which defines the maximum number of
assets with non-zero weights and max_pos_long/ max_pos_short
that specify the maximum number of assets with long (i.e. buy)
and short (i.e. sell) positions.45
pspec <- add.constraint(portfolio=pspec, type="position_limit", max_pos=3)

# pspec <- add.constraint(portfolio=pspec, type="position_limit", max_pos_l
# print(pspec)
0.4.1.2.5 Diversification Constraint
The diversification constraint enables to set a minimum diversifi-

cation limit by penalizing the optimizer if the deviation is larger
45
Note that not all solvers suüpport the different options. All of them
are supported by the DEoptim and random portfolio solvres, while no ROI
solver supports this type of constraint. The ROI solvers do not support the
long/short position limit constraintsm, and (only) quadprog allows for the
max_pos argument.
Managing Portfolios lxv
∑ 46
than 5%. Diversification is defined as N 2
i=1 wi for N assets. Its
only argument is the diversification taregt div_target.
pspec <- add.constraint(portfolio=pspec, type="diversification", div_target=

# print(pspec)
0.4.1.2.6 Turnover Constraint
The turnover constraint allows to specify a maximum turnover

from a set of initial weights that can either be given or are the
weights initially specified for the portfolio object. It is also imple-
mented as an optimization penalty if the turnover deviates more
than 5% from the turnover_target.47
pspec <- add.constraint(portfolio=pspec, type="turnover", turnover_target=0.

# print(pspec)
0.4.1.2.7 Target Return Constraint
The target return constraint allows the user to target an average

return specified by return_target.
pspec <- add.constraint(portfolio=pspec, type="return", return_target=0.007)

# print(pspec)
46
Note that diversification constraint is only supported for the global nu-
meric solvers (not the ROI solvers).
47
Note, that the turnover constraint is not currently supported using the
ROI solver for quadratic utility and minimum variance problems.
lxvi List of Figures
0.4.1.2.8 Factor Exposure Constraint
The factor exposure constraint allows the user to set upper and
lower bounds on exposures to risk factors. We will use the factor
exposures that we have calculated in @(#sss_3FactorExposure).
The major input is a vector or matrix B and upper/lower bounds
for the portfolio factor exposure. If B is a vector (with length equal
to the number of assets), lower and upper bounds must be scalars.
If B is a matrix, the number of rows must be equal to the number of
assets and the number of columns represent the number of factors.
In this case, the length of lower and upper bounds must be equal
to the number of factors. B should have column names specifying
the factors and row names specifying the assets.
B <- stocks.factor_exposure %>% as.data.frame() %>% column_to_rownames("symb

pspec <- add.constraint(portfolio=pspec, type="factor_exposure",
B=B,
lower=c(0.8,0,-1),
upper=c(1.2,0.8,0))
# print(pspec)
0.4.1.2.9 Transaction Cost Constraint
The transaction cost constraint enables the user to specify (porpor-

tional) transaction costs.48 Here we will assume the proportional
transation cost ptc to be equal to 1%.
pspec <- add.constraint(portfolio=pspec, type="transaction_cost", ptc=0.01)

# print(pspec)
48
For usage of the ROI (quadprog) solvers, transaction costs are currently
only supported for global minimum variance and quadratic utility problems.
Managing Portfolios lxvii
0.4.1.2.10 Leverage Exposure Constraint
The leverage exposure constraint specifies a maximum level of

leverage. Below we set leverage to 1.3 to create a 130/30 portfo-
lio.
pspec <- add.constraint(portfolio=pspec, type="leverage_exposure", leverage=

# print(pspec)
0.4.1.2.11 Checking and en-/disabling constraints
Every constraint that is added to the portfolio object gets a num-

ber according to the order it was set. If one wants to update
(enable/disable) a specific constraints this can be done by the
indexnum argument.
summary(pspec)
# To get an overview on the specs, their indexnum and whether they are enab
consts <- plyr::ldply(pspec$constraints, function(x){c(x$type,x$enabled)})
consts
pspec$constraints[[which(consts$V1=="box")]]
pspec <- add.constraint(pspec, type="box",
min=0, max=0.5,
indexnum=which(consts$V1=="box"))
pspec$constraints[[which(consts$V1=="box")]]
# to disable constraints
pspec$constraints[[which(consts$V1=="position_limit")]]
pspec <- add.constraint(pspec, type="position_limit", enable=FALSE, # only s
indexnum=which(consts$V1=="position_limit"))
pspec$constraints[[which(consts$V1=="position_limit")]]
lxviii List of Figures
0.4.1.3 Objectives
For an optimal portfolio there first has to be specified what

optimal in terms of the relevant (business) objective. Such ob-
jectives (target functions) can be added to the portfolio object
with add.objective. With this function, the user can specify the
type of objective to add to the portfolio object. Currently available
are ‘return’, ‘risk’, ‘risk budget’, ‘quadratic utility’, ‘weight con-
centration’, ‘turnover’ and ‘minmax’. Each type of objective has
additional arguments that need to be specified. Several types of
objectives can be added and enabled or disabled by specifying the
indexnum argument.
0.4.1.3.1 Portfolio Risk Objective
Here, the user can specify a risk function that should be mini-
mized. We start by adding a risk objective to minimize portfolio
variance (minimum variance portfolio). Another example could be
the expected tail loss with a confidence level 0.95. Whatever func-
tion (even user defined ones are possble, the name must correspond
to a function in R), necessary additional arguments to the function
have to be passed as a named list to arguments. Possible functions
are:
pspec <- add.objective(portfolio=pspec,

type='risk',
name='var')
type='risk',
name='ETL',
arguments=list(p=0.95),
enabled=FALSE)
# print(pspec)
Managing Portfolios lxix
0.4.1.3.2 Portfolio Return Objective
The return objective allows the user to specify a return function to

maximize. Here we add a return objective to maximize the port-
folio mean return.

type='return',
name='mean')
# print(pspec)
0.4.1.3.3 Portfolio Risk Budget Objective
The portfolio risk objective allows the user to specify constraints to

minimize component contribution (i.e. equal risk contribution) or
specify upper and lower bounds on percentage risk contribution.
Here we specify that no asset can contribute more than 30% to
total portfolio risk.
See the risk budget optimization vignette for more detailed exam-
ples of portfolio optimizationswith risk budgets.

type="risk_budget",
name="var",
max_prisk=0.3)
type="risk_budget",
name="ETL",
max_prisk=0.3,
enabled=FALSE)
lxx List of Figures
# for an equal risk contribution portfolio, set min_concentration=TRUE

type="risk_budget",
name="ETL",
min_concentration=TRUE,
enabled=FALSE)
print(pspec)
## **************************************************
## PortfolioAnalytics Portfolio Specification
## **************************************************
##
## Call:
## portfolio.spec(assets = stocks.selection$symbol, category_labels = stocks.
##
## Number of assets: 10
## Asset Names
## [1] "AAPL" "MSFT" "AMZN" "CSCO" "NVDA" "ORCL" "AMGN" "ADBE" "QCOM" "GILD"
##
## Category Labels
## Information Technology : AAPL MSFT CSCO NVDA ORCL ADBE QCOM
## Consumer Discretionary : AMZN
## Health Care : AMGN GILD
##
##
## Constraints
## Enabled constraint types
## - full_investment
## - box
## - group
## - position_limit
## - diversification
## - turnover
## - return
## - factor_exposure
Managing Portfolios lxxi
## - transaction_cost
## - leverage_exposure
##
## Objectives:
## Enabled objective names
## - var
## - mean
## - var
## Disabled objective names
## - ETL
0.4.1.4 Solvers
Solvers are the workhorse of our portfolio optimization frame-

work, and there are a variety of them available to us through
the portfolioAnalytics-package. I will briefly introduce the
available solvers. Note that these solvers can be specified
through optimize_method in the optimize.portfolio and
optimize.portfolio.rebalancing method.
0.4.1.4.1 DEOptim
This solver comes from the R package DEoptim and is a dif-

ferential evolution algorithm (a global stochastic optimization
algorithm) developed by Ardia et al. (2016). The help on
?DEoptim gives many more references. There is also a nice
vignette("DEoptimPortfolioOptimization") on large scale
portfolio optimization using the portfolioAnalytics-package.
0.4.1.4.2 Random Portfolios
There are three methods to generate random portfolios contained

in portfolioAnalytics:
lxxii List of Figures
1. The most flexible but also slowest method is ‘sample’.

It can take leverage, box, group, and position limit con-
straints into account.
2. The ‘simplex’ method is useful to generate random port-
folios with the full investment and min box constraints
(values for min_sum/ max_sum are ignored). Other con-
straints (box max, group and position limit constraints
will be handled by elimination) which might leave only
very few feasible portfolios. Sometimes it will lalso lead
to suboptimal solutions.
3. Using grid search, the ‘grid’ method only satisfies the min
and max box constraints.
0.4.1.4.3 pso
The psoptim function from the R package pso (Bendtsen., 2012)

and uses particle swarm optimization.
0.4.1.4.4 GenSA
The GenSA function from the R package GenSA (Gubian et al.,

2018) and is based on generalized simmulated annealing (a generic
probabilistic heuristic optimization algorithm)
0.4.1.4.5 ROI
The ROI (R Optimization Infrastructure) is a framework to han-

dle optimization problems in R. It serves as an interface to the
Rglpk package and the quadprog package which solve linear and
quadratic programming problems. Available methods in the con-
text of the portfolioAnalytics-package are given below (see sec-
tion @(#sss_4Objectives) for available objectives.
1. Maxmimize portfolio return subject leverage, box, group,

Managing Portfolios lxxiii
position limit, target mean return, and/or factor exposure

constraints on weights.
2. Globally minimize portfolio variance subject to leverage,
box, group, turnover, and/or factor exposure constraints.
3. Minimize portfolio variance subject to leverage, box,
group, and/or factor exposure constraints given a desired
portfolio return.
4. Maximize quadratic utility subject to leverage, box,
group, target mean return, turnover, and/or factor ex-
posure constraints and risk aversion parameter. (The risk
aversion parameter is passed into optimize.portfolio
as an added argument to the portfolio object).
5. Minimize ETL subject to leverage, box, group, position
limit, target mean return, and/or factor exposure con-
straints and target portfolio return.
0.4.2 Mean-variance Portfolios
0.4.2.1 Introduction and Theoretics
0.4.2.1.1 The minimum risk mean-variance portfolio
0.4.2.1.2 Feasible Set and Efficient Frontier
0.4.2.1.3 Minimum variance portfolio
0.4.2.1.4 Capital market line and tangency portfolio
0.4.2.1.5 Box and Group Constrained mean-variance portfolios

lxxiv List of Figures
0.4.2.1.6 Maximum return mean-variance portfolio
0.4.2.1.7 Covariance risk budget constraints
0.4.3 Mean-CVaR Portfolios
0.5 Managing Portfolios in the Real World
0.5.1 Rolling Portfolios
0.5.2 Backtesting
0.6 Further applications in Finance
0.6.1 Portfolio Sorts
0.6.2 Fama-MacBeth-Regressions
0.6.3 Risk Indices
0.7 References
# Appendix{#s_99Appendix}
.0.1 Introduction to R
For everyone that is more interested in all the topics I strongly

recommend this eBook: R for Data Science49
49
http://r4ds.had.co.nz/
References lxxv
.0.1.1 Getting started
Once you have started R, there are several ways to find help. First
of all, (almost) every command is equipped with a help page that
can be accessed via ?... (if the package is loaded). If the command
is part of a package that is not loaded or you have no clue about the
command itself, you can search the entire help (full-text) by using
??.... Be aware, that certain very-high level commands need to
be put in quotation marks ?'function'. Many of the packages
you find are either equipped with a demo() (get a list of all
available demos using demo(package=.packages(all.available
= TRUE))) and/or a vignette(), a document explaining
the purpose of the package and demonstrating its work
using suitable examples (find all available vignettes with
vignette(package=.packages(all.available = TRUE))). If
you want to learn how to do a certain task (e.g. conducting an
event study vignette("eventstudies")50 ).
Executing code in Rstudio is simple. Either you highlight the exact
portion of the code that you want to execute and hit ctrl+enter,
or you place the cursor just somewhere to execute this particular
line of code with the same command.51
.0.1.2 Working directory
Before we start to learn how to program, we have to set a work-

ing directory. First, create a folder “researchmethods” (prefer-
ably never use directory names containing special characters or
empty spaces) somewhere on citrix/your laptop, this will be
your working directory where R looks for code, files to load
and saves everything that is not designated by a full path (e.g.
“D:/R/LAB/SS2018/…”). Note: In contrast to windows paths you
50
If this command shows an error message you need to install the package
first, see further down for how to do that.
51
Under certain circumstances - either using pipes or within loops - RStudio
will execute the en tire loop/pipe structure. In this case you have to highlight
the particular line that you want to execute.
lxxvi List of Figures
have to use either “/” instead of “” or use two”\“. Now set the
working directory using setwd() and check with getwd()
setwd("D:/R/researchmethods")
getwd()
.0.1.3 Basic calculations
3+5; 3-5; 3*5; 3/5

# More complex including brackets
(5+3-1)/(5*10)
# is different to
5+3-1/5*10
# power of a variable
4*4*4
4^300
# root of a variable
sqrt(16)
16^(1/2)
16^0.5
# exponential and logarithms
exp(3)
log(exp(3))
exp(1)
# Log to the basis of 2
log2(8)
2^log2(8)
# raise the number of digits shown
options(digits=6)
exp(1)
# Rounding
20/3
round(20/3,2)
References lxxvii
floor(20/3)
ceiling(20/3)
.0.1.4 Mapping variables
Defining variables (objects) in R is done via the arrow operator <-

that works in both directions ->. Sometimes you will see someone
use the equal sign = but for several (more complex) reasons, this
is not advisable.
n <- 10
n
n <- 11
n
12 -> n
n
n <- n^2
n
In the last case, we overwrite a variable recursively. You might

want to do that for several reasons, but I advise you to rarely
do that. The reason is that - depending on how often you have
executed this part of the code already - n will have a different value.
In addition, if you are checking the output of some calculation, it
is not nice if one of the input variables always has a different value.
In a next step, we will check variables. This is a very important
part of programming.
# check if m==10
m <- 11
m==10 # is equal to
m==11
lxxviii List of Figures
m!=11 # is not equal to

m>10 # is larger than
m<10 # is smaller than
m<=11 # is smaller or equal than
m>=12 # is larger or equal than
If one wants to find out which variables are already set use ls().
Delete (Remove) variables using rm() (you sometimes might want
to do that to save memory - in this case always follow the rm()
command with gc()).
ls() # list variables

rm(m) # remove m
ls() # list variables again (m is missing)
Of course, often we do not only want to store numbers but also

characters. In this case enclose the value by quotation marks: name
<- "test". If you want to check whether a variable has a certain
format use available commands starting with is.. If you want to
change the format of a variable use as.
name <- "test"

is.numeric(n)
is.numeric(name)
is.character(n)
is.character(name)
If you do want to find out the format of a variable you can use
class(). Slightly different information will be given by mode()
and typeof()
References lxxix
class(n)
class(name)
mode(n)
mode(name)
typeof(n)
typeof(name)
# Lets change formats:
n1 <- n
is.character(n1)
n1 <- as.character(n)
is.character(n1)
as.numeric(name) # New thing: NA
Before we learn about NA, we have to define logical variables that

are very important when programming (e.g., as options in a func-
tion). Logical (boolean) variables will either assume TRUE or FALSE.
# last but not least we need boolean (logical) variables

n2 <- TRUE
is.numeric(n2)
class(n2)
is.logical(n2)
as.logical(2) # all values except 0 will be converted to TRUE
as.logical(0)
Now we can check whether a condition holds true. In this case, we

check if m is equal to 10. The output (as you have seen before) is
of type logical.
is.logical(n==10)
n3 <- n==10 # we can assign the logical output to a new variable
is.logical(n3)
lxxx List of Figures
Assignment: Create numeric variable x, set x equal to 5/3. What

happens if you divide by 0? By Inf? Set y<-NA. What could this
mean? Check if the variable is “na”. Is Inf numeric? Is NA numeric?
.0.1.5 Sequences, vectors and matrices
In this chapter, we are going to learn about higher-dimensional

objects (storing more information than just one number).
.0.1.5.1 Sequences
We define sequences of elements (numbers/characters/logicals) via

the concatenation operator c() and assign them to a variable. If
one of the elements of a sequence is of type character, the whole
sequence will be converted to character, else it will be of type
numeric (for other possibilities check the help ?vector). At the
same type it will be of the type vector.
x <- c(1,3,5,6,7)
class(x)
is.vector(x)
is.numeric(x)
To create ordered sequences make use of the command

seq(from,to,by). Please note that often programmers are lazy
and just write seq(1,10,2) instead of seq(from=1,to=10,by=2).
However it makes code much harder to understand, can produce
unintended results, and if a function is changed (which happens as
R is always under construction) yield something very different to
what was intended. Therefore I strongly encourage you to always
specify the arguments of a function by name. To do this I advise
you to make use of the tab a lot. Tab helps you to complete com-
mands, produces a list of different commands starting with the
same letters (if you do not completely remember the spelling for
example), helps you to find out about the arguments and even gives
References lxxxi
information about the intended/possible values of the arguments.

A nice way and shortcut for creating ordered/regular sequences
with distance (by=) one is given by the : operator: 1:10 is equal
to seq(from=1,to=10,by=1).
x1 <- seq(from=1,to=5,by=1)
x2 <- 1:5
One can operate with sequences in the same way as with numbers.
Be aware of the order of the commands and use brackets where
necessary!
1:10-1
1:(10-1)
1:10^2-2 *3
Assignment: 1. Create a series from -1 to 5 with distances 0.5?

Can you find another way to do it using the : operator and stan-
dard mathematical operations? 2. Create the same series, but this
time using the “length”-option 3. Create 20 ones in a row (hint:
find a function to do just that)
Of course, all logical operations are possible for vectors, too. In
this case, the output is a vector of logicals having the same size as
the input vector. You can check if a condition is true for any() or
all() parts of the vector.
.0.1.5.2 Random Sequences
One of the most important tasks of any programming language

that is used for data analysis and research is the ability to gener-
ate random numbers. In R all the random number commands
start with an r..., e.g. random normal numbers rnorm(). To find
out more about the command use the help ?rnorm. All of these
lxxxii List of Figures
commands are a part of the stats package, where you find avail-
able commands using the package help: library(help=stats).
Notice that whenever you generate random numbers, they are dif-
ferent. If you prefer to work with the same set of random numbers
(e.g. for testing purposes) you can fix the starting value of the
random number generator by setting the seed to a chosen num-
ber set.seed(123). Notice that you have to execute set.seed()
every time before (re)using the random number generator.
rand1 <- rnorm(n = 100) # 100 standard normally distributed random numbers
set.seed(134) # fix the starting value of the random number generator (then
rand1a <- rnorm(n = 100)
Assignment: 1. Create a random sequence of 20 N(0,2)-

distributed variables and assign it to the variable rand2. 2. Create
a random sequence of 200 Uniform(-1,1) distributed variables and
save to rand3. 3. What other distributions can you find in the stats
package? 4. Use the functions mean and sd. Manipulate the ran-
dom variables to have a different mean and standard deviation.
Do you remember the normalization process (z-score)?
As in the last assignment you can use all the functions you learned
about in statistics to calculate the mean(), the standard deviation
sd(), skewness() and kurtosis() (the latter two after loading
and installing the moments package). To install/load a package we
use install.packages() (only once) and then load the package
with require().
#install.packages("moments") # only once, no need to reinstall every time

require(moments)
mean(rand1a)
sd(rand1a)
skewness(rand1a)
kurtosis(rand1a)
summary(rand1a)
References lxxxiii
.0.1.6 Vectors and matrices
We have created (random) sequences above and can determine

their properties, such as their length(). We also know how to
manipulate sequences through mathematical operations, such as
+-*/^. If you want to calculate a vector product, R provides the
%*% operator. In many cases (such as %*%) vectors behave like ma-
trices, automating whether they should be row or column-vectors.
However, to make this more explicit transform your vector into a
matrix using as.matrix. Now, it has a dimension and the prop-
erty matrix. You can transpose the matrix using t(), calculate its
inverse using solve() and manipulate in any other way imagin-
able. To create matrices use matrix() and be careful about the
available options!
x <- c(2,4,5,8,10,12)
length(x)
dim(x)
x^2/2-1
x %*% x # R automatically multiplies row and column vector
is.vector(x)
y <- as.matrix(x)
is.matrix(y); is.matrix(x)
dim(y); dim(x)
t(y) %*% y
y %*% t(y)
mat <- matrix(data = x,nrow = 2,ncol = 3, byrow = TRUE)
dim(mat); ncol(mat); nrow(mat)
mat2 <- matrix(c(1,2,3,4),2,2) # set a new (quadratic) matrix
mat2i <- solve(mat2)
mat2 %*% mat2i
mat2i %*% mat2
Assignment: 1. Create this matrix matrix(c(1,2,2,4),2,2)

and try to calculate its inverse. What is the problem? Remem-
ber the determinant? Calculate using det(). What do you learn?
lxxxiv List of Figures
2. Create a 4x3 matrix of ones and/or zeros. Try to matrix-

multiply with any of the vectors/matrices used before. 3. Try to
add/subtract/multiply matrices, vectors and scalars.
A variety of special matrices is available, such as diagonal ma-
trices using diag(). You can glue matrices together columnwise
(cbind()) or rowwise (rbind()).
diag(3)
diag(c(1,2,3,4))
mat4 <- matrix(0,3,3)
mat5 <- matrix(1,3,3)
cbind(mat4,mat5)
rbind(mat4,mat5)
.0.1.6.1 The indexing system
We can access the row/column elements of any object with at least

one dimension using [].
########################################################
### 8) The INDEXING System
# We can access the single values of a vector/matrix
x[2] # one-dim
mat[,2] # two-dim column
mat[2,] # two-dim row
i <- c(1,3)
mat[i]
mat[1,2:3] # two-dim select second and third column, first row
mat[-1,] # two-dim suppress first row
mat[,-2] # two-dim suppress second column
Now we can use logical vectors/matrices to subset vec-

tors/matrices. This is very useful for data mining.
References lxxxv
mat>=5 # which elements are large or equal to 5?

mat[mat>=5] # What are these elements?
which(mat>=5, arr.ind = TRUE) # another way with more explicit information
We can do something even more useful and name the rows and
columns of a matrix usingcolnames() and rownames().
colnames(mat) <- c("a","b","c")

rownames(mat) <- c("A","B")
mat["A",c("b","c")]
.0.1.7 Functions in R
.0.1.7.1 Useful Functions
Of course, there are thousands of functions available in R, espe-

cially through the use of packages. In the following you find a demo
of the most useful ones.
x <- c(1,2,4,-1,2,8) # example vector 1

x1 <- c(1,2,4,-1,2,8,NA,Inf) # example vector 2 (more complex)
sqrt(x) # square root of x
x^3 # x to the power of ...
sum(x) # sum of the elements of x
prod(x) # product of the elements of x
max(x) # maximum of the elements of x
min(x) # minimum of the elements of x
which.max(x) # returns the index of the greatest element of x
which.min(x) # returns the index of the smallest element of x
# statistical function - use rand1 and rand2 created before
range # returns the minimum and maximum of the elements of x
mean # mean of the elements of x
lxxxvi List of Figures
median # median of the elements of x

var # variance of the elements of x
sd # standard deviation of the elements of x
cor # correlation matrix of x
cov # covariance between x and y
cor # linear correlation between x and y
# more complex functions
round(x, n) # rounds the elements of x to n decimals
rev(x) # reverses the elements of x
sort(x) # sorts the elements of x in increasing order
rank(x) # ranks of the elements of x
log(x) # computes natural logarithms of x
cumsum(x) # a vector which ith element is the sum from x[1] to x[i]
cumprod(x) # id. for the product
cummin(x) # id. for the minimum
cummax(x) # id. for the maximum
unique(x) # duplicate elements are suppressed
.0.1.7.2 More complex objects in R
Next to numbers, sequences/vectors and matrices R offers a va-

riety of different and more complex objects that can stow more
complex information than just numbers and characters (e.g. func-
tions, output text. etc). The most important ones are data.frames
(extended matrices) and lists. Check the examples below to see
how to create these objects and how to access specific elements.
df <- data.frame(col1=c(2,3,4), col2=sin(c(2,3,4)), col3=c("a","b", "c"))

li <- list(x=c(2,3,4), y=sin(c(2,3,4)), z=c("a","b", "c","d","e"), fun=mean)
# to grab elements from a list or dataframe use $ or [[]]
df$col3; li$x # get variables
df[,"col3"]; li[["x"]] # get specific elements that can also be numbered
df[,3]; li[[1]]
References lxxxvii
Assignment: 1. Get the second entry of element y of list x
.0.1.7.3 Create simple functions in R
To create our own functions in R we need to give them a name,

determine necessary input variables and whether these variables
should be pre-specified or not. I use a couple of examples to show
how to do this below.
?"function" # "function" is such a high-level object that it is interpreted
# 1. Let's create a function that squares an entry x and name it square

square <- function(x){x^2}
square(5)
square(c(1,2,3))
# 2. Let us define a function that returns a list of several different resu

stats <- function(v){
v.m <- mean(v) # create a variable that is only valid in the function
v.sd <- sd(v)
v.var <- var(v)
v.output <- list(Mean=v.m, StandardDeviation=v.sd, Variance=v.var)
return(v.output)
}
v <- rnorm(1000,mean=1,sd=5)
stats(v)
stats(v)$Mean
# 3. A function can have standard arguments.

### This time we also create a random vector within the function and use its
stats2 <- function(n,m=0,s=1){
v <- rnorm(n,mean=m,sd=s)
v.m <- mean(v) # create a variable that is only valid in the function
v.sd <- sd(v)
v.var <- var(v)
lxxxviii List of Figures
v.output <- list(Mean=v.m, StandardDeviation=v.sd, Variance=v.var)

return(v.output)
}
stats2(1000000)
stats2(1000,m=1)
stats2(1000,m=1,s=10)
stats2(m=1) # what happens if an obligatory argument is left out?
Assignment: 1. Create a function that creates two random sam-

ples with length n and m from the normal and the uniform distri-
bution resp., given the mean and sd for the first and min and max
for the second distribution. The function shall then calculate the
covariance-matrix and the correlation-matrix which it gives back
in a named list.
.0.1.8 Plotting
Plotting in R can be done very easily. Check the examples below

to get a reference and idea about the plotting capabilities in R.
A very good source for color names (that work in R) is (http:
//en.wikipedia.org/wiki/Web_colors).
?plot
?colors # very good source for colors:
y1 <- rnorm(50,0,1)
plot(y1)
# set title, and x- and y-labels
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample")
# now make a line between elements, and color everything blue
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l')
# if you want to save plots or open them in separate windows you can use x1
?Devices
# x11 (opens seperate window)
x11(8,6)
References lxxxix
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l')

# pdf
pdf("plot1.pdf",6,6)
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l',l
legend("topleft",col=c("blue"), lty=2, legend=c("normal Sample"))
dev.off()
# more extensive example
X11(6,6)
par(mfrow=c(2,1),cex=0.9,mar=c(3,3,1,3)+0.1)
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l',l
legend("topleft",col=c("blue"), lty=2, legend=c("normal Sample"))
barplot(y1,col="blue") # making a barplot
# plotting a histogram
hist(y1) # there is a nicer version available once we get to time series an
# create a second sample
y2 <- rnorm(50)
# scatterplot
plot(y1,y2)
# boxplot
boxplot(y1,y2)
.0.1.9 Control Structures
Last and least for this lecture we learn about control structure.
These structures (for-loops, if/else checks etc) are very useful, if
you want to translate a tedious manual task (e.g. in Excel) into
something R should do for you and go step by step (e.g. column by
column). Again, see below for a variety of examples and commands
used in easy examples.
x <- sample(-15:15,10) # sample does draw randomly draw 10 numbers from the
# 1. We square every element of vector x in a loop
y <- NULL # 1.a) set an empty variable (NULL means it is truly nothing and
is.null(y)
xc List of Figures
# 1.b) Use an easy for-loop:

for (i in 1:length(x)){
y[i] <- x[i]^2
}
# 2. Now we use an if-condition to only replace negative values
y <- NULL
for (i in 1:length(x)){
y[i] <- x[i]
if(x[i]<0) {y[i] <- x[i]^2}
}
# ASSIGNMENT: lets calculate the 100th square root of the square root of th
y <- rep(NA,101)
y[1] <- 500
for (i in 1:100){
print(i)
y[i+1] <- sqrt(y[i])
}
plot(y,type="l")
Bibliography
Ardia, D., Mullen, K., Peterson, B., and Ulrich, J. (2016). DEop-
tim: Global Optimization by Differential Evolution. R package
version 2.2-4.
Bacon, C. R. (2008). Practical Portfolio Performance Measure-

ment and Attribution: plus CD-ROM. Wiley, Chichester, Eng-
land ; Hoboken, NJ, 2. edition.
Bendtsen., C. (2012). pso: Particle Swarm Optimization. R pack-

age version 1.0.3.
Gubian, S., Xiang, Y., Suomela, B., Hoeng, J., and SA., P. (2018).
GenSA: Generalized Simulated Annealing. R package version
1.1.7.
Peterson, B. G. and Carl, P. (2018). PortfolioAnalytics: Portfo-

lio Analysis, Including Numerical Methods for Optimization of
Portfolios. R package version 1.1.0.
Würtz, D., Chalabi, Y., Chen, W., and Ellis, A. (2015). Portfolio
Optimization with R/Rmetrics. Rmetrics.
xci
Index
constraint objective, lxvi–lxix

diversification, lxii return, lxvii
factor exposure, lxiv risk, lxvi
leverage exposure, lxv risk budget, lxvii
position limit, lxii
target return, lxiii PerformanceAnalytics, xxvii
transaction cost, lxiv quantmod, xxv
turnover, lxiii TTR, xxvi
constraints, lix–lxv
active, lx risk factors, lv
box, lx
dollar-neutral, lx solver, lxix–lxxi
full investment, lx GenSA, lxx
group, lix, lxi pso, lxx
long-only, lxi random portfolios, lxix
sum of weights, lix grid, lxx
sample, lxx
date and time, x simplex, lxx
as.Date(), x Rglpk, lxix
business days, xvii ROI, lxx
holidays, xvii quadprog, lxx
POSIXct, xii Rglpk, lxx
Sys.setlocale(), xi
timeDate, xiii tidyverse, xxviii–xxxiii
yearmon, xi, xxii ggplot2, xxxi
yearqtr, xi, xxii timeDate, xiii
business days, xvii
factor exposure, lv, lxiv FinCenter, xiv
holidays, xvii
ggplot2, xxxi origin, xiv
xciii
xciv Bibliography
timetk, lvii
TTR, xxvi
xts, xxi, lvii

import/export, xxv
join (in-
ner/outer/left/right/full),
xxiii
merge, xxiii
missing values, xxiv
replace, xxiii
subset, xxiii
vignettes, xxi
zoo
vignettes, xxi

Tidy Portfoliomanagement in R

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Tidy Portfoliomanagement in R

Enviado por

Direitos autorais:

Formatos disponíveis

Sebastian Stöckl

List of Tables vii

List of Tables vii

0.1.1 Introduction to Timeseries . . . . . . . . . xi

0.1.1.1 Date and Time . . . . . . . . . . xii

0.1.1.2 eXtensible Timeseries . . . . . . xxii

0.1.1.3 Downloading timeseries and basic

0.1.2.1 Tibbles . . . . . . . . . . . . . . xxx

0.1.2.2 Summary statistics . . . . . . . xxxiii

0.1.2.3 Plotting . . . . . . . . . . . . . . xxxiii

0.2 Managing Data . . . . . . . . . . . . . . . . . . . xxxvi

0.2.1 Getting Data . . . . . . . . . . . . . . . . xxxvi

0.2.1.1 Downloading from Online Data-

0.3 Exploring Data . . . . . . . . . . . . . . . . . . . lvi

0.3.1 Plotting Data . . . . . . . . . . . . . . . . lvii

0.3.1.1 Time-series plots . . . . . . . . . lvii

0.3.1.2 Box-plots . . . . . . . . . . . . . lvii

0.3.1.3 Histogram and Density Plots . . lvii

0.3.1.4 Quantile Plots . . . . . . . . . . lvii

0.3.2 Analyzing Data . . . . . . . . . . . . . . . lvii

0.3.2.1 Calculating Statistics . . . . . . lvii

0.3.2.2 Testing Data . . . . . . . . . . . lvii

0.3.2.3 Exposure to Factors . . . . . . . lvii

0.4 Managing Portfolios . . . . . . . . . . . . . . . . lviii

0.4.1 Introduction . . . . . . . . . . . . . . . . lviii

0.4.1.1 The portfolio.spec() Object . lviii

0.4.1.2 Constraints . . . . . . . . . . . . lxi

0.4.1.3 Objectives . . . . . . . . . . . . lxviii

0.4.1.4 Solvers . . . . . . . . . . . . . . lxxi

0.4.2 Mean-variance Portfolios . . . . . . . . . . lxxiii

0.4.2.1 Introduction and Theoretics . . . lxxiii

0.4.3 Mean-CVaR Portfolios . . . . . . . . . . . lxxiv

0.5 Managing Portfolios in the Real World . . . . . . lxxiv

0.5.1 Rolling Portfolios . . . . . . . . . . . . . . lxxiv

0.5.2 Backtesting . . . . . . . . . . . . . . . . . lxxiv

0.6 Further applications in Finance . . . . . . . . . . lxxiv

0.6.1 Portfolio Sorts . . . . . . . . . . . . . . . lxxiv

0.6.2 Fama-MacBeth-Regressions . . . . . . . . lxxiv

0.6.3 Risk Indices . . . . . . . . . . . . . . . . . lxxiv

0.7 References . . . . . . . . . . . . . . . . . . . . . lxxiv

.0.1 Introduction to R . . . . . . . . . . . . . . lxxiv

.0.1.1 Getting started . . . . . . . . . . lxxv

.0.1.2 Working directory . . . . . . . . lxxv

.0.1.3 Basic calculations . . . . . . . . lxxvi

.0.1.4 Mapping variables . . . . . . . . lxxvii

.0.1.5 Sequences, vectors and matrices lxxx

.0.1.6 Vectors and matrices . . . . . . lxxxiii

.0.1.7 Functions in R . . . . . . . . . . lxxxv

.0.1.8 Plotting . . . . . . . . . . . . . . lxxxviii

.0.1.9 Control Structures . . . . . . . . lxxxix

This book should accompany my lectures “Research Meth-

Why read this book

Because it may help my students :-)

Structure of the book

To start, install/load all necessary packages using the pacman-

• the excellent fPortfolio-Book

Sebastian Stöckl University of Liechtenstein Vaduz, Liechtenstein

0.1.1 Introduction to Timeseries

For an introduction to R see the Appendix @ref(ss_991IntrotoR)

The community is currently working heavily to develop time-

All information regarding tibbles and the financial universe is

In the following, we will define a variety of date and time classes,

0.1.1.1 Date and Time