Você está na página 1de 94

Sebastian Stöckl

Tidy Portfoliomanagement
in R
DEDICATION
Contents

List of Tables vii

List of Tables vii

List of Figures ix

List of Figures ix
0.1 Introduction . . . . . . . . . . . . . . . . . . . . xi

0.1.1 Introduction to Timeseries . . . . . . . . . xi

0.1.1.1 Date and Time . . . . . . . . . . xii

0.1.1.2 eXtensible Timeseries . . . . . . xxii

0.1.1.3 Downloading timeseries and basic


visualization with quantmod . . xxvii
0.1.2 Introduction to the tidyVerse . . . . . . xxx

0.1.2.1 Tibbles . . . . . . . . . . . . . . xxx

0.1.2.2 Summary statistics . . . . . . . xxxiii

0.1.2.3 Plotting . . . . . . . . . . . . . . xxxiii

0.2 Managing Data . . . . . . . . . . . . . . . . . . . xxxvi

0.2.1 Getting Data . . . . . . . . . . . . . . . . xxxvi

0.2.1.1 Downloading from Online Data-


sources . . . . . . . . . . . . . . xxxvi
0.2.1.2 Manipulate Data . . . . . . . . . l

0.3 Exploring Data . . . . . . . . . . . . . . . . . . . lvi

iii
iv Contents

0.3.1 Plotting Data . . . . . . . . . . . . . . . . lvii

0.3.1.1 Time-series plots . . . . . . . . . lvii

0.3.1.2 Box-plots . . . . . . . . . . . . . lvii

0.3.1.3 Histogram and Density Plots . . lvii

0.3.1.4 Quantile Plots . . . . . . . . . . lvii

0.3.2 Analyzing Data . . . . . . . . . . . . . . . lvii

0.3.2.1 Calculating Statistics . . . . . . lvii

0.3.2.2 Testing Data . . . . . . . . . . . lvii

0.3.2.3 Exposure to Factors . . . . . . . lvii

0.4 Managing Portfolios . . . . . . . . . . . . . . . . lviii

0.4.1 Introduction . . . . . . . . . . . . . . . . lviii

0.4.1.1 The portfolio.spec() Object . lviii

0.4.1.2 Constraints . . . . . . . . . . . . lxi

0.4.1.3 Objectives . . . . . . . . . . . . lxviii

0.4.1.4 Solvers . . . . . . . . . . . . . . lxxi

0.4.2 Mean-variance Portfolios . . . . . . . . . . lxxiii

0.4.2.1 Introduction and Theoretics . . . lxxiii

0.4.3 Mean-CVaR Portfolios . . . . . . . . . . . lxxiv

0.5 Managing Portfolios in the Real World . . . . . . lxxiv

0.5.1 Rolling Portfolios . . . . . . . . . . . . . . lxxiv

0.5.2 Backtesting . . . . . . . . . . . . . . . . . lxxiv

0.6 Further applications in Finance . . . . . . . . . . lxxiv

0.6.1 Portfolio Sorts . . . . . . . . . . . . . . . lxxiv

0.6.2 Fama-MacBeth-Regressions . . . . . . . . lxxiv

0.6.3 Risk Indices . . . . . . . . . . . . . . . . . lxxiv

0.7 References . . . . . . . . . . . . . . . . . . . . . lxxiv

.0.1 Introduction to R . . . . . . . . . . . . . . lxxiv


Contents v

.0.1.1 Getting started . . . . . . . . . . lxxv

.0.1.2 Working directory . . . . . . . . lxxv

.0.1.3 Basic calculations . . . . . . . . lxxvi

.0.1.4 Mapping variables . . . . . . . . lxxvii

.0.1.5 Sequences, vectors and matrices lxxx

.0.1.6 Vectors and matrices . . . . . . lxxxiii

.0.1.7 Functions in R . . . . . . . . . . lxxxv

.0.1.8 Plotting . . . . . . . . . . . . . . lxxxviii

.0.1.9 Control Structures . . . . . . . . lxxxix

Bibliography xci

Bibliography xci
List of Tables

vii
List of Figures

Preface

This book should accompany my lectures “Research Meth-


ods”, “Quantitative Analysis”, “Portoliomanagement and Finan-
cial Analysis” and (to a smaller degree) “Empirical Methods in
Finance”. In the past years I have been a heavy promoter of the
Rmetrics1 tools for my lectures and research. However, in the last
year the development of the project has stagnated due to the tragic
death of its founder Prof. Dr. Diethelm Würtz2 . It therefore hap-
pened several times that code from past semesters and lectures has
stopped working and no more support for the project was avail-
able.
Also, in the past year I have started to be a heavy user of the
tidyverse3 and the financial packages that have been developed
on top (e.g. tidyquant). Therefore I have taken the chance, to
put together some material from my lectures and start writing this
book. In structure it is kept similar to the excellent RMetrics book
Würtz et al. (2015) on Portfolio Optimization with R/Rmetrics4 ,
that I have been heavily using and recommending to my students
in the past years!
1
https://www.rmetrics.org/
2
https://www.rmetrics.org/about
3
https://www.tidyverse.org/
4
https://www.rmetrics.org/ebooks-portfolio

ix
x List of Figures

Why read this book

Because it may help my students :-)

Structure of the book

Not yet fixed. But the book will start with an introduction to the
most important tools for the portfolio analysis: timeseries and the
tidyverse. Afterwards, the possibilities of managing and explor-
ing financial data will be developed. Then we do portfolio opti-
mization for mean-Variance and Mean-CVaR portfolios. This will
be followed by a chapter on backtesting, before I show further ap-
plications in finance, such as predictions, portfolio sorting, Fama-
MacBeth-regressions etc.

Prerequisites

To start, install/load all necessary packages using the pacman-


package (the list will be expanded with the growth of the book).

pacman::p_load(tidyverse,tidyquant,PortfolioAnalytics,quantmod,PerformanceAna
tibbletime,timetk,ggthemes,timeDate,Quandl,alphavantager,readx
DEoptim,pso,GenSA,Rglpk,ROI,ROI.plugin.glpk,ROI.plugin.quadpro

Acknowledgments

I thank my family…
I especially thank the developers of:

• the excellent fPortfolio-Book


• the tidyquant package and its vignettes
• the PerformanceAnalytics developers and the package vi-
gnettes
• the portfolioAnalytics developers (currently working very
hard on the package) and its package vignettes
Introduction xi

Sebastian Stöckl University of Liechtenstein Vaduz, Liechtenstein

0.1 Introduction

0.1.1 Introduction to Timeseries

For an introduction to R see the Appendix @ref(ss_991IntrotoR)


Many of the datasets we will be working with have a (somehow reg-
ular) time dimension, and are therefore often called timeseries.
In R there are a variety of classes available to handle data, such
as vector, matrix, data.frame or their more modern implemen-
tation: tibble.[^According to the Vignette5 of the xts.] Adding a
time dimension creates a timeseries from these objects. The most
common/flexible package in R that handles timeseries based on
the first three formats is xts, which we will discuss in the fol-
lowing. Afterwards we will introduce the package timetk-package
that allows xts to interplay with tibbles to create the most pow-
erful framework to handle (even very large) time-based datasets
(as we often encounter in finance).

The community is currently working heavily to develop time-


aware tibbles to bring together the powerful grouping fea-
ture from the dplyr package (for tibbles) with the abbilities
of xts, which is the most powerful and most used timeseries
method in finance to date, due to the available interplay with
quantmod and other financial package. See also this link6 for
more information.

5
https://cran.r-project.org/web/packages/xts/vignettes/xts.pdf
xii List of Figures

All information regarding tibbles and the financial universe is


summarized and updated on the business-science.io-Website7 .

In the following, we will define a variety of date and time classes,


before we go and introduce xts, tibble and tibbletime. Most
of this packages come with some excellent vignettes, that I will
reference for further reading, while I will only pickup the necessary
features for portfolio management, which is the focus of this book.

0.1.1.1 Date and Time

There some basic functionalities in base-R, but most of the time


we will need additional functions to perform all necessary tasks.
Available date (time) classes are Date, POSIXct, (chron), yearmon,
yearqtr and timeDate (from the Rmetrics bundle).

0.1.1.1.1 Basic Date and Time Classes

There are several Date and Time Classes in R that can all be used
as time-index for xts. We start with the most basic as.Date()

d1 <- "2018-01-18"
str(d1) # str() checks the structure of the R-object

## chr "2018-01-18"

d2 <- as.Date(d1)
str(d2)

## Date[1:1], format: "2018-01-18"


Introduction xiii

In the second case, R automatically detects the format of the


Date-object, but if there is something more complex involved you
can specify the format (for all available format definitions, see
?strptime())

d3 <- "4/30-2018"
as.Date(d3, "%m/%d-%Y") # as.Date(d3) will not work

## [1] "2018-04-30"

If you are working with monthly or quarterly data, yearmon and


yearqtr will be your friends (both coming from the zoo-package
that serves as foundation for xts)

as.yearmon(d1); as.yearmon(as.Date(d3, "%m/%d-%Y"))

## [1] "Jan 2018"

## [1] "Apr 2018"

as.yearqtr(d1); as.yearqtr(as.Date(d3, "%m/%d-%Y"))

## [1] "2018 Q1"

## [1] "2018 Q2"

Note, that as.yearmon shows dates in terms of the current locale


of your computer (e.g. Austrian German). You can find out about
your locale with Sys.getlocale() and set a different locale with
Sys.setlocale()
xiv List of Figures

Sys.setlocale("LC_TIME","German_Austria")

## [1] "German_Austria.1252"

as.yearmon(d1); as.yearmon(as.Date(d3, "%m/%d-%Y"))

## [1] "Jän 2018"

## [1] "Apr 2018"

Sys.setlocale("LC_TIME","English")

## [1] "English_United States.1252"

as.yearmon(d1); as.yearmon(as.Date(d3, "%m/%d-%Y"))

## [1] "Jan 2018"

## [1] "Apr 2018"

When your data wants you to also include information on time,


then you will either need the POSIXct (which is the basic package
behind all times and dates in R) or the timeDate-package. The
latter one includes excellent abilities to work with financial data
(see the next section). Note that talking about time also requires
you to talk about timezones! We start with several examples of
the POSIXct-class:
Introduction xv

strptime("2018-01-15 13:55:23.975", "%Y-%m-%d %H:%M:%OS") # converts from ch

## [1] "2018-01-15 13:55:23 CET"

as.POSIXct("2009-01-05 14:19:12", format="%Y-%m-%d %H:%M:%S", tz="UTC")

## [1] "2009-01-05 14:19:12 UTC"

We will mainly use the timeDate-package that provides many use-


ful functions for financial timeseries.

An introduction to timeDate by the Rmetrics group can be


found at https://www.rmetrics.org/sites/default/files/2010-
02-timeDateObjects.pdf.

Dates <- c("1989-09-28","2001-01-15","2004-08-30","1990-02-09")


Times <- c( "23:12:55", "10:34:02", "08:30:00", "11:18:23")
DatesTimes <- paste(Dates, Times)
as.Date(DatesTimes)

## [1] "1989-09-28" "2001-01-15" "2004-08-30" "1990-02-09"

as.timeDate(DatesTimes)

## GMT
## [1] [1989-09-28 23:12:55] [2001-01-15 10:34:02] [2004-08-30 08:30:00]
## [4] [1990-02-09 11:18:23]
xvi List of Figures

You see, that the timeDate comes along with timezone information
(GMT) that is set to your computers locale. timeDate allows you
to specify the timezone of origin zone as well as the timezone to
transfer data to FinCenter:

timeDate(DatesTimes, zone = "Tokyo", FinCenter = "Zurich")

## Zurich
## [1] [1989-09-28 15:12:55] [2001-01-15 02:34:02] [2004-08-30 01:30:00]
## [4] [1990-02-09 03:18:23]

timeDate(DatesTimes, zone = "Tokyo", FinCenter = "NewYork")

## NewYork
## [1] [1989-09-28 10:12:55] [2001-01-14 20:34:02] [2004-08-29 19:30:00]
## [4] [1990-02-08 21:18:23]

timeDate(DatesTimes, zone = "NewYork", FinCenter = "Tokyo")

## Tokyo
## [1] [1989-09-29 12:12:55] [2001-01-16 00:34:02] [2004-08-30 21:30:00]
## [4] [1990-02-10 01:18:23]

listFinCenter("Europe/Vi*") # get a list of all financial centers available

## [1] "Europe/Vaduz" "Europe/Vatican" "Europe/Vienna"


## [4] "Europe/Vilnius" "Europe/Volgograd"
Introduction xvii

Date as well as the timeDate package allow you to create time


sequences (necessary if you want to manually create timeseries)

dates1 <- seq(as.Date("2017-01-01"), length=12, by="month"); dates1 # or to=

## [1] "2017-01-01" "2017-02-01" "2017-03-01" "2017-04-01" "2017-05-01"


## [6] "2017-06-01" "2017-07-01" "2017-08-01" "2017-09-01" "2017-10-01"
## [11] "2017-11-01" "2017-12-01"

dates2 <- timeSequence(from = "2017-01-01", to = "2017-12-31", by = "month");

## GMT
## [1] [2017-01-01] [2017-02-01] [2017-03-01] [2017-04-01] [2017-05-01]
## [6] [2017-06-01] [2017-07-01] [2017-08-01] [2017-09-01] [2017-10-01]
## [11] [2017-11-01] [2017-12-01]

Now there are several very useful functions in the timeDate pack-
age to determine first/last days of months/quarters/… (I let them
speak for themselves)

timeFirstDayInMonth(dates1 -7) # btw check the difference between "dates1-7"

## GMT
## [1] [2016-12-01] [2017-01-01] [2017-02-01] [2017-03-01] [2017-04-01]
## [6] [2017-05-01] [2017-06-01] [2017-07-01] [2017-08-01] [2017-09-01]
## [11] [2017-10-01] [2017-11-01]
xviii List of Figures

timeFirstDayInQuarter(dates1)

## GMT
## [1] [2017-01-01] [2017-01-01] [2017-01-01] [2017-04-01] [2017-04-01]
## [6] [2017-04-01] [2017-07-01] [2017-07-01] [2017-07-01] [2017-10-01]
## [11] [2017-10-01] [2017-10-01]

timeLastDayInMonth(dates1)

## GMT
## [1] [2017-01-31] [2017-02-28] [2017-03-31] [2017-04-30] [2017-05-31]
## [6] [2017-06-30] [2017-07-31] [2017-08-31] [2017-09-30] [2017-10-31]
## [11] [2017-11-30] [2017-12-31]

timeLastDayInQuarter(dates1)

## GMT
## [1] [2017-03-31] [2017-03-31] [2017-03-31] [2017-06-30] [2017-06-30]
## [6] [2017-06-30] [2017-09-30] [2017-09-30] [2017-09-30] [2017-12-31]
## [11] [2017-12-31] [2017-12-31]

timeNthNdayInMonth("2018-01-01",nday = 5, nth = 3) # useful for option expir

## GMT
## [1] [2018-01-19]
Introduction xix

timeNthNdayInMonth(dates1,nday = 5, nth = 3)

## GMT
## [1] [2017-01-20] [2017-02-17] [2017-03-17] [2017-04-21] [2017-05-19]
## [6] [2017-06-16] [2017-07-21] [2017-08-18] [2017-09-15] [2017-10-20]
## [11] [2017-11-17] [2017-12-15]

If one wants to create a more specific sequence of times, this can


be done with timeCalendar using time ‘atoms’:

timeCalendar(m = 1:4, d = c(28, 15, 30, 9), y = c(1989, 2001, 2004, 1990), F

## Europe/Zurich
## [1] [1989-01-28 01:00:00] [2001-02-15 01:00:00] [2004-03-30 02:00:00]
## [4] [1990-04-09 02:00:00]

timeCalendar(d=1, m=3:4, y=2018, h = c(9, 14), min = c(15, 23), s=c(39,41),

## Europe/Zurich
## [1] [2018-03-01 10:15:39] [2018-04-01 16:23:41]

0.1.1.1.2 Week-days and Business-days

One of the most important functionalities only existing in the


timeDate-package is the possibility to check for business days in
almost any timezone. The most important ones can be called by
holidayXXX()
xx List of Figures

holidayNYSE()

## NewYork
## [1] [2018-01-01] [2018-01-15] [2018-02-19] [2018-03-30] [2018-05-28]
## [6] [2018-07-04] [2018-09-03] [2018-11-22] [2018-12-25]

holiday(year = 2018, Holiday = c("GoodFriday","Easter","FRAllSaints"))

## GMT
## [1] [2018-03-30] [2018-04-01] [2018-11-01]

dateSeq <- timeSequence(Easter(year(Sys.time()), -14), Easter(year(Sys.time(

## GMT
## [1] [2018-03-18] [2018-03-19] [2018-03-20] [2018-03-21] [2018-03-22]
## [6] [2018-03-23] [2018-03-24] [2018-03-25] [2018-03-26] [2018-03-27]
## [11] [2018-03-28] [2018-03-29] [2018-03-30] [2018-03-31] [2018-04-01]
## [16] [2018-04-02] [2018-04-03] [2018-04-04] [2018-04-05] [2018-04-06]
## [21] [2018-04-07] [2018-04-08] [2018-04-09] [2018-04-10] [2018-04-11]
## [26] [2018-04-12] [2018-04-13] [2018-04-14] [2018-04-15]

dateSeq2 <- dateSeq[isWeekday(dateSeq)]; dateSeq2 # select only weekdays

## GMT
## [1] [2018-03-19] [2018-03-20] [2018-03-21] [2018-03-22] [2018-03-23]
## [6] [2018-03-26] [2018-03-27] [2018-03-28] [2018-03-29] [2018-03-30]
## [11] [2018-04-02] [2018-04-03] [2018-04-04] [2018-04-05] [2018-04-06]
## [16] [2018-04-09] [2018-04-10] [2018-04-11] [2018-04-12] [2018-04-13]
Introduction xxi

dayOfWeek(dateSeq2)

## 2018-03-19 2018-03-20 2018-03-21 2018-03-22 2018-03-23 2018-03-26


## "Mon" "Tue" "Wed" "Thu" "Fri" "Mon"
## 2018-03-27 2018-03-28 2018-03-29 2018-03-30 2018-04-02 2018-04-03
## "Tue" "Wed" "Thu" "Fri" "Mon" "Tue"
## 2018-04-04 2018-04-05 2018-04-06 2018-04-09 2018-04-10 2018-04-11
## "Wed" "Thu" "Fri" "Mon" "Tue" "Wed"
## 2018-04-12 2018-04-13
## "Thu" "Fri"

dateSeq3 <- dateSeq[isBizday(dateSeq, holidayZURICH(year(Sys.time())))]; dat

## GMT
## [1] [2018-03-19] [2018-03-20] [2018-03-21] [2018-03-22] [2018-03-23]
## [6] [2018-03-26] [2018-03-27] [2018-03-28] [2018-03-29] [2018-04-03]
## [11] [2018-04-04] [2018-04-05] [2018-04-06] [2018-04-09] [2018-04-10]
## [16] [2018-04-11] [2018-04-12] [2018-04-13]

dayOfWeek(dateSeq3)

## 2018-03-19 2018-03-20 2018-03-21 2018-03-22 2018-03-23 2018-03-26


## "Mon" "Tue" "Wed" "Thu" "Fri" "Mon"
## 2018-03-27 2018-03-28 2018-03-29 2018-04-03 2018-04-04 2018-04-05
## "Tue" "Wed" "Thu" "Tue" "Wed" "Thu"
## 2018-04-06 2018-04-09 2018-04-10 2018-04-11 2018-04-12 2018-04-13
## "Fri" "Mon" "Tue" "Wed" "Thu" "Fri"

Now, one of the strongest points for the timeDate package is made,
when one puts times and dates from different timezones together.
xxii List of Figures

This could be a challenging task (imagine hourly stock prices from


London, Tokyo and New York). Luckily the timeDate-package can
handle this easily:

ZH <- timeDate("2015-01-01 16:00:00", zone = "GMT", FinCenter = "Zurich")


NY <- timeDate("2015-01-01 18:00:00", zone = "GMT", FinCenter = "NewYork")
c(ZH, NY)

## Zurich
## [1] [2015-01-01 17:00:00] [2015-01-01 19:00:00]

c(NY, ZH) # it always takes the Financial Center of the first entry

## NewYork
## [1] [2015-01-01 13:00:00] [2015-01-01 11:00:00]

0.1.1.1.3 Assignments

Create a daily time series for 2018:

1. Find the subset of first and last days per month/quarter


(uniquely)
2. Take December 2017 and remove all weekends and holi-
days in Zurich (Tokyo)
3. create a series of five dates & times in New York. Show
them for New York, London and Belgrade

0.1.1.2 eXtensible Timeseries

The xts format is based on the timeseries format zoo, but extends
its power to be more compatible with other data classes. For ex-
ample, if one converts dates from the timeDate, xts will be so
Introduction xxiii

flexible as to memorize the financial center the dates were coming


from and upon retransformation to this class will be reassigned
values that would have been lost upon transformation to a pure
zoo-object. As quite often we (might) want to transform our data
to and from xts this is a great feature and makes our lifes a lot
easier. Also xts comes with a bundle of other features.

For the reader who wants to dig deeper,


we recommend the excellent zoo vignettes
(vignette("zoo-quickref"), vignette("zoo"),
vignette("zoo-faq"), vignette("zoo-design") and
vignette("zoo-read")). Read up on xts in vignette("xts")
and vignette("xts-faq").

To start, we create an xts object consisting of a series of randomly


created data points:

data <- rnorm(5) # 5 std. normally distributed random numbers


dates <- seq(as.Date("2017-05-01"), length=5, by="days")
xts1 <- xts(x=data, order.by=dates); xts1

## [,1]
## 2017-05-01 0.72838032
## 2017-05-02 0.47100977
## 2017-05-03 -0.04537768
## 2017-05-04 1.61845234
## 2017-05-05 0.07191067

coredata(xts1) # access data


xxiv List of Figures

## [,1]
## [1,] 0.72838032
## [2,] 0.47100977
## [3,] -0.04537768
## [4,] 1.61845234
## [5,] 0.07191067

index(xts1) # access time (index)

## [1] "2017-05-01" "2017-05-02" "2017-05-03" "2017-05-04" "2017-05-05"

Here, the xts object was built from a vector and a series of Dates.
We could also have used timeDate, yearmon or yearqtr and a
data.frame:

s1 <- rnorm(5); s2 <- 1:5


data <- data.frame(s1,s2)
dates <- timeSequence("2017-01-01",by="months",length.out=5,zone = "GMT")
xts2 <- xts(x=data, order.by=dates); xts2

## Warning: timezone of object (GMT) is different than current timezone ().

## s1 s2
## 2017-01-01 0.7462329 1
## 2017-02-01 -0.1551448 2
## 2017-03-01 -0.9693310 3
## 2017-04-01 0.3428151 4
## 2017-05-01 0.4692079 5
Introduction xxv

dates2 <- as.yearmon(dates)


xts3 <- xts(x=data, order.by = dates2)

In the next step we evaluate the merging of two timeseries:

set.seed(1)
xts3 <- xts(rnorm(6), timeSequence(from = "2017-01-01", to = "2017-06-01", b
xts4 <- xts(rnorm(5), timeSequence(from = "2017-04-01", to = "2017-08-01", b
colnames(xts3) <- "tsA"; colnames(xts4) <- "tsB"
merge(xts3,xts4)

Please be aware that joining timeseries in R does sometimes want


you to do a left/right/inner/outer join of the two objects

merge(xts3,xts4,join = "left")
merge(xts3,xts4,join = "right")
merge(xts3,xts4,join = "inner")
merge(xts3,xts4,join="outer",fill=0)

In the next step, we subset and replace parts of xts objects

xts5 <- xts(rnorm(24), timeSequence(from = "2016-01-01", to = "2017-12-01",


xts5["2017-01-01"]
xts5["2017-05-01/2017-08-12"]
xts5[c("2017-01-01","2017-05-01")] <- NA
xts5["2016"] <- 99
xts5["2016-05-01/"]
first(xts5)
last(xts5)
first(xts5,"3 months")
xts6 <- last(xts5,"1 year")
xxvi List of Figures

Now let us handle the missing value we introduced. One possi-


bility is just to omit the missing value using na.omit(). Other
possibilities would be to use the last value na.locf() or linear
interpolation with na.approx()

na.omit(xts6)
na.locf(xts6)
na.locf(xts6,fromLast = TRUE,na.rm = TRUE)
na.approx(xts6,na.rm = FALSE)

Finally, standard calculations can be done on xts objects, AND


there are some pretty helper functions to make life easier

periodicity(xts5)
nmonths(xts5); nquarters(xts5); nyears(xts5)
to.yearly(xts5)
to.quarterly(xts6)
round(xts6^2,2)
xts6[which(is.na(xts6))] <- rnorm(2)
# For aggregation of timeseries
ep1 <- endpoints(xts6,on="months",k = 2) # for aggregating timesries
ep2 <- endpoints(xts6,on="months",k = 3) # for aggregating timesries
period.sum(xts6, INDEX = ep2)
period.apply(xts6, INDEX = ep1, FUN=mean) # 2month means
period.apply(xts6, INDEX = ep2, FUN=mean) # 3month means
# Lead, lag and diff operations
cbind(xts6,lag(xts6,k=-1),lag(xts6,k=1),diff(xts6))

Finally, I will show some applications that go beyond xts, for ex-
ample the use of lapply to operate on a list
Introduction xxvii

# splitting timeseries (results is a list)


xts6_yearly <- split(xts5,f="years")
lapply(xts6_yearly,FUN=mean,na.rm=TRUE)
# using elaborate functions from the zoo-package
rollapply(as.zoo(xts6), width=3, FUN=sd) # rolling standard deviation

Last and least, we plot xts data and save it to a (csv) file, then
open it again:

tmp <- tempfile()


write.zoo(xts2,sep=",",file = tmp)
xts8 <- as.xts(read.zoo(tmp, sep=",", FUN=as.yearmon))
plot(xts8)

0.1.1.3 Downloading timeseries and basic visualization with quant-


mod

Many downloading and plotting functions are (still) available in


quantmod. We first require the package, then download data for
Google, Apple and the S&P500 from yahoo finance. Each of these
“Symbols” will be downloaded into its own “environment”. For
plotting there are a large variety of technical indicators available,
for an overview see here8 .

Quantmod is developed by Jeffrey Ryan and Joshua Ulrich9 and


has a homepage10 . The homepage includes an Introduction11 ,
describes how Data can be handled between xts and quantmod12
and has examples about Financial Charting with quantmod and
TTR13 . More documents will be developed within 2018.

8
https://www.r-bloggers.com/a-guide-on-r-quantmod-package-how-to-
get-started/
xxviii List of Figures

require(quantmod)
# the easiest form of getting data is for yahoo finance where you know the
getSymbols(Symbols = "AAPL", from="2010-01-01", to="2018-03-01", periodicity=
head(AAPL)
is.xts(AAPL)
plot(AAPL[, "AAPL.Adjusted"], main = "AAPL")
chartSeries(AAPL, TA=c(addVo(),addBBands(), addADX())) # Plot and add techni
getSymbols(Symbols = c("GOOG","^GSPC"), from="2000-01-01", to="2018-03-01", p
getSymbols('DTB3', src='FRED') # fred does not recognize from and to

Now we create an xts from all relevant parts of the data

stocks <- cbind("Apple"=AAPL[,"AAPL.Adjusted"],"Google"=GOOG[,"GOOG.Adjusted"


rf.daily <- DTB3["2010-01-01/2018-03-01"]
rf.monthly <- to.monthly(rf.daily)[,"rf.daily.Open"]
rf <- xts(coredata(rf.monthly),order.by = as.Date(index(rf.monthly)))

One possibility (that I adopted from (here)[https://www.


quantinsti.com/blog/an-example-of-a-trading-strategy-coded-in-
r/]) is to use the technical indicators provided by quantmod to
devise a technical trading strategy. We make use of a fast and
a slow moving average (function MACD in the TTR package that
belongs to quantmod). Whenever the fast moving average crosses
the slow moving one from below, we invest (there is a short term
trend to exploit) and we drop out of the investment once the red
(fast) line falls below the grey (slow) line. To evaluate the trading
strategy we need to also calculate returns for the S&P500 index
using ROC.

chartSeries(GSPC, TA=c(addMACD(fast=3, slow=12,signal=6,type=SMA)))


macd <- MACD(GSPC[,"GSPC.Adjusted"], nFast=3, nSlow=12,nSig=6,maType=SMA, per
buy_sell_signal <- Lag(ifelse(macd$macd < macd$signal, -1, 1))
Introduction xxix

buy_sell_returns <- (ROC(GSPC[,"GSPC.Adjusted"])*buy_sell_signal)["2001-06-01


portfolio <- exp(cumsum(buy_sell_returns)) # for nice plotting we assume tha
plot(portfolio)

For evaluation of trading strategies/portfolios and


other financial timeseries, almost every tool is available
through the package PerformanceAnalytics. In this case
charts.PerformanceSummary() calculates cumulative returns
(similar to above), monthly returns and maximum drawdown
(maximum loss in relation to best value, see here14 .

PerformanceAnalytics is a large package with an uncountable


variety of Tools. There are vignettes on the estimation of higher
order (co)moments vignette("EstimationComoments"), per-
formance attribution measures according to Bacon (2008)
vignette("PA-Bacon"), charting vignette("PA-charts") and
more that can be found on the PerformanceAnalytics cran
page15 .

require(PerformanceAnalytics)
rets <- cbind(buy_sell_returns,ROC(GSPC[,"GSPC.Adjusted"]))
colnames(rets) <- c("investment","benchmark")
charts.PerformanceSummary(rets,colorset=rich6equal)
chart.Histogram(rets, main = "Risk Measures", methods = c("add.density", "ad

14
https://de.wikipedia.org/wiki/Maximum_Drawdown
xxx List of Figures

0.1.2 Introduction to the tidyVerse

0.1.2.1 Tibbles

Since the middle of 2017 a lot of programmers have put in a huge


effort to rewrite many r functions and data objects in a tidy way
and thereby created the tidyverse16 .

For updates check the tidyverse homepage17 . A very well written


book introducing the tidyverse can be found online: R for Data
Science18 . The core of the tidyverse currently contains several
packages:

– ggplot2 for creating powerful graphs19 (see the


vignette("ggplot2-specs"))
– dplyr for data manipulation20 (see the
vignette("dplyr"))
– tidyr for tidying data21
– readr for importing datasets22 (see the
vignette("readr"))
– purrr for programming23 (see the “)
– tibble for modern data.frames24 (see the
vignette("tibble"))

and many more25 .

require(tidyverse) # install first if not yet there, update regularly: insta


require(tidyquant) # this package wraps all the quantmod etc packages into t

Most of the following is adapted from “Introduction to Statistical


Learning with Applications in R” by Gareth James, Daniela Wit-
ten, Trevor Hastie and Robert Tibshirani at http://www.science.
16
https://www.tidyverse.org/
Introduction xxxi

smith.edu/~jcrouser/SDS293/labs/. We begin by loading in the


Auto data set. This data is part of the ISLR package.

require(ISLR)
data(Auto)

Nothing happens when you run this, but now the data is avail-
able in your environment. (In RStudio, you would see the name
of the data in your Environment tab). To view the data, we can
either print the entire dataset by typing its name, or we can “slice”
some of the data off to look at just a subset by piping data us-
ing the %>% operator into the slice function. The piping operator
is one of the most useful tools of the tidyverse. Thereby you can
pipe command into command into command without saving and
naming each Intermittent step. The first step is to transform this
data.frame into a tibble (similar concept but better26 ). A tibble
has observations in rows and variables in columns. Those variables
can have many different formats:

Auto %>% slice(1:10)


tbs1 <- tibble(
Date = seq(as.Date("2017-01-01"), length=12, by="months"),
returns = rnorm(12),
letters = sample(letters, 12, replace = TRUE)
)

As you can see all three columns of tbs1 have different formats.
One can get the different variables by name and position. If you
want to use the pipe operator you need to use the special place-
holder ..

26
http://r4ds.had.co.nz/tibbles.html
xxxii List of Figures

tbs1$returns
tbs1[[2]]
tbs1 %>% .[[2]]

Before we go on an analysis a large tibble such as Auto, we quickly


talk about reading and saving files with tools from the tidyverse.
We save the file as csv using write_csv and read it back using
read_csv. because the columns of the read file are not in the exact
format as before, we use mutate to transform the columns.

Auto <- as.tibble(Auto) # make tibble from Auto


tmp <- tempfile()
write_csv(Auto,path = tmp) # write
Auto2 <- read_csv(tmp)
Auto2 <- Auto2 %>%
mutate(cylinders=as.double(cylinders),horsepower=as.double(horsepower),yea
all.equal(Auto,Auto2) # only the factor levels differ

Notice that the data looks just the same as when we loaded it from
the package. Now that we have the data, we can begin to learn
things about it.

dim(Auto)
str(Auto)
names(Auto)

The dim() function tells us that the data has 392 observations and
nine variables. The original data had some empty rows, but when
we read the data in R knew to ignore them. The str() function
tells us that most of the variables are numeric or integer, although
the name variable is a character vector. names() lets us check the
variable names.
Introduction xxxiii

0.1.2.2 Summary statistics

Often, we want to know some basic things about variables in our


data. summary() on an entire dataset will give you an idea of some
of the distributions of your variables. The summary() function pro-
duces a numerical summary of each variable in a particular data
set.

summary(Auto)

The summary suggests that origin might be better thought of


as a factor. It only seems to have three possible values, 1, 2 and
3. If we read the documentation about the data (using ?Auto)
we will learn that these numbers correspond to where the car is
from: 1. American, 2. European, 3. Japanese. So, lets mutate()
that variable into a factor (categorical) variable.

Auto <- Auto %>%


mutate(origin = factor(origin))
summary(Auto)

0.1.2.3 Plotting

We can use the ggplot2 package to produce simple graphics.


ggplot2 has a particular syntax, which looks like this

ggplot(Auto) + geom_point(aes(x=cylinders, y=mpg))

The basic idea is that you need to initialize a plot with ggplot()
and then add “geoms” (short for geometric objects) to the plot.
xxxiv List of Figures

The ggplot2 package is based on the Grammar of Graphics27 ,


a famous book on data visualization theory. It is a way to map
attributes in your data (like variables) to “aesthetics” on the plot.
The parameter aes() is short for aesthetic.
For more about the ggplot2 syntax, view the help by typing
?ggplot or ?geom_point. There are also great online resources
for ggplot2, like the R graphics cookbook28 .
The cylinders variable is stored as a numeric vector, so R has
treated it as quantitative. However, since there are only a small
number of possible values for cylinders, one may prefer to treat it
as a qualitative variable. We can turn it into a factor, again using
a mutate() call.

Auto = Auto %>%


mutate(cylinders = factor(cylinders))

To view the relationship between a categorical and a numeric vari-


able, we might want to produce boxplots. As usual, a number of
options can be specified in order to customize the plots.

ggplot(Auto) + geom_boxplot(aes(x=cylinders, y=mpg)) + xlab("Cylinders") + y

The geom geom_histogram() can be used to plot a histogram.

27
https://www.google.com/url?sa=t&rct=j&q=
&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=
0ahUKEwjV6I6F4ILPAhUFPT4KHTFiBwgQFggcMAA&
url=https%3A%2F%2Fwww.amazon.com%2FGrammar-
Graphics-Statistics-Computing%2Fdp%2F0387245448&
usg=AFQjCNF5D6H3ySCsgqBTdp96KNF3bGyU2Q&sig2=
GnNgoN6Ztn3AJSTJYaMPwA
28
http://www.cookbook-r.com/Graphs/
Introduction xxxv

ggplot(Auto) + geom_histogram(aes(x=mpg), binwidth=5)

For small datasets, we might want to see all the bivariate relation-
ships between the variables. The GGally package has an exten-
sion of the scatterplot matrix that can do just that. We make use
of the select operator to only select the two variables mpg and
cylinders and pipe it into the ggpairs() function

Auto %>% select(mpg, cylinders) %>% GGally::ggpairs()

Because there are not many cars with 3 and 5 cylinders we use
filter to only select those cars with 4, 6 and 8 cylinders.

Auto %>% select(mpg, cylinders) %>% filter(cylinders %in% c(4,6,8)) %>% GGal

Sometimes, we might want to save a plot for use outside of R. To


do this, we can use the ggsave() function.

ggsave("histogram.png",ggplot(Auto) + geom_histogram(aes(x=mpg), binwidth=5)

TO DO: * Tidyquant: Document more technical features. * For


extensive manipulations a la timeseries, there is an extension of the
tibble objects: time aware tibbles, that allow for many of the
xts functionality without the necessary conversion tibbletime29 .
29
https://github.com/business-science/tibbletime
xxxvi List of Figures

0.2 Managing Data

In this chapter we will learn how to download/import data from


various sources. Most importantly we will use the quantmod library
through tidyquant to download financial data from a variety of
sources. We will also lear how to import ‘.xlsx’ (Excel) files.

0.2.1 Getting Data

0.2.1.1 Downloading from Online Datasources

The tidyquant package comes with a variiety of readily compiled


datasets/datasources. For whole collections of data, there are the
following commands available

tq_exchange_options() # find all exchanges available

## [1] "AMEX" "NASDAQ" "NYSE"

tq_index_options() # find all indices available

## [1] "RUSSELL1000" "RUSSELL2000" "RUSSELL3000" "DOW" "DOWGLOBAL"


## [6] "SP400" "SP500" "SP600" "SP1000"

tq_get_options() # find all data sources available

## [1] "stock.prices" "stock.prices.google" "stock.prices.japan"


## [4] "financials" "key.ratios" "dividends"
## [7] "splits" "economic.data" "exchange.rates"
## [10] "metal.prices" "quandl" "quandl.datatable"
## [13] "alphavantager" "rblpapi"
Managing Data xxxvii

The commands tq_exchange() and tq_index() will now get you


all symbols and some additional information on the stock listed at
that exchange or contained in that index.30

glimpse(sp500)

## Observations: 504
## Variables: 5
## $ symbol <chr> "AAPL", "MSFT", "AMZN", "BRK.B", "FB", "JPM", "JNJ...
## $ company <chr> "Apple Inc.", "Microsoft Corporation", "Amazon.com...
## $ weight <dbl> 0.044387857, 0.035053855, 0.032730459, 0.016868330...
## $ sector <chr> "Information Technology", "Information Technology"...
## $ shares_held <dbl> 53939268, 84297440, 4418447, 21117048, 26316160, 3...

glimpse(nyse)

## Observations: 3,139
## Variables: 7
## $ symbol <chr> "DDD", "MMM", "WBAI", "WUBA", "EGHT", "AHC", "...
## $ company <chr> "3D Systems Corporation", "3M Company", "500.c...
## $ last.sale.price <dbl> 18.4800, 206.7100, 11.6400, 68.1800, 23.2000, ...
## $ market.cap <chr> "$2.11B", "$121.26B", "$491.85M", "$10.06B", "...
## $ ipo.year <dbl> NA, NA, 2013, 2013, NA, NA, 2014, 2014, NA, NA...
## $ sector <chr> "Technology", "Health Care", "Consumer Service...
## $ industry <chr> "Computer Software: Prepackaged Software", "Me...

glimpse(nasdaq)

30
Note that tq_index() unfortunately makes use of the package XLConnect
that requires Java to be installed on your system.
xxxviii List of Figures

## Observations: 3,405
## Variables: 7
## $ symbol <chr> "YI", "PIH", "PIHPP", "TURN", "FLWS", "FCCY", ...
## $ company <chr> "111, Inc.", "1347 Property Insurance Holdings...
## $ last.sale.price <dbl> 13.800, 6.350, 25.450, 2.180, 11.550, 20.150, ...
## $ market.cap <chr> NA, "$38M", NA, "$67.85M", "$746.18M", "$168.8...
## $ ipo.year <dbl> 2018, 2014, NA, NA, 1999, NA, NA, 2011, 2014, ...
## $ sector <chr> NA, "Finance", "Finance", "Finance", "Consumer...
## $ industry <chr> NA, "Property-Casualty Insurers", "Property-Ca...

The datset we will be using consists of the ten largest stocks within
the S&P500 that had an IPO before January 2000. Therefore we
need to merge both datasets using inner_join() because we only
want to keep symbols from the S&P500 that are also traded on
NYSE or NASDAQ:

stocks.selection <- sp500 %>%


inner_join(rbind(nyse,nasdaq) %>% select(symbol,last.sale.price,market.cap
filter(ipo.year<2000&!is.na(market.cap)) %>% # filter years with ipo<2000
arrange(desc(weight)) %>% # sort in descending order
slice(1:10)

The ten largest stocks in the S&P500 with a history longer than
January 2000.
symbol
company
weight
sector
shares_held
last.sale.price
market.cap
ipo.year
Managing Data xxxix

AAPL
Apple Inc.
0.044
Information Technology
53939268
221.07
$1067.75B
1980
MSFT
Microsoft Corporation
0.035
Information Technology
84297440
111.71
$856.62B
1986
AMZN
Amazon.com Inc.
0.033
Consumer Discretionary
4418447
1990.00
$970.6B
1997
CSCO
Cisco Systems Inc.
0.009
xl List of Figures

Information Technology
51606584
46.89
$214.35B
1990
NVDA
NVIDIA Corporation
0.007
Information Technology
6659463
268.20
$163.07B
1999
ORCL
Oracle Corporation
0.006
Information Technology
32699620
49.34
$196.43B
1986
AMGN
Amgen Inc.
0.005
Health Care
7306144
199.50
Managing Data xli

$129.13B
1983
ADBE
Adobe Systems Incorporated
0.005
Information Technology
5402625
267.79
$131.13B
1986
QCOM
QUALCOMM Incorporated
0.004
Information Technology
15438597
71.75
$105.41B
1991
GILD
Gilead Sciences Inc.
0.004
Health Care
14310276
73.97
$95.89B
1992
In a next step, we will download stock prices from yahoo.
xlii List of Figures

Data from that source usually comes in the OHLC format


(open,high,low,close) with additional information (volume, ad-
justed). We will additionall download data for the S&P500-index
itself. Note, that we get daily prices:

stocks.prices <- stocks.selection$symbol %>%


tq_get(get = "stock.prices",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
index.prices <- "^GSPC" %>%
tq_get(get = "stock.prices",from = "2000-01-01",to = "2017-12-31")
stocks.prices %>% slice(1:2) # show the first two entries of each group

## # A tibble: 20 x 8
## # Groups: symbol [10]
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2000-01-03 3.75 4.02 3.63 4.00 133949200 3.54
## 2 AAPL 2000-01-04 3.87 3.95 3.61 3.66 128094400 3.24
## 3 ADBE 2000-01-03 16.8 16.9 16.1 16.4 7384400 16.1
## 4 ADBE 2000-01-04 15.8 16.5 15.0 15.0 7813200 14.8
## 5 AMGN 2000-01-03 70 70 62.9 62.9 22914900 53.5
## 6 AMGN 2000-01-04 62 64.1 57.7 58.1 15052600 49.4
## 7 AMZN 2000-01-03 81.5 89.6 79.0 89.4 16117600 89.4
## 8 AMZN 2000-01-04 85.4 91.5 81.8 81.9 17487400 81.9
## 9 CSCO 2000-01-03 55.0 55.1 51.8 54.0 53076000 43.6
## 10 CSCO 2000-01-04 52.8 53.5 50.9 51 50805600 41.2
## 11 GILD 2000-01-03 1.79 1.80 1.72 1.76 54070400 1.61
## 12 GILD 2000-01-04 1.70 1.72 1.66 1.68 38960000 1.54
## 13 MSFT 2000-01-03 58.7 59.3 56 58.3 53228400 42.5
## 14 MSFT 2000-01-04 56.8 58.6 56.1 56.3 54119000 41.0
## 15 NVDA 2000-01-03 3.94 3.97 3.68 3.90 7522800 3.61
## 16 NVDA 2000-01-04 3.83 3.84 3.60 3.80 7512000 3.51
## 17 ORCL 2000-01-03 31.2 31.3 27.9 29.5 98114800 26.4
## 18 ORCL 2000-01-04 28.9 29.7 26.2 26.9 116824800 24.0
## 19 QCOM 2000-01-03 99.6 100 87 89.7 91334000 65.7
## 20 QCOM 2000-01-04 86.3 87.7 80 81.0 63567400 59.4
Managing Data xliii

Dividends and stock splits can also be downloaded:

stocks.dividends <- stocks.selection$symbol %>%


tq_get(get = "dividends",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
stocks.splits <- stocks.selection$symbol %>%
tq_get(get = "splits",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)

We additionally can download financial for the different stocks.


Therein we have key ratios (financials, profitability, growth,
cash flow, financial health, efficiency ratios and valuation
ratios). These ratios are from Morningstar31 and come in a nested
form, that we will have to ‘dig out’ using unnest.

stocks.ratios <- stocks.selection$symbol %>%


tq_get(get = "key.ratios",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)

## # A tibble: 42 x 3
## # Groups: symbol [6]
## symbol section data
## <chr> <chr> <list>
## 1 AAPL Financials <tibble [150 x 5]>
## 2 AAPL Profitability <tibble [170 x 5]>
## 3 AAPL Growth <tibble [160 x 5]>
## 4 AAPL Cash Flow <tibble [50 x 5]>
## 5 AAPL Financial Health <tibble [240 x 5]>
## 6 AAPL Efficiency Ratios <tibble [80 x 5]>
## 7 AAPL Valuation Ratios <tibble [40 x 5]>
## 8 MSFT Financials <tibble [150 x 5]>
## 9 MSFT Profitability <tibble [170 x 5]>
31
http://www.morningstar.com/
xliv List of Figures

## 10 MSFT Growth <tibble [160 x 5]>


## # ... with 32 more rows

We find that financial ratios are only available for a subset of


the ten stocks. We first filter for the ‘Growth’-information, then
unnest() the nested tibbles and filter again for ‘EPS %’ and the
‘Year over Year’ information. Then we use ggplot() to plot the
timeseries of Earnings per Share for the different companies.

stocks.ratios %>% filter(section=="Growth") %>% unnest() %>%


filter(sub.section=="EPS %",category=="Year over Year") %>%
ggplot(aes(x=date,y=value,color=symbol)) + geom_line(lwd=1.1) +
labs(title="Year over Year EPS in %", x="",y="") +
theme_tq() + scale_color_tq()

Year over Year EPS in %


300

200

100

-100
2010 2012 2014 2016 2018

AAPL AMZN MSFT


symbol
AMGN CSCO NVDA

A variety of other (professional) data services are available, that


are integrated into tidyquant which I will list in the following
subsections:
Managing Data xlv

0.2.1.1.1 Quandl

Quandl32 provides access to many different financial and economic


databases. To use it, one should acquire an api key by creating a
Quandl account.33 Searches can be done using quandl_search()
(I personally would use their homepage to do that). Data can
be downloaded as before with tq_get(), be aware that you can
download either single timeseries or entire datatables with the ar-
guments get = "quandl" and get = "quandl.datatable". Note
that in the example for ‘Apple’ below, the adjusted close prices are
different from the ones of Yahoo. An example for a datatable is
Zacks Fundamentals Collection B34 .

quandl_api_key("enter-your-api-key-here")
quandl_search(query = "Oil", database_code = "NSE", per_page = 3)
quandl.aapl <- c("WIKI/AAPL") %>%
tq_get(get = "quandl",
from = "2000-01-01",
to = "2017-12-31",
column_index = 11, # numeric column number (e.g. 1)
collapse = "daily", # can be “none”, “daily”, “weekly”, “mon
transform = "none") # for summarizing data: “none”, “diff”,

## Oil India Limited


## Code: NSE/OIL
## Desc: Historical prices for Oil India Limited<br><br>National Stock Exchan
## Freq: daily
## Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Tur
##
## Oil Country Tubular Limited
## Code: NSE/OILCOUNTUB
32
https://www.quandl.com/
33
If you do not use an API key, you are limited to 50 calls per day.
34
https://www.quandl.com/databases/ZFB/documentation/about
xlvi List of Figures

## Desc: Historical prices for Oil Country Tubular Limited<br><br>National St


## Freq: daily
## Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Tur
##
## Essar Oil Limited
## Code: NSE/ESSAROIL
## Desc: Historical prices for Essar Oil Limited<br><br>National Stock Exchan
## Freq: daily
## Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Tur

## # A tibble: 3 x 13
## id dataset_code database_code name description refreshed_at
## * <int> <chr> <chr> <chr> <chr> <chr>
## 1 6668 OIL NSE Oil ~ Historical~ 2018-09-13T~
## 2 6669 OILCOUNTUB NSE Oil ~ Historical~ 2018-09-13T~
## 3 6041 ESSAROIL NSE Essa~ Historical~ 2016-02-09T~
## # ... with 7 more variables: newest_available_date <chr>,
## # oldest_available_date <chr>, column_names <list>, frequency <chr>,
## # type <chr>, premium <lgl>, database_id <int>

## # A tibble: 5 x 12
## date open high low close volume ex.dividend split.ratio
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2000-01-03 105. 112. 102. 112. 4.78e6 0 1
## 2 2000-01-04 108. 111. 101. 102. 4.57e6 0 1
## 3 2000-01-05 104. 111. 103 104 6.95e6 0 1
## 4 2000-01-06 106. 107 95 95 6.86e6 0 1
## 5 2000-01-07 96.5 101 95.5 99.5 4.11e6 0 1
## # ... with 4 more variables: adj.open <dbl>, adj.high <dbl>,
## # adj.low <dbl>, adj.close <dbl>
0.2.1.1.2 Alpha Vantage

Alpha Vantage35 provides access to a real-time and historical fi-


nancial data. Here we also need to get and set an api key (for
free).
35
https://www.alphavantage.co
Managing Data xlvii

av_api_key("enter-your-api-key-here")
alpha.aapl <- c("AAPL") %>%
tq_get(get = "alphavantager",
av_fun="TIME_SERIES_DAILY_ADJUSTED") # for daily data
alpha.aapl.id <- c("AAPL") %>%
tq_get(get = "alphavantager",
av_fun="TIME_SERIES_INTRADAY", # for intraday data
interval="5min") # 5 minute intervals

## # A tibble: 5 x 9
## timestamp open high low close adjusted_close volume dividend_amount
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
## 1 2018-04-24 166. 166. 161. 163. 162. 3.37e7 0
## 2 2018-04-25 163. 165. 162. 164. 162. 2.84e7 0
## 3 2018-04-26 164. 166. 163. 164. 163. 2.80e7 0
## 4 2018-04-27 164 164. 161. 162. 161. 3.57e7 0
## 5 2018-04-30 162. 167. 162. 165. 164. 4.24e7 0
## # ... with 1 more variable: split_coefficient <dbl>

## # A tibble: 5 x 6
## timestamp open high low close volume
## <dttm> <dbl> <dbl> <dbl> <dbl> <int>
## 1 2018-09-11 14:25:00 224. 224. 224. 224. 261968
## 2 2018-09-11 14:30:00 224. 224. 224. 224. 334069
## 3 2018-09-11 14:35:00 224. 224. 224. 224. 285138
## 4 2018-09-11 14:40:00 224. 224. 224. 224. 229329
## 5 2018-09-11 14:45:00 224. 224. 224. 224. 193316
0.2.1.1.3 FRED (Economic Data)

A large quantity of economic data can be extracted from the Fed-


eral Reserve Economic Data (FRED) database36 . Below we down-
load the 1M- and 3M- risk-free-rate for the US. Note that these
are annualized rates!
36
https://fred.stlouisfed.org/
xlviii List of Figures

ir <- tq_get(c("TB1YR","TB3MS"), get = "economic.data") %>%


group_by(symbol)

## # A tibble: 6 x 3
## # Groups: symbol [2]
## symbol date price
## <chr> <date> <dbl>
## 1 TB1YR 2018-08-01 2.36
## 2 TB1YR 2018-07-01 2.31
## 3 TB1YR 2018-06-01 2.25
## 4 TB3MS 2018-08-01 2.03
## 5 TB3MS 2018-07-01 1.96
## 6 TB3MS 2018-06-01 1.9

0.2.1.1.4 OANDA (Exchange Rates and Metal Prices)

Oanda37 provides a large quantity of exchange rates (currently


only for the last 180 days). Enter them as currency pairs using “/”
notation (e.g “EUR/USD”), and set get = "exchange.rates".
Note that most of the data (having a much larger horizon) is also
available on FRED.

eur_usd <- tq_get("EUR/USD",


get = "exchange.rates",
from = Sys.Date() - lubridate::days(10))
plat_price_eur <- tq_get("plat", get = "metal.prices",
from = Sys.Date() - lubridate::days(10),
base.currency = "EUR")
eur_usd %>% arrange(desc(date)) %>% slice(1:3)

## # A tibble: 3 x 2
37
https://www.oanda.com
Managing Data xlix

## date exchange.rate
## <date> <dbl>
## 1 2018-09-12 1.16
## 2 2018-09-11 1.16
## 3 2018-09-10 1.16

plat_price_eur %>% arrange(desc(date)) %>% slice(1:3)

## # A tibble: 3 x 2
## date price
## <date> <dbl>
## 1 2018-09-12 681.
## 2 2018-09-11 681.
## 3 2018-09-10 680.

0.2.1.1.5 Bloomberg and Datastream

Bloomberg is officially integrated into the tidyquant-package, but


one needs to have Bloomberg running on the terminal one is using.
Datastream is not integrated but has a nice R-Interface in the
package rdatastream38 . However, you need to have the Thomson
Dataworks Enterprise SOAP API (non free)39 licensed, then the
package allows for convienient retrieval of data. If this is not the
case, then you have to manually retrieve your data, save it as
“.xlsx” Excel-file that we can import using readxl::read_xlsx()
from the readxl-package.

0.2.1.1.6 Fama-French Data (Kenneth French’s Data Library)

To download Fama-French data in batch there is a package


38
https://github.com/fcocquemas/rdatastream
39
http://dataworks.thomson.com/Dataworks/Enterprise/1.0/
l List of Figures

FFdownload that I updated and that now can be installed


via devtools::install_bitbucket("sstoeckl/FFdownload").
Currently you can either download all data or skip the (large)
daily files using the command exclude_daily=TRUE. The result
is a list of data.frames that has to be cleaned somehow but
nonetheless is quite usable.

FFdownload(output_file = "FFdata.RData", # output file for the final


tempdir = NULL, # where should the temporary downloads go to (cre
exclude_daily = TRUE, # exclude daily data
download = FALSE) # if false, data already in the temp-directory
load(file = "FFdata.RData")
factors <- FFdownload$`x_F-F_Research_Data_Factors`$monthly$Temp2 %>%
tk_tbl(rename_index="date") %>% # make tibble
mutate(date=as.Date(date, frac=1)) %>% # make proper month-end dat
gather(key=FFvar,value = price,-date) # gather into tidy format
factors %>% group_by(FFvar) %>% slice(1:2)

## # A tibble: 8 x 3
## # Groups: FFvar [4]
## date FFvar price
## <date> <chr> <dbl>
## 1 1926-07-31 HML -2.87
## 2 1926-08-31 HML 4.19
## 3 1926-07-31 Mkt.RF 2.96
## 4 1926-08-31 Mkt.RF 2.64
## 5 1926-07-31 RF 0.22
## 6 1926-08-31 RF 0.25
## 7 1926-07-31 SMB -2.3
## 8 1926-08-31 SMB -1.4

0.2.1.2 Manipulate Data

A variety of transformations can be applied to (financial) time-


series data. We will present some examples merging together our
Managing Data li

stock file with the index, the risk free rate from FRED and the
Fama-French-Factors.
Doing data transformations in tidy datasets is either called
a transmute (change variable/dataset, only return calculated
column) or a mutate() (add transformed variable). In the
tidyquant-package these functions are called tq_transmute and
tq_mutate, because they simultaneously allow changes of periodic-
ity (daily to monthly) and therefore the returned dataset can have
less rows than before. The core of these functions is the provision
of a mutate_fun that can come from the the xts/zoo, quantmod
(Quantitative Financial Modelling & Trading Framework for R40 )
and TTR (Technical Trading Rules41 ) packages.
In the examples below, we show how to change the periodicity of
the data (where we keep the adjusted close price and the volume
information) and calculate monthly log returns for the ten stocks
and the index. We then merge the price and return information
for each stock, and at each point in time add the return of the
S&P500 index and the 3 Fama-French-Factors.

stocks.prices.monthly <- stocks.prices %>%


tq_transmute(select = c(adjusted,volume), # which column t
mutate_fun = to.monthly, # funtion: make
indexAt = "lastof") %>% # ‘yearmon’, ‘ye
ungroup() %>% mutate(date=as.yearmon(date))
stocks.returns <- stocks.prices %>%
tq_transmute(select = adjusted,
mutate_fun = periodReturn, # create monthly
period="monthly",
type="arithmetic") %>%
ungroup() %>% mutate(date=as.yearmon(date))
index.returns <- index.prices %>%
tq_transmute(select = adjusted,mutate_fun = periodReturn,
period="monthly", type="arithmetic") %>%

40
https://www.quantmod.com/
41
https://www.rdocumentation.org/packages/TTR/
lii List of Figures

mutate(date=as.yearmon(date))
factors.returns <- factors %>% mutate(price=price/100) %>% # already is mon
mutate(date=as.yearmon(date))
stocks.prices.monthly %>% ungroup() %>% slice(1:5) # show first 5 entries

## # A tibble: 5 x 4
## symbol date adjusted volume
## <chr> <S3: yearmon> <dbl> <dbl>
## 1 AAPL Jan 2000 3.28 175420000
## 2 AAPL Feb 2000 3.63 92240400
## 3 AAPL Mrz 2000 4.30 101158400
## 4 AAPL Apr 2000 3.93 62395200
## 5 AAPL Mai 2000 2.66 108376800

stocks.returns %>% ungroup() %>% slice(1:5) # show first 5 entries

## # A tibble: 5 x 3
## symbol date monthly.returns
## <chr> <S3: yearmon> <dbl>
## 1 AAPL Jan 2000 -0.0731
## 2 AAPL Feb 2000 0.105
## 3 AAPL Mrz 2000 0.185
## 4 AAPL Apr 2000 -0.0865
## 5 AAPL Mai 2000 -0.323

index.returns %>% ungroup() %>% slice(1:5) # show first 5 entries

## # A tibble: 5 x 2
## date monthly.returns
## <S3: yearmon> <dbl>
Managing Data liii

## 1 Jan 2000 -0.0418


## 2 Feb 2000 -0.0201
## 3 Mrz 2000 0.0967
## 4 Apr 2000 -0.0308
## 5 Mai 2000 -0.0219

factors.returns %>% ungroup() %>% slice(1:5) # show first 5 entries

## # A tibble: 5 x 3
## date FFvar price
## <S3: yearmon> <chr> <dbl>
## 1 Jul 1926 Mkt.RF 0.0296
## 2 Aug 1926 Mkt.RF 0.0264
## 3 Sep 1926 Mkt.RF 0.0036
## 4 Okt 1926 Mkt.RF -0.0324
## 5 Nov 1926 Mkt.RF 0.0253

Now, we merge all the information together

## # A tibble: 5 x 10
## symbol date return adjusted volume sp500 Mkt.RF SMB HML
## <chr> <S3:> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL Jan ~ -0.0731 3.28 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
## 2 AAPL Feb ~ 0.105 3.63 9.22e7 -0.0201 0.0245 0.221 -0.106
## 3 AAPL Mrz ~ 0.185 4.30 1.01e8 0.0967 0.052 -0.173 0.0794
## 4 AAPL Apr ~ -0.0865 3.93 6.24e7 -0.0308 -0.064 -0.0771 0.0856
## 5 AAPL Mai ~ -0.323 2.66 1.08e8 -0.0219 -0.0442 -0.0501 0.0243
## # ... with 1 more variable: RF <dbl>

Now we can calculate and add additional information, such as


the MACD (Moving Average Convergence/Divergence42 ) and its
driving signal. Be aware, that you have to group_by symbol, or
the signal would just be calculated for one large stacked timeseries:

42
https://en.wikipedia.org/wiki/MACD
liv List of Figures

stocks.final %>% group_by(symbol) %>%


tq_mutate(select = adjusted,
mutate_fun = MACD,
col_rename = c("MACD", "Signal")) %>%
select(symbol,date,adjusted,MACD,Signal) %>%
tail() # show last part of the dataset

## # A tibble: 6 x 5
## # Groups: symbol [1]
## symbol date adjusted MACD Signal
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 GILD 2018. 73.4 -5.40 -4.38
## 2 GILD 2018. 80.8 -3.86 -4.27
## 3 GILD 2018. 78.7 -2.85 -3.99
## 4 GILD 2018. 72.8 -2.68 -3.73
## 5 GILD 2018. 72.6 -2.52 -3.49
## 6 GILD 2018. 70.0 -2.66 -3.32

save(stocks.final,file="stocks.RData")

0.2.1.2.1 Rolling functions

One of the most important functions you will need in reality is


the possibility to perform a rolling analysis. One example would
be a rolling regression to get time varying α and β of each stock
with respect to the index or the Fama-French-Factors. To do that
we need to create a function that does everything we want in one
step:

regr_fun <- function(data,formula) {


coef(lm(formula, data = timetk::tk_tbl(data, silent = TRUE)))
}
Managing Data lv

This function takes a dataset and a regression formula as input,


performs a regression and returns the coefficients, as well as the
residual standard deviation and the respective R2
Step 2: Create a custom function
Next, create a custom regression function, which will be used to
apply over the rolling window in Step 3. An important point is
that the “data” will be passed to the regression function as an xts
object. The timetk::tk_tbl function takes care of converting to a
data frame for the lm function to work properly with the columns
“fb.returns” and “xlk.returns”.
regr_fun <- function(data) { coef(lm(fb.returns ~ xlk.returns,
data = timetk::tk_tbl(data, silent = TRUE))) } Step 3: Apply
the custom function
Now we can use tq_mutate() to apply the custom regression func-
tion over a rolling window using rollapply from the zoo package.
Internally, since we left select = NULL, the returns_combined
data frame is being passed automatically to the data argument of
the rollapply function. All you need to specify is the mutate_fun
= rollapply and any additional arguments necessary to apply the
rollapply function. We’ll specify a 12 week window via width = 12.
The FUN argument is our custom regression function, regr_fun.
It’s extremely important to specify by.column = FALSE, which
tells rollapply to perform the computation using the data as a
whole rather than apply the function to each column indepen-
dently. The col_rename argument is used to rename the added
columns.
returns_combined %>% tq_mutate(mutate_fun = rollapply,
width = 12, FUN = regr_fun, by.column = FALSE, col_rename
= c(“coef.0”, “coef.1”))
As shown above, the rolling regression coefficients were added to
the data frame.

Also check out the functionality of tibbletime for that task


(rollify)!
lvi List of Figures

0.3 Exploring Data

In this chapter we show how to explore and analyze data using the
dataset created in Chapter @ref(#s_2Data):

load("stocks.RData")
glimpse(stocks.final)

## Observations: 2,160
## Variables: 10
## $ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL...
## $ date <S3: yearmon> Jan 2000, Feb 2000, Mrz 2000, Apr 2000, Mai 2...
## $ return <dbl> -0.07314358, 0.10481916, 0.18484202, -0.08651613, -0....
## $ adjusted <dbl> 3.283894, 3.628109, 4.298736, 3.926826, 2.658767, 3.3...
## $ volume <dbl> 175420000, 92240400, 101158400, 62395200, 108376800, ...
## $ sp500 <dbl> -0.041753145, -0.020108083, 0.096719828, -0.030795756...
## $ Mkt.RF <dbl> -0.0474, 0.0245, 0.0520, -0.0640, -0.0442, 0.0464, -0...
## $ SMB <dbl> 0.0505, 0.2214, -0.1728, -0.0771, -0.0501, 0.1403, -0...
## $ HML <dbl> -0.0045, -0.1057, 0.0794, 0.0856, 0.0243, -0.1010, 0....
## $ RF <dbl> 0.0041, 0.0043, 0.0047, 0.0046, 0.0050, 0.0040, 0.004...

stocks.final %>% slice(1:2)

## # A tibble: 2 x 10
## symbol date return adjusted volume sp500 Mkt.RF SMB HML
## <chr> <S3:> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL Jan ~ -0.0731 3.28 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
## 2 AAPL Feb ~ 0.105 3.63 9.22e7 -0.0201 0.0245 0.221 -0.106
## # ... with 1 more variable: RF <dbl>
Exploring Data lvii

0.3.1 Plotting Data

In this chapter we show how to create various graphs of financial


timeseries and their properties, which should help us to get a better
understanding of their properties, before we go on to calculate and
test their statistics.

0.3.1.1 Time-series plots

0.3.1.2 Box-plots

0.3.1.3 Histogram and Density Plots

0.3.1.4 Quantile Plots

0.3.2 Analyzing Data

0.3.2.1 Calculating Statistics

0.3.2.2 Testing Data

0.3.2.3 Exposure to Factors

The stocks in our example all have a certain exposure to risk fac-
tors (e.g. the Fama-French-factors we have added to our dataset).
Let us specify these exposures by regression each stocks return on
the factors Mkt.RF, SMB and HML:

stocks.factor_exposure <- stocks.final %>%


nest(-symbol) %>%
mutate(model = map(data, ~ lm(return ~ Mkt.RF + SMB + HML, data= .x)),
tidied = map(model, tidy)) %>%
unnest(tidied, .drop=TRUE) %>%
filter(term != "(Intercept)") %>%
select(symbol,term,estimate) %>%
spread(term,estimate) %>%
select(symbol,Mkt.RF,SMB,HML)
lviii List of Figures

0.4 Managing Portfolios

In this chapter we show how to explore and analyze data using the
dataset created in Chapter @ref(#s_2Data):
At first we will learn how to full-sample optimize portfolios, then
(in the next chapters) we will do the same thing in a rolling anal-
ysis and also perform some backtesting. The major workhorse of
this chapter is the portfolioAnalytics-package developed by Pe-
terson and Carl (2018).

portfolioAnalytics comes with an excellent introduc-


tory vignette vignette("portfolio_vignette") and in-
cludes more documents, detailing on the use of ROI-solvers
vignette("ROI_vignette"), how to create custom mo-
ment functions vignette("custom_moments_objectives")
and how to introduce CVaR-budgets
vignette("risk_budget_optimization").

0.4.1 Introduction

SHORT INTRODUCTION TO PORTFOLIOMANAGEMENT


We start by first creating a portfolio object, before we…

0.4.1.1 The portfolio.spec() Object

The portfolio object is a so-called S3-object43 , which means, that it


has a certain class (portfolio) describing its properties, behavior
and relation to other objects. Usually such an objects comes with
a variety of methods. To create such an object, we reuse the stock
data set that we have created in Chapter @ref(#s_2Data):

43
http://adv-r.had.co.nz/S3.html
Managing Portfolios lix

load("stocks.RData")
glimpse(stocks.final)

## Observations: 2,160
## Variables: 10
## $ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL...
## $ date <S3: yearmon> Jan 2000, Feb 2000, Mrz 2000, Apr 2000, Mai 2...
## $ return <dbl> -0.07314358, 0.10481916, 0.18484202, -0.08651613, -0....
## $ adjusted <dbl> 3.283894, 3.628109, 4.298736, 3.926826, 2.658767, 3.3...
## $ volume <dbl> 175420000, 92240400, 101158400, 62395200, 108376800, ...
## $ sp500 <dbl> -0.041753145, -0.020108083, 0.096719828, -0.030795756...
## $ Mkt.RF <dbl> -0.0474, 0.0245, 0.0520, -0.0640, -0.0442, 0.0464, -0...
## $ SMB <dbl> 0.0505, 0.2214, -0.1728, -0.0771, -0.0501, 0.1403, -0...
## $ HML <dbl> -0.0045, -0.1057, 0.0794, 0.0856, 0.0243, -0.1010, 0....
## $ RF <dbl> 0.0041, 0.0043, 0.0047, 0.0046, 0.0050, 0.0040, 0.004...

stocks.final %>% slice(1:2)

## # A tibble: 2 x 10
## symbol date return adjusted volume sp500 Mkt.RF SMB HML
## <chr> <S3:> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL Jan ~ -0.0731 3.28 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
## 2 AAPL Feb ~ 0.105 3.63 9.22e7 -0.0201 0.0245 0.221 -0.106
## # ... with 1 more variable: RF <dbl>

For the portfolioAnalytics-package we need our data in xts-


format (see @ref(#sss_112xts)) and therefore first spread the
dataset returns in columns of stocks and the convert to xts using
tk_xts() from the timetk-package.
lx List of Figures

returns <- stocks.final %>%


select(symbol,date,return) %>%
spread(symbol,return) %>%
tk_xts(silent = TRUE)

Now its time to initialize the portfolio.spec() object passing


along the names of our assets. Afterwards we print the object
(most S3 obejcts come with a printing methods that nicely displays
some nice information).

pspec <- portfolio.spec(assets = stocks.selection$symbol,


category_labels = stocks.selection$sector)
print(pspec)

## **************************************************
## PortfolioAnalytics Portfolio Specification
## **************************************************
##
## Call:
## portfolio.spec(assets = stocks.selection$symbol, category_labels = stocks.
##
## Number of assets: 10
## Asset Names
## [1] "AAPL" "MSFT" "AMZN" "CSCO" "NVDA" "ORCL" "AMGN" "ADBE" "QCOM" "GILD"
##
## Category Labels
## Information Technology : AAPL MSFT CSCO NVDA ORCL ADBE QCOM
## Consumer Discretionary : AMZN
## Health Care : AMGN GILD

str(pspec)
Managing Portfolios lxi

## List of 6
## $ assets : Named num [1:10] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0
## ..- attr(*, "names")= chr [1:10] "AAPL" "MSFT" "AMZN" "CSCO" ...
## $ category_labels:List of 3
## ..$ Information Technology: int [1:7] 1 2 4 5 6 8 9
## ..$ Consumer Discretionary: int 3
## ..$ Health Care : int [1:2] 7 10
## $ weight_seq : NULL
## $ constraints : list()
## $ objectives : list()
## $ call : language portfolio.spec(assets = stocks.selection$symb
## - attr(*, "class")= chr [1:2] "portfolio.spec" "portfolio"

Checking the structure of the object str() we find that it contains


several elements: assets which contains the asset names and initial
weights that are equally distributed unless otherwise specified (e.g.
portfolio.spec(assets=c(0.6,0.4))), category_labels to cate-
gorize assets by sector (or geography etc.), weight_seq (sequence
of weights for later use by random_portfolios), constraints that
we will set soon, objectives and the call that initialised the object.
Before we go and optimize any portfolio we will show how to set
contraints.

0.4.1.2 Constraints

Constraints define restrictions and boundary conditions on the


weights of a portfolio. Constraints are defined by add.constraint
specifying certain types and arguments for each type, as well as
whether the constraint should be enabled or not (enabled=TRUE
is the default).

0.4.1.2.1 Sum of Weights Constraint

Here we define how much of the available budget can/must be


invested by specifying the maximum/minimum sum of portfolio
lxii List of Figures

weights. Usually we want to invest our entire budget and there-


fore set type="full_investment" which sets the sum of weights
to 1. ALternatively we can set the type="weight_sum" to have
mimimum/maximum weight_sum equal to 1.

pspec <- add.constraint(portfolio=pspec,


type="full_investment")
# print(pspec)
# pspec <- add.constraint(portfolio=pspec,type="weight_sum", min_sum=1, max

Another common constraint is to have the portfolio dollar-neutral


type="dollar_neutral" (or equivalent formulations specified be-
low)

# pspec <- add.constraint(portfolio=pspec,


# type="dollar_neutral")
# print(pspec)
# pspec <- add.constraint(portfolio=pspec, type="active")
# pspec <- add.constraint(portfolio=pspec, type="weight_sum", min_sum=0, ma

0.4.1.2.2 Box Constraint

Box constraints specify upper and lower bounds on the asset


weights. If we pass min and max as scalars then the same max
and min weights are set per asset. If we pass vectors (that should
be of the same length as the number of assets) we can specify
position limits on individual stocks

pspec <- add.constraint(portfolio=pspec,


type="box",
min=0,
Managing Portfolios lxiii

max=0.4)
# print(pspec)
# add.constraint(portfolio=pspec,
# type="box",
# min=c(0.05, 0, rep(0.05,8)),
# max=c(0.4, 0.3, rep(0.4,8)))

Another special type of box constraints are long-only constraints,


where we only allow positive weights per asset. These are set
automatically, if no min and max are set or when we use
type="long_only"

# pspec <- add.constraint(portfolio=pspec, type="box")


# pspec <- add.constraint(portfolio=pspec, type="long_only")

0.4.1.2.3 Group Constraints

Group constraints allow the user to specify constraints per groups,


such as industries, sectors or geography.44 These groups can be
randomly defined, below we will set group constraints for the
sectors as specified above. The input arguments are the follow-
ing: groupslist of vectors specifying the groups of the assets,
group_labels character vector to label the groups (e.g. size, asset
class, style, etc.), group_min and group_max specifying minimum
and maximum weight per group, group_pos to specifying the num-
ber of non-zero weights per group (optional).

pspec <- add.constraint(portfolio=pspec,


type="group",

44
Note, that only the ROI, DEoptim and random portfolio solvers support
group constraints. See also @(#sss_4solvers).
lxiv List of Figures

groups=list(pspec$category_labels$`Information Techno
pspec$category_labels$`Consumer Discretionar
pspec$category_labels$`Health Care`),
group_min=c(0.1, 0.15,0.1),
group_max=c(0.85, 0.55,0.4),
group_labels=pspec$category_labels)
# print(pspec)

0.4.1.2.4 Position Limit Constraint

The position limit constraint allows the user to specify limits on


the number of assets with non-zero, long, or short positions. Its
arguments are: max_pos which defines the maximum number of
assets with non-zero weights and max_pos_long/ max_pos_short
that specify the maximum number of assets with long (i.e. buy)
and short (i.e. sell) positions.45

pspec <- add.constraint(portfolio=pspec, type="position_limit", max_pos=3)


# pspec <- add.constraint(portfolio=pspec, type="position_limit", max_pos_l
# print(pspec)

0.4.1.2.5 Diversification Constraint

The diversification constraint enables to set a minimum diversifi-


cation limit by penalizing the optimizer if the deviation is larger
45
Note that not all solvers suüpport the different options. All of them
are supported by the DEoptim and random portfolio solvres, while no ROI
solver supports this type of constraint. The ROI solvers do not support the
long/short position limit constraintsm, and (only) quadprog allows for the
max_pos argument.
Managing Portfolios lxv
∑ 46
than 5%. Diversification is defined as N 2
i=1 wi for N assets. Its
only argument is the diversification taregt div_target.

pspec <- add.constraint(portfolio=pspec, type="diversification", div_target=


# print(pspec)

0.4.1.2.6 Turnover Constraint

The turnover constraint allows to specify a maximum turnover


from a set of initial weights that can either be given or are the
weights initially specified for the portfolio object. It is also imple-
mented as an optimization penalty if the turnover deviates more
than 5% from the turnover_target.47

pspec <- add.constraint(portfolio=pspec, type="turnover", turnover_target=0.


# print(pspec)

0.4.1.2.7 Target Return Constraint

The target return constraint allows the user to target an average


return specified by return_target.

pspec <- add.constraint(portfolio=pspec, type="return", return_target=0.007)


# print(pspec)

46
Note that diversification constraint is only supported for the global nu-
meric solvers (not the ROI solvers).
47
Note, that the turnover constraint is not currently supported using the
ROI solver for quadratic utility and minimum variance problems.
lxvi List of Figures

0.4.1.2.8 Factor Exposure Constraint

The factor exposure constraint allows the user to set upper and
lower bounds on exposures to risk factors. We will use the factor
exposures that we have calculated in @(#sss_3FactorExposure).
The major input is a vector or matrix B and upper/lower bounds
for the portfolio factor exposure. If B is a vector (with length equal
to the number of assets), lower and upper bounds must be scalars.
If B is a matrix, the number of rows must be equal to the number of
assets and the number of columns represent the number of factors.
In this case, the length of lower and upper bounds must be equal
to the number of factors. B should have column names specifying
the factors and row names specifying the assets.

B <- stocks.factor_exposure %>% as.data.frame() %>% column_to_rownames("symb


pspec <- add.constraint(portfolio=pspec, type="factor_exposure",
B=B,
lower=c(0.8,0,-1),
upper=c(1.2,0.8,0))
# print(pspec)

0.4.1.2.9 Transaction Cost Constraint

The transaction cost constraint enables the user to specify (porpor-


tional) transaction costs.48 Here we will assume the proportional
transation cost ptc to be equal to 1%.

pspec <- add.constraint(portfolio=pspec, type="transaction_cost", ptc=0.01)


# print(pspec)

48
For usage of the ROI (quadprog) solvers, transaction costs are currently
only supported for global minimum variance and quadratic utility problems.
Managing Portfolios lxvii

0.4.1.2.10 Leverage Exposure Constraint

The leverage exposure constraint specifies a maximum level of


leverage. Below we set leverage to 1.3 to create a 130/30 portfo-
lio.

pspec <- add.constraint(portfolio=pspec, type="leverage_exposure", leverage=


# print(pspec)

0.4.1.2.11 Checking and en-/disabling constraints

Every constraint that is added to the portfolio object gets a num-


ber according to the order it was set. If one wants to update
(enable/disable) a specific constraints this can be done by the
indexnum argument.

summary(pspec)
# To get an overview on the specs, their indexnum and whether they are enab
consts <- plyr::ldply(pspec$constraints, function(x){c(x$type,x$enabled)})
consts
pspec$constraints[[which(consts$V1=="box")]]
pspec <- add.constraint(pspec, type="box",
min=0, max=0.5,
indexnum=which(consts$V1=="box"))
pspec$constraints[[which(consts$V1=="box")]]
# to disable constraints
pspec$constraints[[which(consts$V1=="position_limit")]]
pspec <- add.constraint(pspec, type="position_limit", enable=FALSE, # only s
indexnum=which(consts$V1=="position_limit"))
pspec$constraints[[which(consts$V1=="position_limit")]]
lxviii List of Figures

0.4.1.3 Objectives

For an optimal portfolio there first has to be specified what


optimal in terms of the relevant (business) objective. Such ob-
jectives (target functions) can be added to the portfolio object
with add.objective. With this function, the user can specify the
type of objective to add to the portfolio object. Currently available
are ‘return’, ‘risk’, ‘risk budget’, ‘quadratic utility’, ‘weight con-
centration’, ‘turnover’ and ‘minmax’. Each type of objective has
additional arguments that need to be specified. Several types of
objectives can be added and enabled or disabled by specifying the
indexnum argument.

0.4.1.3.1 Portfolio Risk Objective

Here, the user can specify a risk function that should be mini-
mized. We start by adding a risk objective to minimize portfolio
variance (minimum variance portfolio). Another example could be
the expected tail loss with a confidence level 0.95. Whatever func-
tion (even user defined ones are possble, the name must correspond
to a function in R), necessary additional arguments to the function
have to be passed as a named list to arguments. Possible functions
are:

pspec <- add.objective(portfolio=pspec,


type='risk',
name='var')
pspec <- add.objective(portfolio=pspec,
type='risk',
name='ETL',
arguments=list(p=0.95),
enabled=FALSE)
# print(pspec)
Managing Portfolios lxix

0.4.1.3.2 Portfolio Return Objective

The return objective allows the user to specify a return function to


maximize. Here we add a return objective to maximize the port-
folio mean return.

pspec <- add.objective(portfolio=pspec,


type='return',
name='mean')
# print(pspec)

0.4.1.3.3 Portfolio Risk Budget Objective

The portfolio risk objective allows the user to specify constraints to


minimize component contribution (i.e. equal risk contribution) or
specify upper and lower bounds on percentage risk contribution.
Here we specify that no asset can contribute more than 30% to
total portfolio risk.
See the risk budget optimization vignette for more detailed exam-
ples of portfolio optimizationswith risk budgets.

pspec <- add.objective(portfolio=pspec,


type="risk_budget",
name="var",
max_prisk=0.3)
pspec <- add.objective(portfolio=pspec,
type="risk_budget",
name="ETL",
arguments=list(p=0.95),
max_prisk=0.3,
enabled=FALSE)
lxx List of Figures

# for an equal risk contribution portfolio, set min_concentration=TRUE


pspec <- add.objective(portfolio=pspec,
type="risk_budget",
name="ETL",
arguments=list(p=0.95),
min_concentration=TRUE,
enabled=FALSE)
print(pspec)

## **************************************************
## PortfolioAnalytics Portfolio Specification
## **************************************************
##
## Call:
## portfolio.spec(assets = stocks.selection$symbol, category_labels = stocks.
##
## Number of assets: 10
## Asset Names
## [1] "AAPL" "MSFT" "AMZN" "CSCO" "NVDA" "ORCL" "AMGN" "ADBE" "QCOM" "GILD"
##
## Category Labels
## Information Technology : AAPL MSFT CSCO NVDA ORCL ADBE QCOM
## Consumer Discretionary : AMZN
## Health Care : AMGN GILD
##
##
## Constraints
## Enabled constraint types
## - full_investment
## - box
## - group
## - position_limit
## - diversification
## - turnover
## - return
## - factor_exposure
Managing Portfolios lxxi

## - transaction_cost
## - leverage_exposure
##
## Objectives:
## Enabled objective names
## - var
## - mean
## - var
## Disabled objective names
## - ETL

0.4.1.4 Solvers

Solvers are the workhorse of our portfolio optimization frame-


work, and there are a variety of them available to us through
the portfolioAnalytics-package. I will briefly introduce the
available solvers. Note that these solvers can be specified
through optimize_method in the optimize.portfolio and
optimize.portfolio.rebalancing method.

0.4.1.4.1 DEOptim

This solver comes from the R package DEoptim and is a dif-


ferential evolution algorithm (a global stochastic optimization
algorithm) developed by Ardia et al. (2016). The help on
?DEoptim gives many more references. There is also a nice
vignette("DEoptimPortfolioOptimization") on large scale
portfolio optimization using the portfolioAnalytics-package.

0.4.1.4.2 Random Portfolios

There are three methods to generate random portfolios contained


in portfolioAnalytics:
lxxii List of Figures

1. The most flexible but also slowest method is ‘sample’.


It can take leverage, box, group, and position limit con-
straints into account.
2. The ‘simplex’ method is useful to generate random port-
folios with the full investment and min box constraints
(values for min_sum/ max_sum are ignored). Other con-
straints (box max, group and position limit constraints
will be handled by elimination) which might leave only
very few feasible portfolios. Sometimes it will lalso lead
to suboptimal solutions.
3. Using grid search, the ‘grid’ method only satisfies the min
and max box constraints.

0.4.1.4.3 pso

The psoptim function from the R package pso (Bendtsen., 2012)


and uses particle swarm optimization.

0.4.1.4.4 GenSA

The GenSA function from the R package GenSA (Gubian et al.,


2018) and is based on generalized simmulated annealing (a generic
probabilistic heuristic optimization algorithm)

0.4.1.4.5 ROI

The ROI (R Optimization Infrastructure) is a framework to han-


dle optimization problems in R. It serves as an interface to the
Rglpk package and the quadprog package which solve linear and
quadratic programming problems. Available methods in the con-
text of the portfolioAnalytics-package are given below (see sec-
tion @(#sss_4Objectives) for available objectives.

1. Maxmimize portfolio return subject leverage, box, group,


Managing Portfolios lxxiii

position limit, target mean return, and/or factor exposure


constraints on weights.
2. Globally minimize portfolio variance subject to leverage,
box, group, turnover, and/or factor exposure constraints.
3. Minimize portfolio variance subject to leverage, box,
group, and/or factor exposure constraints given a desired
portfolio return.
4. Maximize quadratic utility subject to leverage, box,
group, target mean return, turnover, and/or factor ex-
posure constraints and risk aversion parameter. (The risk
aversion parameter is passed into optimize.portfolio
as an added argument to the portfolio object).
5. Minimize ETL subject to leverage, box, group, position
limit, target mean return, and/or factor exposure con-
straints and target portfolio return.

0.4.2 Mean-variance Portfolios

0.4.2.1 Introduction and Theoretics

0.4.2.1.1 The minimum risk mean-variance portfolio

0.4.2.1.2 Feasible Set and Efficient Frontier

0.4.2.1.3 Minimum variance portfolio

0.4.2.1.4 Capital market line and tangency portfolio

0.4.2.1.5 Box and Group Constrained mean-variance portfolios


lxxiv List of Figures

0.4.2.1.6 Maximum return mean-variance portfolio

0.4.2.1.7 Covariance risk budget constraints

0.4.3 Mean-CVaR Portfolios

0.5 Managing Portfolios in the Real World

0.5.1 Rolling Portfolios

0.5.2 Backtesting

0.6 Further applications in Finance

0.6.1 Portfolio Sorts

0.6.2 Fama-MacBeth-Regressions

0.6.3 Risk Indices

0.7 References

# Appendix{#s_99Appendix}

.0.1 Introduction to R

For everyone that is more interested in all the topics I strongly


recommend this eBook: R for Data Science49
49
http://r4ds.had.co.nz/
References lxxv

.0.1.1 Getting started

Once you have started R, there are several ways to find help. First
of all, (almost) every command is equipped with a help page that
can be accessed via ?... (if the package is loaded). If the command
is part of a package that is not loaded or you have no clue about the
command itself, you can search the entire help (full-text) by using
??.... Be aware, that certain very-high level commands need to
be put in quotation marks ?'function'. Many of the packages
you find are either equipped with a demo() (get a list of all
available demos using demo(package=.packages(all.available
= TRUE))) and/or a vignette(), a document explaining
the purpose of the package and demonstrating its work
using suitable examples (find all available vignettes with
vignette(package=.packages(all.available = TRUE))). If
you want to learn how to do a certain task (e.g. conducting an
event study vignette("eventstudies")50 ).
Executing code in Rstudio is simple. Either you highlight the exact
portion of the code that you want to execute and hit ctrl+enter,
or you place the cursor just somewhere to execute this particular
line of code with the same command.51

.0.1.2 Working directory

Before we start to learn how to program, we have to set a work-


ing directory. First, create a folder “researchmethods” (prefer-
ably never use directory names containing special characters or
empty spaces) somewhere on citrix/your laptop, this will be
your working directory where R looks for code, files to load
and saves everything that is not designated by a full path (e.g.
“D:/R/LAB/SS2018/…”). Note: In contrast to windows paths you
50
If this command shows an error message you need to install the package
first, see further down for how to do that.
51
Under certain circumstances - either using pipes or within loops - RStudio
will execute the en tire loop/pipe structure. In this case you have to highlight
the particular line that you want to execute.
lxxvi List of Figures

have to use either “/” instead of “” or use two”\“. Now set the
working directory using setwd() and check with getwd()

setwd("D:/R/researchmethods")
getwd()

.0.1.3 Basic calculations

3+5; 3-5; 3*5; 3/5


# More complex including brackets
(5+3-1)/(5*10)
# is different to
5+3-1/5*10
# power of a variable
4*4*4
4^300
# root of a variable
sqrt(16)
16^(1/2)
16^0.5
# exponential and logarithms
exp(3)
log(exp(3))
exp(1)
# Log to the basis of 2
log2(8)
2^log2(8)
# raise the number of digits shown
options(digits=6)
exp(1)
# Rounding
20/3
round(20/3,2)
References lxxvii

floor(20/3)
ceiling(20/3)

.0.1.4 Mapping variables

Defining variables (objects) in R is done via the arrow operator <-


that works in both directions ->. Sometimes you will see someone
use the equal sign = but for several (more complex) reasons, this
is not advisable.

n <- 10
n
n <- 11
n
12 -> n
n
n <- n^2
n

In the last case, we overwrite a variable recursively. You might


want to do that for several reasons, but I advise you to rarely
do that. The reason is that - depending on how often you have
executed this part of the code already - n will have a different value.
In addition, if you are checking the output of some calculation, it
is not nice if one of the input variables always has a different value.
In a next step, we will check variables. This is a very important
part of programming.

# check if m==10
m <- 11
m==10 # is equal to
m==11
lxxviii List of Figures

m!=11 # is not equal to


m>10 # is larger than
m<10 # is smaller than
m<=11 # is smaller or equal than
m>=12 # is larger or equal than

If one wants to find out which variables are already set use ls().
Delete (Remove) variables using rm() (you sometimes might want
to do that to save memory - in this case always follow the rm()
command with gc()).

ls() # list variables


rm(m) # remove m
ls() # list variables again (m is missing)

Of course, often we do not only want to store numbers but also


characters. In this case enclose the value by quotation marks: name
<- "test". If you want to check whether a variable has a certain
format use available commands starting with is.. If you want to
change the format of a variable use as.

name <- "test"


is.numeric(n)
is.numeric(name)
is.character(n)
is.character(name)

If you do want to find out the format of a variable you can use
class(). Slightly different information will be given by mode()
and typeof()
References lxxix

class(n)
class(name)
mode(n)
mode(name)
typeof(n)
typeof(name)
# Lets change formats:
n1 <- n
is.character(n1)
n1 <- as.character(n)
is.character(n1)
as.numeric(name) # New thing: NA

Before we learn about NA, we have to define logical variables that


are very important when programming (e.g., as options in a func-
tion). Logical (boolean) variables will either assume TRUE or FALSE.

# last but not least we need boolean (logical) variables


n2 <- TRUE
is.numeric(n2)
class(n2)
is.logical(n2)
as.logical(2) # all values except 0 will be converted to TRUE
as.logical(0)

Now we can check whether a condition holds true. In this case, we


check if m is equal to 10. The output (as you have seen before) is
of type logical.

is.logical(n==10)
n3 <- n==10 # we can assign the logical output to a new variable
is.logical(n3)
lxxx List of Figures

Assignment: Create numeric variable x, set x equal to 5/3. What


happens if you divide by 0? By Inf? Set y<-NA. What could this
mean? Check if the variable is “na”. Is Inf numeric? Is NA numeric?

.0.1.5 Sequences, vectors and matrices

In this chapter, we are going to learn about higher-dimensional


objects (storing more information than just one number).

.0.1.5.1 Sequences

We define sequences of elements (numbers/characters/logicals) via


the concatenation operator c() and assign them to a variable. If
one of the elements of a sequence is of type character, the whole
sequence will be converted to character, else it will be of type
numeric (for other possibilities check the help ?vector). At the
same type it will be of the type vector.

x <- c(1,3,5,6,7)
class(x)
is.vector(x)
is.numeric(x)

To create ordered sequences make use of the command


seq(from,to,by). Please note that often programmers are lazy
and just write seq(1,10,2) instead of seq(from=1,to=10,by=2).
However it makes code much harder to understand, can produce
unintended results, and if a function is changed (which happens as
R is always under construction) yield something very different to
what was intended. Therefore I strongly encourage you to always
specify the arguments of a function by name. To do this I advise
you to make use of the tab a lot. Tab helps you to complete com-
mands, produces a list of different commands starting with the
same letters (if you do not completely remember the spelling for
example), helps you to find out about the arguments and even gives
References lxxxi

information about the intended/possible values of the arguments.


A nice way and shortcut for creating ordered/regular sequences
with distance (by=) one is given by the : operator: 1:10 is equal
to seq(from=1,to=10,by=1).

x1 <- seq(from=1,to=5,by=1)
x2 <- 1:5

One can operate with sequences in the same way as with numbers.
Be aware of the order of the commands and use brackets where
necessary!

1:10-1
1:(10-1)
1:10^2-2 *3

Assignment: 1. Create a series from -1 to 5 with distances 0.5?


Can you find another way to do it using the : operator and stan-
dard mathematical operations? 2. Create the same series, but this
time using the “length”-option 3. Create 20 ones in a row (hint:
find a function to do just that)
Of course, all logical operations are possible for vectors, too. In
this case, the output is a vector of logicals having the same size as
the input vector. You can check if a condition is true for any() or
all() parts of the vector.

.0.1.5.2 Random Sequences

One of the most important tasks of any programming language


that is used for data analysis and research is the ability to gener-
ate random numbers. In R all the random number commands
start with an r..., e.g. random normal numbers rnorm(). To find
out more about the command use the help ?rnorm. All of these
lxxxii List of Figures

commands are a part of the stats package, where you find avail-
able commands using the package help: library(help=stats).
Notice that whenever you generate random numbers, they are dif-
ferent. If you prefer to work with the same set of random numbers
(e.g. for testing purposes) you can fix the starting value of the
random number generator by setting the seed to a chosen num-
ber set.seed(123). Notice that you have to execute set.seed()
every time before (re)using the random number generator.

rand1 <- rnorm(n = 100) # 100 standard normally distributed random numbers
set.seed(134) # fix the starting value of the random number generator (then
rand1a <- rnorm(n = 100)

Assignment: 1. Create a random sequence of 20 N(0,2)-


distributed variables and assign it to the variable rand2. 2. Create
a random sequence of 200 Uniform(-1,1) distributed variables and
save to rand3. 3. What other distributions can you find in the stats
package? 4. Use the functions mean and sd. Manipulate the ran-
dom variables to have a different mean and standard deviation.
Do you remember the normalization process (z-score)?
As in the last assignment you can use all the functions you learned
about in statistics to calculate the mean(), the standard deviation
sd(), skewness() and kurtosis() (the latter two after loading
and installing the moments package). To install/load a package we
use install.packages() (only once) and then load the package
with require().

#install.packages("moments") # only once, no need to reinstall every time


require(moments)
mean(rand1a)
sd(rand1a)
skewness(rand1a)
kurtosis(rand1a)
summary(rand1a)
References lxxxiii

.0.1.6 Vectors and matrices

We have created (random) sequences above and can determine


their properties, such as their length(). We also know how to
manipulate sequences through mathematical operations, such as
+-*/^. If you want to calculate a vector product, R provides the
%*% operator. In many cases (such as %*%) vectors behave like ma-
trices, automating whether they should be row or column-vectors.
However, to make this more explicit transform your vector into a
matrix using as.matrix. Now, it has a dimension and the prop-
erty matrix. You can transpose the matrix using t(), calculate its
inverse using solve() and manipulate in any other way imagin-
able. To create matrices use matrix() and be careful about the
available options!

x <- c(2,4,5,8,10,12)
length(x)
dim(x)
x^2/2-1
x %*% x # R automatically multiplies row and column vector
is.vector(x)
y <- as.matrix(x)
is.matrix(y); is.matrix(x)
dim(y); dim(x)
t(y) %*% y
y %*% t(y)
mat <- matrix(data = x,nrow = 2,ncol = 3, byrow = TRUE)
dim(mat); ncol(mat); nrow(mat)
mat2 <- matrix(c(1,2,3,4),2,2) # set a new (quadratic) matrix
mat2i <- solve(mat2)
mat2 %*% mat2i
mat2i %*% mat2

Assignment: 1. Create this matrix matrix(c(1,2,2,4),2,2)


and try to calculate its inverse. What is the problem? Remem-
ber the determinant? Calculate using det(). What do you learn?
lxxxiv List of Figures

2. Create a 4x3 matrix of ones and/or zeros. Try to matrix-


multiply with any of the vectors/matrices used before. 3. Try to
add/subtract/multiply matrices, vectors and scalars.
A variety of special matrices is available, such as diagonal ma-
trices using diag(). You can glue matrices together columnwise
(cbind()) or rowwise (rbind()).

diag(3)
diag(c(1,2,3,4))
mat4 <- matrix(0,3,3)
mat5 <- matrix(1,3,3)
cbind(mat4,mat5)
rbind(mat4,mat5)

.0.1.6.1 The indexing system

We can access the row/column elements of any object with at least


one dimension using [].

########################################################
### 8) The INDEXING System
# We can access the single values of a vector/matrix
x[2] # one-dim
mat[,2] # two-dim column
mat[2,] # two-dim row
i <- c(1,3)
mat[i]
mat[1,2:3] # two-dim select second and third column, first row
mat[-1,] # two-dim suppress first row
mat[,-2] # two-dim suppress second column

Now we can use logical vectors/matrices to subset vec-


tors/matrices. This is very useful for data mining.
References lxxxv

mat>=5 # which elements are large or equal to 5?


mat[mat>=5] # What are these elements?
which(mat>=5, arr.ind = TRUE) # another way with more explicit information

We can do something even more useful and name the rows and
columns of a matrix usingcolnames() and rownames().

colnames(mat) <- c("a","b","c")


rownames(mat) <- c("A","B")
mat["A",c("b","c")]

.0.1.7 Functions in R

.0.1.7.1 Useful Functions

Of course, there are thousands of functions available in R, espe-


cially through the use of packages. In the following you find a demo
of the most useful ones.

x <- c(1,2,4,-1,2,8) # example vector 1


x1 <- c(1,2,4,-1,2,8,NA,Inf) # example vector 2 (more complex)
sqrt(x) # square root of x
x^3 # x to the power of ...
sum(x) # sum of the elements of x
prod(x) # product of the elements of x
max(x) # maximum of the elements of x
min(x) # minimum of the elements of x
which.max(x) # returns the index of the greatest element of x
which.min(x) # returns the index of the smallest element of x
# statistical function - use rand1 and rand2 created before
range # returns the minimum and maximum of the elements of x
mean # mean of the elements of x
lxxxvi List of Figures

median # median of the elements of x


var # variance of the elements of x
sd # standard deviation of the elements of x
cor # correlation matrix of x
cov # covariance between x and y
cor # linear correlation between x and y
# more complex functions
round(x, n) # rounds the elements of x to n decimals
rev(x) # reverses the elements of x
sort(x) # sorts the elements of x in increasing order
rank(x) # ranks of the elements of x
log(x) # computes natural logarithms of x
cumsum(x) # a vector which ith element is the sum from x[1] to x[i]
cumprod(x) # id. for the product
cummin(x) # id. for the minimum
cummax(x) # id. for the maximum
unique(x) # duplicate elements are suppressed

.0.1.7.2 More complex objects in R

Next to numbers, sequences/vectors and matrices R offers a va-


riety of different and more complex objects that can stow more
complex information than just numbers and characters (e.g. func-
tions, output text. etc). The most important ones are data.frames
(extended matrices) and lists. Check the examples below to see
how to create these objects and how to access specific elements.

df <- data.frame(col1=c(2,3,4), col2=sin(c(2,3,4)), col3=c("a","b", "c"))


li <- list(x=c(2,3,4), y=sin(c(2,3,4)), z=c("a","b", "c","d","e"), fun=mean)
# to grab elements from a list or dataframe use $ or [[]]
df$col3; li$x # get variables
df[,"col3"]; li[["x"]] # get specific elements that can also be numbered
df[,3]; li[[1]]
References lxxxvii

Assignment: 1. Get the second entry of element y of list x

.0.1.7.3 Create simple functions in R

To create our own functions in R we need to give them a name,


determine necessary input variables and whether these variables
should be pre-specified or not. I use a couple of examples to show
how to do this below.

?"function" # "function" is such a high-level object that it is interpreted

# 1. Let's create a function that squares an entry x and name it square


square <- function(x){x^2}
square(5)
square(c(1,2,3))

# 2. Let us define a function that returns a list of several different resu


stats <- function(v){
v.m <- mean(v) # create a variable that is only valid in the function
v.sd <- sd(v)
v.var <- var(v)
v.output <- list(Mean=v.m, StandardDeviation=v.sd, Variance=v.var)
return(v.output)
}
v <- rnorm(1000,mean=1,sd=5)
stats(v)
stats(v)$Mean

# 3. A function can have standard arguments.


### This time we also create a random vector within the function and use its
stats2 <- function(n,m=0,s=1){
v <- rnorm(n,mean=m,sd=s)
v.m <- mean(v) # create a variable that is only valid in the function
v.sd <- sd(v)
v.var <- var(v)
lxxxviii List of Figures

v.output <- list(Mean=v.m, StandardDeviation=v.sd, Variance=v.var)


return(v.output)
}
stats2(1000000)
stats2(1000,m=1)
stats2(1000,m=1,s=10)
stats2(m=1) # what happens if an obligatory argument is left out?

Assignment: 1. Create a function that creates two random sam-


ples with length n and m from the normal and the uniform distri-
bution resp., given the mean and sd for the first and min and max
for the second distribution. The function shall then calculate the
covariance-matrix and the correlation-matrix which it gives back
in a named list.

.0.1.8 Plotting

Plotting in R can be done very easily. Check the examples below


to get a reference and idea about the plotting capabilities in R.
A very good source for color names (that work in R) is (http:
//en.wikipedia.org/wiki/Web_colors).

?plot
?colors # very good source for colors:
y1 <- rnorm(50,0,1)
plot(y1)
# set title, and x- and y-labels
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample")
# now make a line between elements, and color everything blue
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l')
# if you want to save plots or open them in separate windows you can use x1
?Devices
# x11 (opens seperate window)
x11(8,6)
References lxxxix

plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l')


# pdf
pdf("plot1.pdf",6,6)
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l',l
legend("topleft",col=c("blue"), lty=2, legend=c("normal Sample"))
dev.off()
# more extensive example
X11(6,6)
par(mfrow=c(2,1),cex=0.9,mar=c(3,3,1,3)+0.1)
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l',l
legend("topleft",col=c("blue"), lty=2, legend=c("normal Sample"))
barplot(y1,col="blue") # making a barplot
# plotting a histogram
hist(y1) # there is a nicer version available once we get to time series an
# create a second sample
y2 <- rnorm(50)
# scatterplot
plot(y1,y2)
# boxplot
boxplot(y1,y2)

.0.1.9 Control Structures

Last and least for this lecture we learn about control structure.
These structures (for-loops, if/else checks etc) are very useful, if
you want to translate a tedious manual task (e.g. in Excel) into
something R should do for you and go step by step (e.g. column by
column). Again, see below for a variety of examples and commands
used in easy examples.

x <- sample(-15:15,10) # sample does draw randomly draw 10 numbers from the
# 1. We square every element of vector x in a loop
y <- NULL # 1.a) set an empty variable (NULL means it is truly nothing and
is.null(y)
xc List of Figures

# 1.b) Use an easy for-loop:


for (i in 1:length(x)){
y[i] <- x[i]^2
}
# 2. Now we use an if-condition to only replace negative values
y <- NULL
for (i in 1:length(x)){
y[i] <- x[i]
if(x[i]<0) {y[i] <- x[i]^2}
}
# ASSIGNMENT: lets calculate the 100th square root of the square root of th
y <- rep(NA,101)
y[1] <- 500
for (i in 1:100){
print(i)
y[i+1] <- sqrt(y[i])
}
plot(y,type="l")
Bibliography

Ardia, D., Mullen, K., Peterson, B., and Ulrich, J. (2016). DEop-
tim: Global Optimization by Differential Evolution. R package
version 2.2-4.

Bacon, C. R. (2008). Practical Portfolio Performance Measure-


ment and Attribution: plus CD-ROM. Wiley, Chichester, Eng-
land ; Hoboken, NJ, 2. edition.

Bendtsen., C. (2012). pso: Particle Swarm Optimization. R pack-


age version 1.0.3.

Gubian, S., Xiang, Y., Suomela, B., Hoeng, J., and SA., P. (2018).
GenSA: Generalized Simulated Annealing. R package version
1.1.7.

Peterson, B. G. and Carl, P. (2018). PortfolioAnalytics: Portfo-


lio Analysis, Including Numerical Methods for Optimization of
Portfolios. R package version 1.1.0.

Würtz, D., Chalabi, Y., Chen, W., and Ellis, A. (2015). Portfolio
Optimization with R/Rmetrics. Rmetrics.

xci
Index

constraint objective, lxvi–lxix


diversification, lxii return, lxvii
factor exposure, lxiv risk, lxvi
leverage exposure, lxv risk budget, lxvii
position limit, lxii
target return, lxiii PerformanceAnalytics, xxvii
transaction cost, lxiv quantmod, xxv
turnover, lxiii TTR, xxvi
constraints, lix–lxv
active, lx risk factors, lv
box, lx
dollar-neutral, lx solver, lxix–lxxi
full investment, lx GenSA, lxx
group, lix, lxi pso, lxx
long-only, lxi random portfolios, lxix
sum of weights, lix grid, lxx
sample, lxx
date and time, x simplex, lxx
as.Date(), x Rglpk, lxix
business days, xvii ROI, lxx
holidays, xvii quadprog, lxx
POSIXct, xii Rglpk, lxx
Sys.setlocale(), xi
timeDate, xiii tidyverse, xxviii–xxxiii
yearmon, xi, xxii ggplot2, xxxi
yearqtr, xi, xxii timeDate, xiii
business days, xvii
factor exposure, lv, lxiv FinCenter, xiv
holidays, xvii
ggplot2, xxxi origin, xiv
xciii
xciv Bibliography

timetk, lvii
TTR, xxvi

xts, xxi, lvii


import/export, xxv
join (in-
ner/outer/left/right/full),
xxiii
merge, xxiii
missing values, xxiv
replace, xxiii
subset, xxiii
vignettes, xxi

zoo
vignettes, xxi

Você também pode gostar