Escolar Documentos
Profissional Documentos
Cultura Documentos
Tidy Portfoliomanagement
in R
DEDICATION
Contents
List of Figures ix
List of Figures ix
0.1 Introduction . . . . . . . . . . . . . . . . . . . . xi
iii
iv Contents
Bibliography xci
Bibliography xci
List of Tables
vii
List of Figures
Preface
ix
x List of Figures
Not yet fixed. But the book will start with an introduction to the
most important tools for the portfolio analysis: timeseries and the
tidyverse. Afterwards, the possibilities of managing and explor-
ing financial data will be developed. Then we do portfolio opti-
mization for mean-Variance and Mean-CVaR portfolios. This will
be followed by a chapter on backtesting, before I show further ap-
plications in finance, such as predictions, portfolio sorting, Fama-
MacBeth-regressions etc.
Prerequisites
pacman::p_load(tidyverse,tidyquant,PortfolioAnalytics,quantmod,PerformanceAna
tibbletime,timetk,ggthemes,timeDate,Quandl,alphavantager,readx
DEoptim,pso,GenSA,Rglpk,ROI,ROI.plugin.glpk,ROI.plugin.quadpro
Acknowledgments
I thank my family…
I especially thank the developers of:
0.1 Introduction
5
https://cran.r-project.org/web/packages/xts/vignettes/xts.pdf
xii List of Figures
There are several Date and Time Classes in R that can all be used
as time-index for xts. We start with the most basic as.Date()
d1 <- "2018-01-18"
str(d1) # str() checks the structure of the R-object
## chr "2018-01-18"
d2 <- as.Date(d1)
str(d2)
d3 <- "4/30-2018"
as.Date(d3, "%m/%d-%Y") # as.Date(d3) will not work
## [1] "2018-04-30"
Sys.setlocale("LC_TIME","German_Austria")
## [1] "German_Austria.1252"
Sys.setlocale("LC_TIME","English")
as.timeDate(DatesTimes)
## GMT
## [1] [1989-09-28 23:12:55] [2001-01-15 10:34:02] [2004-08-30 08:30:00]
## [4] [1990-02-09 11:18:23]
xvi List of Figures
You see, that the timeDate comes along with timezone information
(GMT) that is set to your computers locale. timeDate allows you
to specify the timezone of origin zone as well as the timezone to
transfer data to FinCenter:
## Zurich
## [1] [1989-09-28 15:12:55] [2001-01-15 02:34:02] [2004-08-30 01:30:00]
## [4] [1990-02-09 03:18:23]
## NewYork
## [1] [1989-09-28 10:12:55] [2001-01-14 20:34:02] [2004-08-29 19:30:00]
## [4] [1990-02-08 21:18:23]
## Tokyo
## [1] [1989-09-29 12:12:55] [2001-01-16 00:34:02] [2004-08-30 21:30:00]
## [4] [1990-02-10 01:18:23]
## GMT
## [1] [2017-01-01] [2017-02-01] [2017-03-01] [2017-04-01] [2017-05-01]
## [6] [2017-06-01] [2017-07-01] [2017-08-01] [2017-09-01] [2017-10-01]
## [11] [2017-11-01] [2017-12-01]
Now there are several very useful functions in the timeDate pack-
age to determine first/last days of months/quarters/… (I let them
speak for themselves)
## GMT
## [1] [2016-12-01] [2017-01-01] [2017-02-01] [2017-03-01] [2017-04-01]
## [6] [2017-05-01] [2017-06-01] [2017-07-01] [2017-08-01] [2017-09-01]
## [11] [2017-10-01] [2017-11-01]
xviii List of Figures
timeFirstDayInQuarter(dates1)
## GMT
## [1] [2017-01-01] [2017-01-01] [2017-01-01] [2017-04-01] [2017-04-01]
## [6] [2017-04-01] [2017-07-01] [2017-07-01] [2017-07-01] [2017-10-01]
## [11] [2017-10-01] [2017-10-01]
timeLastDayInMonth(dates1)
## GMT
## [1] [2017-01-31] [2017-02-28] [2017-03-31] [2017-04-30] [2017-05-31]
## [6] [2017-06-30] [2017-07-31] [2017-08-31] [2017-09-30] [2017-10-31]
## [11] [2017-11-30] [2017-12-31]
timeLastDayInQuarter(dates1)
## GMT
## [1] [2017-03-31] [2017-03-31] [2017-03-31] [2017-06-30] [2017-06-30]
## [6] [2017-06-30] [2017-09-30] [2017-09-30] [2017-09-30] [2017-12-31]
## [11] [2017-12-31] [2017-12-31]
## GMT
## [1] [2018-01-19]
Introduction xix
timeNthNdayInMonth(dates1,nday = 5, nth = 3)
## GMT
## [1] [2017-01-20] [2017-02-17] [2017-03-17] [2017-04-21] [2017-05-19]
## [6] [2017-06-16] [2017-07-21] [2017-08-18] [2017-09-15] [2017-10-20]
## [11] [2017-11-17] [2017-12-15]
timeCalendar(m = 1:4, d = c(28, 15, 30, 9), y = c(1989, 2001, 2004, 1990), F
## Europe/Zurich
## [1] [1989-01-28 01:00:00] [2001-02-15 01:00:00] [2004-03-30 02:00:00]
## [4] [1990-04-09 02:00:00]
## Europe/Zurich
## [1] [2018-03-01 10:15:39] [2018-04-01 16:23:41]
holidayNYSE()
## NewYork
## [1] [2018-01-01] [2018-01-15] [2018-02-19] [2018-03-30] [2018-05-28]
## [6] [2018-07-04] [2018-09-03] [2018-11-22] [2018-12-25]
## GMT
## [1] [2018-03-30] [2018-04-01] [2018-11-01]
## GMT
## [1] [2018-03-18] [2018-03-19] [2018-03-20] [2018-03-21] [2018-03-22]
## [6] [2018-03-23] [2018-03-24] [2018-03-25] [2018-03-26] [2018-03-27]
## [11] [2018-03-28] [2018-03-29] [2018-03-30] [2018-03-31] [2018-04-01]
## [16] [2018-04-02] [2018-04-03] [2018-04-04] [2018-04-05] [2018-04-06]
## [21] [2018-04-07] [2018-04-08] [2018-04-09] [2018-04-10] [2018-04-11]
## [26] [2018-04-12] [2018-04-13] [2018-04-14] [2018-04-15]
## GMT
## [1] [2018-03-19] [2018-03-20] [2018-03-21] [2018-03-22] [2018-03-23]
## [6] [2018-03-26] [2018-03-27] [2018-03-28] [2018-03-29] [2018-03-30]
## [11] [2018-04-02] [2018-04-03] [2018-04-04] [2018-04-05] [2018-04-06]
## [16] [2018-04-09] [2018-04-10] [2018-04-11] [2018-04-12] [2018-04-13]
Introduction xxi
dayOfWeek(dateSeq2)
## GMT
## [1] [2018-03-19] [2018-03-20] [2018-03-21] [2018-03-22] [2018-03-23]
## [6] [2018-03-26] [2018-03-27] [2018-03-28] [2018-03-29] [2018-04-03]
## [11] [2018-04-04] [2018-04-05] [2018-04-06] [2018-04-09] [2018-04-10]
## [16] [2018-04-11] [2018-04-12] [2018-04-13]
dayOfWeek(dateSeq3)
Now, one of the strongest points for the timeDate package is made,
when one puts times and dates from different timezones together.
xxii List of Figures
## Zurich
## [1] [2015-01-01 17:00:00] [2015-01-01 19:00:00]
c(NY, ZH) # it always takes the Financial Center of the first entry
## NewYork
## [1] [2015-01-01 13:00:00] [2015-01-01 11:00:00]
0.1.1.1.3 Assignments
The xts format is based on the timeseries format zoo, but extends
its power to be more compatible with other data classes. For ex-
ample, if one converts dates from the timeDate, xts will be so
Introduction xxiii
## [,1]
## 2017-05-01 0.72838032
## 2017-05-02 0.47100977
## 2017-05-03 -0.04537768
## 2017-05-04 1.61845234
## 2017-05-05 0.07191067
## [,1]
## [1,] 0.72838032
## [2,] 0.47100977
## [3,] -0.04537768
## [4,] 1.61845234
## [5,] 0.07191067
Here, the xts object was built from a vector and a series of Dates.
We could also have used timeDate, yearmon or yearqtr and a
data.frame:
## s1 s2
## 2017-01-01 0.7462329 1
## 2017-02-01 -0.1551448 2
## 2017-03-01 -0.9693310 3
## 2017-04-01 0.3428151 4
## 2017-05-01 0.4692079 5
Introduction xxv
set.seed(1)
xts3 <- xts(rnorm(6), timeSequence(from = "2017-01-01", to = "2017-06-01", b
xts4 <- xts(rnorm(5), timeSequence(from = "2017-04-01", to = "2017-08-01", b
colnames(xts3) <- "tsA"; colnames(xts4) <- "tsB"
merge(xts3,xts4)
merge(xts3,xts4,join = "left")
merge(xts3,xts4,join = "right")
merge(xts3,xts4,join = "inner")
merge(xts3,xts4,join="outer",fill=0)
na.omit(xts6)
na.locf(xts6)
na.locf(xts6,fromLast = TRUE,na.rm = TRUE)
na.approx(xts6,na.rm = FALSE)
periodicity(xts5)
nmonths(xts5); nquarters(xts5); nyears(xts5)
to.yearly(xts5)
to.quarterly(xts6)
round(xts6^2,2)
xts6[which(is.na(xts6))] <- rnorm(2)
# For aggregation of timeseries
ep1 <- endpoints(xts6,on="months",k = 2) # for aggregating timesries
ep2 <- endpoints(xts6,on="months",k = 3) # for aggregating timesries
period.sum(xts6, INDEX = ep2)
period.apply(xts6, INDEX = ep1, FUN=mean) # 2month means
period.apply(xts6, INDEX = ep2, FUN=mean) # 3month means
# Lead, lag and diff operations
cbind(xts6,lag(xts6,k=-1),lag(xts6,k=1),diff(xts6))
Finally, I will show some applications that go beyond xts, for ex-
ample the use of lapply to operate on a list
Introduction xxvii
Last and least, we plot xts data and save it to a (csv) file, then
open it again:
8
https://www.r-bloggers.com/a-guide-on-r-quantmod-package-how-to-
get-started/
xxviii List of Figures
require(quantmod)
# the easiest form of getting data is for yahoo finance where you know the
getSymbols(Symbols = "AAPL", from="2010-01-01", to="2018-03-01", periodicity=
head(AAPL)
is.xts(AAPL)
plot(AAPL[, "AAPL.Adjusted"], main = "AAPL")
chartSeries(AAPL, TA=c(addVo(),addBBands(), addADX())) # Plot and add techni
getSymbols(Symbols = c("GOOG","^GSPC"), from="2000-01-01", to="2018-03-01", p
getSymbols('DTB3', src='FRED') # fred does not recognize from and to
require(PerformanceAnalytics)
rets <- cbind(buy_sell_returns,ROC(GSPC[,"GSPC.Adjusted"]))
colnames(rets) <- c("investment","benchmark")
charts.PerformanceSummary(rets,colorset=rich6equal)
chart.Histogram(rets, main = "Risk Measures", methods = c("add.density", "ad
14
https://de.wikipedia.org/wiki/Maximum_Drawdown
xxx List of Figures
0.1.2.1 Tibbles
require(ISLR)
data(Auto)
Nothing happens when you run this, but now the data is avail-
able in your environment. (In RStudio, you would see the name
of the data in your Environment tab). To view the data, we can
either print the entire dataset by typing its name, or we can “slice”
some of the data off to look at just a subset by piping data us-
ing the %>% operator into the slice function. The piping operator
is one of the most useful tools of the tidyverse. Thereby you can
pipe command into command into command without saving and
naming each Intermittent step. The first step is to transform this
data.frame into a tibble (similar concept but better26 ). A tibble
has observations in rows and variables in columns. Those variables
can have many different formats:
As you can see all three columns of tbs1 have different formats.
One can get the different variables by name and position. If you
want to use the pipe operator you need to use the special place-
holder ..
26
http://r4ds.had.co.nz/tibbles.html
xxxii List of Figures
tbs1$returns
tbs1[[2]]
tbs1 %>% .[[2]]
Notice that the data looks just the same as when we loaded it from
the package. Now that we have the data, we can begin to learn
things about it.
dim(Auto)
str(Auto)
names(Auto)
The dim() function tells us that the data has 392 observations and
nine variables. The original data had some empty rows, but when
we read the data in R knew to ignore them. The str() function
tells us that most of the variables are numeric or integer, although
the name variable is a character vector. names() lets us check the
variable names.
Introduction xxxiii
summary(Auto)
0.1.2.3 Plotting
The basic idea is that you need to initialize a plot with ggplot()
and then add “geoms” (short for geometric objects) to the plot.
xxxiv List of Figures
27
https://www.google.com/url?sa=t&rct=j&q=
&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=
0ahUKEwjV6I6F4ILPAhUFPT4KHTFiBwgQFggcMAA&
url=https%3A%2F%2Fwww.amazon.com%2FGrammar-
Graphics-Statistics-Computing%2Fdp%2F0387245448&
usg=AFQjCNF5D6H3ySCsgqBTdp96KNF3bGyU2Q&sig2=
GnNgoN6Ztn3AJSTJYaMPwA
28
http://www.cookbook-r.com/Graphs/
Introduction xxxv
For small datasets, we might want to see all the bivariate relation-
ships between the variables. The GGally package has an exten-
sion of the scatterplot matrix that can do just that. We make use
of the select operator to only select the two variables mpg and
cylinders and pipe it into the ggpairs() function
Because there are not many cars with 3 and 5 cylinders we use
filter to only select those cars with 4, 6 and 8 cylinders.
Auto %>% select(mpg, cylinders) %>% filter(cylinders %in% c(4,6,8)) %>% GGal
glimpse(sp500)
## Observations: 504
## Variables: 5
## $ symbol <chr> "AAPL", "MSFT", "AMZN", "BRK.B", "FB", "JPM", "JNJ...
## $ company <chr> "Apple Inc.", "Microsoft Corporation", "Amazon.com...
## $ weight <dbl> 0.044387857, 0.035053855, 0.032730459, 0.016868330...
## $ sector <chr> "Information Technology", "Information Technology"...
## $ shares_held <dbl> 53939268, 84297440, 4418447, 21117048, 26316160, 3...
glimpse(nyse)
## Observations: 3,139
## Variables: 7
## $ symbol <chr> "DDD", "MMM", "WBAI", "WUBA", "EGHT", "AHC", "...
## $ company <chr> "3D Systems Corporation", "3M Company", "500.c...
## $ last.sale.price <dbl> 18.4800, 206.7100, 11.6400, 68.1800, 23.2000, ...
## $ market.cap <chr> "$2.11B", "$121.26B", "$491.85M", "$10.06B", "...
## $ ipo.year <dbl> NA, NA, 2013, 2013, NA, NA, 2014, 2014, NA, NA...
## $ sector <chr> "Technology", "Health Care", "Consumer Service...
## $ industry <chr> "Computer Software: Prepackaged Software", "Me...
glimpse(nasdaq)
30
Note that tq_index() unfortunately makes use of the package XLConnect
that requires Java to be installed on your system.
xxxviii List of Figures
## Observations: 3,405
## Variables: 7
## $ symbol <chr> "YI", "PIH", "PIHPP", "TURN", "FLWS", "FCCY", ...
## $ company <chr> "111, Inc.", "1347 Property Insurance Holdings...
## $ last.sale.price <dbl> 13.800, 6.350, 25.450, 2.180, 11.550, 20.150, ...
## $ market.cap <chr> NA, "$38M", NA, "$67.85M", "$746.18M", "$168.8...
## $ ipo.year <dbl> 2018, 2014, NA, NA, 1999, NA, NA, 2011, 2014, ...
## $ sector <chr> NA, "Finance", "Finance", "Finance", "Consumer...
## $ industry <chr> NA, "Property-Casualty Insurers", "Property-Ca...
The datset we will be using consists of the ten largest stocks within
the S&P500 that had an IPO before January 2000. Therefore we
need to merge both datasets using inner_join() because we only
want to keep symbols from the S&P500 that are also traded on
NYSE or NASDAQ:
The ten largest stocks in the S&P500 with a history longer than
January 2000.
symbol
company
weight
sector
shares_held
last.sale.price
market.cap
ipo.year
Managing Data xxxix
AAPL
Apple Inc.
0.044
Information Technology
53939268
221.07
$1067.75B
1980
MSFT
Microsoft Corporation
0.035
Information Technology
84297440
111.71
$856.62B
1986
AMZN
Amazon.com Inc.
0.033
Consumer Discretionary
4418447
1990.00
$970.6B
1997
CSCO
Cisco Systems Inc.
0.009
xl List of Figures
Information Technology
51606584
46.89
$214.35B
1990
NVDA
NVIDIA Corporation
0.007
Information Technology
6659463
268.20
$163.07B
1999
ORCL
Oracle Corporation
0.006
Information Technology
32699620
49.34
$196.43B
1986
AMGN
Amgen Inc.
0.005
Health Care
7306144
199.50
Managing Data xli
$129.13B
1983
ADBE
Adobe Systems Incorporated
0.005
Information Technology
5402625
267.79
$131.13B
1986
QCOM
QUALCOMM Incorporated
0.004
Information Technology
15438597
71.75
$105.41B
1991
GILD
Gilead Sciences Inc.
0.004
Health Care
14310276
73.97
$95.89B
1992
In a next step, we will download stock prices from yahoo.
xlii List of Figures
## # A tibble: 20 x 8
## # Groups: symbol [10]
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2000-01-03 3.75 4.02 3.63 4.00 133949200 3.54
## 2 AAPL 2000-01-04 3.87 3.95 3.61 3.66 128094400 3.24
## 3 ADBE 2000-01-03 16.8 16.9 16.1 16.4 7384400 16.1
## 4 ADBE 2000-01-04 15.8 16.5 15.0 15.0 7813200 14.8
## 5 AMGN 2000-01-03 70 70 62.9 62.9 22914900 53.5
## 6 AMGN 2000-01-04 62 64.1 57.7 58.1 15052600 49.4
## 7 AMZN 2000-01-03 81.5 89.6 79.0 89.4 16117600 89.4
## 8 AMZN 2000-01-04 85.4 91.5 81.8 81.9 17487400 81.9
## 9 CSCO 2000-01-03 55.0 55.1 51.8 54.0 53076000 43.6
## 10 CSCO 2000-01-04 52.8 53.5 50.9 51 50805600 41.2
## 11 GILD 2000-01-03 1.79 1.80 1.72 1.76 54070400 1.61
## 12 GILD 2000-01-04 1.70 1.72 1.66 1.68 38960000 1.54
## 13 MSFT 2000-01-03 58.7 59.3 56 58.3 53228400 42.5
## 14 MSFT 2000-01-04 56.8 58.6 56.1 56.3 54119000 41.0
## 15 NVDA 2000-01-03 3.94 3.97 3.68 3.90 7522800 3.61
## 16 NVDA 2000-01-04 3.83 3.84 3.60 3.80 7512000 3.51
## 17 ORCL 2000-01-03 31.2 31.3 27.9 29.5 98114800 26.4
## 18 ORCL 2000-01-04 28.9 29.7 26.2 26.9 116824800 24.0
## 19 QCOM 2000-01-03 99.6 100 87 89.7 91334000 65.7
## 20 QCOM 2000-01-04 86.3 87.7 80 81.0 63567400 59.4
Managing Data xliii
## # A tibble: 42 x 3
## # Groups: symbol [6]
## symbol section data
## <chr> <chr> <list>
## 1 AAPL Financials <tibble [150 x 5]>
## 2 AAPL Profitability <tibble [170 x 5]>
## 3 AAPL Growth <tibble [160 x 5]>
## 4 AAPL Cash Flow <tibble [50 x 5]>
## 5 AAPL Financial Health <tibble [240 x 5]>
## 6 AAPL Efficiency Ratios <tibble [80 x 5]>
## 7 AAPL Valuation Ratios <tibble [40 x 5]>
## 8 MSFT Financials <tibble [150 x 5]>
## 9 MSFT Profitability <tibble [170 x 5]>
31
http://www.morningstar.com/
xliv List of Figures
200
100
-100
2010 2012 2014 2016 2018
0.2.1.1.1 Quandl
quandl_api_key("enter-your-api-key-here")
quandl_search(query = "Oil", database_code = "NSE", per_page = 3)
quandl.aapl <- c("WIKI/AAPL") %>%
tq_get(get = "quandl",
from = "2000-01-01",
to = "2017-12-31",
column_index = 11, # numeric column number (e.g. 1)
collapse = "daily", # can be “none”, “daily”, “weekly”, “mon
transform = "none") # for summarizing data: “none”, “diff”,
## # A tibble: 3 x 13
## id dataset_code database_code name description refreshed_at
## * <int> <chr> <chr> <chr> <chr> <chr>
## 1 6668 OIL NSE Oil ~ Historical~ 2018-09-13T~
## 2 6669 OILCOUNTUB NSE Oil ~ Historical~ 2018-09-13T~
## 3 6041 ESSAROIL NSE Essa~ Historical~ 2016-02-09T~
## # ... with 7 more variables: newest_available_date <chr>,
## # oldest_available_date <chr>, column_names <list>, frequency <chr>,
## # type <chr>, premium <lgl>, database_id <int>
## # A tibble: 5 x 12
## date open high low close volume ex.dividend split.ratio
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2000-01-03 105. 112. 102. 112. 4.78e6 0 1
## 2 2000-01-04 108. 111. 101. 102. 4.57e6 0 1
## 3 2000-01-05 104. 111. 103 104 6.95e6 0 1
## 4 2000-01-06 106. 107 95 95 6.86e6 0 1
## 5 2000-01-07 96.5 101 95.5 99.5 4.11e6 0 1
## # ... with 4 more variables: adj.open <dbl>, adj.high <dbl>,
## # adj.low <dbl>, adj.close <dbl>
0.2.1.1.2 Alpha Vantage
av_api_key("enter-your-api-key-here")
alpha.aapl <- c("AAPL") %>%
tq_get(get = "alphavantager",
av_fun="TIME_SERIES_DAILY_ADJUSTED") # for daily data
alpha.aapl.id <- c("AAPL") %>%
tq_get(get = "alphavantager",
av_fun="TIME_SERIES_INTRADAY", # for intraday data
interval="5min") # 5 minute intervals
## # A tibble: 5 x 9
## timestamp open high low close adjusted_close volume dividend_amount
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
## 1 2018-04-24 166. 166. 161. 163. 162. 3.37e7 0
## 2 2018-04-25 163. 165. 162. 164. 162. 2.84e7 0
## 3 2018-04-26 164. 166. 163. 164. 163. 2.80e7 0
## 4 2018-04-27 164 164. 161. 162. 161. 3.57e7 0
## 5 2018-04-30 162. 167. 162. 165. 164. 4.24e7 0
## # ... with 1 more variable: split_coefficient <dbl>
## # A tibble: 5 x 6
## timestamp open high low close volume
## <dttm> <dbl> <dbl> <dbl> <dbl> <int>
## 1 2018-09-11 14:25:00 224. 224. 224. 224. 261968
## 2 2018-09-11 14:30:00 224. 224. 224. 224. 334069
## 3 2018-09-11 14:35:00 224. 224. 224. 224. 285138
## 4 2018-09-11 14:40:00 224. 224. 224. 224. 229329
## 5 2018-09-11 14:45:00 224. 224. 224. 224. 193316
0.2.1.1.3 FRED (Economic Data)
## # A tibble: 6 x 3
## # Groups: symbol [2]
## symbol date price
## <chr> <date> <dbl>
## 1 TB1YR 2018-08-01 2.36
## 2 TB1YR 2018-07-01 2.31
## 3 TB1YR 2018-06-01 2.25
## 4 TB3MS 2018-08-01 2.03
## 5 TB3MS 2018-07-01 1.96
## 6 TB3MS 2018-06-01 1.9
## # A tibble: 3 x 2
37
https://www.oanda.com
Managing Data xlix
## date exchange.rate
## <date> <dbl>
## 1 2018-09-12 1.16
## 2 2018-09-11 1.16
## 3 2018-09-10 1.16
## # A tibble: 3 x 2
## date price
## <date> <dbl>
## 1 2018-09-12 681.
## 2 2018-09-11 681.
## 3 2018-09-10 680.
## # A tibble: 8 x 3
## # Groups: FFvar [4]
## date FFvar price
## <date> <chr> <dbl>
## 1 1926-07-31 HML -2.87
## 2 1926-08-31 HML 4.19
## 3 1926-07-31 Mkt.RF 2.96
## 4 1926-08-31 Mkt.RF 2.64
## 5 1926-07-31 RF 0.22
## 6 1926-08-31 RF 0.25
## 7 1926-07-31 SMB -2.3
## 8 1926-08-31 SMB -1.4
stock file with the index, the risk free rate from FRED and the
Fama-French-Factors.
Doing data transformations in tidy datasets is either called
a transmute (change variable/dataset, only return calculated
column) or a mutate() (add transformed variable). In the
tidyquant-package these functions are called tq_transmute and
tq_mutate, because they simultaneously allow changes of periodic-
ity (daily to monthly) and therefore the returned dataset can have
less rows than before. The core of these functions is the provision
of a mutate_fun that can come from the the xts/zoo, quantmod
(Quantitative Financial Modelling & Trading Framework for R40 )
and TTR (Technical Trading Rules41 ) packages.
In the examples below, we show how to change the periodicity of
the data (where we keep the adjusted close price and the volume
information) and calculate monthly log returns for the ten stocks
and the index. We then merge the price and return information
for each stock, and at each point in time add the return of the
S&P500 index and the 3 Fama-French-Factors.
40
https://www.quantmod.com/
41
https://www.rdocumentation.org/packages/TTR/
lii List of Figures
mutate(date=as.yearmon(date))
factors.returns <- factors %>% mutate(price=price/100) %>% # already is mon
mutate(date=as.yearmon(date))
stocks.prices.monthly %>% ungroup() %>% slice(1:5) # show first 5 entries
## # A tibble: 5 x 4
## symbol date adjusted volume
## <chr> <S3: yearmon> <dbl> <dbl>
## 1 AAPL Jan 2000 3.28 175420000
## 2 AAPL Feb 2000 3.63 92240400
## 3 AAPL Mrz 2000 4.30 101158400
## 4 AAPL Apr 2000 3.93 62395200
## 5 AAPL Mai 2000 2.66 108376800
## # A tibble: 5 x 3
## symbol date monthly.returns
## <chr> <S3: yearmon> <dbl>
## 1 AAPL Jan 2000 -0.0731
## 2 AAPL Feb 2000 0.105
## 3 AAPL Mrz 2000 0.185
## 4 AAPL Apr 2000 -0.0865
## 5 AAPL Mai 2000 -0.323
## # A tibble: 5 x 2
## date monthly.returns
## <S3: yearmon> <dbl>
Managing Data liii
## # A tibble: 5 x 3
## date FFvar price
## <S3: yearmon> <chr> <dbl>
## 1 Jul 1926 Mkt.RF 0.0296
## 2 Aug 1926 Mkt.RF 0.0264
## 3 Sep 1926 Mkt.RF 0.0036
## 4 Okt 1926 Mkt.RF -0.0324
## 5 Nov 1926 Mkt.RF 0.0253
## # A tibble: 5 x 10
## symbol date return adjusted volume sp500 Mkt.RF SMB HML
## <chr> <S3:> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL Jan ~ -0.0731 3.28 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
## 2 AAPL Feb ~ 0.105 3.63 9.22e7 -0.0201 0.0245 0.221 -0.106
## 3 AAPL Mrz ~ 0.185 4.30 1.01e8 0.0967 0.052 -0.173 0.0794
## 4 AAPL Apr ~ -0.0865 3.93 6.24e7 -0.0308 -0.064 -0.0771 0.0856
## 5 AAPL Mai ~ -0.323 2.66 1.08e8 -0.0219 -0.0442 -0.0501 0.0243
## # ... with 1 more variable: RF <dbl>
42
https://en.wikipedia.org/wiki/MACD
liv List of Figures
## # A tibble: 6 x 5
## # Groups: symbol [1]
## symbol date adjusted MACD Signal
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 GILD 2018. 73.4 -5.40 -4.38
## 2 GILD 2018. 80.8 -3.86 -4.27
## 3 GILD 2018. 78.7 -2.85 -3.99
## 4 GILD 2018. 72.8 -2.68 -3.73
## 5 GILD 2018. 72.6 -2.52 -3.49
## 6 GILD 2018. 70.0 -2.66 -3.32
save(stocks.final,file="stocks.RData")
In this chapter we show how to explore and analyze data using the
dataset created in Chapter @ref(#s_2Data):
load("stocks.RData")
glimpse(stocks.final)
## Observations: 2,160
## Variables: 10
## $ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL...
## $ date <S3: yearmon> Jan 2000, Feb 2000, Mrz 2000, Apr 2000, Mai 2...
## $ return <dbl> -0.07314358, 0.10481916, 0.18484202, -0.08651613, -0....
## $ adjusted <dbl> 3.283894, 3.628109, 4.298736, 3.926826, 2.658767, 3.3...
## $ volume <dbl> 175420000, 92240400, 101158400, 62395200, 108376800, ...
## $ sp500 <dbl> -0.041753145, -0.020108083, 0.096719828, -0.030795756...
## $ Mkt.RF <dbl> -0.0474, 0.0245, 0.0520, -0.0640, -0.0442, 0.0464, -0...
## $ SMB <dbl> 0.0505, 0.2214, -0.1728, -0.0771, -0.0501, 0.1403, -0...
## $ HML <dbl> -0.0045, -0.1057, 0.0794, 0.0856, 0.0243, -0.1010, 0....
## $ RF <dbl> 0.0041, 0.0043, 0.0047, 0.0046, 0.0050, 0.0040, 0.004...
## # A tibble: 2 x 10
## symbol date return adjusted volume sp500 Mkt.RF SMB HML
## <chr> <S3:> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL Jan ~ -0.0731 3.28 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
## 2 AAPL Feb ~ 0.105 3.63 9.22e7 -0.0201 0.0245 0.221 -0.106
## # ... with 1 more variable: RF <dbl>
Exploring Data lvii
0.3.1.2 Box-plots
The stocks in our example all have a certain exposure to risk fac-
tors (e.g. the Fama-French-factors we have added to our dataset).
Let us specify these exposures by regression each stocks return on
the factors Mkt.RF, SMB and HML:
In this chapter we show how to explore and analyze data using the
dataset created in Chapter @ref(#s_2Data):
At first we will learn how to full-sample optimize portfolios, then
(in the next chapters) we will do the same thing in a rolling anal-
ysis and also perform some backtesting. The major workhorse of
this chapter is the portfolioAnalytics-package developed by Pe-
terson and Carl (2018).
0.4.1 Introduction
43
http://adv-r.had.co.nz/S3.html
Managing Portfolios lix
load("stocks.RData")
glimpse(stocks.final)
## Observations: 2,160
## Variables: 10
## $ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL...
## $ date <S3: yearmon> Jan 2000, Feb 2000, Mrz 2000, Apr 2000, Mai 2...
## $ return <dbl> -0.07314358, 0.10481916, 0.18484202, -0.08651613, -0....
## $ adjusted <dbl> 3.283894, 3.628109, 4.298736, 3.926826, 2.658767, 3.3...
## $ volume <dbl> 175420000, 92240400, 101158400, 62395200, 108376800, ...
## $ sp500 <dbl> -0.041753145, -0.020108083, 0.096719828, -0.030795756...
## $ Mkt.RF <dbl> -0.0474, 0.0245, 0.0520, -0.0640, -0.0442, 0.0464, -0...
## $ SMB <dbl> 0.0505, 0.2214, -0.1728, -0.0771, -0.0501, 0.1403, -0...
## $ HML <dbl> -0.0045, -0.1057, 0.0794, 0.0856, 0.0243, -0.1010, 0....
## $ RF <dbl> 0.0041, 0.0043, 0.0047, 0.0046, 0.0050, 0.0040, 0.004...
## # A tibble: 2 x 10
## symbol date return adjusted volume sp500 Mkt.RF SMB HML
## <chr> <S3:> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL Jan ~ -0.0731 3.28 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
## 2 AAPL Feb ~ 0.105 3.63 9.22e7 -0.0201 0.0245 0.221 -0.106
## # ... with 1 more variable: RF <dbl>
## **************************************************
## PortfolioAnalytics Portfolio Specification
## **************************************************
##
## Call:
## portfolio.spec(assets = stocks.selection$symbol, category_labels = stocks.
##
## Number of assets: 10
## Asset Names
## [1] "AAPL" "MSFT" "AMZN" "CSCO" "NVDA" "ORCL" "AMGN" "ADBE" "QCOM" "GILD"
##
## Category Labels
## Information Technology : AAPL MSFT CSCO NVDA ORCL ADBE QCOM
## Consumer Discretionary : AMZN
## Health Care : AMGN GILD
str(pspec)
Managing Portfolios lxi
## List of 6
## $ assets : Named num [1:10] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0
## ..- attr(*, "names")= chr [1:10] "AAPL" "MSFT" "AMZN" "CSCO" ...
## $ category_labels:List of 3
## ..$ Information Technology: int [1:7] 1 2 4 5 6 8 9
## ..$ Consumer Discretionary: int 3
## ..$ Health Care : int [1:2] 7 10
## $ weight_seq : NULL
## $ constraints : list()
## $ objectives : list()
## $ call : language portfolio.spec(assets = stocks.selection$symb
## - attr(*, "class")= chr [1:2] "portfolio.spec" "portfolio"
0.4.1.2 Constraints
max=0.4)
# print(pspec)
# add.constraint(portfolio=pspec,
# type="box",
# min=c(0.05, 0, rep(0.05,8)),
# max=c(0.4, 0.3, rep(0.4,8)))
44
Note, that only the ROI, DEoptim and random portfolio solvers support
group constraints. See also @(#sss_4solvers).
lxiv List of Figures
groups=list(pspec$category_labels$`Information Techno
pspec$category_labels$`Consumer Discretionar
pspec$category_labels$`Health Care`),
group_min=c(0.1, 0.15,0.1),
group_max=c(0.85, 0.55,0.4),
group_labels=pspec$category_labels)
# print(pspec)
46
Note that diversification constraint is only supported for the global nu-
meric solvers (not the ROI solvers).
47
Note, that the turnover constraint is not currently supported using the
ROI solver for quadratic utility and minimum variance problems.
lxvi List of Figures
The factor exposure constraint allows the user to set upper and
lower bounds on exposures to risk factors. We will use the factor
exposures that we have calculated in @(#sss_3FactorExposure).
The major input is a vector or matrix B and upper/lower bounds
for the portfolio factor exposure. If B is a vector (with length equal
to the number of assets), lower and upper bounds must be scalars.
If B is a matrix, the number of rows must be equal to the number of
assets and the number of columns represent the number of factors.
In this case, the length of lower and upper bounds must be equal
to the number of factors. B should have column names specifying
the factors and row names specifying the assets.
48
For usage of the ROI (quadprog) solvers, transaction costs are currently
only supported for global minimum variance and quadratic utility problems.
Managing Portfolios lxvii
summary(pspec)
# To get an overview on the specs, their indexnum and whether they are enab
consts <- plyr::ldply(pspec$constraints, function(x){c(x$type,x$enabled)})
consts
pspec$constraints[[which(consts$V1=="box")]]
pspec <- add.constraint(pspec, type="box",
min=0, max=0.5,
indexnum=which(consts$V1=="box"))
pspec$constraints[[which(consts$V1=="box")]]
# to disable constraints
pspec$constraints[[which(consts$V1=="position_limit")]]
pspec <- add.constraint(pspec, type="position_limit", enable=FALSE, # only s
indexnum=which(consts$V1=="position_limit"))
pspec$constraints[[which(consts$V1=="position_limit")]]
lxviii List of Figures
0.4.1.3 Objectives
Here, the user can specify a risk function that should be mini-
mized. We start by adding a risk objective to minimize portfolio
variance (minimum variance portfolio). Another example could be
the expected tail loss with a confidence level 0.95. Whatever func-
tion (even user defined ones are possble, the name must correspond
to a function in R), necessary additional arguments to the function
have to be passed as a named list to arguments. Possible functions
are:
## **************************************************
## PortfolioAnalytics Portfolio Specification
## **************************************************
##
## Call:
## portfolio.spec(assets = stocks.selection$symbol, category_labels = stocks.
##
## Number of assets: 10
## Asset Names
## [1] "AAPL" "MSFT" "AMZN" "CSCO" "NVDA" "ORCL" "AMGN" "ADBE" "QCOM" "GILD"
##
## Category Labels
## Information Technology : AAPL MSFT CSCO NVDA ORCL ADBE QCOM
## Consumer Discretionary : AMZN
## Health Care : AMGN GILD
##
##
## Constraints
## Enabled constraint types
## - full_investment
## - box
## - group
## - position_limit
## - diversification
## - turnover
## - return
## - factor_exposure
Managing Portfolios lxxi
## - transaction_cost
## - leverage_exposure
##
## Objectives:
## Enabled objective names
## - var
## - mean
## - var
## Disabled objective names
## - ETL
0.4.1.4 Solvers
0.4.1.4.1 DEOptim
0.4.1.4.3 pso
0.4.1.4.4 GenSA
0.4.1.4.5 ROI
0.5.2 Backtesting
0.6.2 Fama-MacBeth-Regressions
0.7 References
# Appendix{#s_99Appendix}
.0.1 Introduction to R
Once you have started R, there are several ways to find help. First
of all, (almost) every command is equipped with a help page that
can be accessed via ?... (if the package is loaded). If the command
is part of a package that is not loaded or you have no clue about the
command itself, you can search the entire help (full-text) by using
??.... Be aware, that certain very-high level commands need to
be put in quotation marks ?'function'. Many of the packages
you find are either equipped with a demo() (get a list of all
available demos using demo(package=.packages(all.available
= TRUE))) and/or a vignette(), a document explaining
the purpose of the package and demonstrating its work
using suitable examples (find all available vignettes with
vignette(package=.packages(all.available = TRUE))). If
you want to learn how to do a certain task (e.g. conducting an
event study vignette("eventstudies")50 ).
Executing code in Rstudio is simple. Either you highlight the exact
portion of the code that you want to execute and hit ctrl+enter,
or you place the cursor just somewhere to execute this particular
line of code with the same command.51
have to use either “/” instead of “” or use two”\“. Now set the
working directory using setwd() and check with getwd()
setwd("D:/R/researchmethods")
getwd()
floor(20/3)
ceiling(20/3)
n <- 10
n
n <- 11
n
12 -> n
n
n <- n^2
n
# check if m==10
m <- 11
m==10 # is equal to
m==11
lxxviii List of Figures
If one wants to find out which variables are already set use ls().
Delete (Remove) variables using rm() (you sometimes might want
to do that to save memory - in this case always follow the rm()
command with gc()).
If you do want to find out the format of a variable you can use
class(). Slightly different information will be given by mode()
and typeof()
References lxxix
class(n)
class(name)
mode(n)
mode(name)
typeof(n)
typeof(name)
# Lets change formats:
n1 <- n
is.character(n1)
n1 <- as.character(n)
is.character(n1)
as.numeric(name) # New thing: NA
is.logical(n==10)
n3 <- n==10 # we can assign the logical output to a new variable
is.logical(n3)
lxxx List of Figures
.0.1.5.1 Sequences
x <- c(1,3,5,6,7)
class(x)
is.vector(x)
is.numeric(x)
x1 <- seq(from=1,to=5,by=1)
x2 <- 1:5
One can operate with sequences in the same way as with numbers.
Be aware of the order of the commands and use brackets where
necessary!
1:10-1
1:(10-1)
1:10^2-2 *3
commands are a part of the stats package, where you find avail-
able commands using the package help: library(help=stats).
Notice that whenever you generate random numbers, they are dif-
ferent. If you prefer to work with the same set of random numbers
(e.g. for testing purposes) you can fix the starting value of the
random number generator by setting the seed to a chosen num-
ber set.seed(123). Notice that you have to execute set.seed()
every time before (re)using the random number generator.
rand1 <- rnorm(n = 100) # 100 standard normally distributed random numbers
set.seed(134) # fix the starting value of the random number generator (then
rand1a <- rnorm(n = 100)
x <- c(2,4,5,8,10,12)
length(x)
dim(x)
x^2/2-1
x %*% x # R automatically multiplies row and column vector
is.vector(x)
y <- as.matrix(x)
is.matrix(y); is.matrix(x)
dim(y); dim(x)
t(y) %*% y
y %*% t(y)
mat <- matrix(data = x,nrow = 2,ncol = 3, byrow = TRUE)
dim(mat); ncol(mat); nrow(mat)
mat2 <- matrix(c(1,2,3,4),2,2) # set a new (quadratic) matrix
mat2i <- solve(mat2)
mat2 %*% mat2i
mat2i %*% mat2
diag(3)
diag(c(1,2,3,4))
mat4 <- matrix(0,3,3)
mat5 <- matrix(1,3,3)
cbind(mat4,mat5)
rbind(mat4,mat5)
########################################################
### 8) The INDEXING System
# We can access the single values of a vector/matrix
x[2] # one-dim
mat[,2] # two-dim column
mat[2,] # two-dim row
i <- c(1,3)
mat[i]
mat[1,2:3] # two-dim select second and third column, first row
mat[-1,] # two-dim suppress first row
mat[,-2] # two-dim suppress second column
We can do something even more useful and name the rows and
columns of a matrix usingcolnames() and rownames().
.0.1.7 Functions in R
.0.1.8 Plotting
?plot
?colors # very good source for colors:
y1 <- rnorm(50,0,1)
plot(y1)
# set title, and x- and y-labels
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample")
# now make a line between elements, and color everything blue
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l')
# if you want to save plots or open them in separate windows you can use x1
?Devices
# x11 (opens seperate window)
x11(8,6)
References lxxxix
Last and least for this lecture we learn about control structure.
These structures (for-loops, if/else checks etc) are very useful, if
you want to translate a tedious manual task (e.g. in Excel) into
something R should do for you and go step by step (e.g. column by
column). Again, see below for a variety of examples and commands
used in easy examples.
x <- sample(-15:15,10) # sample does draw randomly draw 10 numbers from the
# 1. We square every element of vector x in a loop
y <- NULL # 1.a) set an empty variable (NULL means it is truly nothing and
is.null(y)
xc List of Figures
Ardia, D., Mullen, K., Peterson, B., and Ulrich, J. (2016). DEop-
tim: Global Optimization by Differential Evolution. R package
version 2.2-4.
Gubian, S., Xiang, Y., Suomela, B., Hoeng, J., and SA., P. (2018).
GenSA: Generalized Simulated Annealing. R package version
1.1.7.
Würtz, D., Chalabi, Y., Chen, W., and Ellis, A. (2015). Portfolio
Optimization with R/Rmetrics. Rmetrics.
xci
Index
timetk, lvii
TTR, xxvi
zoo
vignettes, xxi