Escolar Documentos
Profissional Documentos
Cultura Documentos
10 10
20 20
3 4 5 6 7 8 9 9 0 1 2 3 4 5 6 7 8
200 200 200 200 200 200 200 200 201 201 201 201 201 201 201 201 201
Test Period Trading Date Test Period Trading Date
Figure 1: Daily returns for three reference classifiers before and after peak daily volume.
Cumul. Return
0.00 0.00
0.25 0.25
0.50 0.50
0.75 0.75
1.00 1.00
3 4 5 6 7 8 9 9 0 1 2 3 4 5 6 7 8
200 200 200 200 200 200 200 200 201 201 201 201 201 201 201 201 201
Test Period Trading Date Test Period Trading Date
Figure 2: Cumulative returns for three reference classifiers before and after peak daily volume.
both classifiers (“conflict of opinion”) will not be allocated The trade data is reported per time period, per symbol, per
funds. We require that the investments are always long/short exchange. We process the source data to produce a data set
balanced. The same total dollar allocation will be made to of one-minute open-high-low-close (OHLC) bars plus vol-
the long and short sides of the portfolio, each side subdi- ume for all symbols on each day. A stock that does not trade
vided equally among the available symbols. The balance re- during a given minute retains its earlier close price with zero
quirement is absolute: If there are long predictions but no volume. We further maintain a list of known symbol changes
short predictions, the strategy will not trade at all. for consistency. We exclude Exchange Traded Funds (ETFs)
and test symbols from consideration,
Data For each day in a training, validation, or test period, one-
minute OHLC bars are retrieved from disk for a selected uni-
The source data for our implementation is the NYSE Trade verse of “important” stocks. By universe we mean the set of
and Quote (TAQ) data set from Wharton Research Data Ser- equities eligible for trading on a particular day. The universe
vices (WRDS) under academic license (New York Stock Ex- is determined on a daily basis as the most traded 500 stocks
change 2018). We use the daily trade files, which summarize from the previous twelve months. For example the selected
reported trades from all major U.S. exchanges at one second universe for the first day of a training period beginning Jan-
resolution through 2014 and subsequently at one millisec- uary 1, 2002 is the top 500 dollar volume equities of year
ond resolution. 2001. Only this daily selected universe of equities is used
for training, validating, or testing the system, and only these Evaluation: During the test month, each trained model is
symbols are considered by the trading strategy on a given queried with the daily observations up to the selected end x
day. and a trading decision is made for every symbol every day.
The results of the trading strategy are evaluated for preci-
Methodology sion, daily return, and cumulative return.
The entire described training, validation, and testing pro-
Given a time period over which we wish to conduct an ex- cess are repeated across the experimental period with the test
periment, we proceed in an iterative manner as follows. The period aligned to each calendar month in succession. At the
start date for our initial test period is 25 months after the conclusion of the experiment, an analysis of logged models,
beginning of the experimental period. For the current itera- trades, and results is used to produce and plot final daily and
tion, this date is treated as today before market open, and no cumulative returns of the method across the entire period.
subsequent data may be used. The validation period is the
month preceding the test period. The training period is the
twelve months preceding the validation period. The twelve
Experimental Results
months prior to the training period are needed for universe In our initial analyses, we observed an abrupt change in
selection as previously described. market efficiency after October 10, 2008. In addition to
The system is trained and validated at the start of each that week representing one of the most significant drops in
month only. For a single time iteration, the allowed universe US market value, it was also the date of the highest ever
of symbols is retrieved for every day during the training pe- intra-day volume of trading. Accordingly, we selected the
riod. These approximately 500 × 252 = 126, 000 rows of September/October 2008 boundary as a dividing point in our
390 one-minute close prices become the available training evaluation of market efficiency. We separately evaluate the
data. The same process is repeated for the validation and data before October 2008 as “early” and after as ‘late.”
test periods. As mentioned above, we evaluated three types of classifi-
Training Data: Input observations are transformed into cation models (neural network, logistic regression, and ran-
cumulative returns backward from a final observation dom classifier) over these two time periods: 2001-2008 and
minute end x. Mean one-minute returns are also obtained 2009-2017, with each model trained each month on a roll-
for the universe and transformed in the same manner. Fi- forward basis. Trade decisions are made on a daily basis us-
nally, the difference between the individual observations and ing the most recently trained model.
the selected universe is taken to produce universe-relative All models were trained from, and evaluated on, the same
training data normed to 0.0 at minute end x. data using the same hyperparameter search method, on a
For each input, the forward cumulative return is com- monthly basis. We compare results from the same classifier
puted from minute end x + 1 to market close, then made across time periods and across classifiers within the same
universe-relative as above to produce universe-relative gains time period. All returns are quoted in basis points (bps) or
from the end of the minute after the observations cease. This one-hundredths of one percent.
permits one minute of time to make a decision and enter Returns by Time Period: The daily returns, cumulative
orders. Norming two consecutive minutes to zero also pro- returns, and precision of all three classification models are
duces a clean break that withholds any future information summarized in Table 1. The returns of each model are illus-
from the classifier. trated in Figures 1 and 2. As mentioned above, the analysis
Hyperparameter Search: For each reference learner is broken into a “before” and“after” period at September 30,
method two similar classifiers are trained to detect stocks 2008, near the point of greatest universe share trade volume
that will rise or fall into the close. (October 10, 2008). Daily returns are smoothed with a 40-
The entry minute for the day’s trades, and the thresh- day centered window.
old in bps (hundredths of a percent) required for a price Observe that by all metrics the ML methods perform sub-
movement to be considered meaningful, are hyperparame- stantially better in the Early period, and furthermore that the
ter choices for the system to make. A hyperparameter grid control method (Random) is neutral in all time periods.
search is performed across a simple 3 × 4 matrix: end x
∈ {−5, −10, −30}, bps ∈ {2, 5, 10, 25}. For each candi- Daily Ret Cumul Ret Precision
date hyperparameter pair, each classifier network is trained Early 4.6 93.5 0.577
Neural
on all available training examples constructed as described Late -0.4 -9.7 0.519
above. For each hyperparameter pair, the trained network is Early 4.9 101.6 0.576
Logistic
evaluated on the validation data. Late -0.9 -19.5 0.519
The objective function for the hyperparameter grid search Early 0.0 0.0 0.499
Random
is the precision of the trading strategy, evaluated on a per- Late 0.0 0.0 0.501
symbol basis, where the true positive class indicates the
strategy allocated long funds to a symbol that did rise or Table 1: Daily Returns (bps), Cumulative Returns (%), and
allocated short funds to a symbol that did fall, and so forth. Precision for three reference classifiers for two time periods.
The trained model with the maximum trading strategy pre-
cision for the validation month is selected for use during the To further assess the changing effectiveness of each model
upcoming test month. over time, we considered the distribution of daily returns for
0.16 0.16
2003-2008 2003-2008
0.14 2009-2017 0.14 2009-2017
0.12 0.12
0.10 0.10
Frequency (%)
Frequency (%)
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02
0.00 0.00
100 75 50 25 0 25 50 75 100 100 75 50 25 0 25 50 75 100
Daily Return (bps) Daily Return (bps)
(a) Daily Return: Neural Classifier (b) Daily Return: Logistic Classifier
Figure 3: Histogram of daily returns comparing same classifier across test periods.
0.10 0.10
2003-2008 2003-2008
2009-2017 2009-2017
0.08 0.08
0.06 0.06
Frequency (%)
Frequency (%)
0.04 0.04
0.02 0.02
0.00 0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Precision of Trade Decisions Precision of Trade Decisions
(a) Trade Precision: Neural Classifier (b) Trade Precision: Logistic Classifier
Figure 4: Histogram of trade precision comparing same classifier across test periods.
each model in the Early period and the Late period. Figure 3 we do not report classifier precision. Instead we report “trade
provides a histogram of returns for each learning algorithm precision”, which is the precision of our system’s decisions
using bins of two basis points. In both cases, the distribution on each day the market is open.
of returns in the early period is more narrow and centered We present the distribution of daily “trade precision” in
in positive territory, while in the late period the distribution Figure 4. Note that we do not assess false negative deci-
is broader and zero centered, suggesting that returns in the sions (or “recall”). Our method allocates the same total dol-
early period are more positive and more reliable. The logistic lar amount long and short regardless of the number of sym-
classifier, while providing slightly higher returns in the early bols selected, so there is little harm in omitting potentially-
period, has a wider variance of returns in both periods. profitable symbols (Type II error), only in including un-
Accuracy of Trading Decisions: We have primarily fo- profitable symbols (Type I error). Because we evaluate the
cused on the potential monetary returns of the reference clas- classifiers with universe-relative returns, we would expect a
sifiers across time, as is common in financial market appli- random classifier to have precision 0.5, and indeed it does.
cations. In the field of Artificial Intelligence it is customary Compared to this, both learning methods have substantially
to evaluate classifier accuracy. better than random discrimination and a relatively narrow
Our method employs two classifiers together to make a distribution in the early time period, but not in the late time
single decision for each symbol in our universe each day. period.
Because no action is taken on the output of a single classifier, Figure 5 shows the daily and cumulative returns of all
1.2
15 Random
Logistic
10 1.0 Neural Net
5 0.8
Daily Return (bps)
Cumul. Return
0
0.6
5
10 0.4
15
Random 0.2
20 Logistic
Neural Net 0.0
25
3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
200 200 200 200 200 200 200 201 201 201 201 201 201 201 201 201 200 200 200 200 200 200 200 201 201 201 201 201 201 201 201 201
Test Period Trading Date Test Period Trading Date
1.03
1.02
1.02 1.00
Model Monthly Return
0.98
1.01
0.96
1.00
0.94
0.99 0.92
0.0 0.1 0.2 0.3 0.4 0.5 0.48 0.50 0.52 0.54 0.56 0.58 0.60 0.62
Ratio of HFT to Total Trade Volume Ratio of HFT to Total Trade Volume
(a) Through Sep 2008 (b) Oct 2008 onward
Figure 6: Correlation between HFT Ratio and Model Return before/after volume peak.
three model types across the entire test period from Febru- ing the same process, method, and data has effectively zero
ary 2003 to December 2017. The initial performance of both returns throughout the entire study period.
learning algorithms is strong, though they appear to already This degradation in profitability for the same flexible
be in decline as our test period begins. (Earlier data was not learning model suggests a corresponding increase in weak-
available to us.) Around the date of peak universe volume form market efficiency over the same period. What it does
(October 10, 2008) both methods begin to produce noisy, not address is any underlying cause.
flat returns. Relationship of HFT Volume to Weak Efficiency: Re-
call that we assert that if a trading algorithm can profit using
Discussion and Future Work only historical price data, the market must be weak-form in-
efficient. And conversely if that same algorithm is unable
We present objective measures of market efficiency, namely to find profit at a different time, we infer that the market is
two reference machine learning trading methods that oper- relatively more efficient. We concede that there likely ex-
ate intra-day (Neural Networks and Logistic Regression). ist other algorithms that can extract profit even when ours
We show that both of these trading strategies provide sub- cannot. Accordingly we only make claims regarding com-
stantial profit from 2003 to 2008, but that their profitability parative efficiency.
degrades gradually over that time. We further show that a Consider the prevalence of HFT (see Figure 7). According
Random Classifier control model trained and evaluated us- to Avaramovic’s data, there was almost no HFT volume in
numerous trading agents select and execute strategies from
a flexible policy space, with analysis held until the market
0.6 1.02 reaches competitive equilibrium, for example as used by the
Ratio of HFT to Total Trade Volume